Pennsylvania Assessment Data: PSSA and Keystone Results • paschooldata

library(paschooldata)
library(dplyr)
library(tidyr)
library(ggplot2)

Pennsylvania administers two major assessment programs through the Department of Education:

PSSA (Pennsylvania System of School Assessment): Grades 3-8, testing English Language Arts and Mathematics
Keystone Exams: End-of-course assessments for grade 11 in Algebra I, Biology, and Literature

This vignette explores assessment data using the paschooldata package, which provides direct access to PDE’s assessment files.

Data Notes

PSSA: - Grades 3-8 - Subjects: English Language Arts, Math - Years available: 2015-2019, 2021-2025 (2020 cancelled due to COVID) - Levels: School, District, State

Keystone Exams: - Grade 11 (end-of-course) - Subjects: Algebra I, Biology, Literature - Years available: 2015-2019, 2021-2025

Proficiency Levels: - Below Basic: Limited knowledge - Basic: Partial knowledge - Proficient: Meets grade-level standards - Advanced: Exceeds grade-level standards

Data Source: PA DOE Assessment Data

Data Preparation

# Fetch 2025 assessment data using package functions
pssa_state <- fetch_pssa(2025, level = "state", tidy = FALSE, use_cache = TRUE)
pssa_school <- fetch_pssa(2025, level = "school", tidy = FALSE, use_cache = TRUE)
keystone_state <- fetch_keystone(2025, level = "state", tidy = FALSE, use_cache = TRUE)

# Clean up school data to standard columns
pssa_school <- pssa_school %>%
  select(any_of(c("aun", "county", "district_name", "school_name", "subject",
                  "group", "grade", "n_scored", "pct_advanced", "pct_proficient",
                  "pct_basic", "pct_below_basic", "pct_proficient_above", "end_year")))

1. Only 42% of Pennsylvania students are proficient in math

Less than half of Pennsylvania students meet grade-level math standards. Over 700,000 students took the PSSA Math exam in 2025, and 58% scored below proficient.

state_math <- pssa_state %>%
  filter(subject == "Math",
         group == "All Students",
         grade == "Total") %>%
  select(n_scored, pct_advanced, pct_proficient, pct_basic,
         pct_below_basic, pct_proficient_above)
stopifnot(nrow(state_math) > 0)
state_math
#> # A tibble: 1 × 6
#>   n_scored pct_advanced pct_proficient pct_basic pct_below_basic
#>      <int>        <dbl>          <dbl>     <dbl>           <dbl>
#> 1   703819         16.6           25.1      27.4            30.9
#> # ℹ 1 more variable: pct_proficient_above <dbl>

math_dist <- pssa_state %>%
  filter(subject == "Math",
         group == "All Students",
         grade == "Total") %>%
  select(pct_advanced, pct_proficient, pct_basic, pct_below_basic) %>%
  pivot_longer(everything(), names_to = "Level", values_to = "Percent") %>%
  mutate(Level = gsub("pct_", "", Level),
         Level = tools::toTitleCase(gsub("_", " ", Level)),
         Level = factor(Level, levels = c("Below Basic", "Basic", "Proficient", "Advanced")))
stopifnot(nrow(math_dist) > 0)

ggplot(math_dist, aes(x = Level, y = Percent, fill = Level)) +
  geom_col() +
  scale_fill_manual(values = c("Below Basic" = "#d73027", "Basic" = "#fc8d59",
                                "Proficient" = "#91bfdb", "Advanced" = "#4575b4")) +
  labs(title = "Pennsylvania Math Proficiency Distribution (2025)",
       subtitle = "703,819 students tested",
       x = NULL, y = "Percent of Students") +
  theme_minimal() +
  theme(legend.position = "none")

2. ELA outperforms math by 7 percentage points

Nearly half of students (48.5%) meet ELA standards, compared to only 42% in math. The reading-math gap is consistent across grades and persists year over year.

state_ela <- pssa_state %>%
  filter(subject == "English Language Arts",
         group == "All Students",
         grade == "Total") %>%
  select(n_scored, pct_advanced, pct_proficient, pct_basic,
         pct_below_basic, pct_proficient_above)
stopifnot(nrow(state_ela) > 0)
state_ela
#> # A tibble: 1 × 6
#>   n_scored pct_advanced pct_proficient pct_basic pct_below_basic
#>      <int>        <dbl>          <dbl>     <dbl>           <dbl>
#> 1   703650         12.5             36      36.4            15.1
#> # ℹ 1 more variable: pct_proficient_above <dbl>

subjects <- pssa_state %>%
  filter(group == "All Students", grade == "Total") %>%
  select(subject, pct_proficient_above)
stopifnot(nrow(subjects) > 0)

ggplot(subjects, aes(x = subject, y = pct_proficient_above, fill = subject)) +
  geom_col() +
  geom_text(aes(label = paste0(pct_proficient_above, "%")), vjust = -0.5, size = 5) +
  scale_fill_manual(values = c("English Language Arts" = "#4575b4", "Math" = "#d73027")) +
  scale_y_continuous(limits = c(0, 60)) +
  labs(title = "ELA Outperforms Math by 7 Percentage Points",
       subtitle = "2025 PSSA % Proficient and Above",
       x = NULL, y = "% Proficient") +
  theme_minimal() +
  theme(legend.position = "none")

3. Math proficiency drops 23 points from grade 3 to grade 8

Third graders start at 54% proficient in math. By eighth grade, only 31% remain proficient - a 23 percentage point collapse that signals deepening gaps as content becomes more complex.

grade_math <- pssa_state %>%
  filter(subject == "Math",
         group == "All Students",
         grade %in% c("3", "4", "5", "6", "7", "8")) %>%
  select(grade, n_scored, pct_proficient_above) %>%
  arrange(as.numeric(grade))
stopifnot(nrow(grade_math) > 0)
grade_math
#> # A tibble: 6 × 3
#>   grade n_scored pct_proficient_above
#>   <chr>    <int>                <dbl>
#> 1 3       118946                 53.6
#> 2 4       114405                 50.6
#> 3 5       118011                 43.9
#> 4 6       117605                 37.8
#> 5 7       117200                 33.7
#> 6 8       117652                 30.5

grade_math_chart <- pssa_state %>%
  filter(subject == "Math",
         group == "All Students",
         grade %in% c("3", "4", "5", "6", "7", "8")) %>%
  mutate(grade = factor(grade, levels = c("3", "4", "5", "6", "7", "8")))
stopifnot(nrow(grade_math_chart) > 0)

ggplot(grade_math_chart, aes(x = grade, y = pct_proficient_above)) +
  geom_col(fill = "#e63946") +
  geom_text(aes(label = paste0(pct_proficient_above, "%")), vjust = -0.5) +
  scale_y_continuous(limits = c(0, 60)) +
  labs(title = "Math Proficiency Collapses as Students Age",
       subtitle = "2025 PSSA Math - 23 point drop from Grade 3 to Grade 8",
       x = "Grade", y = "% Proficient") +
  theme_minimal()

4. Nearly 1 in 3 students score “Below Basic” in math

Almost 28% of Pennsylvania students score at the lowest proficiency level - they haven’t mastered foundational math concepts. This is double the ELA rate.

below_basic <- pssa_state %>%
  filter(group == "All Students", grade == "Total") %>%
  select(subject, pct_below_basic)
stopifnot(nrow(below_basic) > 0)
below_basic
#> # A tibble: 2 × 2
#>   subject               pct_below_basic
#>   <chr>                           <dbl>
#> 1 English Language Arts            15.1
#> 2 Math                             30.9

below_basic_chart <- pssa_state %>%
  filter(group == "All Students", grade == "Total")
stopifnot(nrow(below_basic_chart) > 0)

ggplot(below_basic_chart, aes(x = subject, y = pct_below_basic, fill = subject)) +
  geom_col() +
  geom_text(aes(label = paste0(pct_below_basic, "%")), vjust = -0.5) +
  scale_fill_manual(values = c("English Language Arts" = "#fc8d59", "Math" = "#d73027")) +
  scale_y_continuous(limits = c(0, 35)) +
  labs(title = "Math Has Double the 'Below Basic' Rate of ELA",
       subtitle = "2025 PSSA",
       x = NULL, y = "% Below Basic") +
  theme_minimal() +
  theme(legend.position = "none")

5. Philadelphia’s math proficiency is only 22% - half the state average

Philadelphia City SD’s 170 schools average just 22% math proficiency, compared to 42% statewide. The state’s largest district educates 45,000 tested students but lags dramatically behind.

philly_sum <- pssa_school %>%
  filter(district_name == "PHILADELPHIA CITY SD",
         subject == "Math",
         group == "All Students",
         grade == "Total") %>%
  summarize(
    n_schools = n(),
    total_scored = sum(n_scored, na.rm = TRUE),
    avg_proficient = round(mean(pct_proficient_above, na.rm = TRUE), 1)
  )
stopifnot(nrow(philly_sum) > 0)
philly_sum
#> # A tibble: 1 × 3
#>   n_schools total_scored avg_proficient
#>       <int>        <int>          <dbl>
#> 1       170        45030           21.6

philly_math <- pssa_school %>%
  filter(district_name == "PHILADELPHIA CITY SD",
         subject == "Math",
         group == "All Students",
         grade == "Total") %>%
  filter(!is.na(pct_proficient_above))
stopifnot(nrow(philly_math) > 0)

ggplot(philly_math, aes(x = pct_proficient_above)) +
  geom_histogram(binwidth = 5, fill = "#e63946", color = "white") +
  geom_vline(xintercept = 41.7, linetype = "dashed", color = "#003366", linewidth = 1) +
  annotate("text", x = 50, y = 22, label = "State Avg: 41.7%", color = "#003366") +
  labs(title = "Philadelphia Schools' Math Proficiency Distribution",
       subtitle = "Most schools fall far below state average",
       x = "% Proficient", y = "Number of Schools") +
  theme_minimal()

6. Central Bucks outperforms Philadelphia by 45 percentage points

The suburban-urban divide is stark. Central Bucks SD averages 67% math proficiency while Philadelphia averages 22%. Same state assessments, radically different outcomes.

major_districts <- c("PHILADELPHIA CITY SD", "PITTSBURGH SD",
                     "CENTRAL BUCKS SD", "NORTH PENN SD")

dist_compare <- pssa_school %>%
  filter(district_name %in% major_districts,
         subject == "Math",
         group == "All Students",
         grade == "Total") %>%
  group_by(district_name) %>%
  summarize(
    n_schools = n(),
    total_scored = sum(n_scored, na.rm = TRUE),
    avg_proficient = round(mean(pct_proficient_above, na.rm = TRUE), 1),
    .groups = "drop"
  ) %>%
  arrange(desc(avg_proficient))
stopifnot(nrow(dist_compare) > 0)
dist_compare
#> # A tibble: 4 × 4
#>   district_name        n_schools total_scored avg_proficient
#>   <chr>                    <int>        <int>          <dbl>
#> 1 CENTRAL BUCKS SD            20         7684           66.8
#> 2 NORTH PENN SD               16         5361           61.4
#> 3 PITTSBURGH SD               48         7114           26.2
#> 4 PHILADELPHIA CITY SD       170        45030           21.6

district_summary <- pssa_school %>%
  filter(district_name %in% major_districts,
         subject == "Math",
         group == "All Students",
         grade == "Total") %>%
  group_by(district_name) %>%
  summarize(avg_proficient = mean(pct_proficient_above, na.rm = TRUE),
            .groups = "drop")
stopifnot(nrow(district_summary) > 0)

ggplot(district_summary, aes(x = reorder(district_name, avg_proficient),
                              y = avg_proficient,
                              fill = avg_proficient > 40)) +
  geom_col() +
  coord_flip() +
  scale_fill_manual(values = c("TRUE" = "#2a9d8f", "FALSE" = "#e63946")) +
  labs(title = "The Urban-Suburban Divide",
       subtitle = "2025 PSSA Math % Proficient",
       x = NULL, y = "% Proficient") +
  theme_minimal() +
  theme(legend.position = "none")

7. Peters Township leads the state at 88% math proficiency

Pennsylvania’s top-performing districts cluster in the Pittsburgh suburbs (Allegheny County) and Philadelphia suburbs (Montgomery, Delaware, Chester counties). The top 10 all exceed 77% proficiency.

top_dist <- pssa_school %>%
  filter(subject == "Math",
         group == "All Students",
         grade == "Total") %>%
  group_by(district_name, county) %>%
  summarize(
    n_schools = n(),
    total_scored = sum(n_scored, na.rm = TRUE),
    avg_proficient = round(mean(pct_proficient_above, na.rm = TRUE), 1),
    .groups = "drop"
  ) %>%
  filter(total_scored >= 500) %>%
  arrange(desc(avg_proficient)) %>%
  head(10)
stopifnot(nrow(top_dist) > 0)
top_dist
#> # A tibble: 10 × 5
#>    district_name             county     n_schools total_scored avg_proficient
#>    <chr>                     <chr>          <int>        <int>          <dbl>
#>  1 PETERS TOWNSHIP SD        WASHINGTON         4         1784           87.9
#>  2 FOX CHAPEL AREA SD        ALLEGHENY          5         1906           84.3
#>  3 UPPER ST. CLAIR SD        ALLEGHENY          5         1793           84.1
#>  4 LOWER MERION SD           MONTGOMERY         9         3678           82.7
#>  5 HAMPTON TOWNSHIP SD       ALLEGHENY          4         1156           82.4
#>  6 RADNOR TOWNSHIP SD        DELAWARE           4         1575           82.3
#>  7 WEST ALLEGHENY SD         ALLEGHENY          4         1518           82.2
#>  8 COLONIAL SD               MONTGOMERY         6         2526           79.7
#>  9 NORTH ALLEGHENY SD        ALLEGHENY         10         3887           79.4
#> 10 WALLINGFORD-SWARTHMORE SD DELAWARE           4         1706           78.5

top_districts <- pssa_school %>%
  filter(subject == "Math",
         group == "All Students",
         grade == "Total") %>%
  group_by(district_name) %>%
  summarize(
    total_scored = sum(n_scored, na.rm = TRUE),
    avg_proficient = mean(pct_proficient_above, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  filter(total_scored >= 500) %>%
  arrange(desc(avg_proficient)) %>%
  head(10)
stopifnot(nrow(top_districts) > 0)

ggplot(top_districts, aes(x = reorder(district_name, avg_proficient),
                           y = avg_proficient)) +
  geom_col(fill = "#2a9d8f") +
  coord_flip() +
  labs(title = "Pennsylvania's Top 10 Math Districts (2025)",
       subtitle = "Districts with 500+ students tested",
       x = NULL, y = "% Proficient") +
  theme_minimal()

8. Harrisburg has the worst math proficiency of any large district at 6%

Among Pennsylvania’s major urban districts, Harrisburg City SD has the lowest math proficiency at just 5.7%. Even Pittsburgh and Philadelphia outperform the state capital.

urban_districts <- c("PHILADELPHIA CITY SD", "PITTSBURGH SD",
                     "ALLENTOWN CITY SD", "READING SD", "ERIE CITY SD",
                     "SCRANTON SD", "HARRISBURG CITY SD", "LANCASTER SD")

urban_dist <- pssa_school %>%
  filter(district_name %in% urban_districts,
         subject == "Math",
         group == "All Students",
         grade == "Total") %>%
  group_by(district_name) %>%
  summarize(
    n_schools = n(),
    total_scored = sum(n_scored, na.rm = TRUE),
    avg_proficient = round(mean(pct_proficient_above, na.rm = TRUE), 1),
    .groups = "drop"
  ) %>%
  arrange(desc(avg_proficient))
stopifnot(nrow(urban_dist) > 0)
urban_dist
#> # A tibble: 8 × 4
#>   district_name        n_schools total_scored avg_proficient
#>   <chr>                    <int>        <int>          <dbl>
#> 1 SCRANTON SD                 14         3281           27  
#> 2 LANCASTER SD                18         3799           26.9
#> 3 PITTSBURGH SD               48         7114           26.2
#> 4 PHILADELPHIA CITY SD       170        45030           21.6
#> 5 ALLENTOWN CITY SD           17         5887           19.3
#> 6 ERIE CITY SD                13         3864           15.1
#> 7 READING SD                  18         6227           14.5
#> 8 HARRISBURG CITY SD          10         2571            5.7

urban_summary <- pssa_school %>%
  filter(district_name %in% urban_districts,
         subject == "Math",
         group == "All Students",
         grade == "Total") %>%
  group_by(district_name) %>%
  summarize(avg_proficient = mean(pct_proficient_above, na.rm = TRUE),
            .groups = "drop")
stopifnot(nrow(urban_summary) > 0)

ggplot(urban_summary, aes(x = reorder(district_name, avg_proficient),
                           y = avg_proficient)) +
  geom_col(fill = "#e63946") +
  geom_hline(yintercept = 41.7, linetype = "dashed", color = "#003366") +
  annotate("text", x = 6, y = 44, label = "State Avg: 41.7%", color = "#003366") +
  coord_flip() +
  labs(title = "Pennsylvania's Big City Math Crisis",
       subtitle = "All major urban districts below state average",
       x = NULL, y = "% Proficient") +
  theme_minimal()

9. Keystone Literature leads at 62% proficiency

Among the three Keystone end-of-course exams, Literature has the highest proficiency at 62%, followed by Biology at 49% and Algebra I trailing at 44%.

keystone_ov <- keystone_state %>%
  filter(group == "All Students") %>%
  select(subject, n_scored, pct_proficient_above) %>%
  arrange(desc(pct_proficient_above))
stopifnot(nrow(keystone_ov) > 0)
keystone_ov
#> # A tibble: 3 × 3
#>   subject    n_scored pct_proficient_above
#>   <chr>         <int>                <dbl>
#> 1 Literature   120804                 62.1
#> 2 Biology      121380                 49.4
#> 3 Algebra I    121876                 44.3

keystone_chart_data <- keystone_state %>%
  filter(group == "All Students")
stopifnot(nrow(keystone_chart_data) > 0)

ggplot(keystone_chart_data, aes(x = reorder(subject, pct_proficient_above),
             y = pct_proficient_above,
             fill = subject)) +
  geom_col() +
  geom_text(aes(label = paste0(pct_proficient_above, "%")), hjust = -0.2) +
  coord_flip() +
  scale_fill_brewer(palette = "Set2") +
  scale_y_continuous(limits = c(0, 80)) +
  labs(title = "Keystone Exam Proficiency Rates (2025)",
       subtitle = "Grade 11 End-of-Course Exams",
       x = NULL, y = "% Proficient") +
  theme_minimal() +
  theme(legend.position = "none")

10. Nearly 1 in 5 students score “Below Basic” on Algebra I

Algebra I has the highest failure rate among Keystone exams - 19% score “Below Basic.” These students are graduating high school without basic algebraic understanding.

keystone_bb <- keystone_state %>%
  filter(group == "All Students") %>%
  select(subject, pct_below_basic) %>%
  arrange(desc(pct_below_basic))
stopifnot(nrow(keystone_bb) > 0)
keystone_bb
#> # A tibble: 3 × 2
#>   subject    pct_below_basic
#>   <chr>                <dbl>
#> 1 Biology               23.4
#> 2 Algebra I             19.3
#> 3 Literature            13.7

keystone_long <- keystone_state %>%
  filter(group == "All Students") %>%
  select(subject, pct_advanced, pct_proficient, pct_basic, pct_below_basic) %>%
  pivot_longer(-subject, names_to = "Level", values_to = "Percent") %>%
  mutate(Level = gsub("pct_", "", Level),
         Level = tools::toTitleCase(gsub("_", " ", Level)),
         Level = factor(Level, levels = c("Below Basic", "Basic", "Proficient", "Advanced")))
stopifnot(nrow(keystone_long) > 0)

ggplot(keystone_long, aes(x = subject, y = Percent, fill = Level)) +
  geom_col(position = "stack") +
  scale_fill_manual(values = c("Below Basic" = "#d73027", "Basic" = "#fc8d59",
                                "Proficient" = "#91bfdb", "Advanced" = "#4575b4")) +
  labs(title = "Keystone Proficiency Distribution by Subject",
       subtitle = "2025 Grade 11 Results - 120,000+ students per exam",
       x = NULL, y = "Percent") +
  theme_minimal() +
  theme(legend.position = "bottom")

11. Asian students outperform all other groups by 17+ points in math

Asian students lead with 68% math proficiency, followed by White students at 51%. Black and Hispanic students trail at 16% and 22% respectively - a gap of over 50 percentage points.

racial_groups <- c("White (not Hispanic)", "Black or African American (not Hispanic)",
                   "Hispanic (any race)", "Asian (not Hispanic)")

racial_gap <- pssa_state %>%
  filter(subject == "Math",
         group %in% racial_groups,
         grade == "Total") %>%
  select(group, n_scored, pct_proficient_above) %>%
  arrange(desc(pct_proficient_above))
stopifnot(nrow(racial_gap) > 0)
racial_gap
#> # A tibble: 4 × 3
#>   group                                    n_scored pct_proficient_above
#>   <chr>                                       <int>                <dbl>
#> 1 Asian (not Hispanic)                        35601                 68.4
#> 2 White (not Hispanic)                       424531                 50.8
#> 3 Hispanic (any race)                        105681                 21.7
#> 4 Black or African American (not Hispanic)    97352                 15.7

race_data <- pssa_state %>%
  filter(subject == "Math",
         group %in% racial_groups,
         grade == "Total") %>%
  mutate(group = gsub(" \\(not Hispanic\\)", "", group),
         group = gsub(" \\(any race\\)", "", group))
stopifnot(nrow(race_data) > 0)

ggplot(race_data, aes(x = reorder(group, pct_proficient_above),
                       y = pct_proficient_above,
                       fill = group)) +
  geom_col() +
  coord_flip() +
  scale_fill_brewer(palette = "Set2") +
  labs(title = "Racial Achievement Gaps in Math",
       subtitle = "2025 PSSA - 52 point gap between highest and lowest",
       x = NULL, y = "% Proficient") +
  theme_minimal() +
  theme(legend.position = "none")

12. Economically disadvantaged students face a 16-point gap

Students from low-income families score 26% proficient in math, compared to the state average of 42%. The poverty gap is larger in ELA (16 points) than math (16 points).

econ_disadv <- pssa_state %>%
  filter(subject == "Math",
         group %in% c("All Students", "Economically Disadvantaged"),
         grade == "Total") %>%
  select(group, n_scored, pct_proficient_above)
stopifnot(nrow(econ_disadv) > 0)
econ_disadv
#> # A tibble: 2 × 3
#>   group                      n_scored pct_proficient_above
#>   <chr>                         <int>                <dbl>
#> 1 All Students                 703819                 41.7
#> 2 Economically Disadvantaged   338533                 25.5

econ_gap_data <- pssa_state %>%
  filter(group %in% c("All Students", "Economically Disadvantaged"),
         grade == "Total")
stopifnot(nrow(econ_gap_data) > 0)

ggplot(econ_gap_data, aes(x = subject, y = pct_proficient_above, fill = group)) +
  geom_col(position = "dodge") +
  geom_text(aes(label = paste0(pct_proficient_above, "%")),
            position = position_dodge(width = 0.9), vjust = -0.5) +
  scale_fill_manual(values = c("All Students" = "#003366",
                                "Economically Disadvantaged" = "#e63946"),
                    name = "Group") +
  scale_y_continuous(limits = c(0, 60)) +
  labs(title = "The Income Achievement Gap",
       subtitle = "2025 PSSA - 16 point gap in both subjects",
       x = NULL, y = "% Proficient") +
  theme_minimal() +
  theme(legend.position = "bottom")

13. IEP students have the lowest proficiency rates at 15%

Students with Individualized Education Programs (IEPs) score just 15% proficient in math and 17% in ELA. English Language Learners (ELL) score similarly low at 14% in ELA and 16% in math.

iep_compare <- pssa_state %>%
  filter(group %in% c("All Students", "IEP", "ELL", "Economically Disadvantaged"),
         grade == "Total") %>%
  select(subject, group, n_scored, pct_proficient_above) %>%
  arrange(subject, desc(pct_proficient_above))
stopifnot(nrow(iep_compare) > 0)
iep_compare
#> # A tibble: 8 × 4
#>   subject               group                      n_scored pct_proficient_above
#>   <chr>                 <chr>                         <int>                <dbl>
#> 1 English Language Arts All Students                 703650                 48.5
#> 2 English Language Arts Economically Disadvantaged   338525                 32.7
#> 3 English Language Arts IEP                          143647                 16.5
#> 4 English Language Arts ELL                           41754                 13.8
#> 5 Math                  All Students                 703819                 41.7
#> 6 Math                  Economically Disadvantaged   338533                 25.5
#> 7 Math                  ELL                           41980                 16  
#> 8 Math                  IEP                          143413                 15.3

subgroups <- pssa_state %>%
  filter(subject == "Math",
         group %in% c("All Students", "IEP", "ELL", "Economically Disadvantaged"),
         grade == "Total")
stopifnot(nrow(subgroups) > 0)

ggplot(subgroups, aes(x = reorder(group, pct_proficient_above),
                       y = pct_proficient_above,
                       fill = group == "All Students")) +
  geom_col() +
  coord_flip() +
  scale_fill_manual(values = c("TRUE" = "#003366", "FALSE" = "#e63946")) +
  labs(title = "Achievement Gaps Across Student Subgroups",
       subtitle = "2025 PSSA Math - All subgroups below state average",
       x = NULL, y = "% Proficient") +
  theme_minimal() +
  theme(legend.position = "none")

14. Female students outperform males in ELA by 9 points

Girls score 53% proficient in ELA compared to 44% for boys - a 9 percentage point gap. But boys slightly outperform girls in math (43% vs 40%).

gender_gap <- pssa_state %>%
  filter(group %in% c("Male", "Female"),
         grade == "Total") %>%
  select(subject, group, n_scored, pct_proficient_above) %>%
  arrange(subject, desc(pct_proficient_above))
stopifnot(nrow(gender_gap) > 0)
gender_gap
#> # A tibble: 4 × 4
#>   subject               group  n_scored pct_proficient_above
#>   <chr>                 <chr>     <int>                <dbl>
#> 1 English Language Arts Female   344499                 52.9
#> 2 English Language Arts Male     359151                 44.2
#> 3 Math                  Male     359347                 43.1
#> 4 Math                  Female   344472                 40.2

gender_data <- pssa_state %>%
  filter(group %in% c("Male", "Female"),
         grade == "Total")
stopifnot(nrow(gender_data) > 0)

ggplot(gender_data, aes(x = subject, y = pct_proficient_above, fill = group)) +
  geom_col(position = "dodge") +
  geom_text(aes(label = paste0(pct_proficient_above, "%")),
            position = position_dodge(width = 0.9), vjust = -0.5) +
  scale_fill_manual(values = c("Female" = "#e377c2", "Male" = "#1f77b4")) +
  scale_y_continuous(limits = c(0, 65)) +
  labs(title = "Gender Gaps in PSSA Performance",
       subtitle = "2025 Results - Girls lead ELA, boys edge ahead in math",
       x = NULL, y = "% Proficient") +
  theme_minimal() +
  theme(legend.position = "bottom")

15. The “Advanced” category is shrinking - only 17% in math

Only 17% of students score “Advanced” in math, while 13% reach Advanced in ELA. The top tier is thin, with most proficient students clustered in the “Proficient” category rather than excelling.

adv_rates <- pssa_state %>%
  filter(group == "All Students",
         grade == "Total") %>%
  select(subject, pct_advanced, pct_proficient)
stopifnot(nrow(adv_rates) > 0)
adv_rates
#> # A tibble: 2 × 3
#>   subject               pct_advanced pct_proficient
#>   <chr>                        <dbl>          <dbl>
#> 1 English Language Arts         12.5           36  
#> 2 Math                          16.6           25.1

advanced_by_grade <- pssa_state %>%
  filter(subject == "Math",
         group == "All Students",
         grade %in% c("3", "4", "5", "6", "7", "8")) %>%
  mutate(grade = factor(grade, levels = c("3", "4", "5", "6", "7", "8")))
stopifnot(nrow(advanced_by_grade) > 0)

ggplot(advanced_by_grade, aes(x = grade, y = pct_advanced)) +
  geom_col(fill = "#4575b4") +
  geom_text(aes(label = paste0(pct_advanced, "%")), vjust = -0.5) +
  scale_y_continuous(limits = c(0, 30)) +
  labs(title = "Where Are the Advanced Math Students?",
       subtitle = "2025 PSSA % Advanced by Grade - Peaks at grade 3, drops steadily",
       x = "Grade", y = "% Advanced") +
  theme_minimal()

16. Chester Community CS has the lowest math proficiency at 4%

Among schools with substantial enrollment (300+ students tested), Chester Community Charter School has just 3.7% math proficiency - the lowest in Pennsylvania.

bottom_dist <- pssa_school %>%
  filter(subject == "Math",
         group == "All Students",
         grade == "Total") %>%
  group_by(district_name, county) %>%
  summarize(
    n_schools = n(),
    total_scored = sum(n_scored, na.rm = TRUE),
    avg_proficient = round(mean(pct_proficient_above, na.rm = TRUE), 1),
    .groups = "drop"
  ) %>%
  filter(total_scored >= 300) %>%
  arrange(avg_proficient) %>%
  head(10)
stopifnot(nrow(bottom_dist) > 0)
bottom_dist
#> # A tibble: 10 × 5
#>    district_name                    county n_schools total_scored avg_proficient
#>    <chr>                            <chr>      <int>        <int>          <dbl>
#>  1 CHESTER COMMUNITY CS             DELAW…         1         2282            3.7
#>  2 LINDLEY ACADEMY CS AT BIRNEY     PHILA…         1          438            4.1
#>  3 GREATER JOHNSTOWN SD             CAMBR…         4         1022            4.3
#>  4 KIPP NORTH PHILADELPHIA CS       PHILA…         1          338            4.4
#>  5 MEMPHIS STREET ACADEMY CS @ JP … PHILA…         1          420            4.5
#>  6 ALLIANCE FOR PROGRESS CS         PHILA…         1          379            4.7
#>  7 ALIQUIPPA SD                     BEAVER         2          392            5  
#>  8 ESPERANZA ACADEMY CS             PHILA…         1          716            5.2
#>  9 INSIGHT PA CYBER CS              CHEST…         1          989            5.4
#> 10 HARRISBURG CITY SD               DAUPH…        10         2571            5.7

bottom_districts <- pssa_school %>%
  filter(subject == "Math",
         group == "All Students",
         grade == "Total") %>%
  group_by(district_name) %>%
  summarize(
    total_scored = sum(n_scored, na.rm = TRUE),
    avg_proficient = mean(pct_proficient_above, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  filter(total_scored >= 300) %>%
  arrange(avg_proficient) %>%
  head(10)
stopifnot(nrow(bottom_districts) > 0)

ggplot(bottom_districts, aes(x = reorder(district_name, -avg_proficient),
                              y = avg_proficient)) +
  geom_col(fill = "#d73027") +
  coord_flip() +
  labs(title = "Pennsylvania's Lowest-Performing Districts (2025)",
       subtitle = "Districts with 300+ students tested",
       x = NULL, y = "% Proficient") +
  theme_minimal()

17. Third grade reading is the gateway - 49% are already behind

Only 49% of third graders are proficient in ELA. Since third grade is when students transition from “learning to read” to “reading to learn,” half of Pennsylvania’s students start behind in every subject.

grade3_ela <- pssa_state %>%
  filter(subject == "English Language Arts",
         group == "All Students",
         grade %in% c("3", "4", "5", "6", "7", "8")) %>%
  select(grade, n_scored, pct_proficient_above) %>%
  arrange(as.numeric(grade))
stopifnot(nrow(grade3_ela) > 0)
grade3_ela
#> # A tibble: 6 × 3
#>   grade n_scored pct_proficient_above
#>   <chr>    <int>                <dbl>
#> 1 3       118825                 48.6
#> 2 4       114210                 48.4
#> 3 5       117850                 44.7
#> 4 6       117599                 50.8
#> 5 7       117288                 49.2
#> 6 8       117878                 49.3

ela_by_grade <- pssa_state %>%
  filter(subject == "English Language Arts",
         group == "All Students",
         grade %in% c("3", "4", "5", "6", "7", "8")) %>%
  mutate(grade = factor(grade, levels = c("3", "4", "5", "6", "7", "8")))
stopifnot(nrow(ela_by_grade) > 0)

ggplot(ela_by_grade, aes(x = grade, y = pct_proficient_above)) +
  geom_col(fill = "#003366") +
  geom_hline(yintercept = 50, linetype = "dashed", color = "#e63946") +
  annotate("text", x = 5.5, y = 52, label = "50% threshold", color = "#e63946") +
  labs(title = "ELA Proficiency by Grade",
       subtitle = "2025 PSSA - Nearly half start below proficient at grade 3",
       x = "Grade", y = "% Proficient") +
  theme_minimal()

Session Info

sessionInfo()
#> R version 4.5.2 (2025-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
#>  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
#>  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
#> [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
#> 
#> time zone: UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] readxl_1.4.5       ggplot2_4.0.2      tidyr_1.3.2        dplyr_1.2.0       
#> [5] paschooldata_0.1.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] gtable_0.3.6       jsonlite_2.0.0     compiler_4.5.2     tidyselect_1.2.1  
#>  [5] jquerylib_0.1.4    systemfonts_1.3.2  scales_1.4.0       textshaping_1.0.5 
#>  [9] yaml_2.3.12        fastmap_1.2.0      R6_2.6.1           labeling_0.4.3    
#> [13] generics_0.1.4     knitr_1.51         tibble_3.3.1       desc_1.4.3        
#> [17] downloader_0.4.1   bslib_0.10.0       pillar_1.11.1      RColorBrewer_1.1-3
#> [21] rlang_1.1.7        utf8_1.2.6         cachem_1.1.0       xfun_0.56         
#> [25] S7_0.2.1           fs_1.6.7           sass_0.4.10        cli_3.6.5         
#> [29] withr_3.0.2        pkgdown_2.2.0      magrittr_2.0.4     digest_0.6.39     
#> [33] grid_4.5.2         rappdirs_0.3.4     lifecycle_1.0.5    vctrs_0.7.1       
#> [37] evaluate_1.0.5     glue_1.8.0         cellranger_1.1.0   farver_2.1.2      
#> [41] codetools_0.2-20   ragg_1.5.1         rmarkdown_2.30     purrr_1.2.1       
#> [45] tools_4.5.2        pkgconfig_2.0.3    htmltools_0.5.9