Overview
paschooldata provides a simple interface for downloading
and analyzing Pennsylvania public school enrollment data from the
Pennsylvania Department of Education (PDE). This vignette covers basic
usage, data structure, and common analysis patterns.
Installation
Install from GitHub:
# install.packages("remotes")
remotes::install_github("almartin82/paschooldata")Understanding the Data Schema
Data Format
The data is returned in tidy (long) format by default:
- Each row represents one subgroup for one school/district/state
-
subgroupidentifies the demographic group (e.g., “total_enrollment”, “white”, “hispanic”, “low_income”) -
grade_levelshows the grade (“TOTAL”, “K”, “01”, “02”, etc.) -
n_studentsis the enrollment count -
pctis the percentage of total enrollment
Key Columns
| Column | Description |
|---|---|
end_year |
School year end (e.g., 2025 for 2024-25) |
aun |
Administrative Unit Number (9-digit LEA identifier) |
school_code |
4-digit school identifier within an LEA |
type |
“Statewide”, “District”, or “School” |
lea_name |
Local Education Agency name |
school_name |
School name (for school-level records) |
county |
Pennsylvania county |
lea_type |
Type of LEA (School District, Charter, IU, etc.) |
subgroup |
Demographic subgroup |
grade_level |
Grade level or “TOTAL” |
n_students |
Enrollment count |
pct |
Percentage of total |
Understanding AUN Codes
Pennsylvania uses Administrative Unit Numbers (AUN) to identify Local Education Agencies:
-
AUN: 9-digit unique identifier for LEAs
- School districts, charter schools, intermediate units (IUs), and career/technical centers (CTCs) each have unique AUNs
- School Code: 4-digit identifier for schools within an LEA
- Combined, these form the full school identifier
Filtering by Aggregation Level
Use the aggregation flags to filter data:
# State totals
state <- enr %>%
filter(is_state, subgroup == "total_enrollment", grade_level == "TOTAL")
state %>% select(end_year, n_students)
# All districts
districts <- enr %>%
filter(is_district, subgroup == "total_enrollment", grade_level == "TOTAL")
nrow(districts)
# All schools
schools <- enr %>%
filter(is_school, subgroup == "total_enrollment", grade_level == "TOTAL")
nrow(schools)Philadelphia School District
Philadelphia is by far the largest district in Pennsylvania. Use the convenience function to get just Philadelphia data:
philly <- fetch_philly_enr(2025)
# District totals
philly %>%
filter(type == "District", subgroup == "total_enrollment", grade_level == "TOTAL") %>%
select(lea_name, n_students)
# Schools in Philadelphia
philly %>%
filter(is_school, subgroup == "total_enrollment", grade_level == "TOTAL") %>%
arrange(desc(n_students)) %>%
select(school_name, n_students) %>%
head(10)Charter Schools
Pennsylvania has both brick-and-mortar and cyber charter schools. Use
the is_charter flag to filter:
# All charter schools
enr %>%
filter(is_charter, subgroup == "total_enrollment", grade_level == "TOTAL") %>%
arrange(desc(n_students)) %>%
select(lea_name, lea_type, n_students) %>%
head(10)
# Cyber charters specifically
enr %>%
filter(grepl("cyber", lea_type, ignore.case = TRUE)) %>%
filter(subgroup == "total_enrollment", grade_level == "TOTAL") %>%
arrange(desc(n_students)) %>%
select(lea_name, n_students)Wide vs Tidy Format
If you prefer wide format (one column per demographic), set
tidy = FALSE:
enr_wide <- fetch_enr(2025, tidy = FALSE)
enr_wide %>%
filter(type == "Statewide") %>%
select(end_year, row_total, white, black, hispanic, asian, low_income)This is useful when you need to calculate ratios or compare multiple demographics simultaneously.
Historical Data and Trends
Fetch multiple years to analyze enrollment trends:
# Fetch 5 years of data
enr_multi <- fetch_enr_years(2021:2025)
# State enrollment trend
enr_multi %>%
filter(is_state, subgroup == "total_enrollment", grade_level == "TOTAL") %>%
select(end_year, n_students) %>%
arrange(end_year)Visualizing Trends
library(ggplot2)
# State enrollment over time
enr_multi %>%
filter(is_state, subgroup == "total_enrollment", grade_level == "TOTAL") %>%
ggplot(aes(x = end_year, y = n_students)) +
geom_line(linewidth = 1.2) +
geom_point(size = 3) +
scale_y_continuous(labels = scales::comma) +
labs(
title = "Pennsylvania Public School Enrollment",
x = "School Year End",
y = "Total Enrollment"
) +
theme_minimal()Demographic Trends
enr_multi %>%
filter(is_state, grade_level == "TOTAL") %>%
filter(subgroup %in% c("white", "black", "hispanic", "asian")) %>%
ggplot(aes(x = end_year, y = pct, color = subgroup)) +
geom_line(linewidth = 1) +
geom_point(size = 2) +
scale_y_continuous(labels = scales::percent) +
labs(
title = "Pennsylvania Demographics Over Time",
x = "School Year End",
y = "Percentage of Enrollment",
color = "Subgroup"
) +
theme_minimal()Grade-Level Aggregates
Create common grade groupings (K-8, HS, K-12) using
enr_grade_aggs():
# Get grade-level aggregates
grade_aggs <- enr_grade_aggs(enr)
# View K-8 vs High School enrollment by district
grade_aggs %>%
filter(is_district) %>%
select(lea_name, grade_level, n_students) %>%
tidyr::pivot_wider(names_from = grade_level, values_from = n_students) %>%
mutate(hs_ratio = HS / K8) %>%
arrange(desc(K12)) %>%
head(10)Cache Management
Downloaded data is cached locally to speed up repeated queries. The cache is stored in a platform-appropriate location and expires after 30 days.
# View cache status
cache_status()
# Clear cache for a specific year
clear_enr_cache(2025)
# Force re-download (bypass cache)
enr_fresh <- fetch_enr(2025, use_cache = FALSE)Available Years
Check which years of data are available:
Data is typically available from 2005 to the current year. Current year data is usually released in the fall after the school year begins.
Next Steps
- Use
?fetch_enrfor full function documentation - Check the Data Quality QA article for data validation
- Explore the PDE Data Portal for additional data sources