Getting Started with paschooldata • paschooldata

Overview

paschooldata provides a simple interface for downloading and analyzing Pennsylvania public school enrollment data from the Pennsylvania Department of Education (PDE). This vignette covers basic usage, data structure, and common analysis patterns.

Installation

Install from GitHub:

# install.packages("remotes")
remotes::install_github("almartin82/paschooldata")

Quick Example

Fetch the most recent year of Pennsylvania enrollment data:

library(paschooldata)
library(dplyr)

# Fetch 2025 enrollment data (2024-25 school year)
enr <- fetch_enr(2025)

head(enr)

Understanding the Data Schema

Data Format

The data is returned in tidy (long) format by default:

Each row represents one subgroup for one school/district/state
subgroup identifies the demographic group (e.g., “total_enrollment”, “white”, “hispanic”, “low_income”)
grade_level shows the grade (“TOTAL”, “K”, “01”, “02”, etc.)
n_students is the enrollment count
pct is the percentage of total enrollment

enr %>%
  filter(is_state) %>%
  select(end_year, type, subgroup, grade_level, n_students) %>%
  head(10)

Key Columns

Column	Description
`end_year`	School year end (e.g., 2025 for 2024-25)
`aun`	Administrative Unit Number (9-digit LEA identifier)
`school_code`	4-digit school identifier within an LEA
`type`	“Statewide”, “District”, or “School”
`lea_name`	Local Education Agency name
`school_name`	School name (for school-level records)
`county`	Pennsylvania county
`lea_type`	Type of LEA (School District, Charter, IU, etc.)
`subgroup`	Demographic subgroup
`grade_level`	Grade level or “TOTAL”
`n_students`	Enrollment count
`pct`	Percentage of total

Aggregation Flags

The package adds boolean flags to easily filter data:

Flag	Description
`is_state`	Statewide aggregate record
`is_district`	District-level record
`is_school`	School-level record
`is_charter`	Charter school (brick-and-mortar or cyber)

Understanding AUN Codes

Pennsylvania uses Administrative Unit Numbers (AUN) to identify Local Education Agencies:

AUN: 9-digit unique identifier for LEAs
- School districts, charter schools, intermediate units (IUs), and career/technical centers (CTCs) each have unique AUNs
School Code: 4-digit identifier for schools within an LEA
Combined, these form the full school identifier

enr %>%
  filter(is_school) %>%
  select(aun, school_code, lea_name, school_name) %>%
  distinct() %>%
  head(5)

Common AUNs

AUN	LEA
126515001	Philadelphia City School District
102027451	Pittsburgh School District
115222752	Central Bucks School District

Filtering by Aggregation Level

Use the aggregation flags to filter data:

# State totals
state <- enr %>%
  filter(is_state, subgroup == "total_enrollment", grade_level == "TOTAL")
state %>% select(end_year, n_students)

# All districts
districts <- enr %>%
  filter(is_district, subgroup == "total_enrollment", grade_level == "TOTAL")
nrow(districts)

# All schools
schools <- enr %>%
  filter(is_school, subgroup == "total_enrollment", grade_level == "TOTAL")
nrow(schools)

Common Analysis Examples

Top 10 Largest Districts

enr %>%
  filter(is_district, subgroup == "total_enrollment", grade_level == "TOTAL") %>%
  arrange(desc(n_students)) %>%
  select(lea_name, county, n_students) %>%
  head(10)

Enrollment by County

enr %>%
  filter(is_district, subgroup == "total_enrollment", grade_level == "TOTAL") %>%
  group_by(county) %>%
  summarize(
    n_districts = n(),
    total_enrollment = sum(n_students, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(desc(total_enrollment)) %>%
  head(10)

Demographic Breakdown

# State-level demographics
enr %>%
  filter(is_state, grade_level == "TOTAL") %>%
  filter(subgroup %in% c("white", "black", "hispanic", "asian")) %>%
  select(subgroup, n_students, pct) %>%
  arrange(desc(n_students))

Philadelphia School District

Philadelphia is by far the largest district in Pennsylvania. Use the convenience function to get just Philadelphia data:

philly <- fetch_philly_enr(2025)

# District totals
philly %>%
  filter(type == "District", subgroup == "total_enrollment", grade_level == "TOTAL") %>%
  select(lea_name, n_students)

# Schools in Philadelphia
philly %>%
  filter(is_school, subgroup == "total_enrollment", grade_level == "TOTAL") %>%
  arrange(desc(n_students)) %>%
  select(school_name, n_students) %>%
  head(10)

Charter Schools

Pennsylvania has both brick-and-mortar and cyber charter schools. Use the is_charter flag to filter:

# All charter schools
enr %>%
  filter(is_charter, subgroup == "total_enrollment", grade_level == "TOTAL") %>%
  arrange(desc(n_students)) %>%
  select(lea_name, lea_type, n_students) %>%
  head(10)

# Cyber charters specifically
enr %>%
  filter(grepl("cyber", lea_type, ignore.case = TRUE)) %>%
  filter(subgroup == "total_enrollment", grade_level == "TOTAL") %>%
  arrange(desc(n_students)) %>%
  select(lea_name, n_students)

Wide vs Tidy Format

If you prefer wide format (one column per demographic), set tidy = FALSE:

enr_wide <- fetch_enr(2025, tidy = FALSE)

enr_wide %>%
  filter(type == "Statewide") %>%
  select(end_year, row_total, white, black, hispanic, asian, low_income)

This is useful when you need to calculate ratios or compare multiple demographics simultaneously.

Historical Data and Trends

Fetch multiple years to analyze enrollment trends:

# Fetch 5 years of data
enr_multi <- fetch_enr_years(2021:2025)

# State enrollment trend
enr_multi %>%
  filter(is_state, subgroup == "total_enrollment", grade_level == "TOTAL") %>%
  select(end_year, n_students) %>%
  arrange(end_year)

Visualizing Trends

library(ggplot2)

# State enrollment over time
enr_multi %>%
  filter(is_state, subgroup == "total_enrollment", grade_level == "TOTAL") %>%
  ggplot(aes(x = end_year, y = n_students)) +
  geom_line(linewidth = 1.2) +
  geom_point(size = 3) +
  scale_y_continuous(labels = scales::comma) +
  labs(
    title = "Pennsylvania Public School Enrollment",
    x = "School Year End",
    y = "Total Enrollment"
  ) +
  theme_minimal()

Demographic Trends

enr_multi %>%
  filter(is_state, grade_level == "TOTAL") %>%
  filter(subgroup %in% c("white", "black", "hispanic", "asian")) %>%
  ggplot(aes(x = end_year, y = pct, color = subgroup)) +
  geom_line(linewidth = 1) +
  geom_point(size = 2) +
  scale_y_continuous(labels = scales::percent) +
  labs(
    title = "Pennsylvania Demographics Over Time",
    x = "School Year End",
    y = "Percentage of Enrollment",
    color = "Subgroup"
  ) +
  theme_minimal()

Grade-Level Aggregates

Create common grade groupings (K-8, HS, K-12) using enr_grade_aggs():

# Get grade-level aggregates
grade_aggs <- enr_grade_aggs(enr)

# View K-8 vs High School enrollment by district
grade_aggs %>%
  filter(is_district) %>%
  select(lea_name, grade_level, n_students) %>%
  tidyr::pivot_wider(names_from = grade_level, values_from = n_students) %>%
  mutate(hs_ratio = HS / K8) %>%
  arrange(desc(K12)) %>%
  head(10)

Cache Management

Downloaded data is cached locally to speed up repeated queries. The cache is stored in a platform-appropriate location and expires after 30 days.

# View cache status
cache_status()

# Clear cache for a specific year
clear_enr_cache(2025)

# Force re-download (bypass cache)
enr_fresh <- fetch_enr(2025, use_cache = FALSE)

Available Years

Check which years of data are available:

available_years()

Data is typically available from 2005 to the current year. Current year data is usually released in the fall after the school year begins.

Next Steps

Use ?fetch_enr for full function documentation
Check the Data Quality QA article for data validation
Explore the PDE Data Portal for additional data sources

Session Info

sessionInfo()