Skip to contents

Overview

paschooldata provides a simple interface for downloading and analyzing Pennsylvania public school enrollment data from the Pennsylvania Department of Education (PDE). This vignette covers basic usage, data structure, and common analysis patterns.

Installation

Install from GitHub:

# install.packages("remotes")
remotes::install_github("almartin82/paschooldata")

Quick Example

Fetch the most recent year of Pennsylvania enrollment data:

library(paschooldata)
library(dplyr)

# Fetch 2025 enrollment data (2024-25 school year)
enr <- fetch_enr(2025)

head(enr)

Understanding the Data Schema

Data Format

The data is returned in tidy (long) format by default:

  • Each row represents one subgroup for one school/district/state
  • subgroup identifies the demographic group (e.g., “total_enrollment”, “white”, “hispanic”, “low_income”)
  • grade_level shows the grade (“TOTAL”, “K”, “01”, “02”, etc.)
  • n_students is the enrollment count
  • pct is the percentage of total enrollment
enr %>%
  filter(is_state) %>%
  select(end_year, type, subgroup, grade_level, n_students) %>%
  head(10)

Key Columns

Column Description
end_year School year end (e.g., 2025 for 2024-25)
aun Administrative Unit Number (9-digit LEA identifier)
school_code 4-digit school identifier within an LEA
type “Statewide”, “District”, or “School”
lea_name Local Education Agency name
school_name School name (for school-level records)
county Pennsylvania county
lea_type Type of LEA (School District, Charter, IU, etc.)
subgroup Demographic subgroup
grade_level Grade level or “TOTAL”
n_students Enrollment count
pct Percentage of total

Aggregation Flags

The package adds boolean flags to easily filter data:

Flag Description
is_state Statewide aggregate record
is_district District-level record
is_school School-level record
is_charter Charter school (brick-and-mortar or cyber)

Understanding AUN Codes

Pennsylvania uses Administrative Unit Numbers (AUN) to identify Local Education Agencies:

  • AUN: 9-digit unique identifier for LEAs
    • School districts, charter schools, intermediate units (IUs), and career/technical centers (CTCs) each have unique AUNs
  • School Code: 4-digit identifier for schools within an LEA
  • Combined, these form the full school identifier
enr %>%
  filter(is_school) %>%
  select(aun, school_code, lea_name, school_name) %>%
  distinct() %>%
  head(5)

Common AUNs

AUN LEA
126515001 Philadelphia City School District
102027451 Pittsburgh School District
115222752 Central Bucks School District

Filtering by Aggregation Level

Use the aggregation flags to filter data:

# State totals
state <- enr %>%
  filter(is_state, subgroup == "total_enrollment", grade_level == "TOTAL")
state %>% select(end_year, n_students)

# All districts
districts <- enr %>%
  filter(is_district, subgroup == "total_enrollment", grade_level == "TOTAL")
nrow(districts)

# All schools
schools <- enr %>%
  filter(is_school, subgroup == "total_enrollment", grade_level == "TOTAL")
nrow(schools)

Common Analysis Examples

Top 10 Largest Districts

enr %>%
  filter(is_district, subgroup == "total_enrollment", grade_level == "TOTAL") %>%
  arrange(desc(n_students)) %>%
  select(lea_name, county, n_students) %>%
  head(10)

Enrollment by County

enr %>%
  filter(is_district, subgroup == "total_enrollment", grade_level == "TOTAL") %>%
  group_by(county) %>%
  summarize(
    n_districts = n(),
    total_enrollment = sum(n_students, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(desc(total_enrollment)) %>%
  head(10)

Demographic Breakdown

# State-level demographics
enr %>%
  filter(is_state, grade_level == "TOTAL") %>%
  filter(subgroup %in% c("white", "black", "hispanic", "asian")) %>%
  select(subgroup, n_students, pct) %>%
  arrange(desc(n_students))

Philadelphia School District

Philadelphia is by far the largest district in Pennsylvania. Use the convenience function to get just Philadelphia data:

philly <- fetch_philly_enr(2025)

# District totals
philly %>%
  filter(type == "District", subgroup == "total_enrollment", grade_level == "TOTAL") %>%
  select(lea_name, n_students)

# Schools in Philadelphia
philly %>%
  filter(is_school, subgroup == "total_enrollment", grade_level == "TOTAL") %>%
  arrange(desc(n_students)) %>%
  select(school_name, n_students) %>%
  head(10)

Charter Schools

Pennsylvania has both brick-and-mortar and cyber charter schools. Use the is_charter flag to filter:

# All charter schools
enr %>%
  filter(is_charter, subgroup == "total_enrollment", grade_level == "TOTAL") %>%
  arrange(desc(n_students)) %>%
  select(lea_name, lea_type, n_students) %>%
  head(10)

# Cyber charters specifically
enr %>%
  filter(grepl("cyber", lea_type, ignore.case = TRUE)) %>%
  filter(subgroup == "total_enrollment", grade_level == "TOTAL") %>%
  arrange(desc(n_students)) %>%
  select(lea_name, n_students)

Wide vs Tidy Format

If you prefer wide format (one column per demographic), set tidy = FALSE:

enr_wide <- fetch_enr(2025, tidy = FALSE)

enr_wide %>%
  filter(type == "Statewide") %>%
  select(end_year, row_total, white, black, hispanic, asian, low_income)

This is useful when you need to calculate ratios or compare multiple demographics simultaneously.

Fetch multiple years to analyze enrollment trends:

# Fetch 5 years of data
enr_multi <- fetch_enr_years(2021:2025)

# State enrollment trend
enr_multi %>%
  filter(is_state, subgroup == "total_enrollment", grade_level == "TOTAL") %>%
  select(end_year, n_students) %>%
  arrange(end_year)
library(ggplot2)

# State enrollment over time
enr_multi %>%
  filter(is_state, subgroup == "total_enrollment", grade_level == "TOTAL") %>%
  ggplot(aes(x = end_year, y = n_students)) +
  geom_line(linewidth = 1.2) +
  geom_point(size = 3) +
  scale_y_continuous(labels = scales::comma) +
  labs(
    title = "Pennsylvania Public School Enrollment",
    x = "School Year End",
    y = "Total Enrollment"
  ) +
  theme_minimal()
enr_multi %>%
  filter(is_state, grade_level == "TOTAL") %>%
  filter(subgroup %in% c("white", "black", "hispanic", "asian")) %>%
  ggplot(aes(x = end_year, y = pct, color = subgroup)) +
  geom_line(linewidth = 1) +
  geom_point(size = 2) +
  scale_y_continuous(labels = scales::percent) +
  labs(
    title = "Pennsylvania Demographics Over Time",
    x = "School Year End",
    y = "Percentage of Enrollment",
    color = "Subgroup"
  ) +
  theme_minimal()

Grade-Level Aggregates

Create common grade groupings (K-8, HS, K-12) using enr_grade_aggs():

# Get grade-level aggregates
grade_aggs <- enr_grade_aggs(enr)

# View K-8 vs High School enrollment by district
grade_aggs %>%
  filter(is_district) %>%
  select(lea_name, grade_level, n_students) %>%
  tidyr::pivot_wider(names_from = grade_level, values_from = n_students) %>%
  mutate(hs_ratio = HS / K8) %>%
  arrange(desc(K12)) %>%
  head(10)

Cache Management

Downloaded data is cached locally to speed up repeated queries. The cache is stored in a platform-appropriate location and expires after 30 days.

# View cache status
cache_status()

# Clear cache for a specific year
clear_enr_cache(2025)

# Force re-download (bypass cache)
enr_fresh <- fetch_enr(2025, use_cache = FALSE)

Available Years

Check which years of data are available:

Data is typically available from 2005 to the current year. Current year data is usually released in the fall after the school year begins.

Next Steps

Session Info