Introduction
The njschooldata package provides a simple interface for
accessing New Jersey Department of Education (NJ DOE) school data in R.
The NJ DOE publishes raw data files covering enrollment, assessments,
graduation rates, and more - but these files use inconsistent formats
across years, making longitudinal analysis challenging.
This package solves that problem by:
- Automatically downloading data from NJ DOE servers
- Parsing various file formats (fixed-width, Excel, CSV)
- Standardizing column names across years
- Providing both wide and tidy data formats
Installation
Install the package from GitHub:
# Using remotes (recommended)
remotes::install_github("almartin82/njschooldata")
# Or using devtools
devtools::install_github("almartin82/njschooldata")Load the package:
library(njschooldata)
library(dplyr) # for data manipulation examplesUnderstanding NJ Education Data
The end_year Convention
All functions use end_year to specify the school year.
This refers to the spring semester year:
| School Year |
end_year Value |
|---|---|
| 2023-24 | 2024 |
| 2022-23 | 2023 |
| 2019-20 | 2020 |
CDS Codes
New Jersey uses a County-District-School (CDS) identifier system:
- county_id: 2-digit county code (e.g., “13” = Essex County)
- district_id: 4-digit district code
- school_id: 3-digit school code (“999” indicates a district-level aggregate)
# A complete CDS code example:
# County 13 (Essex), District 3570 (Newark), School 050 (specific school)Quick Start: Enrollment Data
Fetching Basic Enrollment
# Get 2024 enrollment data (2023-24 school year)
enr_2024 <- fetch_enr(2024)
# View the structure
glimpse(enr_2024)
#> Rows: 101,650
#> Columns: 26
#> $ end_year <dbl> 2024, 2024, 2024, 2024, 2024, 2024, 2024, 2024, 2024,…
#> $ cds_code <chr> "010010999", "010010999", "010010999", "010010999", "…
#> $ county_id <chr> "01", "01", "01", "01", "01", "01", "01", "01", "01",…
#> $ county_name <chr> "Atlantic", "Atlantic", "Atlantic", "Atlantic", "Atla…
#> $ district_id <chr> "0010", "0010", "0010", "0010", "0010", "0010", "0010…
#> $ district_name <chr> "Absecon Public Schools District", "Absecon Public Sc…
#> $ school_id <chr> "999", "999", "999", "999", "999", "999", "999", "999…
#> $ school_name <chr> "District Total", "District Total", "District Total",…
#> $ program_code <chr> "PH", NA, "PF", NA, "KH", NA, "KF", NA, "01", NA, "02…
#> $ program_name <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ male <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ female <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ white <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ black <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ hispanic <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ asian <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ native_american <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ pacific_islander <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ multiracial <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ row_total <dbl> 0.0, 0.0, 123.0, 13.3, 0.0, 0.0, 77.0, 8.3, 89.0, 9.6…
#> $ free_lunch <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ reduced_lunch <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ lep <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ migrant <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ homeless <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ grade_level <chr> "PK", NA, "PK", NA, "K", NA, "K", NA, "01", NA, "02",…
# Check dimensions
dim(enr_2024) # Rows and columns
#> [1] 101650 26Wide vs. Tidy Format
The tidy parameter transforms data for easier
analysis:
# Wide format (default) - one row per school, many demographic columns
enr_wide <- fetch_enr(2024, tidy = FALSE)
# Tidy format - one row per school-subgroup combination
enr_tidy <- fetch_enr(2024, tidy = TRUE)The tidy format is better for: - Comparing subgroups within schools - Longitudinal analysis across years - Filtering to specific demographics
Assessment Data
NJSLA / PARCC (2015-present)
Use fetch_parcc() for both NJSLA (2019+) and PARCC
(2015-2018) data:
# Get 2024 Grade 4 Math results
math_g4_2024 <- fetch_parcc(
end_year = 2024,
grade_or_subj = 4,
subj = "math"
)
# Get 2024 Grade 8 ELA results
ela_g8_2024 <- fetch_parcc(
end_year = 2024,
grade_or_subj = 8,
subj = "ela"
)
# Get Algebra 1 results (high school)
alg1_2024 <- fetch_parcc(
end_year = 2024,
grade_or_subj = "ALG1",
subj = "math"
)Note: 2020 assessments were cancelled due to COVID-19.
Available Grade Levels
| Subject | PARCC (2015-2018) | NJSLA (2019+) |
|---|---|---|
| ELA | Grades 3-11 | Grades 3-10 |
| Math | Grades 3-8 | Grades 3-8 |
| Math Courses | ALG1, GEO, ALG2 | ALG1, GEO, ALG2 |
Legacy Assessments (2004-2014)
For historical data, use fetch_old_nj_assess():
# Get 2010 Grade 5 NJASK results
njask_2010 <- fetch_old_nj_assess(
end_year = 2010,
grade = 5,
tidy = TRUE
)Graduation Data
Graduation Rates
# Get 4-year graduation rates for 2024
grad_rate_2024 <- fetch_grad_rate(
end_year = 2024,
methodology = "4 year"
)
# Get 5-year graduation rates (available 2012-2019)
grad_rate_5yr <- fetch_grad_rate(
end_year = 2019,
methodology = "5 year"
)Graduation Counts
# Get graduation counts
grad_count_2024 <- fetch_grad_count(end_year = 2024)School and District Directories
Get metadata about schools and districts:
# Current school directory with addresses, coordinates, grades served
schools <- get_school_directory()
# Current district directory
districts <- get_district_directory()
# View available columns
names(schools)
#> [1] "county_id" "county_name"
#> [3] "district_id" "district_name"
#> [5] "school_id" "school_name"
#> [7] "princ_title" "princ_first_name"
#> [9] "princ_last_name" "princ_title_2"
#> [11] "princ_email" "address1"
#> [13] "address2" "city"
#> [15] "state" "zip"
#> [17] "mailing_address1" "mailing_address2"
#> [19] "mailing_city" "mailing_state"
#> [21] "mailing_zip" "hib_title1"
#> [23] "hib_first_nname" "hib_last_name"
#> [25] "hib_title2" "homeless_liaison_title1"
#> [27] "homeless_liaison_first_name" "homeless_liaison_last_name"
#> [29] "homeless_liaison_title2" "phone"
#> [31] "pre_k" "kindergarten"
#> [33] "grade_1" "grade_2"
#> [35] "grade_3" "grade_4"
#> [37] "grade_5" "grade_6"
#> [39] "grade_7" "grade_8"
#> [41] "grade_9" "grade_10"
#> [43] "grade_11" "grade_12"
#> [45] "post_grad" "adult_ed"
#> [47] "nces_code" "address"
#> [49] "cds_code"Data Coverage Summary
| Data Type | Function | Years Available |
|---|---|---|
| Enrollment | fetch_enr() |
2000-2025 |
| NJSLA/PARCC | fetch_parcc() |
2015-2024 |
| NJASK | fetch_old_nj_assess() |
2004-2014 |
| Graduation Rates | fetch_grad_rate() |
2011-2024 |
| Graduation Counts | fetch_grad_count() |
2012-2024 |
Common Subgroups
When working with tidy data, you’ll encounter these subgroup codes:
Other Demographics
-
male,female -
free_lunch,reduced_lunch,free_reduced_lunch -
lep(English learners) migranttotal_enrollment
Note: Special education data is available via
fetch_sped(), not through enrollment subgroups.
Tips and Best Practices
1. Use Session Caching
The package includes built-in session caching to avoid re-downloading:
# Use session cache (avoids re-downloading within same session)
enr_2024 <- fetch_enr(2024, tidy = TRUE, use_cache = TRUE)
# Or save to disk for reuse across sessions
saveRDS(enr_2024, "data/enr_2024.rds")
enr_2024 <- readRDS("data/enr_2024.rds")
# Check cache status
njsd_cache_info()3. Multi-Year Analysis
Use purrr::map_df() for combining multiple years:
Next Steps
- See
vignette("nj-enrollment-insights")for 15 data stories with visualizations - See
vignette("spr-dictionary")for School Performance Report data - Visit the package website for full documentation
Getting Help
- File issues: https://github.com/almartin82/njschooldata/issues
- Email: almartin@gmail.com
sessionInfo()
#> R version 4.5.3 (2026-03-11)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
#> [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
#> [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
#> [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] purrr_1.2.1 dplyr_1.2.0 njschooldata_0.9.0
#>
#> loaded via a namespace (and not attached):
#> [1] sass_0.4.10 generics_0.1.4 tidyr_1.3.2 stringi_1.8.7
#> [5] hms_1.1.4 digest_0.6.39 magrittr_2.0.4 evaluate_1.0.5
#> [9] timechange_0.4.0 fastmap_1.2.0 cellranger_1.1.0 jsonlite_2.0.0
#> [13] httr_1.4.8 codetools_0.2-20 textshaping_1.0.5 jquerylib_0.1.4
#> [17] cli_3.6.5 crayon_1.5.3 rlang_1.1.7 bit64_4.6.0-1
#> [21] withr_3.0.2 cachem_1.1.0 yaml_2.3.12 parallel_4.5.3
#> [25] tools_4.5.3 downloader_0.4.1 tzdb_0.5.0 curl_7.0.0
#> [29] vctrs_0.7.2 R6_2.6.1 lifecycle_1.0.5 lubridate_1.9.5
#> [33] snakecase_0.11.1 stringr_1.6.0 bit_4.6.0 fs_2.0.1
#> [37] vroom_1.7.1 ragg_1.5.2 janitor_2.2.1 pkgconfig_2.0.3
#> [41] desc_1.4.3 pkgdown_2.2.0 pillar_1.11.1 bslib_0.10.0
#> [45] glue_1.8.0 systemfonts_1.3.2 xfun_0.57 tibble_3.3.1
#> [49] tidyselect_1.2.1 knitr_1.51 htmltools_0.5.9 rmarkdown_2.31
#> [53] readr_2.2.0 compiler_4.5.3 readxl_1.4.5