Overview
The ctschooldata package provides tools for accessing
Connecticut public school enrollment data from the Connecticut State
Department of Education (CSDE). Enrollment data with 13 demographic
subgroups is available for 2020-2023 via CT attendance datasets on
data.ct.gov.
Installation
# install.packages("remotes")
remotes::install_github("almartin82/ctschooldata")Enrollment Data (2020-2023)
The enrollment pipeline fetches real data from CT attendance datasets on data.ct.gov. District-level data includes 13 demographic subgroups; school-level data provides total enrollment.
library(ctschooldata)
library(dplyr)
enr_2023 <- fetch_enr(2023, use_cache = TRUE)
cat("Total rows:", nrow(enr_2023), "\n")
#> Total rows: 3150
cat("Entity types:", paste(unique(enr_2023$type), collapse = ", "), "\n")
#> Entity types: State, District, Campus
cat("Subgroups:", length(unique(enr_2023$subgroup)), "\n")
#> Subgroups: 13
# State total enrollment
enr_2023 |>
filter(is_state, subgroup == "total_enrollment", grade_level == "TOTAL") |>
select(end_year, type, n_students)
#> end_year type n_students
#> 1 2023 State 494006
# Top 10 districts by enrollment
enr_2023 |>
filter(is_district, subgroup == "total_enrollment", grade_level == "TOTAL") |>
arrange(desc(n_students)) |>
select(district_name, n_students) |>
head(10)
#> district_name n_students
#> 1 Bridgeport School District 18508
#> 2 Waterbury School District 17786
#> 3 New Haven School District 17776
#> 4 Stamford School District 15938
#> 5 Hartford School District 15448
#> 6 Danbury School District 11925
#> 7 Norwalk School District 11326
#> 8 Connecticut Technical Education and Career System 10949
#> 9 New Britain School District 9367
#> 10 Fairfield School District 9279School Directory (Working)
The school directory is fully functional and returns real data from CT Open Data.
dir_data <- fetch_directory()
cat("Records:", nrow(dir_data), "\n")
#> Records: 1232
# Top districts by number of schools
dir_data |>
count(district_name, sort = TRUE) |>
head(10)
#> # A tibble: 10 × 2
#> district_name n
#> <chr> <int>
#> 1 Hartford School District 42
#> 2 Bridgeport School District 37
#> 3 New Haven School District 36
#> 4 Waterbury School District 30
#> 5 Norwalk School District 23
#> 6 Stamford School District 22
#> 7 Danbury School District 21
#> 8 Connecticut Technical Education and Career System 18
#> 9 Capitol Region Education Council 17
#> 10 Fairfield School District 17Understanding the Directory Data
| Column | Description |
|---|---|
state_school_id |
State organization code (7 characters) |
state_district_id |
District code (3 characters) |
school_name |
Organization/school name |
district_name |
District name |
school_type |
Type of organization |
grades_served |
Comma-separated list of grades offered |
address, city, zip
|
Location |
latitude, longitude
|
Coordinates |
Importing EdSight Data
For historical enrollment data (before 2020) or data with grade-level breakdown, manually export from EdSight and import:
# Import manually downloaded EdSight data
enr_2024 <- import_local_enr(
"~/Downloads/CT_Enrollment_2023-24.xlsx",
end_year = 2024
)
# View state totals
enr_2024 |>
filter(is_state, subgroup == "total_enrollment", grade_level == "TOTAL")
# View district totals
enr_2024 |>
filter(is_district, subgroup == "total_enrollment", grade_level == "TOTAL") |>
arrange(desc(n_students)) |>
head(10)Data Sources
Primary: CT Attendance Datasets (Automated)
-
District-level: https://data.ct.gov/resource/he4h-bgqh.json
- Dataset ID:
he4h-bgqh - Years: 2020-2023
- Includes: 13 demographic subgroups, state + district level
- Dataset ID:
-
School-level: https://data.ct.gov/resource/vpbj-j9a4.json
- Dataset ID:
vpbj-j9a4 - Years: 2021-2023
- Includes: Total enrollment only (no demographics at school level)
- Dataset ID:
Supplemental: EdSight (Manual Export)
- URL: https://public-edsight.ct.gov/Students/Enrollment-Dashboard/Public-School-Enrollment-Export
- Years: 2007-2024
- Includes: Full enrollment counts, demographics, special populations, grade levels
- Limitation: Requires browser interaction (Qlik Sense dashboard)
Directory: CT Open Data Education Directory
- URL: https://data.ct.gov/resource/9k2y-kqxn.json
- Includes: District/school names, organization codes, grade-level offerings
Available Functions
| Function | Description | Status |
|---|---|---|
fetch_enr(end_year) |
Fetch enrollment for one year | Working (real data, 2020-2023) |
fetch_enr_multi(years) |
Fetch enrollment for multiple years | Working (real data, 2020-2023) |
fetch_directory() |
School/district directory | Working (real data) |
import_local_enr(path, year) |
Import local EdSight export | Working (requires manual download) |
get_available_years() |
Get range of available years | Working |
tidy_enr(df) |
Convert wide data to tidy format | Working |
id_enr_aggs(df) |
Add aggregation level flags | Working |
enr_grade_aggs(df) |
Create K-8, HS, K-12 aggregates | Working |
clear_cache() |
Clear cached data | Working |
cache_status() |
Show cached files | Working |
Known Limitations
-
No grade-level breakdown: CT attendance datasets
provide only TOTAL enrollment, not per-grade counts. The
grade_levelcolumn is always"TOTAL". Grade-level detail requires manual EdSight export viaimport_local_enr(). - Limited year range (2020-2023): The automated pipeline covers 4 years only. Historical data (2007-2019) requires manual EdSight export.
-
“Other races” lumping: CT attendance data combines
Asian, Native American, Pacific Islander, and Multiracial into a single
other_racescategory. Standard subgroup namesasian,native_american,pacific_islander, andmultiracialare not available. -
School-level demographics: School-level data
(dataset
vpbj-j9a4) provides total enrollment only – no demographic subgroups at the campus level. -
2020 has no school data: The 2019-20 year is only
available at the district level (dataset
he4h-bgqh); school-level data starts in 2021.
sessionInfo()
#> R version 4.5.0 (2025-04-11)
#> Platform: aarch64-apple-darwin22.6.0
#> Running under: macOS 26.1
#>
#> Matrix products: default
#> BLAS: /opt/homebrew/Cellar/openblas/0.3.30/lib/libopenblasp-r0.3.30.dylib
#> LAPACK: /opt/homebrew/Cellar/r/4.5.0/lib/R/lib/libRlapack.dylib; LAPACK version 3.12.1
#>
#> locale:
#> [1] C.UTF-8/C.UTF-8/C.UTF-8/C/C.UTF-8/C.UTF-8
#>
#> time zone: America/New_York
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] dplyr_1.2.0 ctschooldata_0.1.0 testthat_3.3.1
#>
#> loaded via a namespace (and not attached):
#> [1] vctrs_0.7.1 cli_3.6.5 knitr_1.51 rlang_1.1.7
#> [5] xfun_0.55 otel_0.2.0 generics_0.1.4 textshaping_1.0.4
#> [9] jsonlite_2.0.0 glue_1.8.0 htmltools_0.5.9 ragg_1.5.0
#> [13] sass_0.4.10 rappdirs_0.3.4 brio_1.1.5 rmarkdown_2.30
#> [17] tibble_3.3.1 evaluate_1.0.5 jquerylib_0.1.4 fastmap_1.2.0
#> [21] yaml_2.3.12 lifecycle_1.0.5 compiler_4.5.0 codetools_0.2-20
#> [25] fs_1.6.6 pkgconfig_2.0.3 htmlwidgets_1.6.4 systemfonts_1.3.1
#> [29] digest_0.6.39 R6_2.6.1 utf8_1.2.6 tidyselect_1.2.1
#> [33] pillar_1.11.1 magrittr_2.0.4 bslib_0.9.0 withr_3.0.2
#> [37] tools_4.5.0 pkgdown_2.2.0 cachem_1.1.0 desc_1.4.3