Skip to contents

Overview

The ctschooldata package provides tools for accessing Connecticut public school enrollment data from the Connecticut State Department of Education (CSDE). Enrollment data with 13 demographic subgroups is available for 2020-2023 via CT attendance datasets on data.ct.gov.

Installation

# install.packages("remotes")
remotes::install_github("almartin82/ctschooldata")

Enrollment Data (2020-2023)

The enrollment pipeline fetches real data from CT attendance datasets on data.ct.gov. District-level data includes 13 demographic subgroups; school-level data provides total enrollment.

library(ctschooldata)
library(dplyr)

enr_2023 <- fetch_enr(2023, use_cache = TRUE)
cat("Total rows:", nrow(enr_2023), "\n")
#> Total rows: 3150
cat("Entity types:", paste(unique(enr_2023$type), collapse = ", "), "\n")
#> Entity types: State, District, Campus
cat("Subgroups:", length(unique(enr_2023$subgroup)), "\n")
#> Subgroups: 13
# State total enrollment
enr_2023 |>
  filter(is_state, subgroup == "total_enrollment", grade_level == "TOTAL") |>
  select(end_year, type, n_students)
#>   end_year  type n_students
#> 1     2023 State     494006
# Top 10 districts by enrollment
enr_2023 |>
  filter(is_district, subgroup == "total_enrollment", grade_level == "TOTAL") |>
  arrange(desc(n_students)) |>
  select(district_name, n_students) |>
  head(10)
#>                                        district_name n_students
#> 1                         Bridgeport School District      18508
#> 2                          Waterbury School District      17786
#> 3                          New Haven School District      17776
#> 4                           Stamford School District      15938
#> 5                           Hartford School District      15448
#> 6                            Danbury School District      11925
#> 7                            Norwalk School District      11326
#> 8  Connecticut Technical Education and Career System      10949
#> 9                        New Britain School District       9367
#> 10                         Fairfield School District       9279

School Directory (Working)

The school directory is fully functional and returns real data from CT Open Data.

dir_data <- fetch_directory()
cat("Records:", nrow(dir_data), "\n")
#> Records: 1232
# Top districts by number of schools
dir_data |>
  count(district_name, sort = TRUE) |>
  head(10)
#> # A tibble: 10 × 2
#>    district_name                                         n
#>    <chr>                                             <int>
#>  1 Hartford School District                             42
#>  2 Bridgeport School District                           37
#>  3 New Haven School District                            36
#>  4 Waterbury School District                            30
#>  5 Norwalk School District                              23
#>  6 Stamford School District                             22
#>  7 Danbury School District                              21
#>  8 Connecticut Technical Education and Career System    18
#>  9 Capitol Region Education Council                     17
#> 10 Fairfield School District                            17

Understanding the Directory Data

Column Description
state_school_id State organization code (7 characters)
state_district_id District code (3 characters)
school_name Organization/school name
district_name District name
school_type Type of organization
grades_served Comma-separated list of grades offered
address, city, zip Location
latitude, longitude Coordinates

Importing EdSight Data

For historical enrollment data (before 2020) or data with grade-level breakdown, manually export from EdSight and import:

# Import manually downloaded EdSight data
enr_2024 <- import_local_enr(
  "~/Downloads/CT_Enrollment_2023-24.xlsx",
  end_year = 2024
)

# View state totals
enr_2024 |>
  filter(is_state, subgroup == "total_enrollment", grade_level == "TOTAL")

# View district totals
enr_2024 |>
  filter(is_district, subgroup == "total_enrollment", grade_level == "TOTAL") |>
  arrange(desc(n_students)) |>
  head(10)

Data Sources

Primary: CT Attendance Datasets (Automated)

Supplemental: EdSight (Manual Export)

Directory: CT Open Data Education Directory

Available Functions

Function Description Status
fetch_enr(end_year) Fetch enrollment for one year Working (real data, 2020-2023)
fetch_enr_multi(years) Fetch enrollment for multiple years Working (real data, 2020-2023)
fetch_directory() School/district directory Working (real data)
import_local_enr(path, year) Import local EdSight export Working (requires manual download)
get_available_years() Get range of available years Working
tidy_enr(df) Convert wide data to tidy format Working
id_enr_aggs(df) Add aggregation level flags Working
enr_grade_aggs(df) Create K-8, HS, K-12 aggregates Working
clear_cache() Clear cached data Working
cache_status() Show cached files Working

Python Support

A Python wrapper is also available:

import pyctschooldata as ct

# Fetch directory data (works)
directory = ct.fetch_directory()

# Fetch enrollment data
enr = ct.fetch_enr(2023)

# Get available years
years = ct.get_available_years()
print(f"Data available: {years['min_year']}-{years['max_year']}")

Known Limitations

  1. No grade-level breakdown: CT attendance datasets provide only TOTAL enrollment, not per-grade counts. The grade_level column is always "TOTAL". Grade-level detail requires manual EdSight export via import_local_enr().
  2. Limited year range (2020-2023): The automated pipeline covers 4 years only. Historical data (2007-2019) requires manual EdSight export.
  3. “Other races” lumping: CT attendance data combines Asian, Native American, Pacific Islander, and Multiracial into a single other_races category. Standard subgroup names asian, native_american, pacific_islander, and multiracial are not available.
  4. School-level demographics: School-level data (dataset vpbj-j9a4) provides total enrollment only – no demographic subgroups at the campus level.
  5. 2020 has no school data: The 2019-20 year is only available at the district level (dataset he4h-bgqh); school-level data starts in 2021.
sessionInfo()
#> R version 4.5.0 (2025-04-11)
#> Platform: aarch64-apple-darwin22.6.0
#> Running under: macOS 26.1
#> 
#> Matrix products: default
#> BLAS:   /opt/homebrew/Cellar/openblas/0.3.30/lib/libopenblasp-r0.3.30.dylib 
#> LAPACK: /opt/homebrew/Cellar/r/4.5.0/lib/R/lib/libRlapack.dylib;  LAPACK version 3.12.1
#> 
#> locale:
#> [1] C.UTF-8/C.UTF-8/C.UTF-8/C/C.UTF-8/C.UTF-8
#> 
#> time zone: America/New_York
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] dplyr_1.2.0        ctschooldata_0.1.0 testthat_3.3.1    
#> 
#> loaded via a namespace (and not attached):
#>  [1] vctrs_0.7.1       cli_3.6.5         knitr_1.51        rlang_1.1.7      
#>  [5] xfun_0.55         otel_0.2.0        generics_0.1.4    textshaping_1.0.4
#>  [9] jsonlite_2.0.0    glue_1.8.0        htmltools_0.5.9   ragg_1.5.0       
#> [13] sass_0.4.10       rappdirs_0.3.4    brio_1.1.5        rmarkdown_2.30   
#> [17] tibble_3.3.1      evaluate_1.0.5    jquerylib_0.1.4   fastmap_1.2.0    
#> [21] yaml_2.3.12       lifecycle_1.0.5   compiler_4.5.0    codetools_0.2-20 
#> [25] fs_1.6.6          pkgconfig_2.0.3   htmlwidgets_1.6.4 systemfonts_1.3.1
#> [29] digest_0.6.39     R6_2.6.1          utf8_1.2.6        tidyselect_1.2.1 
#> [33] pillar_1.11.1     magrittr_2.0.4    bslib_0.9.0       withr_3.0.2      
#> [37] tools_4.5.0       pkgdown_2.2.0     cachem_1.1.0      desc_1.4.3