Recover suppressed district graduation rates from school data
Source:R/recover_graduation.R
recover_suppressed_grate.RdWhen NJ DOE suppresses district-level graduation rates (showing them as NA), this function attempts to calculate them from school-level data. This is useful when district data is suppressed but school-level data is available for the same subgroup.
Usage
recover_suppressed_grate(
df,
min_schools = 1,
min_cohort = 10,
log_dir = tempdir()
)Arguments
- df
Graduation rate data frame with both school and district level data. Must include columns: district_id, school_id, subgroup, grad_rate, cohort_count, end_year, is_district, is_school
- min_schools
Minimum number of schools required to calculate district rate. Default is 1.
- min_cohort
Minimum total cohort size required. Default is 10.
- log_dir
Directory for log files. Default is tempdir().
Value
Data frame with recovered district rates where possible. Adds columns: - `grad_rate_recovered`: TRUE if the rate was recovered from school data - `grad_rate_original`: Original (suppressed) value before recovery - `recovered_n_schools`: Number of schools used in calculation - `recovered_cohort`: Total cohort used in calculation
Details
The function calculates district rates as the weighted average of school rates, weighted by cohort count. This matches NJ DOE's methodology.
Recovery only occurs when: - District-level grad_rate is NA (suppressed) - At least `min_schools` schools have non-NA rates for that subgroup - Total cohort is at least `min_cohort`
A log file is written with details of all recoveries.