Recover suppressed district graduation rates from school data — recover_suppressed

When NJ DOE suppresses district-level graduation rates (showing them as NA), this function attempts to calculate them from school-level data. This is useful when district data is suppressed but school-level data is available for the same subgroup.

Usage

recover_suppressed_grate(
  df,
  min_schools = 1,
  min_cohort = 10,
  log_dir = tempdir()
)

Arguments

df: Graduation rate data frame with both school and district level data. Must include columns: district_id, school_id, subgroup, grad_rate, cohort_count, end_year, is_district, is_school
min_schools: Minimum number of schools required to calculate district rate. Default is 1.
min_cohort: Minimum total cohort size required. Default is 10.
log_dir: Directory for log files. Default is tempdir().

Value

Data frame with recovered district rates where possible. Adds columns: - `grad_rate_recovered`: TRUE if the rate was recovered from school data - `grad_rate_original`: Original (suppressed) value before recovery - `recovered_n_schools`: Number of schools used in calculation - `recovered_cohort`: Total cohort used in calculation

Details

The function calculates district rates as the weighted average of school rates, weighted by cohort count. This matches NJ DOE's methodology.

Recovery only occurs when: - District-level grad_rate is NA (suppressed) - At least `min_schools` schools have non-NA rates for that subgroup - Total cohort is at least `min_cohort`

A log file is written with details of all recoveries.