Skip to contents

Compares district-level graduation rates against weighted averages calculated from school-level data. Flags discrepancies that exceed a threshold, which may indicate data quality issues in the source files.

Usage

validate_grate_aggregation(df, tolerance = 2, log_dir = tempdir())

Arguments

df

Graduation rate data frame with both school and district level data

tolerance

Maximum allowed difference (in percentage points) between reported district rate and calculated rate. Default is 2 (2 percentage points).

log_dir

Directory for log files. Default is tempdir().

Value

Data frame with validation columns added: - `calculated_from_schools`: Rate calculated from school-level data - `rate_discrepancy_pp`: Difference in percentage points - `aggregation_flag`: "OK", "DISCREPANCY", "MISSING_SCHOOL_DATA", or "SUPPRESSED"

Details

This function helps identify potential data quality issues where: - District totals don't match sum of school data - Rates are mathematically inconsistent - Data may have been incorrectly entered or processed

A detailed log file is written documenting all discrepancies found.