Key Findings
Calibrated predictions work
Best Y1 Brier score is 0.188 (vs base-rate 0.209). Platt-calibrated XGBoost produces well-calibrated probabilities across race and gender subgroups.
Model performanceFairness gaps are threshold-dependent
Race FPR gap at t=0.5 is 0.002, but peaks at 0.116 near t=0.25. Choosing a threshold changes who bears the errors.
Fairness analysisGang affiliation is strongest predictor
SHAP importance 0.163. Associated with +19.1 pp rearrest delta. Age 18-22 (+12.2 pp) and supervision risk score also dominant.
Predictive factorsAt a Glance
Brier Score by Model and Horizon
Lower is better. Y2/Y3 are conditional on non-rearrest at prior horizons.
Explore
Model Performance
XGBoost leaderboard, baseline comparisons, seed stability, and COMPAS benchmark.
View modelsFairness
Threshold sweeps, subgroup calibration, and error-rate parity analysis across race and gender.
View fairnessPredictive Factors
SHAP values, feature importance rankings, and subgroup rearrest-rate breakdowns.
View factorsMethods
Data pipeline, feature engineering, model training, calibration, and evaluation methodology.
View methods