Predictive Branch: Findings

Using 1.6 million public arraignment records (2021–2025), three classifiers estimate conviction likelihood from arraignment-time features. The best model reaches AUROC 0.86 on a held-out test set of 271,701 cases.

Model Performance

3 Classifiers

Metrics Across Classifiers

Evaluated on 271,701 held-out test cases. All models use the same 19-feature set.

Model	Accuracy	AUROC	PR-AUC	Brier	Log Loss
Dummy (prior)	0.5684	0.5000	0.4316	0.2461	0.6854
Logistic Regression	0.7688	0.8465	0.8148	0.1569	0.4783
Hist. Gradient Boosting	0.7861	0.8644	0.8302	0.1496	0.4615

Calibration

Reliability Curve

A well-calibrated model's predicted probabilities should match observed outcome rates. The chart below compares the model's average prediction in each quantile bin against the actual conviction rate in that bin.

Calibration Curve (Hist. Gradient Boosting)

8 quantile bins. Points close to the diagonal indicate good calibration.

Post-hoc sigmoid (Platt) calibration applied via CalibratedClassifierCV with 3-fold CV.

Subgroup Performance

Fairness Audit

Model accuracy varies across demographic groups and regions. The charts below show AUROC for each subgroup. Higher AUROC means the model ranks cases more accurately within that group.

AUROC by Race

Ranking quality varies across racial groups

AUROC by Ethnicity

AUROC by Gender

AUROC by Age Group

AUROC by Region

Notable: The model's ranking quality (AUROC) varies substantially across race groups — from 0.69 (Asian) to 0.77 (White). The NYC vs. non-NYC region split also shows very different positive rates (23% vs. 69%) despite similar AUROC values.

Full subgroup metrics table

Group	Value	Cases	Positive Rate	Accuracy	AUROC	PR-AUC	Brier

County-Level Performance

61 Counties

Model accuracy varies substantially across New York's 61 counties. Some counties are much easier for the model to predict than others. This chart shows AUROC for every county, sorted from highest to lowest.

AUROC by County

Hover for details. Counties with more cases shown in darker color.

Bar color intensity reflects log-scaled case count. Monroe (n=16,629) leads at 0.77; Schoharie (n=693) trails at 0.52.