Frequently Asked Questions

Short answers to common questions about this project, its data, methods, and findings.

It analyzes public New York State court records to describe patterns in criminal case outcomes. It has two parts: one that estimates conviction likelihood from arraignment data, and one that looks at how pretrial release rates changed around bail-law amendments in 2022 and 2023.
OCA-STAT is a public dataset maintained by the New York State Office of Court Administration. It contains case-level records from criminal arraignments across the state. This project uses OCA-STAT as the main data source for its predictive branch.
It estimates the probability that a criminal case will end in conviction (as opposed to dismissal or acquittal). It uses only information available around the time of arraignment — things like county, charge severity, arrest type, gender, ethnicity, and race. It does not use any post-arraignment information such as plea negotiations or evidence.
The best model correctly classified about 79% of test cases and achieved an AUROC of 0.86 (on a scale from 0.5 to 1.0, where higher is better). In practical terms, it is substantially better than guessing and picks up real patterns, but it is not perfect. Accuracy varies across counties (0.52–0.77 AUROC) and demographic groups.

Race is included as a feature in the model. Conviction rates and model accuracy both vary across racial groups. Three independent adjusted-analysis methods show that non-White defendants have lower conviction rates than White defendants with similar case profiles.

The Black–White gap is roughly 2.5–2.8 percentage points across all methods. However, the public data lacks many important variables (evidence strength, attorney quality, criminal history detail) that could explain these differences. See the full race analysis for details.

The pretrial branch compares release rates before and after the May 2022 and June 2023 bail-law amendments. The largest shift was in firearm-related charges after May 2022, where release rates dropped 20–32 pp more than for other case types. The June 2023 amendment shows smaller differences. See the pretrial impact page for interactive charts.

Geography: NYC courts have a raw conviction rate of about 25%; courts outside NYC have a rate of about 69%. This is the single strongest predictor.

Charge severity: Felonies have a conviction rate of about 66%; violations have a rate of about 25%. These two factors dwarf all others.

Both datasets are public. The predictive branch uses OCA-STAT arraignment records published by the New York State Office of Court Administration. The pretrial branch uses the DCJS/OCA supplemental pretrial release file published by the Division of Criminal Justice Services. No row-level data is committed to the repository — only aggregate analysis outputs.
  • The predictive model only uses information from the time of arraignment. It does not include plea negotiations, evidence, or other post-arraignment factors.
  • The pretrial analysis is a before-and-after comparison, not a controlled experiment.
  • Both branches cover New York State only (2021–2025).
  • Model accuracy varies by county and demographic group.
  • About 31% of cases have race recorded as "Unknown."
No. This project is for research use only. It must not be used for charging, plea bargaining, release, sentencing, supervision, or any individual legal decision. The model describes aggregate patterns in public data; it does not and cannot predict individual case outcomes.
Yes. The full codebase is available under the MIT License. The pipeline is reproducible: data fetching, dataset building, model training, evaluation, and figure generation are all scripted. See the methodology page for technical details.