Project information

  • Category: Responsible AI / Data Science
  • Project duration: Jan '25 - Mar '25
  • Team size: 3
  • Dataset: NYPD Stop-Question-and-Frisk (2003-2024)
  • Github URL: Stop-and-Frisk Bias Analysis
  • Video Presentation URL: Video URL

Freeze! Is This Model Fair?

Technology(s) Used: Scikit-learn, Fairness Indicators, Platt Scaling, Random Forest, AdaBoost, Python

  • Objective: Analyzed over two decades of NYPD stop-and-frisk data to detect and mitigate predictive bias in arrest outcomes across sensitive demographics (race, gender, and age).
  • Fairness Interventions: Implemented a three-stage fairness pipeline, applying pre-processing techniques such as uniform sampling and preferential sampling to mitigate dataset bias, and post-processing methods including Platt scaling to adjust predictions and enforce equalized odds after model training.
  • Modeling: Built robust classifiers using in-processing strategies, training Logistic Regression, Random Forest, and AdaBoost models, and assessed performance using balanced accuracy and demographic parity metrics.
  • Results: The post-processing + Random Forest pipeline emerged as the best performer, increasing balanced accuracy from 0.68 to 0.71 while achieving significantly more equitable TPR/FPR across groups.