Portfolio Details - Diabetes Risk Prediction

Zoom In

Zoom In

Zoom In

Zoom In

Diabetes Risk Prediction

Data Engineering: Cleaned and processed the massive CDC Diabetes Health Indicators dataset. Addressed significant class imbalance using oversampling techniques to ensure the minority (diabetes/prediabetes) class was properly represented.
Feature Selection: Conducted multi-collinearity analysis and used Random Forest Gini importance to distill 21 variables down to 7 key health indicators, optimizing model efficiency.
Model Development: Implemented and compared Logistic Regression, SVM, Random Forest, and KNN. Utilized GridSearchCV for hyperparameter tuning and cross-validation to prevent overfitting.
Performance: Achieved an 88.79% accuracy and 89.24% precision. The model provides a reliable framework for early screening based on behavioral and health metrics.