Data Science | Machine Learning
Loan Default Risk Analysis
A comprehensive analysis of loan default patterns using machine learning to identify high-risk borrowers and recommend data-driven strategies for minimizing defaults.
Project Links
Tech Stack
Python
SQL
Tableau
Jupyter
GitHub
Key Features
Predictive modeling for default risk
Feature importance analysis
Interactive dashboards
Business recommendations
Data Insights
Distribution of Credit Limits

Most customers have credit limits below $200,000, with a right-skewed distribution.
Payment Behavior Over Time

Customers with delayed payments in earlier months tend to continue defaulting.
Feature Importance Analysis

Key factors affecting loan default risk, identified through Random Forest analysis.
Model Performance Comparison

Comparison of different machine learning models' performance metrics.
Model Performance
Logistic Regression with SMOTE
Random Forest with SMOTE
Key Insights
SMOTE Enhancement
SMOTE improved the model's ability to detect high-risk borrowers by increasing recall.
Model Selection
Random Forest performed better overall, making it the preferred model for this use case.
Future Development
Fine-tuning hyperparameters and exploring additional models like XGBoost.
Business Recommendations
Early Intervention
Customers with early payment delays (PAY_0, PAY_2) should be flagged for risk monitoring.
Credit Limit Adjustments
Borrowers with low credit limits and high repayment issues should undergo stricter approval checks.
Payment Behavior Monitoring
Implement a real-time alert system for customers showing consistent delayed payments.
Automated Risk-Based Interest Rates
Adjust loan interest rates dynamically based on default probability scores.