Data Science | Machine Learning

Loan Default Risk Analysis

A comprehensive analysis of loan default patterns using machine learning to identify high-risk borrowers and recommend data-driven strategies for minimizing defaults.

Project Links

Tech Stack

Python

Python

SQL

SQL

Tableau

Tableau

Jupyter

Jupyter

GitHub

GitHub

Key Features

Predictive modeling for default risk

Feature importance analysis

Interactive dashboards

Business recommendations

Data Insights

Distribution of Credit Limits

Credit Limit Distribution

Most customers have credit limits below $200,000, with a right-skewed distribution.

Payment Behavior Over Time

Payment Behavior Over Time

Customers with delayed payments in earlier months tend to continue defaulting.

Feature Importance Analysis

Feature Importance

Key factors affecting loan default risk, identified through Random Forest analysis.

Model Performance Comparison

Model Performance Comparison

Comparison of different machine learning models' performance metrics.

Model Performance

Logistic Regression with SMOTE

precision 0.93
recall 0.95
f1 Score 0.94
accuracy 0.89

Random Forest with SMOTE

precision 0.97
recall 0.91
f1 Score 0.94
accuracy 0.9

Key Insights

SMOTE Enhancement

SMOTE improved the model's ability to detect high-risk borrowers by increasing recall.

Model Selection

Random Forest performed better overall, making it the preferred model for this use case.

Future Development

Fine-tuning hyperparameters and exploring additional models like XGBoost.

Business Recommendations

Early Intervention

Customers with early payment delays (PAY_0, PAY_2) should be flagged for risk monitoring.

Credit Limit Adjustments

Borrowers with low credit limits and high repayment issues should undergo stricter approval checks.

Payment Behavior Monitoring

Implement a real-time alert system for customers showing consistent delayed payments.

Automated Risk-Based Interest Rates

Adjust loan interest rates dynamically based on default probability scores.