This project analyzed salary patterns across the global data science industry using historical datasets. The goal was to understand how experience, company size, and work flexibility impact compensation, while also building predictive models to forecast salaries. The challenge: how do you translate raw data into insights that are both technically rigorous and strategically valuable?
I served as Data Scientist, leading the end-to-end workflow: data collection, cleaning, exploratory analysis, feature engineering, model development, and interpretation.
I combined technical execution with analytical storytelling, ensuring the results were not just accurate but also actionable for both technical and business stakeholders.
Cleaned, normalized, and engineered features from raw salary datasets
Built and validated predictive models using machine learning algorithms
Translated technical findings into business recommendations and insights
Handling missing values, outliers, and noisy compensation data across different industries, regions, and reporting standards.
Accounting for multi-variable interactions between role, experience, geography, company size, and remote work arrangements.
Ensuring results could be understood and acted upon by non-technical stakeholders while maintaining statistical rigor.
Cleaned and normalized datasets for consistency, handling missing values and removing outliers that could skew model performance.
Engineered features to capture seniority levels, company size categories, and work flexibility metrics for better model interpretability.
Applied exploratory visualizations to uncover initial patterns and validate assumptions about salary distributions.
I trained multiple models to predict salaries, including Random Forests and baseline regressions. Random Forest emerged as the top performer with exceptional accuracy across all metrics.
Overall prediction accuracy
Positive prediction accuracy
True positive detection rate
Harmonic mean of precision and recall
The Random Forest algorithm was selected for its ability to handle complex feature interactions while maintaining interpretability. The model was trained on engineered features including experience level, company size, job role, and remote work ratio.
Clear salary progression from entry-level to executive positions, with substantial increases at senior and executive tiers.
Larger companies consistently offer higher median salaries across all experience levels and job roles.
Fully remote roles showed compensation on par with, or even above, onsite jobs in many categories.
Career progression yields substantial salary increases, especially at the executive tier. The data shows clear compensation bands that align with industry expectations and provide a roadmap for career growth.
Larger firms consistently set higher salary baselines across all roles, while smaller firms compete through specialized roles, equity packages, and growth opportunities.
Remote jobs are not just competitive — they often surpass onsite pay, particularly for senior roles. This reflects the premium companies place on accessing global talent and the value of flexible work arrangements.
100% remote positions show 15-20% higher median salaries compared to onsite roles, indicating a market premium for distributed work capabilities.
Align pay structures with industry benchmarks to attract and retain top talent. Use data-driven salary bands to ensure competitive positioning.
Action: Implement quarterly salary reviews against market data
Build retention through compensation pathways tied to career growth. Create clear progression models based on experience and performance.
Action: Develop transparent career ladders with salary expectations
Account for regional salary variations when expanding or hiring remotely. Leverage geographic arbitrage while maintaining competitive offers.
Action: Create location-adjusted compensation frameworks
Use predictive models to align offers with market reality and reduce turnover. Optimize hiring budgets based on role requirements and market conditions.
Action: Deploy salary prediction tools in recruitment workflows
This project demonstrated how salary data can be more than a snapshot — it can guide workforce planning, market positioning, and individual career choices.
By blending machine learning with data storytelling, the results moved beyond numbers into strategic insights that inform real-world decisions.