Employee Attrition Prediction

Most ML projects stop at "my model has 97% recall." This one asks: ok, but what does that actually cost? Predicting who will leave is only half the problem. Knowing what each wrong prediction costs your business is the other half.

Algorithms compared

0.857

ROC-AUC (LDA)

82k

CHF saved vs default

Business scenarios

The problem

Employee attrition costs companies between 50–200% of an annual salary per lost employee. For a Swiss IT profile, that's roughly CHF 70,000 per person — recruiting, onboarding, lost productivity, and institutional knowledge walking out the door.

The standard approach is to train a model and pick the one with the best accuracy or F1. But that ignores something important: in this problem, a false negative (missing someone who's about to leave) is not the same cost as a false positive (intervening with someone who was staying anyway). The default threshold of 0.5 assumes they're equal. They're not.

What makes this different

⚖️

Cost-based threshold tuning

Instead of optimizing for recall or F1, I optimized for total business cost. The threshold that minimizes cost isn't 0.5 — it's 0.35 for the Swiss IT market.

🇨🇭

Swiss market adaptation

The IBM dataset uses US salaries. I recalibrated all cost assumptions using real Swiss IT market data (jobs.ch, SECO 2025) with a scaling factor of 1.287.

🏢

3 business scenarios

The "best" model depends on your company. Retaining senior talent? Use threshold 0.12. Budget constraints? Use 0.50. Mixed profiles? 0.25.

🔬

Honest limitations

The model predicts if someone will leave, not when. Overtime is the top predictor but may reflect job type rather than direct cause. Documented openly.

The 3 scenarios

One threshold doesn't fit all companies. I defined three scenarios based on different business priorities:

Scenario 1

Retention

1,096k CHF

Threshold 0.12 · Recall 88% · Detects 30/34 employees. Best for senior profiles where losing anyone is expensive.

Scenario 2

Cost Control

1,090k CHF

Threshold 0.50 · Precision 79% · Only 4 unnecessary interventions. Best when budget for retention is limited.

Scenario 3 ⭐ lowest cost

Balanced

1,016k CHF

Threshold 0.25 · Detects 22/34 · Moderate interventions. Best for mixed profiles and general use.

Tech stack

Python Scikit-learn LDA Pandas NumPy Matplotlib Seaborn Jupyter StandardScaler Label Encoding Confusion Matrix

What I learned

The most interesting insight from this project isn't technical — it's that GridSearchCV with hyperparameter tuning (LogReg, 97% recall, cost 1,782k CHF) was dramatically worse than simple threshold tuning on a base LDA model (88% recall, cost 1,008k CHF). Better metrics don't mean better business decisions.

I also learned that adapting a model to a specific market context requires more than changing currency symbols. The cost ratio between false negatives and false positives fundamentally changes what the optimal threshold is — from 0.12 with SHRM's 6.5:1 ratio to 0.35 with Switzerland's 3.4:1 ratio.

View on GitHub ↗ ← All projects

Employee AttritionPrediction