Two questions, always: Is the model effective? And is its reasoning acceptable?
| Part | Question we'll answer | Data | Main tool |
|---|---|---|---|
| Part 1 | How far can classic interpretability take us? | Census (tabular) | Tree, coefficients, PCA, SHAP |
| Part 2 | Does the same logic work on text and multiclass? | Emotions (text) | Multiclass SHAP |
| Part 3 | Can you run the workflow alone? | Your dataset | Independent SHAP analysis |
Features are human-readable. When the model highlights marital-status or hours-per-week, we can immediately judge whether the pattern is reasonable — or worth auditing.
[ROOT NODE]
feature ≤ value
/ \
True False
[left child] [right child]
| |
[LEAF] [LEAF]
class: A class: B
DecisionTreeClassifier(max_depth=3, random_state=42) — depth 3 to stay readable.
The feature at the root is the strongest first separator in the data.
Readable only because it's shallow. Grow the tree, lose the interpretability.
Split frequency × impurity reduction.
Positive → class 1 · negative → class 0.
tree_clf.feature_importances_ · pipeline['logistic_regression'].coef_[0] (with StandardScaler).
PC2 ▲
│ ● ● ○ ○ ○ ● = class 0 (≤50K)
↑ arrow │ ● ● ○ ○ ○ ○ = class 1 (>50K)
for feat A │● ●● ○ ○ ○
│ ● ○ ○ ○ ○
└──────────────► PC1
→ arrow for feature B
PCA is excellent for exploration. But projection loses information — never a final explanation.
| Model | F1 score | Interpretability |
|---|---|---|
| Decision tree (depth 3) | ~0.75 | Readable rule set |
| Logistic regression | ~0.78 | Coefficients directly inspectable |
| XGBoost | ~0.81 | Not directly inspectable |
Better performance → less readable. No single tree to look at. No coefficient table. Reasoning is distributed across hundreds of trees.
xgb.XGBClassifier(eval_metric='logloss', random_state=42) — 100 trees by default, max_depth=6. This is the model we'll explain with SHAP.
For this prediction, how much did each feature contribute, relative to a baseline?
baseline prediction (expected value)
+ contribution from feature 1
+ contribution from feature 2
+ contribution from feature 3
+ ...
= model output for this person
Think of SHAP as a prediction-decomposition tool. It splits one prediction into additive pieces you can read.
TreeExplainer for XGBoost.TreeExplainer.explainer = shap.TreeExplainer(xgb_model, data=X_train, model_output="probability") · data=X_train is the background set — the mean prediction over it becomes the expected value.shap_values = explainer.shap_values(X_test) → a matrix (n_samples, n_features) feature 1 feature 2 feature 3 ...
person 1 +0.12 -0.03 +0.01
person 2 -0.08 +0.10 0.00
person 3 +0.02 -0.01 -0.05
Near zero → the feature barely influenced this particular prediction.
The expected value is simply the model's average prediction over the background data — E[f(X)] = np.mean(model.predict(X_background)).
model.predict_proba(X_train)[:,1].mean()
≈ 0.24 ← expected_value
(~24% earn >50K in X_train)
expected_value + Σ shap_values[person] = model.predict_proba(person)
High capital-gain → >50K. gender_Female slightly pushes toward ≤50K — worth an audit.
SHAP value
for age • •
• • • • color = value of another feature
0 -----------------
• • •
• •
low → high
value of age
The force plot tells you why this specific person got this prediction.
| Caveat | What it means |
|---|---|
| Correlated features | Credit may split between twins in a messy way |
| Compute cost | TreeExplainer is fast — other explainers can be slow |
| Not causality | A strong SHAP value is association, not cause |
| Local instability | Similar people can get visibly different explanations |
| Explainer choice | Different explainers behave differently across models |
Use SHAP as a structured way to understand the model — not as absolute truth.
| Aspect | Part 1 (binary) | Part 3 (multiclass text) |
|---|---|---|
| Features | Tabular columns | Words / tokens |
| Task | Binary | 6-class (sadness · joy · fear · anger · surprise · disgust) |
| SHAP output space | Probabilities | Logits (raw class scores) |
| Explanation unit | One per prediction | One per class, per prediction |
TreeExplainer, shap_values, force plots. We slow down on 2 new concepts.
xgb.XGBClassifier(objective='multi:softprob', num_class=6)
raw model scores (logits) ──softmax──
probabilities that sum to 1
sadness 3.1 sadness 0.72
joy 0.7 joy 0.08
fear 1.0 fear 0.10
anger 0.2 anger 0.05
surprise -0.3 surprise 0.03
disgust -0.6 disgust 0.02
In multiclass, TreeExplainer explains the raw class scores — probabilities come after softmax.
shap_values │ ├── class 0: sadness → array (n_samples, n_features) ├── class 1: joy → array (n_samples, n_features) ├── class 2: fear → array (n_samples, n_features) ├── class 3: anger → array (n_samples, n_features) ├── class 4: surprise → array (n_samples, n_features) └── class 5: disgust → array (n_samples, n_features)
shap_values[3][5] for person 5, anger class
shap_values[5, :, 3] same thing
Words like horrible, ungrateful push the sadness score up. Other words pull gently the other way.
A single strong term (profane) dominates. Calming words like feeling appear on the opposite side.
The point isn't just which words appear — it's which class they support in this explanation.
XGBClassifier, print classification_report.SHAP values are local and associative — one prediction, one person, one contribution.
shap — shap.readthedocs.iolime · eli5 · captum (PyTorch)