Feature Importance and Interpretation

  • ID: MLPY-F-L08
  • Type: Lesson
  • Audience: Public
  • Theme: From prediction to cautious interpretation

Interpretation Is Not the Same as Prediction

A model can predict well and still be poorly understood.

Interpretation asks different questions:

  • Which features does the model rely on?
  • How sensitive are predictions to each feature?
  • Are relationships stable or dataset-specific?

Interpretation is useful, but it requires caution.

Feature importance is not causality.


Load Data

import pandas as pd

df = pd.read_csv("data/ml-ready/cdi-customer-churn.csv")

X = df.drop(columns="churn")
y = df["churn"]

Train/Test Split

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.2,
    random_state=42,
    stratify=y
)

Define Preprocessing

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.pipeline import Pipeline

numeric_features = X.select_dtypes(include=["int64", "float64"]).columns
categorical_features = X.select_dtypes(include=["object"]).columns

numeric_transformer = Pipeline(
    steps=[("scaler", StandardScaler())]
)

categorical_transformer = Pipeline(
    steps=[("encoder", OneHotEncoder(handle_unknown="ignore"))]
)

preprocessor = ColumnTransformer(
    transformers=[
        ("num", numeric_transformer, numeric_features),
        ("cat", categorical_transformer, categorical_features)
    ]
)

Train a Model

For feature importance, tree-based models are convenient because they provide built-in importances.

from sklearn.tree import DecisionTreeClassifier

tree = Pipeline(
    steps=[
        ("preprocessor", preprocessor),
        ("classifier", DecisionTreeClassifier(
            max_depth=4,
            min_samples_leaf=10,
            random_state=42
        ))
    ]
)

tree.fit(X_train, y_train)
Pipeline(steps=[('preprocessor',
                 ColumnTransformer(transformers=[('num',
                                                  Pipeline(steps=[('scaler',
                                                                   StandardScaler())]),
                                                  Index(['tenure_months', 'monthly_spend', 'support_calls'], dtype='str')),
                                                 ('cat',
                                                  Pipeline(steps=[('encoder',
                                                                   OneHotEncoder(handle_unknown='ignore'))]),
                                                  Index(['customer_id', 'contract_type', 'autopay'], dtype='str'))])),
                ('classifier',
                 DecisionTreeClassifier(max_depth=4, min_samples_leaf=10,
                                        random_state=42))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Built-in Feature Importance (Expanded Features)

Decision trees provide a feature importance score.

Because one-hot encoding expands categorical variables into multiple columns, we first build the expanded feature names.

import numpy as np

ohe = (
    tree.named_steps["preprocessor"]
        .named_transformers_["cat"]
        .named_steps["encoder"]
)

cat_names = ohe.get_feature_names_out(categorical_features)
expanded_feature_names = np.concatenate([numeric_features, cat_names])

len(expanded_feature_names)
648

Now extract importance values from the fitted tree.

importances = tree.named_steps["classifier"].feature_importances_

imp_df = pd.DataFrame({
    "feature": expanded_feature_names,
    "importance": importances
}).sort_values("importance", ascending=False)

imp_df.head(10)
feature importance
1 monthly_spend 0.609083
0 tenure_months 0.201316
643 contract_type_month-to-month 0.136050
2 support_calls 0.053552
435 customer_id_C100544 0.000000
427 customer_id_C100534 0.000000
428 customer_id_C100535 0.000000
429 customer_id_C100536 0.000000
430 customer_id_C100538 0.000000
431 customer_id_C100539 0.000000

Plot the top features.

import matplotlib.pyplot as plt

top = imp_df.head(10).iloc[::-1]

fig, ax = plt.subplots()
ax.barh(top["feature"], top["importance"])
ax.set_xlabel("Importance")
ax.set_title("Top Feature Importances (Decision Tree)")
plt.show()


Permutation Importance (Original Features)

Permutation importance answers a different question:

How much does performance drop if we shuffle one input feature?

When we compute permutation importance on a pipeline, the permutation happens on the original input columns, not on expanded one-hot columns.

That means the importance results align with:

X.columns

not with expanded feature names.

from sklearn.inspection import permutation_importance

result = permutation_importance(
    tree,
    X_test,
    y_test,
    n_repeats=10,
    random_state=42,
    scoring="f1"
)

perm_df = pd.DataFrame({
    "feature": X.columns,
    "importance_mean": result.importances_mean,
    "importance_std": result.importances_std
}).sort_values("importance_mean", ascending=False)

perm_df.head(10)
feature importance_mean importance_std
2 monthly_spend 0.051062 0.018469
3 support_calls 0.020419 0.014855
4 contract_type 0.014887 0.007450
1 tenure_months 0.009057 0.011963
0 customer_id 0.000000 0.000000
5 autopay 0.000000 0.000000

Plot the permutation importances.

top_p = perm_df.head(10).iloc[::-1]

fig, ax = plt.subplots()
ax.barh(top_p["feature"], top_p["importance_mean"])
ax.set_xlabel("Mean importance (F1 drop)")
ax.set_title("Top Permutation Importances (Original Features)")
plt.show()


Interpreting Importances Responsibly

Important does not mean causal.

A feature can appear important because:

  • it is correlated with the true driver
  • it encodes historical behavior
  • it captures an operational artifact
  • it is specific to this dataset

Use feature importance as a diagnostic tool.

Not as a causal claim.


Looking Ahead

In the next lesson, we close the free track.

We summarize what you can now do confidently, and what requires deeper study.