Appendix: Troubleshooting and Extra Notes

Common Rendering Issues

Cause: Each Quarto chapter runs in a fresh Python session.

Solution: Ensure every chapter is self-contained: - load the dataset - split the data - define preprocessing - define the model

Do not rely on variables from previous chapters.

Cause: Mixing expanded one-hot feature names with original column names.

Solution: - Built-in tree feature importance → use expanded feature names
- Permutation importance → use original X.columns

These represent different feature spaces.

Cause: Virtual environment not activated.

Solution:

#| label: activate-env
source .venv/bin/activate

Verify installation:

import sklearn
import pandas

To ensure consistent results:

For more robust validation, consider:

Applied machine learning rarely ends at evaluation.

In real settings, you often need to reuse a trained pipeline outside the notebook.

Deployment does not automatically mean cloud infrastructure.

A practical first step is being able to:

The format matters less than the habit.

Reproducibility and versioning are more important than the specific deployment tool.

Accuracy is not always sufficient.

Always ask:

Metrics should reflect consequences.

Feature importance indicates influence on prediction within this dataset.

It does not imply:

Treat feature importance as a diagnostic tool.

Not as a claim.

If you apply this workflow to real-world data, expect additional challenges:

The structured workflow still applies:

Design → Data → Model → Evaluation → Interpretation

Machine learning is powerful when disciplined.

The workflow you learned here is more important than any single algorithm.

Structure first.
Then complexity.