import sklearn
import pandas as pd
import numpy as np
sklearn.__version__'1.8.0'
Machine learning is often presented as a collection of algorithms.
Linear regression.
Logistic regression.
Decision trees.
Neural networks.
But predictive modeling is not a menu of techniques.
It is a structured workflow:
Design → Data → Model → Evaluation → Interpretation
The focus of this guide is not complexity.
The focus is discipline.
This guide introduces the foundations of supervised machine learning:
This is the free foundational track.
Advanced modeling, tuning, ensembles, and deployment belong to the premium track.
This guide does not:
Models produce predictions.
Interpretation requires judgment.
You should be comfortable with:
If not, complete the CDI Data Science Free Track first.
This project uses a virtual environment to isolate dependencies.
#| label: create-environment
bash scripts/setup-env.sh
source .venv/bin/activate
This installs:
#| label: generate-dataset
python scripts/make-cdi-customer-churn.py
The dataset will be saved to:
data/ml-ready/cdi-customer-churn.csv
This dataset will be used throughout the guide.
import sklearn
import pandas as pd
import numpy as np
sklearn.__version__'1.8.0'
If a version number prints, the environment is working correctly.
To render the Quarto book locally:
#| label: render-book
bash scripts/build-all.sh
The rendered output will appear in:
docs/
Navigation is handled automatically by Quarto.
Do not rush.
Most machine learning errors occur in:
The next lesson begins with the most important skill:
Framing the prediction problem correctly.