Chapter 10: Experiment Tracking with MLflow & DVC🔗
"You can't improve what you don't measure. Experiment tracking is your ML lab notebook."
10.1 The Problem Without Tracking🔗
WITHOUT experiment tracking:
- Tried XGBoost with lr=0.1 last Tuesday... what accuracy did I get?
- Which model did we deploy last month?
- My colleague's model was better — what were their settings?
- We need to reproduce model v3... where's the data?
WITH MLflow:
✅ Every run logged: params, metrics, artifacts
✅ Compare runs visually in the UI
✅ Register and version models
✅ Reproduce any experiment anytime
10.2 What is MLflow?🔗
MLflow is an open-source platform for managing the ML lifecycle. It has 4 core components:
┌───────────────────────────────────────────────────────────┐
│ MLFLOW COMPONENTS │
│ │
│ ┌──────────────────┐ ┌──────────────────────────────┐ │
│ │ MLflow Tracking │ │ MLflow Projects │ │
│ │ │ │ │ │
│ │ Log experiments:│ │ Reproducible runs: │ │
│ │ - Parameters │ │ - MLproject file │ │
│ │ - Metrics │ │ - Conda/Docker env │ │
│ │ - Artifacts │ │ - Run with: mlflow run . │ │
│ └──────────────────┘ └──────────────────────────────┘ │
│ ┌──────────────────┐ ┌──────────────────────────────┐ │
│ │ MLflow Models │ │ MLflow Model Registry │ │
│ │ │ │ │ │
│ │ Standard format │ │ Lifecycle stages: │ │
│ │ for packaging: │ │ None → Staging → Production │ │
│ │ - sklearn │ │ + Version control │ │
│ │ - pytorch │ │ + Annotations │ │
│ │ - tensorflow │ │ │ │
│ └──────────────────┘ └──────────────────────────────┘ │
└───────────────────────────────────────────────────────────┘
10.3 MLflow Tracking — Logging Experiments🔗
import mlflow
import mlflow.sklearn
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score, f1_score, roc_auc_score
# Set the tracking server (or use local ./mlruns by default)
mlflow.set_tracking_uri("http://mlflow-server:5000")
mlflow.set_experiment("churn-prediction")
# Start an experiment run
with mlflow.start_run(run_name="GBM-experiment-v3"):
# ── Log parameters ──────────────────────────────
params = {
"n_estimators": 200,
"learning_rate": 0.05,
"max_depth": 5,
"subsample": 0.8,
}
mlflow.log_params(params)
# ── Train the model ─────────────────────────────
model = GradientBoostingClassifier(**params)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
y_proba = model.predict_proba(X_test)[:, 1]
# ── Log metrics ─────────────────────────────────
mlflow.log_metrics({
"accuracy": accuracy_score(y_test, y_pred),
"f1_score": f1_score(y_test, y_pred),
"auc_roc": roc_auc_score(y_test, y_proba),
})
# ── Log the model artifact ──────────────────────
mlflow.sklearn.log_model(
model,
artifact_path="model",
registered_model_name="churn-classifier"
)
# ── Log other artifacts ─────────────────────────
mlflow.log_artifact("reports/confusion_matrix.png")
mlflow.log_artifact("data/processed/features.csv")
print("Run ID:", mlflow.active_run().info.run_id)
10.4 MLflow Tracking UI🔗
MLflow UI (http://localhost:5000):
Experiments
└── churn-prediction
├── Run: GBM-v1 params: lr=0.1, n=100 accuracy: 0.82 ← old
├── Run: GBM-v2 params: lr=0.05, n=200 accuracy: 0.87 ← better
└── Run: GBM-v3 params: lr=0.05, n=200 accuracy: 0.89 ← BEST ✓
│
├── Parameters: {n_estimators: 200, lr: 0.05, ...}
├── Metrics: {accuracy: 0.89, f1: 0.87, auc: 0.93}
└── Artifacts: model.pkl, confusion_matrix.png
10.5 Model Registry🔗
The MLflow Model Registry provides lifecycle management for production models.
Model Lifecycle:
Training ──▶ None ──▶ Staging ──▶ Production ──▶ Archived
│ │ │
New model Testing Live traffic
(unreviewed) (QA) (serving)
from mlflow.tracking import MlflowClient
client = MlflowClient()
# Register a new model version
result = mlflow.register_model(
model_uri=f"runs:/{run_id}/model",
name="churn-classifier"
)
# Transition to Staging
client.transition_model_version_stage(
name="churn-classifier",
version=result.version,
stage="Staging"
)
# After testing passes, promote to Production
client.transition_model_version_stage(
name="churn-classifier",
version=result.version,
stage="Production"
)
# Load production model anywhere
model = mlflow.sklearn.load_model("models:/churn-classifier/Production")
prediction = model.predict(new_data)
10.6 MLflow + Jenkins Integration🔗
// In Jenkinsfile: evaluate model and gate deployment
stage('Evaluate & Register Model') {
steps {
sh '''
python -c "
import mlflow
from mlflow.tracking import MlflowClient
client = MlflowClient()
# Get latest run
experiment = client.get_experiment_by_name('churn-prediction')
runs = client.search_runs(
experiment_ids=[experiment.experiment_id],
order_by=['metrics.accuracy DESC'],
max_results=1
)
best_run = runs[0]
accuracy = best_run.data.metrics['accuracy']
print(f'Best accuracy: {accuracy}')
# Gate: only register if better than threshold
assert accuracy >= 0.85, f'Model accuracy {accuracy} below threshold!'
# Register model
mlflow.register_model(
model_uri=f'runs:/{best_run.info.run_id}/model',
name='churn-classifier'
)
print('Model registered successfully!')
"
'''
}
}
10.7 Comparing Experiments🔗
# Compare multiple runs programmatically
import mlflow
import pandas as pd
client = mlflow.tracking.MlflowClient()
experiment = client.get_experiment_by_name("churn-prediction")
runs = client.search_runs(
experiment_ids=[experiment.experiment_id],
order_by=["metrics.accuracy DESC"]
)
# Build comparison dataframe
comparison = pd.DataFrame([{
"run_id": r.info.run_id[:8],
"run_name": r.data.tags.get("mlflow.runName"),
"accuracy": r.data.metrics.get("accuracy"),
"f1_score": r.data.metrics.get("f1_score"),
"n_estimators": r.data.params.get("n_estimators"),
"learning_rate": r.data.params.get("learning_rate"),
} for r in runs])
print(comparison.to_string(index=False))
# Output:
# run_id run_name accuracy f1_score n_estimators learning_rate
# a3f1bc GBM-v3 0.8900 0.8700 200 0.05 ← BEST
# 7e2d9a GBM-v2 0.8700 0.8500 200 0.10
# 2f4c11 RF-v1 0.8200 0.8000 100 None
10.8 MLflow Setup (Docker Compose)🔗
# docker-compose.yml addition for MLflow
mlflow:
image: ghcr.io/mlflow/mlflow:latest
ports:
- "5000:5000"
environment:
- MLFLOW_BACKEND_STORE_URI=postgresql://mlflow:password@db:5432/mlflow
- MLFLOW_ARTIFACT_ROOT=gs://my-project/mlflow-artifacts
command: >
mlflow server
--backend-store-uri postgresql://mlflow:password@db:5432/mlflow
--artifact-root gs://my-project/mlflow-artifacts
--host 0.0.0.0
--port 5000
depends_on:
- db
db:
image: postgres:15
environment:
POSTGRES_USER: mlflow
POSTGRES_PASSWORD: password
POSTGRES_DB: mlflow
Next Chapter → 11: Model Monitoring