Chapter 06: Docker for MLOps🔗
"Docker solves the 'it works on my machine' problem forever."
6.1 What is Docker?🔗
Docker is a platform for packaging applications and their dependencies into lightweight, portable units called containers.
Virtual Machine vs Container🔗
VIRTUAL MACHINE: CONTAINER:
┌───────────────────────┐ ┌───────────────────────┐
│ App A │ App B │ │ App A │ App B │
│─────────┴─────────────│ │─────────┴─────────────│
│ Libs A │ Libs B │ │ Libs A │ Libs B │
│─────────┴─────────────│ │─────────────────────── │
│ Guest OS (full) │ │ Docker Engine (thin) │
│ Guest OS (full) │ │─────────────────────── │
│───────────────────────│ │ Host OS │
│ Hypervisor │ └───────────────────────┘
│ Host OS │
└───────────────────────┘
VMs: Heavy (GBs), slow boot Containers: Lightweight (MBs), instant start
6.2 Core Docker Concepts🔗
| Concept | What It Is |
|---|---|
| Image | Read-only blueprint (like a recipe) |
| Container | Running instance of an image (like a cooked meal) |
| Dockerfile | Instructions to build an image |
| Registry | Storage for images (Docker Hub, GCR) |
| Volume | Persistent storage attached to containers |
| Network | Communication between containers |
6.3 Dockerfile for ML Model Serving🔗
Dockerfile layers (each instruction = a layer):
┌────────────────────────────────────────┐
│ COPY . /app ← your code │
├────────────────────────────────────────┤
│ RUN pip install -r req.txt ← libs │
├────────────────────────────────────────┤
│ WORKDIR /app ← work dir │
├────────────────────────────────────────┤
│ FROM python:3.10-slim ← base image │
└────────────────────────────────────────┘
Production ML Dockerfile🔗
# Dockerfile
# ─── Stage 1: Build dependencies ───────────────────────────────
FROM python:3.10-slim AS builder
WORKDIR /app
# Install dependencies first (cached if requirements.txt unchanged)
COPY requirements.txt .
RUN pip install --no-cache-dir --prefix=/install -r requirements.txt
# ─── Stage 2: Production image ──────────────────────────────────
FROM python:3.10-slim AS production
WORKDIR /app
# Copy installed packages from builder stage
COPY --from=builder /install /usr/local
# Copy application code
COPY src/ ./src/
COPY models/ ./models/
COPY config/ ./config/
# Non-root user (security best practice)
RUN useradd --create-home appuser
USER appuser
# Expose API port
EXPOSE 8000
# Health check
HEALTHCHECK --interval=30s --timeout=3s \
CMD curl -f http://localhost:8000/health || exit 1
# Start model server
CMD ["uvicorn", "src.serve:app", "--host", "0.0.0.0", "--port", "8000"]
6.4 ML Model Server (FastAPI)🔗
# src/serve.py
from fastapi import FastAPI
from pydantic import BaseModel
import pickle
import numpy as np
app = FastAPI(title="ML Model API", version="1.0")
# Load model at startup
with open("models/model.pkl", "rb") as f:
model = pickle.load(f)
class PredictionRequest(BaseModel):
features: list[float]
class PredictionResponse(BaseModel):
prediction: float
confidence: float
@app.get("/health")
def health_check():
return {"status": "healthy"}
@app.post("/predict", response_model=PredictionResponse)
def predict(request: PredictionRequest):
features = np.array(request.features).reshape(1, -1)
prediction = model.predict(features)[0]
confidence = model.predict_proba(features).max()
return PredictionResponse(prediction=prediction, confidence=confidence)
6.5 Docker Commands Cheatsheet🔗
# ── Build ──────────────────────────────────────────
# Build image from Dockerfile in current directory
docker build -t my-ml-model:v1 .
# Build with build arguments
docker build --build-arg MODEL_VERSION=v2 -t my-ml-model:v2 .
# ── Run ───────────────────────────────────────────
# Run container (foreground)
docker run -p 8000:8000 my-ml-model:v1
# Run in background (detached)
docker run -d -p 8000:8000 --name ml-server my-ml-model:v1
# Run with volume mount (for model updates)
docker run -d -p 8000:8000 -v $(pwd)/models:/app/models my-ml-model:v1
# ── Inspect ────────────────────────────────────────
docker ps # running containers
docker logs ml-server # view logs
docker exec -it ml-server bash # shell into container
docker inspect ml-server # detailed config
# ── Registry ───────────────────────────────────────
docker tag my-ml-model:v1 gcr.io/my-project/ml-model:v1
docker push gcr.io/my-project/ml-model:v1
docker pull gcr.io/my-project/ml-model:v1
# ── Clean up ───────────────────────────────────────
docker stop ml-server
docker rm ml-server
docker rmi my-ml-model:v1
docker system prune -a # remove ALL unused resources
6.6 Docker Compose for Multi-Service ML Stack🔗
# docker-compose.yml
version: '3.8'
services:
# ML Model API
ml-api:
build: .
ports:
- "8000:8000"
volumes:
- ./models:/app/models
environment:
- MODEL_PATH=/app/models/model.pkl
- LOG_LEVEL=INFO
depends_on:
- redis
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
# MLflow Tracking Server
mlflow:
image: ghcr.io/mlflow/mlflow:latest
ports:
- "5000:5000"
volumes:
- mlflow_data:/mlflow
command: mlflow server --host 0.0.0.0 --port 5000
# Redis — feature cache
redis:
image: redis:7-alpine
ports:
- "6379:6379"
# Prometheus — metrics collection
prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml
# Grafana — dashboards
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
depends_on:
- prometheus
volumes:
mlflow_data:
# Start the entire stack
docker-compose up -d
# Scale the API to 3 instances
docker-compose up -d --scale ml-api=3
# Stop everything
docker-compose down
6.7 Docker Layer Caching (Speed Optimization)🔗
# SLOW: copies code first, then installs (no caching)
COPY . /app
RUN pip install -r requirements.txt
# FAST: install deps first (cached unless requirements.txt changes)
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . /app ← only this layer rebuilds on code changes
6.8 Container Registries🔗
┌────────────────────────────────────────────────────────┐
│ CONTAINER REGISTRIES │
│ │
│ Docker Hub → hub.docker.com (public images) │
│ GCR → gcr.io (Google Cloud Registry) │
│ GAR → pkg.dev (Google Artifact Registry) │
│ ECR → AWS Elastic Container Registry │
│ ACR → Azure Container Registry │
│ GHCR → ghcr.io (GitHub Container Registry) │
└────────────────────────────────────────────────────────┘
Next Chapter → 07: Kubernetes