Chapter 07: Kubernetes (K8s) for MLOpsπ
"Kubernetes is the operating system for your cloud β it runs, scales, and heals your ML services automatically."
7.1 What is Kubernetes?π
Kubernetes (K8s) is an open-source container orchestration platform. It manages how Docker containers are deployed, scaled, and maintained across a cluster of machines.
Without vs With Kubernetesπ
WITHOUT K8s: WITH K8s:
Server 1: runs container βββββββββββββββββββββββββββ
Container crashes β π₯ down! β KUBERNETES CLUSTER β
Traffic spikes β π₯ slow! β β
Update β π₯ downtime! β ββββ ββββ ββββ ββββ β
β βP1β βP2β βP3β βP4β β
β ββββ ββββ ββββ ββββ β
β Auto-heal β
β
β Auto-scale β
β
β Zero-downtime β
β
βββββββββββββββββββββββββββ
7.2 Kubernetes Architectureπ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β KUBERNETES CLUSTER β
β β
β ββββββββββββββββββββββββββββββββββββββββ β
β β CONTROL PLANE β β
β β β β
β β ββββββββββββ ββββββββββββββββββββ β β
β β β API β β Scheduler β β β
β β β Server β β (where to place β β β
β β β β β new pods) β β β
β β ββββββββββββ ββββββββββββββββββββ β β
β β ββββββββββββ ββββββββββββββββββββ β β
β β βControllerβ β etcd β β β
β β β Manager β β (cluster state β β β
β β β β β database) β β β
β β ββββββββββββ ββββββββββββββββββββ β β
β ββββββββββββββββββββββββββββββββββββββββ β
β β
β ββββββββββββββββββ ββββββββββββββββββ ββββββββββββββββββ β
β β WORKER NODE β β WORKER NODE β β WORKER NODE β β
β β β β β β β β
β β ββββββ ββββββ β β ββββββ ββββββ β β ββββββ ββββββ β β
β β βPod β βPod β β β βPod β βPod β β β βPod β βPod β β β
β β ββββββ ββββββ β β ββββββ ββββββ β β ββββββ ββββββ β β
β β Kubelet β β Kubelet β β Kubelet β β
β ββββββββββββββββββ ββββββββββββββββββ ββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
7.3 Key Kubernetes Objectsπ
| Object | What It Does |
|---|---|
| Pod | Smallest unit β wraps one or more containers |
| Deployment | Manages Pods (desired state, rolling updates) |
| Service | Stable network endpoint to reach Pods |
| Ingress | HTTP routing / Load balancer |
| ConfigMap | Store non-secret config data |
| Secret | Store sensitive data (API keys, passwords) |
| HPA | Horizontal Pod Autoscaler β scales pods based on CPU/memory |
| Namespace | Logical isolation (dev, staging, prod) |
7.4 Kubernetes Objects Hierarchyπ
Cluster
βββ Namespace (prod)
βββ Deployment (ml-model)
β βββ ReplicaSet
β βββ Pod 1 [container: ml-model:v2]
β βββ Pod 2 [container: ml-model:v2]
β βββ Pod 3 [container: ml-model:v2]
βββ Service (ml-model-svc) β routes traffic to Pods
βββ Ingress β routes external HTTP
βββ ConfigMap (model-config)
βββ Secret (gcp-credentials)
7.5 Deploying ML Model to Kubernetesπ
Deployment YAMLπ
# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: ml-model
namespace: production
labels:
app: ml-model
version: v2
spec:
replicas: 3 # run 3 instances
selector:
matchLabels:
app: ml-model
strategy:
type: RollingUpdate # zero-downtime updates
rollingUpdate:
maxSurge: 1 # add 1 extra pod during update
maxUnavailable: 0 # never kill pod before new one is ready
template:
metadata:
labels:
app: ml-model
spec:
containers:
- name: ml-model
image: gcr.io/my-project/ml-model:v2
ports:
- containerPort: 8000
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
env:
- name: MODEL_VERSION
value: "v2"
- name: GCP_PROJECT
valueFrom:
secretKeyRef:
name: gcp-credentials
key: project_id
readinessProbe: # don't route traffic until ready
httpGet:
path: /health
port: 8000
initialDelaySeconds: 10
periodSeconds: 5
livenessProbe: # restart if unhealthy
httpGet:
path: /health
port: 8000
initialDelaySeconds: 30
periodSeconds: 10
Service YAMLπ
# k8s/service.yaml
apiVersion: v1
kind: Service
metadata:
name: ml-model-service
namespace: production
spec:
selector:
app: ml-model # routes to pods with this label
type: LoadBalancer # exposes externally (GCP creates a Load Balancer)
ports:
- protocol: TCP
port: 80 # external port
targetPort: 8000 # container port
Horizontal Pod Autoscalerπ
# k8s/hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: ml-model-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: ml-model
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70 # scale up when avg CPU > 70%
7.6 kubectl Commands Cheatsheetπ
# ββ Cluster info ββββββββββββββββββββββββββββββββββ
kubectl cluster-info
kubectl get nodes
# ββ Apply configs βββββββββββββββββββββββββββββββββ
kubectl apply -f k8s/deployment.yaml
kubectl apply -f k8s/ # apply all yamls in folder
# ββ View resources ββββββββββββββββββββββββββββββββ
kubectl get pods -n production
kubectl get deployments -n production
kubectl get services -n production
kubectl get all -n production
# ββ Debugging ββββββββββββββββββββββββββββββββββββ
kubectl logs pod/ml-model-xyz-abc -n production
kubectl describe pod ml-model-xyz-abc -n production
kubectl exec -it ml-model-xyz-abc -n production -- bash
# ββ Scaling βββββββββββββββββββββββββββββββββββββββ
kubectl scale deployment ml-model --replicas=5 -n production
# ββ Updates βββββββββββββββββββββββββββββββββββββββ
kubectl set image deployment/ml-model ml-model=gcr.io/my-project/ml-model:v3 -n production
kubectl rollout status deployment/ml-model -n production
# ββ Rollback ββββββββββββββββββββββββββββββββββββββ
kubectl rollout undo deployment/ml-model -n production
kubectl rollout history deployment/ml-model -n production
7.7 Namespaces for Environmentsπ
βββββββββββββββββββββββββββββββββββββββββββββ
β NAMESPACE ISOLATION β
β β
β namespace: dev β developers β
β namespace: staging β QA/testing β
β namespace: production β live traffic β
β β
β Each namespace has own: β
β βββ Deployments β
β βββ Services β
β βββ ConfigMaps β
β βββ Secrets β
βββββββββββββββββββββββββββββββββββββββββββββ
7.8 Rolling Update Flowπ
BEFORE UPDATE (3 pods running v1):
[v1] [v1] [v1]
ROLLING UPDATE to v2:
Step 1: Add v2 pod
[v1] [v1] [v1] [v2]
Step 2: Remove 1 v1 pod
[v1] [v1] [v2]
Step 3: Add another v2
[v1] [v1] [v2] [v2]
Step 4: Remove another v1
[v1] [v2] [v2]
Step 5: Complete
[v2] [v2] [v2]
β Zero downtime! Traffic served continuously.
Next Chapter β 08: Google Cloud Platform (GCP)