18 Kubernetes

Chapter 07: Kubernetes (K8s) for MLOpsπŸ”—

"Kubernetes is the operating system for your cloud β€” it runs, scales, and heals your ML services automatically."


7.1 What is Kubernetes?πŸ”—

Kubernetes (K8s) is an open-source container orchestration platform. It manages how Docker containers are deployed, scaled, and maintained across a cluster of machines.

Without vs With KubernetesπŸ”—

WITHOUT K8s:                        WITH K8s:
  Server 1: runs container          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  Container crashes β†’ πŸ’₯ down!      β”‚   KUBERNETES CLUSTER    β”‚
  Traffic spikes β†’ πŸ’₯ slow!         β”‚                         β”‚
  Update β†’ πŸ’₯ downtime!             β”‚  β”Œβ”€β”€β” β”Œβ”€β”€β” β”Œβ”€β”€β” β”Œβ”€β”€β”   β”‚
                                    β”‚  β”‚P1β”‚ β”‚P2β”‚ β”‚P3β”‚ β”‚P4β”‚   β”‚
                                    β”‚  β””β”€β”€β”˜ β””β”€β”€β”˜ β””β”€β”€β”˜ β””β”€β”€β”˜   β”‚
                                    β”‚  Auto-heal βœ…            β”‚
                                    β”‚  Auto-scale βœ…           β”‚
                                    β”‚  Zero-downtime βœ…        β”‚
                                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

7.2 Kubernetes ArchitectureπŸ”—

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                       KUBERNETES CLUSTER                             β”‚
β”‚                                                                      β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                            β”‚
β”‚  β”‚           CONTROL PLANE              β”‚                            β”‚
β”‚  β”‚                                      β”‚                            β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚                            β”‚
β”‚  β”‚  β”‚  API     β”‚  β”‚   Scheduler      β”‚  β”‚                            β”‚
β”‚  β”‚  β”‚ Server   β”‚  β”‚ (where to place  β”‚  β”‚                            β”‚
β”‚  β”‚  β”‚          β”‚  β”‚  new pods)       β”‚  β”‚                            β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚                            β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚                            β”‚
β”‚  β”‚  β”‚Controllerβ”‚  β”‚   etcd           β”‚  β”‚                            β”‚
β”‚  β”‚  β”‚ Manager  β”‚  β”‚ (cluster state   β”‚  β”‚                            β”‚
β”‚  β”‚  β”‚          β”‚  β”‚  database)       β”‚  β”‚                            β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚                            β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                            β”‚
β”‚                                                                      β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”          β”‚
β”‚  β”‚   WORKER NODE  β”‚  β”‚   WORKER NODE  β”‚  β”‚   WORKER NODE  β”‚          β”‚
β”‚  β”‚                β”‚  β”‚                β”‚  β”‚                β”‚          β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β” β”‚  β”‚  β”Œβ”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β” β”‚  β”‚  β”Œβ”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β” β”‚          β”‚
β”‚  β”‚  β”‚Pod β”‚ β”‚Pod β”‚ β”‚  β”‚  β”‚Pod β”‚ β”‚Pod β”‚ β”‚  β”‚  β”‚Pod β”‚ β”‚Pod β”‚ β”‚          β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”˜ β”‚  β”‚  β””β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”˜ β”‚  β”‚  β””β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”˜ β”‚          β”‚
β”‚  β”‚  Kubelet        β”‚  β”‚  Kubelet        β”‚  β”‚  Kubelet        β”‚          β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

7.3 Key Kubernetes ObjectsπŸ”—

Object What It Does
Pod Smallest unit β€” wraps one or more containers
Deployment Manages Pods (desired state, rolling updates)
Service Stable network endpoint to reach Pods
Ingress HTTP routing / Load balancer
ConfigMap Store non-secret config data
Secret Store sensitive data (API keys, passwords)
HPA Horizontal Pod Autoscaler β€” scales pods based on CPU/memory
Namespace Logical isolation (dev, staging, prod)

7.4 Kubernetes Objects HierarchyπŸ”—

Cluster
  └── Namespace (prod)
        β”œβ”€β”€ Deployment (ml-model)
        β”‚     └── ReplicaSet
        β”‚           β”œβ”€β”€ Pod 1 [container: ml-model:v2]
        β”‚           β”œβ”€β”€ Pod 2 [container: ml-model:v2]
        β”‚           └── Pod 3 [container: ml-model:v2]
        β”œβ”€β”€ Service (ml-model-svc)  β†’ routes traffic to Pods
        β”œβ”€β”€ Ingress                 β†’ routes external HTTP
        β”œβ”€β”€ ConfigMap (model-config)
        └── Secret (gcp-credentials)

7.5 Deploying ML Model to KubernetesπŸ”—

Deployment YAMLπŸ”—

# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ml-model
  namespace: production
  labels:
    app: ml-model
    version: v2

spec:
  replicas: 3                    # run 3 instances

  selector:
    matchLabels:
      app: ml-model

  strategy:
    type: RollingUpdate          # zero-downtime updates
    rollingUpdate:
      maxSurge: 1                # add 1 extra pod during update
      maxUnavailable: 0          # never kill pod before new one is ready

  template:
    metadata:
      labels:
        app: ml-model

    spec:
      containers:
      - name: ml-model
        image: gcr.io/my-project/ml-model:v2
        ports:
        - containerPort: 8000

        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"

        env:
        - name: MODEL_VERSION
          value: "v2"
        - name: GCP_PROJECT
          valueFrom:
            secretKeyRef:
              name: gcp-credentials
              key: project_id

        readinessProbe:          # don't route traffic until ready
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 10
          periodSeconds: 5

        livenessProbe:           # restart if unhealthy
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 30
          periodSeconds: 10

Service YAMLπŸ”—

# k8s/service.yaml
apiVersion: v1
kind: Service
metadata:
  name: ml-model-service
  namespace: production

spec:
  selector:
    app: ml-model             # routes to pods with this label

  type: LoadBalancer          # exposes externally (GCP creates a Load Balancer)

  ports:
  - protocol: TCP
    port: 80                  # external port
    targetPort: 8000          # container port

Horizontal Pod AutoscalerπŸ”—

# k8s/hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: ml-model-hpa
  namespace: production

spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: ml-model

  minReplicas: 2
  maxReplicas: 10

  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70    # scale up when avg CPU > 70%

7.6 kubectl Commands CheatsheetπŸ”—

# ── Cluster info ──────────────────────────────────
kubectl cluster-info
kubectl get nodes

# ── Apply configs ─────────────────────────────────
kubectl apply -f k8s/deployment.yaml
kubectl apply -f k8s/                    # apply all yamls in folder

# ── View resources ────────────────────────────────
kubectl get pods -n production
kubectl get deployments -n production
kubectl get services -n production
kubectl get all -n production

# ── Debugging ────────────────────────────────────
kubectl logs pod/ml-model-xyz-abc -n production
kubectl describe pod ml-model-xyz-abc -n production
kubectl exec -it ml-model-xyz-abc -n production -- bash

# ── Scaling ───────────────────────────────────────
kubectl scale deployment ml-model --replicas=5 -n production

# ── Updates ───────────────────────────────────────
kubectl set image deployment/ml-model ml-model=gcr.io/my-project/ml-model:v3 -n production
kubectl rollout status deployment/ml-model -n production

# ── Rollback ──────────────────────────────────────
kubectl rollout undo deployment/ml-model -n production
kubectl rollout history deployment/ml-model -n production

7.7 Namespaces for EnvironmentsπŸ”—

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚          NAMESPACE ISOLATION              β”‚
β”‚                                           β”‚
β”‚  namespace: dev       β†’ developers        β”‚
β”‚  namespace: staging   β†’ QA/testing        β”‚
β”‚  namespace: production β†’ live traffic     β”‚
β”‚                                           β”‚
β”‚  Each namespace has own:                  β”‚
β”‚  β”œβ”€β”€ Deployments                          β”‚
β”‚  β”œβ”€β”€ Services                             β”‚
β”‚  β”œβ”€β”€ ConfigMaps                           β”‚
β”‚  └── Secrets                              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

7.8 Rolling Update FlowπŸ”—

BEFORE UPDATE (3 pods running v1):
  [v1] [v1] [v1]

ROLLING UPDATE to v2:
Step 1: Add v2 pod
  [v1] [v1] [v1] [v2]

Step 2: Remove 1 v1 pod
  [v1] [v1] [v2]

Step 3: Add another v2
  [v1] [v1] [v2] [v2]

Step 4: Remove another v1
  [v1] [v2] [v2]

Step 5: Complete
  [v2] [v2] [v2]

β†’ Zero downtime! Traffic served continuously.

Next Chapter β†’ 08: Google Cloud Platform (GCP)