Chapter 32: MLOps Security🔗

"An insecure ML system is a liability. Secure MLOps protects your data, your models, and your users."

32.1 MLOps Security Threat Landscape🔗

THREATS TO ML SYSTEMS:

  DATA THREATS                     MODEL THREATS
  ├── Data poisoning                ├── Model theft/extraction
  │   (inject bad training data)    │   (adversary queries → steal model)
  ├── Training data leakage         ├── Adversarial attacks
  │   (PII in model weights)        │   (crafted inputs → wrong output)
  └── Unauthorized access           ├── Model inversion
      to datasets                   │   (recover training data from model)
                                    └── Prompt injection (LLMs)

  INFRASTRUCTURE THREATS            SUPPLY CHAIN THREATS
  ├── Compromised CI/CD             ├── Malicious dependencies
  ├── Container vulnerabilities     ├── Compromised base images
  ├── K8s RBAC misconfiguration     └── Poisoned datasets from 3rd party
  └── Exposed API endpoints

32.2 IAM & Access Control🔗

Principle of Least Privilege: Grant only the minimum permissions needed.

# GCP: Create service account with minimum permissions

# Training job SA — only needs storage and Vertex AI training
gcloud iam service-accounts create ml-trainer-sa

gcloud projects add-iam-policy-binding my-project \
  --member="serviceAccount:ml-trainer-sa@my-project.iam.gserviceaccount.com" \
  --role="roles/storage.objectViewer"  # read training data

gcloud projects add-iam-policy-binding my-project \
  --member="serviceAccount:ml-trainer-sa@my-project.iam.gserviceaccount.com" \
  --role="roles/storage.objectCreator"  # write model artifacts

gcloud projects add-iam-policy-binding my-project \
  --member="serviceAccount:ml-trainer-sa@my-project.iam.gserviceaccount.com" \
  --role="roles/aiplatform.user"  # submit Vertex AI jobs

# Serving SA — read-only access to model artifacts
gcloud iam service-accounts create ml-serving-sa

gcloud projects add-iam-policy-binding my-project \
  --member="serviceAccount:ml-serving-sa@my-project.iam.gserviceaccount.com" \
  --role="roles/storage.objectViewer"

Kubernetes RBAC for ML🔗

# k8s/rbac.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: ml-model-role
  namespace: production
rules:
  # Only allow reading pods (for health checks)
  - apiGroups: [""]
    resources: ["pods"]
    verbs: ["get", "list"]
  # No write access to deployments — only CI/CD has that
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: ml-model-binding
  namespace: production
subjects:
  - kind: ServiceAccount
    name: ml-serving-sa
roleRef:
  kind: Role
  name: ml-model-role
  apiGroup: rbac.authorization.k8s.io

32.3 Secrets Management🔗

Never hard-code API keys, passwords, or credentials in code or Docker images.

# Use Kubernetes Secrets
kubectl create secret generic ml-secrets \
  --from-literal=mlflow-uri="http://mlflow:5000" \
  --from-literal=gcp-project="my-project" \
  --from-file=gcp-sa-key=./sa-key.json \
  -n production

# Use secrets in deployment
spec:
  containers:
  - name: ml-model
    env:
    - name: MLFLOW_TRACKING_URI
      valueFrom:
        secretKeyRef:
          name: ml-secrets
          key: mlflow-uri
    - name: GCP_PROJECT
      valueFrom:
        secretKeyRef:
          name: ml-secrets
          key: gcp-project
    volumeMounts:
    - name: gcp-key
      mountPath: /secrets
      readOnly: true
  volumes:
  - name: gcp-key
    secret:
      secretName: ml-secrets

// In Jenkins: use credentials binding
withCredentials([
    string(credentialsId: 'mlflow-uri', variable: 'MLFLOW_URI'),
    file(credentialsId: 'gcp-sa-key', variable: 'GOOGLE_APPLICATION_CREDENTIALS'),
]) {
    sh 'python src/train.py'
}

Never put secrets in:
- Git repositories (even private)
- Docker images (use build args or runtime injection)
- CI/CD logs
- Environment variables in Dockerfiles

32.4 Container Security🔗

# Security best practices in Dockerfile

# 1. Use specific image versions (not :latest)
FROM python:3.10.12-slim

# 2. Run as non-root user
RUN useradd --create-home --uid 1001 appuser
USER appuser

# 3. Read-only filesystem
# (set in K8s: securityContext.readOnlyRootFilesystem: true)

# 4. Don't copy unnecessary files
COPY --chown=appuser:appuser src/ ./src/
COPY --chown=appuser:appuser models/ ./models/
# DON'T: COPY . . (copies .env, keys, etc)

# K8s Security Context
spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 1001
    readOnlyRootFilesystem: true
    allowPrivilegeEscalation: false
    seccompProfile:
      type: RuntimeDefault
    capabilities:
      drop: ["ALL"]

32.5 Network Security🔗

# K8s NetworkPolicy — restrict communication
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: ml-model-network-policy
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: ml-model

  # Allow incoming traffic only from API gateway
  ingress:
    - from:
        - podSelector:
            matchLabels:
              role: api-gateway
      ports:
        - port: 8000

  # Allow outgoing traffic only to feature store and MLflow
  egress:
    - to:
        - podSelector:
            matchLabels:
              app: feature-store
      ports:
        - port: 6566

32.6 Audit Logging🔗

# Log all predictions for audit trail
import logging
import json
from datetime import datetime

audit_logger = logging.getLogger("audit")
audit_handler = logging.FileHandler("logs/audit.log")
audit_logger.addHandler(audit_handler)

@app.post("/predict")
def predict(req: PredictionRequest, request: Request):
    response = make_prediction(req)

    # Audit log every prediction
    audit_logger.info(json.dumps({
        "timestamp": datetime.utcnow().isoformat(),
        "request_id": request.headers.get("X-Request-ID"),
        "customer_id": req.customer_id,
        "features": req.dict(),
        "prediction": response.prediction,
        "confidence": response.confidence,
        "model_version": MODEL_VERSION,
        "user_agent": request.headers.get("User-Agent"),
    }))

    return response

Chapter 33: Model Governance & Compliance🔗

"Governance turns MLOps from a technical discipline into an organizational one — with accountability at every step."

33.1 Model Governance Framework🔗

┌──────────────────────────────────────────────────────────────────┐
│                  MODEL GOVERNANCE FRAMEWORK                      │
│                                                                  │
│  DEVELOP          VALIDATE          APPROVE          OPERATE     │
│  ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐   │
│  │ Build    │───▶│ Test     │───▶│ Review   │───▶│ Monitor  │   │
│  │ Document │    │ Evaluate │    │ Approve  │    │ Audit    │   │
│  │ Version  │    │ Bias chk │    │ Sign off │    │ Alert    │   │
│  └──────────┘    └──────────┘    └──────────┘    └──────────┘   │
│                                                                  │
│  ARTIFACTS:                                                      │
│  ├── Model Card          ← What does this model do?             │
│  ├── Data Lineage        ← Where did training data come from?   │
│  ├── Evaluation Report   ← Performance + fairness metrics       │
│  ├── Approval Sign-off   ← Who approved this for production?    │
│  └── Monitoring Alerts   ← What triggers re-review?            │
└──────────────────────────────────────────────────────────────────┘

33.2 EU AI Act Relevance for MLOps🔗

The EU AI Act (in force August 2024) classifies AI systems by risk and mandates different requirements.

RISK LEVELS:
  ├── Unacceptable Risk → PROHIBITED
  │     (social scoring, biometric mass surveillance)
  │
  ├── High Risk → STRICT REQUIREMENTS
  │     (credit scoring, hiring, medical, critical infrastructure)
  │     Requirements: explainability, bias testing, human oversight,
  │                   documentation, registration in EU database
  │
  ├── Limited Risk → TRANSPARENCY OBLIGATIONS
  │     (chatbots must disclose they're AI)
  │
  └── Minimal Risk → Voluntary code of conduct
        (spam filters, recommendation engines)

MLOps IMPLICATIONS for High-Risk AI:
  ✅ Log all training data sources (data lineage)
  ✅ Document model design and training process
  ✅ Test for bias across demographics
  ✅ Maintain performance metrics history
  ✅ Enable human override of model decisions
  ✅ Provide explanations for individual decisions
  ✅ Alert when model performance degrades
  ✅ Audit trail of all model changes

33.3 ML Metadata & Lineage Tracking🔗

# MLMD — ML Metadata (used by TFX and Vertex AI Pipelines)
import ml_metadata as mlmd
from ml_metadata.metadata_store import metadata_store
from ml_metadata.proto import metadata_store_pb2

# Connect to metadata store
connection_config = metadata_store_pb2.ConnectionConfig()
connection_config.sqlite.filename_uri = "mlmd.db"
store = metadata_store.MetadataStore(connection_config)

# Register artifact types
dataset_type = metadata_store_pb2.ArtifactType(name="Dataset")
dataset_type.properties["source"] = metadata_store_pb2.STRING
dataset_type.properties["num_rows"] = metadata_store_pb2.INT
store.put_artifact_type(dataset_type)

# Log artifacts and executions
dataset = metadata_store_pb2.Artifact(
    type_id=store.get_artifact_type("Dataset").id,
    uri="gs://my-bucket/data/train.csv",
    properties={
        "source": metadata_store_pb2.Value(string_value="BigQuery"),
        "num_rows": metadata_store_pb2.Value(int_value=150000),
    }
)
dataset_id = store.put_artifacts([dataset])[0]

# Now you can trace: which model used which data

33.4 Model Approval Workflow🔗

Code + Data Changes
        │
        ▼
  Automated CI checks
  (tests, data validation, training)
        │
        ▼
  Auto-evaluation report
  (accuracy, fairness, SHAP)
        │
        ▼
  ┌─────────────────────┐
  │ PEER REVIEW         │
  │ ML Engineer review  │
  │ Data Scientist sign │
  └─────────────────────┘
        │
        ▼
  ┌─────────────────────┐
  │ GOVERNANCE REVIEW   │
  │ (for High-Risk AI)  │
  │ Risk Officer review │
  │ Legal team sign-off │
  └─────────────────────┘
        │
        ▼
  Deploy to Staging → Test → Deploy to Production
  (tagged in Git with who approved what)

Next → Chapter 34: LLMOps