Chapter 32: MLOps Securityπ
"An insecure ML system is a liability. Secure MLOps protects your data, your models, and your users."
32.1 MLOps Security Threat Landscapeπ
THREATS TO ML SYSTEMS:
DATA THREATS MODEL THREATS
βββ Data poisoning βββ Model theft/extraction
β (inject bad training data) β (adversary queries β steal model)
βββ Training data leakage βββ Adversarial attacks
β (PII in model weights) β (crafted inputs β wrong output)
βββ Unauthorized access βββ Model inversion
to datasets β (recover training data from model)
βββ Prompt injection (LLMs)
INFRASTRUCTURE THREATS SUPPLY CHAIN THREATS
βββ Compromised CI/CD βββ Malicious dependencies
βββ Container vulnerabilities βββ Compromised base images
βββ K8s RBAC misconfiguration βββ Poisoned datasets from 3rd party
βββ Exposed API endpoints
32.2 IAM & Access Controlπ
Principle of Least Privilege: Grant only the minimum permissions needed.
# GCP: Create service account with minimum permissions
# Training job SA β only needs storage and Vertex AI training
gcloud iam service-accounts create ml-trainer-sa
gcloud projects add-iam-policy-binding my-project \
--member="serviceAccount:ml-trainer-sa@my-project.iam.gserviceaccount.com" \
--role="roles/storage.objectViewer" # read training data
gcloud projects add-iam-policy-binding my-project \
--member="serviceAccount:ml-trainer-sa@my-project.iam.gserviceaccount.com" \
--role="roles/storage.objectCreator" # write model artifacts
gcloud projects add-iam-policy-binding my-project \
--member="serviceAccount:ml-trainer-sa@my-project.iam.gserviceaccount.com" \
--role="roles/aiplatform.user" # submit Vertex AI jobs
# Serving SA β read-only access to model artifacts
gcloud iam service-accounts create ml-serving-sa
gcloud projects add-iam-policy-binding my-project \
--member="serviceAccount:ml-serving-sa@my-project.iam.gserviceaccount.com" \
--role="roles/storage.objectViewer"
Kubernetes RBAC for MLπ
# k8s/rbac.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: ml-model-role
namespace: production
rules:
# Only allow reading pods (for health checks)
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list"]
# No write access to deployments β only CI/CD has that
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: ml-model-binding
namespace: production
subjects:
- kind: ServiceAccount
name: ml-serving-sa
roleRef:
kind: Role
name: ml-model-role
apiGroup: rbac.authorization.k8s.io
32.3 Secrets Managementπ
Never hard-code API keys, passwords, or credentials in code or Docker images.
# Use Kubernetes Secrets
kubectl create secret generic ml-secrets \
--from-literal=mlflow-uri="http://mlflow:5000" \
--from-literal=gcp-project="my-project" \
--from-file=gcp-sa-key=./sa-key.json \
-n production
# Use secrets in deployment
spec:
containers:
- name: ml-model
env:
- name: MLFLOW_TRACKING_URI
valueFrom:
secretKeyRef:
name: ml-secrets
key: mlflow-uri
- name: GCP_PROJECT
valueFrom:
secretKeyRef:
name: ml-secrets
key: gcp-project
volumeMounts:
- name: gcp-key
mountPath: /secrets
readOnly: true
volumes:
- name: gcp-key
secret:
secretName: ml-secrets
// In Jenkins: use credentials binding
withCredentials([
string(credentialsId: 'mlflow-uri', variable: 'MLFLOW_URI'),
file(credentialsId: 'gcp-sa-key', variable: 'GOOGLE_APPLICATION_CREDENTIALS'),
]) {
sh 'python src/train.py'
}
Never put secrets in:
- Git repositories (even private)
- Docker images (use build args or runtime injection)
- CI/CD logs
- Environment variables in Dockerfiles
32.4 Container Securityπ
# Security best practices in Dockerfile
# 1. Use specific image versions (not :latest)
FROM python:3.10.12-slim
# 2. Run as non-root user
RUN useradd --create-home --uid 1001 appuser
USER appuser
# 3. Read-only filesystem
# (set in K8s: securityContext.readOnlyRootFilesystem: true)
# 4. Don't copy unnecessary files
COPY --chown=appuser:appuser src/ ./src/
COPY --chown=appuser:appuser models/ ./models/
# DON'T: COPY . . (copies .env, keys, etc)
# K8s Security Context
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1001
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
seccompProfile:
type: RuntimeDefault
capabilities:
drop: ["ALL"]
32.5 Network Securityπ
# K8s NetworkPolicy β restrict communication
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: ml-model-network-policy
namespace: production
spec:
podSelector:
matchLabels:
app: ml-model
# Allow incoming traffic only from API gateway
ingress:
- from:
- podSelector:
matchLabels:
role: api-gateway
ports:
- port: 8000
# Allow outgoing traffic only to feature store and MLflow
egress:
- to:
- podSelector:
matchLabels:
app: feature-store
ports:
- port: 6566
32.6 Audit Loggingπ
# Log all predictions for audit trail
import logging
import json
from datetime import datetime
audit_logger = logging.getLogger("audit")
audit_handler = logging.FileHandler("logs/audit.log")
audit_logger.addHandler(audit_handler)
@app.post("/predict")
def predict(req: PredictionRequest, request: Request):
response = make_prediction(req)
# Audit log every prediction
audit_logger.info(json.dumps({
"timestamp": datetime.utcnow().isoformat(),
"request_id": request.headers.get("X-Request-ID"),
"customer_id": req.customer_id,
"features": req.dict(),
"prediction": response.prediction,
"confidence": response.confidence,
"model_version": MODEL_VERSION,
"user_agent": request.headers.get("User-Agent"),
}))
return response
Chapter 33: Model Governance & Complianceπ
"Governance turns MLOps from a technical discipline into an organizational one β with accountability at every step."
33.1 Model Governance Frameworkπ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β MODEL GOVERNANCE FRAMEWORK β
β β
β DEVELOP VALIDATE APPROVE OPERATE β
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β
β β Build βββββΆβ Test βββββΆβ Review βββββΆβ Monitor β β
β β Document β β Evaluate β β Approve β β Audit β β
β β Version β β Bias chk β β Sign off β β Alert β β
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β
β β
β ARTIFACTS: β
β βββ Model Card β What does this model do? β
β βββ Data Lineage β Where did training data come from? β
β βββ Evaluation Report β Performance + fairness metrics β
β βββ Approval Sign-off β Who approved this for production? β
β βββ Monitoring Alerts β What triggers re-review? β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
33.2 EU AI Act Relevance for MLOpsπ
The EU AI Act (in force August 2024) classifies AI systems by risk and mandates different requirements.
RISK LEVELS:
βββ Unacceptable Risk β PROHIBITED
β (social scoring, biometric mass surveillance)
β
βββ High Risk β STRICT REQUIREMENTS
β (credit scoring, hiring, medical, critical infrastructure)
β Requirements: explainability, bias testing, human oversight,
β documentation, registration in EU database
β
βββ Limited Risk β TRANSPARENCY OBLIGATIONS
β (chatbots must disclose they're AI)
β
βββ Minimal Risk β Voluntary code of conduct
(spam filters, recommendation engines)
MLOps IMPLICATIONS for High-Risk AI:
β
Log all training data sources (data lineage)
β
Document model design and training process
β
Test for bias across demographics
β
Maintain performance metrics history
β
Enable human override of model decisions
β
Provide explanations for individual decisions
β
Alert when model performance degrades
β
Audit trail of all model changes
33.3 ML Metadata & Lineage Trackingπ
# MLMD β ML Metadata (used by TFX and Vertex AI Pipelines)
import ml_metadata as mlmd
from ml_metadata.metadata_store import metadata_store
from ml_metadata.proto import metadata_store_pb2
# Connect to metadata store
connection_config = metadata_store_pb2.ConnectionConfig()
connection_config.sqlite.filename_uri = "mlmd.db"
store = metadata_store.MetadataStore(connection_config)
# Register artifact types
dataset_type = metadata_store_pb2.ArtifactType(name="Dataset")
dataset_type.properties["source"] = metadata_store_pb2.STRING
dataset_type.properties["num_rows"] = metadata_store_pb2.INT
store.put_artifact_type(dataset_type)
# Log artifacts and executions
dataset = metadata_store_pb2.Artifact(
type_id=store.get_artifact_type("Dataset").id,
uri="gs://my-bucket/data/train.csv",
properties={
"source": metadata_store_pb2.Value(string_value="BigQuery"),
"num_rows": metadata_store_pb2.Value(int_value=150000),
}
)
dataset_id = store.put_artifacts([dataset])[0]
# Now you can trace: which model used which data
33.4 Model Approval Workflowπ
Code + Data Changes
β
βΌ
Automated CI checks
(tests, data validation, training)
β
βΌ
Auto-evaluation report
(accuracy, fairness, SHAP)
β
βΌ
βββββββββββββββββββββββ
β PEER REVIEW β
β ML Engineer review β
β Data Scientist sign β
βββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββ
β GOVERNANCE REVIEW β
β (for High-Risk AI) β
β Risk Officer review β
β Legal team sign-off β
βββββββββββββββββββββββ
β
βΌ
Deploy to Staging β Test β Deploy to Production
(tagged in Git with who approved what)
Next β Chapter 34: LLMOps