Chapter 24: Multi-Cloud MLOps🔗
Why Multi-Cloud?🔗
- Avoid vendor lock-in
- Use best service from each provider
- Compliance (data residency requirements)
- Cost optimization (use cheapest GPU per region)
Portability Strategies🔗
USE OPEN STANDARDS:
KFP SDK → runs on Kubeflow, Vertex AI, or locally
MLflow → runs anywhere
Docker → runs on any cloud container platform
DVC → works with GCS, S3, Azure Blob equally
ABSTRACT INFRASTRUCTURE:
Terraform → provision GCP, AWS, Azure with same code
Kubernetes → runs on GKE, EKS, AKS equally
Helm charts → deploy same apps to any K8s
AVOID CLOUD-SPECIFIC LOCK-IN:
❌ Vertex AI AutoML → use MLflow + Optuna instead (portable)
✅ BigQuery → OK if GCP-only; use Delta Lake for multi-cloud
❌ SageMaker Pipelines → use Kubeflow Pipelines (portable)
Multi-Cloud Architecture🔗
Training: AWS (cheapest A100 spot)
Data: GCP (BigQuery for analytics)
Serving: GCP (GKE, lower egress cost)
Monitoring: Open source (Prometheus/Grafana) on any cloud
Next → Chapter 25: Model Serving Strategies