MLOps Glossary🔗
A-Z reference of key MLOps terms.
| Term | Definition |
|---|---|
| AutoML | Automated Machine Learning — automates algorithm selection, hyperparameter tuning, and ensembling |
| Artifact | Any output of a pipeline step: model files, metrics, charts, datasets |
| Blue/Green Deployment | Run old (blue) and new (green) model versions simultaneously; switch traffic when green is validated |
| Canary Deployment | Gradually route a small % of traffic to new model version before full rollout |
| CI (Continuous Integration) | Automatically test and validate every code commit |
| CD (Continuous Delivery) | Automatically prepare and stage builds for deployment |
| CM (Continuous Monitoring) | Ongoing tracking of model performance and data quality in production |
| CT (Continuous Training) | Automatically retrain models when triggered by new data or drift |
| Concept Drift | The statistical relationship between features and labels changes over time |
| Container | A lightweight, portable package containing application code + all dependencies |
| Container Registry | Repository for storing Docker images (GCR, ECR, Docker Hub) |
| Data Drift | Input feature distributions shift from what the model was trained on |
| Data Pipeline | Automated workflow for extracting, transforming, and loading data |
| Docker | Platform for building and running containers |
| Dockerfile | Text file with instructions to build a Docker image |
| DVC | Data Version Control — Git for data files and ML models |
| Endpoint | A deployed model accessible via a URL for making predictions |
| Experiment | A named group of training runs tracked in MLflow or similar |
| Feature | An input variable used by the ML model to make predictions |
| Feature Drift | Statistical shift in one or more input features |
| Feature Engineering | Creating or transforming variables to improve model performance |
| Feature Store | Centralized repo to store, share, and serve ML features consistently |
| GCS | Google Cloud Storage — object storage for data and artifacts |
| GKE | Google Kubernetes Engine — managed Kubernetes on GCP |
| Git | Distributed version control system for tracking code changes |
| GitHub Actions | CI/CD automation built into GitHub |
| HPA | Horizontal Pod Autoscaler — scales K8s pods based on CPU/memory |
| HPO | Hyperparameter Optimization — finding the best training configuration |
| Hyperparameter | Model setting chosen before training (not learned from data) |
| Image (Docker) | Read-only template used to create containers |
| Ingress | Kubernetes object managing external HTTP/S access to services |
| Jenkins | Open-source CI/CD automation server |
| Jenkinsfile | Pipeline-as-code definition for Jenkins |
| Kubeflow | ML toolkit for Kubernetes — workflows, pipelines, serving |
| Kubernetes (K8s) | Container orchestration platform — deploys, scales, heals containers |
| Label Drift | Target variable distribution shifts over time |
| Latency | Time taken to serve a single prediction |
| MLflow | Open-source platform for ML lifecycle management |
| MLOps | Machine Learning Operations — DevOps principles applied to ML |
| Model Registry | Version-controlled storage for trained models with lifecycle stages |
| Namespace | Kubernetes logical isolation — dev/staging/production |
| NAS | Neural Architecture Search — automated design of neural networks |
| Namespace (K8s) | Virtual cluster within a K8s cluster for environment separation |
| Observability | Ability to understand internal system state from external outputs |
| Pipeline | Sequence of automated steps transforming inputs to outputs |
| Pod | Smallest deployable unit in Kubernetes — wraps one or more containers |
| Prometheus | Open-source metrics collection and alerting system |
| Reproducibility | Ability to recreate an experiment exactly using same code + data + config |
| Rolling Update | Gradually replace old Pods with new ones — zero downtime |
| Service (K8s) | Stable network endpoint routing traffic to Pods |
| Shadow Mode | Run new model in parallel with production, compare outputs without serving |
| Training-Serving Skew | Difference between features used in training vs at inference time |
| Vertex AI | Google's unified ML platform on GCP |
| Volume (Docker/K8s) | Persistent storage attached to containers |
| Webhook | HTTP callback — triggers CI/CD when code is pushed to Git |