Chapter 09: AutoML β Automated Machine Learningπ
"AutoML democratizes ML β automating the tedious parts so experts can focus on strategy."
9.1 What is AutoML?π
AutoML (Automated Machine Learning) automates the process of selecting, training, and tuning machine learning models. Instead of manually trying different algorithms and hyperparameters, AutoML explores the entire search space automatically.
Manual ML vs AutoMLπ
MANUAL ML:
Data β You pick algorithm β You tune hyperparameters β You evaluate β Repeat
Time: Days to weeks
AUTOML:
Data β AutoML searches algorithms + hyperparameters β Best model returned
Time: Hours to days (with less human effort)
9.2 What AutoML Automatesπ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β AUTOML AUTOMATION SCOPE β
β β
β β
Feature Preprocessing (encoding, scaling, imputing) β
β β
Algorithm Selection (RF, XGB, SVM, Neural Net...) β
β β
Hyperparameter Optimization (HPO) β
β β
Model Ensembling (stack best models) β
β β
Neural Architecture Search (NAS) β
β β
Model Evaluation & Comparison β
β β
Cross-validation strategy β
β β
β β NOT automated: Problem definition, data collection β
β business metrics, deployment decisions β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
9.3 AutoML Search Processπ
ββββββββββββββββββββββββββββ
β AUTOML SEARCH β
β β
Input Data βββββββΆβ Algorithm Space: β
β βββ Random Forest β
β βββ Gradient Boosting β
β βββ SVM β
β βββ Neural Networks β
β βββ Linear Models β
β β
β Hyperparameter Space: β
β βββ learning_rate β
β βββ max_depth β
β βββ n_estimators β
β βββ dropout_rate β
β β
β Search Strategy: β
β βββ Bayesian Opt β
β βββ Random Search β
β βββ Evolutionary β
ββββββββββββββ¬ββββββββββββββ
β
βΌ
βββββββββββββββββ
β Best Model β
β + Config β
βββββββββββββββββ
9.4 Hyperparameter Optimization (HPO)π
Hyperparameters are settings you choose before training (not learned by the model).
HPO Strategies Comparedπ
GRID SEARCH: RANDOM SEARCH:
ββββ¬βββ¬βββ¬βββ¬βββ ββββ¬βββ¬βββ¬βββ¬βββ
β β β β β β β β ββ β β β
ββββΌβββΌβββΌβββΌβββ€ ββββΌβββΌβββΌβββΌβββ€
β β β β β β β β β ββ β β
ββββΌβββΌβββΌβββΌβββ€ ββββΌβββΌβββΌβββΌβββ€
β β β β β β ββ β β β β β
ββββ΄βββ΄βββ΄βββ΄βββ ββββ΄βββ΄βββ΄βββ΄βββ
Tests every combo Random samples
Slow, exhaustive Faster, misses some
BAYESIAN OPTIMIZATION (BEST):
Start β Try 5 random points β Build a model of "which areas are good"
β Sample from good areas β Update model β Repeat
β Smart, efficient, finds optimum faster
HPO with Optuna (Python)π
import optuna
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
def objective(trial):
# Define hyperparameter search space
params = {
'n_estimators': trial.suggest_int('n_estimators', 50, 500),
'max_depth': trial.suggest_int('max_depth', 3, 20),
'min_samples_split': trial.suggest_int('min_samples_split', 2, 20),
'max_features': trial.suggest_categorical('max_features', ['sqrt', 'log2']),
}
clf = RandomForestClassifier(**params, random_state=42)
score = cross_val_score(clf, X_train, y_train, cv=5, scoring='accuracy').mean()
return score
# Run optimization (100 trials)
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=100)
print(f"Best accuracy: {study.best_value:.4f}")
print(f"Best params: {study.best_params}")
9.5 Vertex AI AutoML (GCP)π
Vertex AI AutoML lets you train high-quality models without writing ML code.
Supported AutoML Typesπ
| Task | What You Provide | What You Get |
|---|---|---|
| AutoML Tables | Tabular CSV data | Classification/Regression model |
| AutoML Vision | Labeled images | Image classifier |
| AutoML NLP | Labeled text | Text classifier |
| AutoML Video | Labeled videos | Video classifier |
AutoML Tables Workflowπ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β VERTEX AI AUTOML TABLES WORKFLOW β
β β
β 1. Upload CSV to BigQuery or GCS β
β β β
β βΌ β
β 2. Create Dataset in Vertex AI β
β β β
β βΌ β
β 3. Configure target column + training budget β
β (e.g., 1 hour, 8 hours, 24 hours) β
β β β
β βΌ β
β 4. AutoML trains 100s of models internally β
β (feature engineering + HPO + ensembling) β
β β β
β βΌ β
β 5. Evaluate: AUC, Precision, Recall, Confusion Mat β
β β β
β βΌ β
β 6. Deploy best model to Vertex AI Endpoint β
β β β
β βΌ β
β 7. Online predictions via REST API β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Vertex AI AutoML via Python SDKπ
from google.cloud import aiplatform
aiplatform.init(project="my-project", location="us-central1")
# Create a Dataset from BigQuery
dataset = aiplatform.TabularDataset.create(
display_name="customer-churn-dataset",
bq_source="bq://my-project.ml_datasets.customer_churn",
)
# Train AutoML Tabular model
job = aiplatform.AutoMLTabularTrainingJob(
display_name="churn-automl-v1",
optimization_prediction_type="classification", # or 'regression'
optimization_objective="maximize-au-roc",
)
model = job.run(
dataset=dataset,
target_column="churned", # column to predict
training_fraction_split=0.8,
validation_fraction_split=0.1,
test_fraction_split=0.1,
budget_milli_node_hours=1000, # 1 hour of training budget
model_display_name="churn-model-v1",
)
# Deploy
endpoint = model.deploy(machine_type="n1-standard-2")
# Predict
prediction = endpoint.predict(instances=[{
"age": 35,
"tenure_months": 12,
"monthly_charges": 65.5
}])
print(prediction.predictions)
9.6 Neural Architecture Search (NAS)π
NAS automates the design of neural network architectures (layer types, depths, connections).
TRADITIONAL: Data Scientist manually designs network architecture
ββββββ ββββββ ββββββ ββββββ ββββββ
βConvβββConvβββPoolβββFC βββSoftβ (hand-designed)
ββββββ ββββββ ββββββ ββββββ ββββββ
NAS: Algorithm searches for best architecture
Search Space: [Conv3x3, Conv5x5, Pool, Skip, Dense, ...]
β
βββ Try Architecture 1 β eval β 0.82 accuracy
βββ Try Architecture 2 β eval β 0.85 accuracy
βββ Try Architecture N β eval β 0.91 accuracy β WINNER
βββ Return best architecture
9.7 Popular AutoML Librariesπ
| Library | Best For | Notes |
|---|---|---|
| Vertex AI AutoML | GCP production workloads | Fully managed, no code |
| H2O AutoML | Tabular data, fast | Open source |
| TPOT | Sklearn pipeline optimization | Uses genetic algorithms |
| AutoKeras | Deep learning architectures | Keras/TF-based |
| Optuna | HPO only | Flexible, widely used |
| Ray Tune | Distributed HPO | Scales to clusters |
9.8 AutoML in the MLOps Pipelineπ
MLOps Pipeline with AutoML:
New Data Arrives
β
βΌ
Data Validation
β
βΌ
Trigger AutoML Training (Vertex AI)
βββββββββββββββββββββββββββββββββββ
β AutoML internally: β
β - Tries 100s of configs β
β - Logs all experiments β
β - Returns best model β
ββββββββββββββββ¬βββββββββββββββββββ
β
βΌ
Evaluate vs current prod model
β
βββ Better? β Deploy (CI/CD)
βββ Worse? β Alert team
Next Chapter β 10: Experiment Tracking