09 Clearml Neptune

Chapter 09: ClearML & Neptune.ai🔗

"More experiment tracking choices — ClearML for self-hosted control, Neptune for enterprise collaboration."


9.1 ClearML🔗

ClearML is a full open-source MLOps platform — experiment tracking, data versioning, pipeline orchestration, and model serving in one tool.

Key Features🔗

ClearML SUITE:
  ├── ClearML Experiment  → Experiment tracking (like MLflow)
  ├── ClearML Data        → Dataset versioning (like DVC)
  ├── ClearML Pipelines   → Workflow orchestration
  ├── ClearML Serving     → Model deployment
  └── ClearML Agent       → Remote execution on any machine

Quick Start🔗

# pip install clearml

from clearml import Task, Logger

# Initialize task (auto-logs everything)
task = Task.init(
    project_name="Churn Prediction",
    task_name="GBM Experiment v3",
    task_type=Task.TaskTypes.training,
)

# Auto-logging captures: hyperparams, metrics, stdout, git diff, requirements.txt
# Manual logging
logger = task.get_logger()
logger.report_scalar("accuracy", "train", value=0.91, iteration=1)
logger.report_scalar("accuracy", "test", value=0.88, iteration=1)
logger.report_text("Model trained successfully")

# Connect config dict
config = {"n_estimators": 200, "lr": 0.05}
task.connect(config)

# Log model
task.upload_artifact("model", artifact_object="models/model.pkl")

# Close
task.close()

ClearML Agent (Remote Execution)🔗

# Install agent on GPU machine
pip install clearml-agent

# Start agent (picks up queued experiments)
clearml-agent daemon --queue default --detached

9.2 Neptune.ai🔗

Neptune.ai is a metadata store for ML experiments — logs runs, compares them, and integrates with almost everything.

# pip install neptune

import neptune

run = neptune.init_run(
    project="my-org/churn-prediction",
    api_token="NEPTUNE_API_TOKEN",
)

# Log parameters
run["parameters"] = {
    "n_estimators": 200,
    "learning_rate": 0.05,
}

# Log metrics
run["train/accuracy"] = 0.91
run["test/accuracy"] = 0.88
run["test/f1"] = 0.85

# Log files
run["model/pickled"].upload("models/model.pkl")
run["reports/confusion_matrix"].upload("reports/confusion.png")

# Stop
run.stop()

9.3 Tool Selection Guide🔗

Situation Recommended Tool
Open source, self-hosted needed ClearML or MLflow
Best UI and collaboration W&B
Enterprise compliance Neptune.ai
GCP-native Vertex AI Experiments
Starting out, free MLflow (self-hosted) or W&B free tier

Next → Chapter 10: AutoML & HPO