Chapter 35: Edge ML🔗

"Edge ML brings inference to the device — no network, no cloud, ultra-low latency."

What is Edge ML?🔗

Edge ML runs ML models on edge devices (phones, IoT sensors, embedded systems) rather than in the cloud.

CLOUD INFERENCE:           EDGE INFERENCE:
  Device → [network] → Cloud   Device runs model locally
  Latency: 100ms–2s            Latency: <10ms
  Privacy: data leaves device  Privacy: data stays on device
  Cost: API/cloud compute      Cost: one-time model conversion
  Connectivity required        Works offline

Model Optimization for Edge🔗

ONNX — Cross-Platform Model Format🔗

# Convert sklearn model to ONNX
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType

# Convert
initial_type = [("float_input", FloatTensorType([None, 5]))]
onnx_model = convert_sklearn(sklearn_model, "churn_model", initial_type)
with open("models/churn.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())

# Run with ONNX Runtime (any platform)
import onnxruntime as ort
import numpy as np

session = ort.InferenceSession("models/churn.onnx")
input_name = session.get_inputs()[0].name
prediction = session.run(None, {input_name: features.astype(np.float32)})

TensorFlow Lite (Mobile & IoT)🔗

import tensorflow as tf

# Convert TF model to TFLite
converter = tf.lite.TFLiteConverter.from_saved_model("models/tf_model")
converter.optimizations = [tf.lite.Optimize.DEFAULT]  # quantization
tflite_model = converter.convert()

with open("models/model.tflite", "wb") as f:
    f.write(tflite_model)

# Model size: SavedModel ~40MB → TFLite INT8 ~3MB

Model Quantization🔗

Full precision (FP32): 4 bytes per weight → 40MB model
Half precision (FP16): 2 bytes per weight → 20MB model
INT8 quantization:     1 byte per weight  → 10MB model
INT4 quantization:     0.5 bytes per weight → 5MB model

Trade-off: smaller size ↔ slight accuracy loss
INT8 typically: <1% accuracy drop, 4x smaller

Edge Deployment Targets🔗

Target	Framework	Use Case
Mobile (Android/iOS)	TFLite, CoreML, ONNX	Apps, camera
Browser	TensorFlow.js, ONNX.js	Web apps
Raspberry Pi/ARM	ONNX RT, TFLite	IoT, robotics
NVIDIA Jetson	TensorRT, TFLite	Computer vision
MCU (Arduino)	TensorFlow Micro	Ultra-tiny sensors

Next → Chapter 36: Cost Optimization