Chapter 35: Edge ML🔗
"Edge ML brings inference to the device — no network, no cloud, ultra-low latency."
What is Edge ML?🔗
Edge ML runs ML models on edge devices (phones, IoT sensors, embedded systems) rather than in the cloud.
CLOUD INFERENCE: EDGE INFERENCE:
Device → [network] → Cloud Device runs model locally
Latency: 100ms–2s Latency: <10ms
Privacy: data leaves device Privacy: data stays on device
Cost: API/cloud compute Cost: one-time model conversion
Connectivity required Works offline
Model Optimization for Edge🔗
ONNX — Cross-Platform Model Format🔗
# Convert sklearn model to ONNX
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
# Convert
initial_type = [("float_input", FloatTensorType([None, 5]))]
onnx_model = convert_sklearn(sklearn_model, "churn_model", initial_type)
with open("models/churn.onnx", "wb") as f:
f.write(onnx_model.SerializeToString())
# Run with ONNX Runtime (any platform)
import onnxruntime as ort
import numpy as np
session = ort.InferenceSession("models/churn.onnx")
input_name = session.get_inputs()[0].name
prediction = session.run(None, {input_name: features.astype(np.float32)})
TensorFlow Lite (Mobile & IoT)🔗
import tensorflow as tf
# Convert TF model to TFLite
converter = tf.lite.TFLiteConverter.from_saved_model("models/tf_model")
converter.optimizations = [tf.lite.Optimize.DEFAULT] # quantization
tflite_model = converter.convert()
with open("models/model.tflite", "wb") as f:
f.write(tflite_model)
# Model size: SavedModel ~40MB → TFLite INT8 ~3MB
Model Quantization🔗
Full precision (FP32): 4 bytes per weight → 40MB model
Half precision (FP16): 2 bytes per weight → 20MB model
INT8 quantization: 1 byte per weight → 10MB model
INT4 quantization: 0.5 bytes per weight → 5MB model
Trade-off: smaller size ↔ slight accuracy loss
INT8 typically: <1% accuracy drop, 4x smaller
Edge Deployment Targets🔗
| Target | Framework | Use Case |
|---|---|---|
| Mobile (Android/iOS) | TFLite, CoreML, ONNX | Apps, camera |
| Browser | TensorFlow.js, ONNX.js | Web apps |
| Raspberry Pi/ARM | ONNX RT, TFLite | IoT, robotics |
| NVIDIA Jetson | TensorRT, TFLite | Computer vision |
| MCU (Arduino) | TensorFlow Micro | Ultra-tiny sensors |