Chapter 31: Responsible AI & Explainabilityπ
"A model that achieves 94% accuracy but discriminates against a protected group is not a good model β it's a liability."
31.1 Why Responsible AI in MLOps?π
MLOps without Responsible AI is dangerous. AI incidents increased 56% in 2024. Regulations (EU AI Act, NIST RMF) now require organizations to demonstrate model transparency, fairness, and accountability.
RESPONSIBLE AI PILLARS:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FAIRNESS β Model treats all groups equitably β
β EXPLAINABILITYβ We can explain why a prediction is madeβ
β TRANSPARENCY β Model behavior is documented & auditableβ
β PRIVACY β PII is protected in data and outputs β
β SECURITY β Model can't be fooled or extracted β
β ACCOUNTABILITYβ Humans are responsible for AI decisionsβ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
31.2 SHAP β SHapley Additive exPlanationsπ
SHAP is the most widely used model explanation framework. It assigns each feature a "Shapley value" β how much that feature contributed to the prediction.
# pip install shap
import shap
import pickle
import pandas as pd
# Load model and data
with open("models/model.pkl", "rb") as f:
model = pickle.load(f)
X_test = pd.read_csv("data/test.csv")
# ββ Global Explanation (feature importance) ββββββββββββββββββββββ
explainer = shap.TreeExplainer(model) # fast for tree-based models
shap_values = explainer(X_test)
# Summary plot (bar chart of mean absolute SHAP values)
shap.summary_plot(shap_values, X_test, plot_type="bar")
# Output: monthly_charges: 0.32 β most important
# tenure_months: 0.24
# income: 0.18
# age: 0.11
# Beeswarm plot (distribution of impact)
shap.summary_plot(shap_values, X_test)
# ββ Local Explanation (one prediction) ββββββββββββββββββββββββββ
sample = X_test.iloc[0:1]
shap_val_single = explainer(sample)
# Waterfall plot β why did this customer get a churn prediction?
shap.plots.waterfall(shap_val_single[0])
# Output:
# Base value: 0.35 (population average)
# + monthly_charges = 95: +0.18 β high bills increase churn risk
# + tenure_months = 3: +0.12 β new customer
# + income = 38000: +0.08 β lower income
# = Final prediction: 0.73 (73% churn probability)
# Force plot (interactive HTML)
shap.force_plot(explainer.expected_value[1], shap_values[1][0], X_test.iloc[0])
31.3 LIME β Local Interpretable Model-agnostic Explanationsπ
LIME explains any model's prediction by approximating it locally with a simple linear model.
# pip install lime
import lime
import lime.lime_tabular
import numpy as np
# Create explainer
explainer = lime.lime_tabular.LimeTabularExplainer(
training_data=X_train.values,
feature_names=X_train.columns.tolist(),
class_names=["No Churn", "Churn"],
mode="classification",
discretize_continuous=True,
)
# Explain single prediction
instance = X_test.iloc[5].values
explanation = explainer.explain_instance(
data_row=instance,
predict_fn=model.predict_proba,
num_features=6,
)
# Show explanation
explanation.show_in_notebook()
# 0.73 churn probability because:
# monthly_charges > 80: +0.22
# tenure_months <= 6: +0.15
# plan = basic: +0.09
# age <= 30: +0.05
31.4 Bias Detection & Fairnessπ
Detecting and measuring bias in ML models across demographic groups.
# pip install fairlearn aif360
import pandas as pd
from fairlearn.metrics import (
MetricFrame,
selection_rate,
false_positive_rate,
false_negative_rate,
)
from sklearn.metrics import accuracy_score
# Load predictions
df = pd.read_csv("predictions_with_demographics.csv")
y_true = df["actual_churn"]
y_pred = df["predicted_churn"]
sensitive_feature = df["age_group"] # e.g., "18-30", "31-50", "51+"
# ββ Measure fairness metrics across groups βββββββββββββββββββββ
metric_frame = MetricFrame(
metrics={
"accuracy": accuracy_score,
"selection_rate": selection_rate,
"false_positive_rate": false_positive_rate,
"false_negative_rate": false_negative_rate,
},
y_true=y_true,
y_pred=y_pred,
sensitive_features=sensitive_feature,
)
print(metric_frame.by_group)
# age_group accuracy selection_rate false_positive_rate false_negative_rate
# 18-30 0.91 0.28 0.15 0.09
# 31-50 0.88 0.25 0.17 0.12
# 51+ 0.79 0.35 0.28 0.21 β BIAS!
# ββ Fairness metrics βββββββββββββββββββββββββββββββββββββββββββ
print("Demographic Parity Difference:", metric_frame.difference(method="between_groups")["selection_rate"])
print("Equalized Odds Difference:", metric_frame.difference()["false_positive_rate"])
# Alert if fairness violation
if metric_frame.difference()["false_positive_rate"] > 0.1:
print("β οΈ FAIRNESS ALERT: False positive rate gap > 10% between groups!")
raise ValueError("Model fails fairness threshold β requires bias mitigation")
31.5 Fairness Mitigation Techniquesπ
BIAS CAN ENTER AT: MITIGATION STRATEGIES:
Data Collection βββ Pre-processing:
βββ Underrepresented groups β Resampling, reweighting
β Synthetic data (SMOTE)
Feature Engineering β
βββ Proxy features βββ In-processing:
(zip code β race) β Fairness constraints in training
β Adversarial debiasing
Model Training β
βββ Optimizing only accuracy βββ Post-processing:
ignores group disparities Threshold adjustment per group
Calibration
from fairlearn.reductions import ExponentiatedGradient, DemographicParity
# ββ Mitigate with fairness constraints ββββββββββββββββββββββββ
mitigator = ExponentiatedGradient(
estimator=GradientBoostingClassifier(),
constraints=DemographicParity(),
eps=0.01, # fairness tolerance
)
mitigator.fit(X_train, y_train, sensitive_features=sensitive_train)
y_pred_fair = mitigator.predict(X_test)
# Re-evaluate after mitigation
fair_metric = MetricFrame(
metrics={"accuracy": accuracy_score, "selection_rate": selection_rate},
y_true=y_test,
y_pred=y_pred_fair,
sensitive_features=sensitive_test,
)
print("After mitigation:")
print(fair_metric.by_group)
31.6 Google's What-If Toolπ
Google's What-If Tool provides an interactive no-code interface for investigating ML model behavior.
# In Vertex AI Workbench or Jupyter
from witwidget.notebook.visualization import WitConfigBuilder, WitWidget
config = WitConfigBuilder(X_test.values[:200].tolist(), X_test.columns.tolist())
config.set_target_feature("churned")
config.set_model_type("classification")
config.set_custom_predict_fn(model.predict_proba)
config.set_label_vocab(["No Churn", "Churn"])
WitWidget(config)
Features of What-If Tool:
- Edit individual examples and see how predictions change (counterfactuals)
- Compare two models side-by-side
- Analyze performance by demographic group
- Identify nearest counterfactual ("What would need to change for a different prediction?")
31.7 Model Cardsπ
A Model Card is a short document describing an ML model β its intended use, performance, limitations, and fairness metrics.
# Model Card: Churn Prediction Model v2.1
## Overview
- **Task**: Binary classification (predict customer churn)
- **Version**: 2.1.0
- **Last Updated**: 2024-03-01
- **Team**: ML Platform Team
## Model Details
- **Architecture**: Gradient Boosting Classifier (sklearn)
- **Training Data**: 150,000 customers, Jan 2022βDec 2023
- **Features**: age, income, tenure, monthly_charges, plan
## Intended Use
- **Primary Use**: Proactive customer retention campaigns
- **Users**: Marketing team, customer success managers
- **Out-of-scope**: Credit scoring, hiring decisions
## Performance
| Metric | Overall | Age 18-30 | Age 31-50 | Age 51+ |
|-------------|---------|-----------|-----------|---------|
| Accuracy | 0.89 | 0.91 | 0.88 | 0.87 |
| F1 Score | 0.86 | 0.88 | 0.85 | 0.83 |
| False Pos. | 0.11 | 0.09 | 0.12 | 0.13 |
## Fairness Evaluation
- Demographic parity difference: 0.05 (within acceptable range)
- Equalized odds difference: 0.07 (within acceptable range)
## Limitations & Risks
- Model trained on historical data β may not reflect new customer behaviors
- Performance degrades for customers with < 3 months tenure
- Does not account for seasonal effects
## Ethical Considerations
- Not used to deny service to any customer group
- Predictions are recommendations, not automated decisions
- Human review required before any customer-facing action
31.8 Explainability in the MLOps Pipelineπ
# In evaluate.py β add explainability to every training run
import shap
import mlflow
with mlflow.start_run():
model.fit(X_train, y_train)
# Standard metrics
mlflow.log_metric("accuracy", accuracy_score(y_test, model.predict(X_test)))
# SHAP global feature importance
explainer = shap.TreeExplainer(model)
shap_values = explainer(X_test[:100])
# Feature importance (mean |SHAP|)
importance = pd.DataFrame({
"feature": X_test.columns,
"importance": np.abs(shap_values.values).mean(0)
}).sort_values("importance", ascending=False)
# Log as artifact
importance.to_csv("reports/feature_importance.csv", index=False)
mlflow.log_artifact("reports/feature_importance.csv")
# Log individual feature importances as metrics
for _, row in importance.iterrows():
mlflow.log_metric(f"shap_{row['feature']}", row['importance'])
Next β Chapter 32: MLOps Security