Chapter 31: Responsible AI & Explainability🔗

"A model that achieves 94% accuracy but discriminates against a protected group is not a good model — it's a liability."

31.1 Why Responsible AI in MLOps?🔗

MLOps without Responsible AI is dangerous. AI incidents increased 56% in 2024. Regulations (EU AI Act, NIST RMF) now require organizations to demonstrate model transparency, fairness, and accountability.

RESPONSIBLE AI PILLARS:

  ┌────────────────────────────────────────────────────────┐
  │  FAIRNESS      │ Model treats all groups equitably     │
  │  EXPLAINABILITY│ We can explain why a prediction is made│
  │  TRANSPARENCY  │ Model behavior is documented & auditable│
  │  PRIVACY       │ PII is protected in data and outputs  │
  │  SECURITY      │ Model can't be fooled or extracted    │
  │  ACCOUNTABILITY│ Humans are responsible for AI decisions│
  └────────────────────────────────────────────────────────┘

31.2 SHAP — SHapley Additive exPlanations🔗

SHAP is the most widely used model explanation framework. It assigns each feature a "Shapley value" — how much that feature contributed to the prediction.

# pip install shap

import shap
import pickle
import pandas as pd

# Load model and data
with open("models/model.pkl", "rb") as f:
    model = pickle.load(f)

X_test = pd.read_csv("data/test.csv")

# ── Global Explanation (feature importance) ──────────────────────
explainer = shap.TreeExplainer(model)  # fast for tree-based models
shap_values = explainer(X_test)

# Summary plot (bar chart of mean absolute SHAP values)
shap.summary_plot(shap_values, X_test, plot_type="bar")
# Output: monthly_charges: 0.32 ← most important
#         tenure_months:   0.24
#         income:          0.18
#         age:             0.11

# Beeswarm plot (distribution of impact)
shap.summary_plot(shap_values, X_test)

# ── Local Explanation (one prediction) ──────────────────────────
sample = X_test.iloc[0:1]
shap_val_single = explainer(sample)

# Waterfall plot — why did this customer get a churn prediction?
shap.plots.waterfall(shap_val_single[0])
# Output:
# Base value: 0.35 (population average)
# + monthly_charges = 95: +0.18 ← high bills increase churn risk
# + tenure_months = 3:   +0.12 ← new customer
# + income = 38000:      +0.08 ← lower income
# = Final prediction: 0.73 (73% churn probability)

# Force plot (interactive HTML)
shap.force_plot(explainer.expected_value[1], shap_values[1][0], X_test.iloc[0])

31.3 LIME — Local Interpretable Model-agnostic Explanations🔗

LIME explains any model's prediction by approximating it locally with a simple linear model.

# pip install lime

import lime
import lime.lime_tabular
import numpy as np

# Create explainer
explainer = lime.lime_tabular.LimeTabularExplainer(
    training_data=X_train.values,
    feature_names=X_train.columns.tolist(),
    class_names=["No Churn", "Churn"],
    mode="classification",
    discretize_continuous=True,
)

# Explain single prediction
instance = X_test.iloc[5].values
explanation = explainer.explain_instance(
    data_row=instance,
    predict_fn=model.predict_proba,
    num_features=6,
)

# Show explanation
explanation.show_in_notebook()
# 0.73 churn probability because:
# monthly_charges > 80: +0.22
# tenure_months <= 6:   +0.15
# plan = basic:         +0.09
# age <= 30:            +0.05

31.4 Bias Detection & Fairness🔗

Detecting and measuring bias in ML models across demographic groups.

# pip install fairlearn aif360

import pandas as pd
from fairlearn.metrics import (
    MetricFrame,
    selection_rate,
    false_positive_rate,
    false_negative_rate,
)
from sklearn.metrics import accuracy_score

# Load predictions
df = pd.read_csv("predictions_with_demographics.csv")
y_true = df["actual_churn"]
y_pred = df["predicted_churn"]
sensitive_feature = df["age_group"]  # e.g., "18-30", "31-50", "51+"

# ── Measure fairness metrics across groups ─────────────────────
metric_frame = MetricFrame(
    metrics={
        "accuracy": accuracy_score,
        "selection_rate": selection_rate,
        "false_positive_rate": false_positive_rate,
        "false_negative_rate": false_negative_rate,
    },
    y_true=y_true,
    y_pred=y_pred,
    sensitive_features=sensitive_feature,
)

print(metric_frame.by_group)
# age_group  accuracy  selection_rate  false_positive_rate  false_negative_rate
# 18-30      0.91      0.28            0.15                 0.09
# 31-50      0.88      0.25            0.17                 0.12
# 51+        0.79      0.35            0.28                 0.21  ← BIAS!

# ── Fairness metrics ───────────────────────────────────────────
print("Demographic Parity Difference:", metric_frame.difference(method="between_groups")["selection_rate"])
print("Equalized Odds Difference:",     metric_frame.difference()["false_positive_rate"])

# Alert if fairness violation
if metric_frame.difference()["false_positive_rate"] > 0.1:
    print("⚠️  FAIRNESS ALERT: False positive rate gap > 10% between groups!")
    raise ValueError("Model fails fairness threshold — requires bias mitigation")

31.5 Fairness Mitigation Techniques🔗

BIAS CAN ENTER AT:                    MITIGATION STRATEGIES:

  Data Collection                       ├── Pre-processing:
  └── Underrepresented groups           │     Resampling, reweighting
                                        │     Synthetic data (SMOTE)
  Feature Engineering                   │
  └── Proxy features                    ├── In-processing:
      (zip code ≈ race)                 │     Fairness constraints in training
                                        │     Adversarial debiasing
  Model Training                        │
  └── Optimizing only accuracy          └── Post-processing:
      ignores group disparities               Threshold adjustment per group
                                             Calibration

from fairlearn.reductions import ExponentiatedGradient, DemographicParity

# ── Mitigate with fairness constraints ────────────────────────
mitigator = ExponentiatedGradient(
    estimator=GradientBoostingClassifier(),
    constraints=DemographicParity(),
    eps=0.01,  # fairness tolerance
)

mitigator.fit(X_train, y_train, sensitive_features=sensitive_train)
y_pred_fair = mitigator.predict(X_test)

# Re-evaluate after mitigation
fair_metric = MetricFrame(
    metrics={"accuracy": accuracy_score, "selection_rate": selection_rate},
    y_true=y_test,
    y_pred=y_pred_fair,
    sensitive_features=sensitive_test,
)
print("After mitigation:")
print(fair_metric.by_group)

31.6 Google's What-If Tool🔗

Google's What-If Tool provides an interactive no-code interface for investigating ML model behavior.

# In Vertex AI Workbench or Jupyter
from witwidget.notebook.visualization import WitConfigBuilder, WitWidget

config = WitConfigBuilder(X_test.values[:200].tolist(), X_test.columns.tolist())
config.set_target_feature("churned")
config.set_model_type("classification")
config.set_custom_predict_fn(model.predict_proba)
config.set_label_vocab(["No Churn", "Churn"])

WitWidget(config)

Features of What-If Tool:
- Edit individual examples and see how predictions change (counterfactuals)
- Compare two models side-by-side
- Analyze performance by demographic group
- Identify nearest counterfactual ("What would need to change for a different prediction?")

31.7 Model Cards🔗

A Model Card is a short document describing an ML model — its intended use, performance, limitations, and fairness metrics.

# Model Card: Churn Prediction Model v2.1

## Overview
- **Task**: Binary classification (predict customer churn)
- **Version**: 2.1.0
- **Last Updated**: 2024-03-01
- **Team**: ML Platform Team

## Model Details
- **Architecture**: Gradient Boosting Classifier (sklearn)
- **Training Data**: 150,000 customers, Jan 2022–Dec 2023
- **Features**: age, income, tenure, monthly_charges, plan

## Intended Use
- **Primary Use**: Proactive customer retention campaigns
- **Users**: Marketing team, customer success managers
- **Out-of-scope**: Credit scoring, hiring decisions

## Performance
| Metric      | Overall | Age 18-30 | Age 31-50 | Age 51+ |
|-------------|---------|-----------|-----------|---------|
| Accuracy    | 0.89    | 0.91      | 0.88      | 0.87    |
| F1 Score    | 0.86    | 0.88      | 0.85      | 0.83    |
| False Pos.  | 0.11    | 0.09      | 0.12      | 0.13    |

## Fairness Evaluation
- Demographic parity difference: 0.05 (within acceptable range)
- Equalized odds difference: 0.07 (within acceptable range)

## Limitations & Risks
- Model trained on historical data — may not reflect new customer behaviors
- Performance degrades for customers with < 3 months tenure
- Does not account for seasonal effects

## Ethical Considerations
- Not used to deny service to any customer group
- Predictions are recommendations, not automated decisions
- Human review required before any customer-facing action

31.8 Explainability in the MLOps Pipeline🔗

# In evaluate.py — add explainability to every training run
import shap
import mlflow

with mlflow.start_run():
    model.fit(X_train, y_train)

    # Standard metrics
    mlflow.log_metric("accuracy", accuracy_score(y_test, model.predict(X_test)))

    # SHAP global feature importance
    explainer = shap.TreeExplainer(model)
    shap_values = explainer(X_test[:100])

    # Feature importance (mean |SHAP|)
    importance = pd.DataFrame({
        "feature": X_test.columns,
        "importance": np.abs(shap_values.values).mean(0)
    }).sort_values("importance", ascending=False)

    # Log as artifact
    importance.to_csv("reports/feature_importance.csv", index=False)
    mlflow.log_artifact("reports/feature_importance.csv")

    # Log individual feature importances as metrics
    for _, row in importance.iterrows():
        mlflow.log_metric(f"shap_{row['feature']}", row['importance'])

Next → Chapter 32: MLOps Security