“What gets measured gets managed.”
— Peter Drucker
Measuring MLOps success requires more than tracking ROI. This chapter introduces leading indicators that predict future success before financial results materialize.
| Type | Definition | Examples | Usefulness |
| Leading | Predicts future outcomes | Deployment velocity, adoption rate | High (actionable) |
| Lagging | Measures past outcomes | Revenue, ROI | High (proves value) |
| Vanity | Looks good, doesn’t inform | Total models (regardless of use) | Low |
graph LR
subgraph "Before Results"
A[Leading Indicators] --> B[Predict]
end
subgraph "After Results"
C[Lagging Indicators] --> D[Prove]
end
B --> E[Future Outcomes]
D --> E
Scenario: You’ve invested $2M in MLOps. The CFO asks for ROI.
| Situation | Lagging Only | With Leading Indicators |
| Month 3 | “We don’t have ROI data yet…” | “Deployment velocity up 3x, on track for $5M benefit” |
| Month 6 | “Still early…” | “Adoption at 60%, incidents down 80%, ROI crystallizing” |
| Month 12 | “Here’s the ROI: $8M” | “Leading indicators predicted $7.5M; we hit $8M” |
Leading indicators give early visibility and credibility.
| Metric | Definition | Target | Warning Sign |
| Active Users | DS/MLEs using platform weekly | >80% of ML team | <50% after 6 months |
| Models on Platform | % of production models using MLOps | >90% | <50% |
| Feature Store Usage | Features served via store | >70% | Features computed ad-hoc |
| Experiment Tracking | Experiments logged | >95% | Notebooks in personal folders |
| Metric | Definition | Target | Warning Sign |
| Time-to-Production | Days from model dev to production | <14 days | >60 days |
| Deployment Frequency | Models deployed per month | ↑ trend | ↓ trend |
| Deployment Success Rate | % without rollback | >95% | <80% |
| Time to Rollback | Minutes to revert bad deployment | <5 min | >60 min |
| Metric | Definition | Target | Warning Sign |
| Model Uptime | % of time models serving | >99.9% | <99% |
| P50/P99 Latency | Inference latency percentiles | Meets SLA | Exceeds SLA |
| Error Rate | % of inference requests failing | <0.1% | >1% |
| MTTR | Mean time to recover | <1 hour | >24 hours |
| Metric | Definition | Target | Warning Sign |
| Accuracy / AUC | Performance on recent data | Within 5% of training | >10% degradation |
| Drift Score | Statistical distance from training | Low | High + sustained |
| Prediction Confidence | Average model confidence | Stable | Declining |
| Ground Truth Alignment | Predictions vs. actual | >90% | <80% |
| Metric | Definition | Target | Warning Sign |
| Model Age | Days since last retrain | <30 days | >90 days |
| Data Freshness | Lag between data and model | <24 hours | >7 days |
| Feature Freshness | Lag in Feature Store updates | <1 hour | >24 hours |
| Metric | Definition | Target | Warning Sign |
| Disparate Impact | Outcome ratio across groups | >0.8 | <0.7 |
| Equal Opportunity | TPR parity | <10% gap | >20% gap |
| Demographic Parity | Prediction rate parity | <10% gap | >20% gap |
| Metric | Definition | Target | Warning Sign |
| Value-Added Time | % on model dev (not ops) | >60% | <30% |
| Experiments per Week | Experiments run per DS | >10 | <3 |
| Toil Ratio | Time on repetitive tasks | <10% | >40% |
| Support Ticket Volume | Platform help requests | ↓ trend | ↑ trend |
| Metric | Definition | Target | Warning Sign |
| NPS | Would recommend platform? | >40 | <0 |
| CSAT | How satisfied? | >4.0/5 | <3.0/5 |
| Effort Score | How easy to use? | >4.0/5 | <3.0/5 |
| Attrition Rate | ML team turnover | <10% | >20% |
| Metric | Definition | Target | Warning Sign |
| Documentation Rate | % models with Model Cards | 100% | <80% |
| Approval Compliance | % through approval process | 100% | <90% |
| Audit Findings | Issues found in audits | 0 critical | Any critical |
| Regulatory Violations | Fines, warnings | 0 | Any |
| Metric | Definition | Target | Warning Sign |
| High-Risk Coverage | % risky models monitored | 100% | <80% |
| Security Incidents | Model security events | 0 | Any major |
| Data Lineage | % features with lineage | 100% | <70% |
from prometheus_client import Counter, Histogram, Gauge, start_http_server
# Platform Health
active_users = Gauge(
'mlops_active_users_total',
'Number of active platform users',
['team']
)
deployments = Counter(
'mlops_deployments_total',
'Total model deployments',
['model', 'status']
)
deployment_duration = Histogram(
'mlops_deployment_duration_seconds',
'Time to deploy a model',
['model'],
buckets=[60, 300, 600, 1800, 3600, 7200, 86400]
)
# Model Quality
model_accuracy = Gauge(
'mlops_model_accuracy',
'Current model accuracy score',
['model', 'version']
)
drift_score = Gauge(
'mlops_drift_score',
'Current data drift score',
['model', 'feature']
)
inference_latency = Histogram(
'mlops_inference_latency_seconds',
'Model inference latency',
['model', 'endpoint'],
buckets=[0.01, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5]
)
# Team Productivity
experiments_created = Counter(
'mlops_experiments_created_total',
'Total experiments created',
['user', 'project']
)
toil_hours = Gauge(
'mlops_toil_hours',
'Hours spent on toil',
['team', 'category']
)
from dataclasses import dataclass
from typing import Dict, List
from datetime import datetime, timedelta
import pandas as pd
@dataclass
class MetricDatapoint:
name: str
value: float
timestamp: datetime
labels: Dict[str, str]
class MetricsCollector:
def __init__(self):
self.sources = {}
def collect_platform_metrics(self) -> List[MetricDatapoint]:
"""Collect platform health metrics."""
metrics = []
# Active users from auth logs
active = self._count_active_users(days=7)
metrics.append(MetricDatapoint(
name="active_users",
value=active,
timestamp=datetime.utcnow(),
labels={"scope": "weekly"}
))
# Deployment velocity
deployments_week = self._count_deployments(days=7)
metrics.append(MetricDatapoint(
name="deployments_weekly",
value=deployments_week,
timestamp=datetime.utcnow(),
labels={}
))
return metrics
def collect_model_metrics(self, model_id: str) -> List[MetricDatapoint]:
"""Collect model quality metrics."""
metrics = []
# Get current accuracy
accuracy = self._get_latest_accuracy(model_id)
metrics.append(MetricDatapoint(
name="model_accuracy",
value=accuracy,
timestamp=datetime.utcnow(),
labels={"model": model_id}
))
# Get drift score
drift = self._calculate_drift(model_id)
metrics.append(MetricDatapoint(
name="drift_score",
value=drift,
timestamp=datetime.utcnow(),
labels={"model": model_id}
))
return metrics
def _count_active_users(self, days: int) -> int:
# Implementation depends on auth system
pass
def _count_deployments(self, days: int) -> int:
# Implementation depends on CI/CD system
pass
def _get_latest_accuracy(self, model_id: str) -> float:
# Implementation depends on monitoring system
pass
def _calculate_drift(self, model_id: str) -> float:
# Implementation depends on drift detection
pass
{
"dashboard": {
"title": "MLOps Leading Indicators",
"panels": [
{
"title": "Platform Adoption",
"type": "gauge",
"targets": [
{
"expr": "mlops_active_users_total / mlops_total_ml_team * 100",
"legendFormat": "Adoption Rate %"
}
],
"fieldConfig": {
"defaults": {
"thresholds": {
"steps": [
{"value": 0, "color": "red"},
{"value": 50, "color": "yellow"},
{"value": 80, "color": "green"}
]
}
}
}
},
{
"title": "Time to Production (days)",
"type": "stat",
"targets": [
{
"expr": "histogram_quantile(0.5, mlops_deployment_duration_seconds) / 86400",
"legendFormat": "P50"
}
],
"fieldConfig": {
"defaults": {
"thresholds": {
"steps": [
{"value": 0, "color": "green"},
{"value": 14, "color": "yellow"},
{"value": 30, "color": "red"}
]
}
}
}
},
{
"title": "Model Drift Alerts",
"type": "timeseries",
"targets": [
{
"expr": "mlops_drift_score > 0.1",
"legendFormat": "{{model}}"
}
]
}
]
}
}
# prometheus_rules.yaml
groups:
- name: mlops_leading_indicators
rules:
# Platform Health Alerts
- alert: LowPlatformAdoption
expr: mlops_active_users_total / mlops_total_ml_team < 0.5
for: 7d
labels:
severity: warning
annotations:
summary: "Platform adoption below 50% for 7 days"
runbook: "https://wiki/mlops/adoption-playbook"
- alert: SlowDeployments
expr: histogram_quantile(0.9, mlops_deployment_duration_seconds) > 2592000
for: 1d
labels:
severity: warning
annotations:
summary: "P90 deployment time exceeds 30 days"
# Model Quality Alerts
- alert: HighDriftScore
expr: mlops_drift_score > 0.3
for: 6h
labels:
severity: critical
annotations:
summary: "Model {{ $labels.model }} has high drift"
runbook: "https://wiki/mlops/drift-response"
- alert: AccuracyDegradation
expr: (mlops_model_accuracy - mlops_model_baseline_accuracy) / mlops_model_baseline_accuracy < -0.1
for: 24h
labels:
severity: critical
annotations:
summary: "Model accuracy dropped >10% from baseline"
# Productivity Alerts
- alert: HighToilRatio
expr: sum(mlops_toil_hours) / sum(mlops_total_hours) > 0.4
for: 14d
labels:
severity: warning
annotations:
summary: "Team spending >40% time on toil"
# Governance Alerts
- alert: MissingModelDocs
expr: mlops_models_without_docs > 0
for: 7d
labels:
severity: warning
annotations:
summary: "Models without documentation in production"
| Alert Level | Response Time | Responder | Action |
| Green | - | - | Continue monitoring |
| Yellow | 1 business day | Platform Team Lead | Investigate, add to sprint |
| Red | 4 hours | Platform Team + Manager | Immediate action, status updates |
| Critical | 1 hour | Leadership + On-call | War room, incident management |
def generate_weekly_report():
"""Generate weekly leading indicators report."""
report = """
# MLOps Leading Indicators - Week of {date}
## Executive Summary
- Platform Adoption: {adoption}% ({adoption_trend})
- Mean Time to Production: {mttp} days ({mttp_trend})
- Model Health Score: {health}/100 ({health_trend})
## Platform Health
| Metric | This Week | Last Week | Target | Status |
|:-------|:----------|:----------|:-------|:-------|
| Active Users | {users} | {users_prev} | 80% | {users_status} |
| Deployments | {deploys} | {deploys_prev} | ↑ | {deploys_status} |
| Success Rate | {success}% | {success_prev}% | 95% | {success_status} |
## Model Quality
| Model | Accuracy | Drift | Age (days) | Status |
|:------|:---------|:------|:-----------|:-------|
{model_table}
## Action Items
{action_items}
"""
return report
| Indicator Category | Weight | Score | Notes |
| Platform Adoption | 25% | 85 | Strong uptake |
| Deployment Velocity | 25% | 72 | Bottleneck in approval |
| Model Quality | 30% | 90 | All models healthy |
| Team Productivity | 20% | 68 | Toil remains high |
| Composite Score | 100% | 80 | On track |
graph LR
A[↑ Deployment Velocity] --> B[↑ Model Experiments]
B --> C[↑ Model Quality]
C --> D[↑ Business Impact]
E[↑ Platform Adoption] --> F[↓ Shadow IT]
F --> G[↓ Risk]
G --> H[↓ Incidents]
I[↓ Time to Production] --> J[↑ Time to Value]
J --> K[↑ ROI]
from sklearn.linear_model import LinearRegression
import numpy as np
def predict_roi_from_leading_indicators(
adoption_rate: float,
deployment_velocity: float,
model_quality_score: float,
productivity_gain: float
) -> float:
"""
Predict expected ROI based on leading indicators.
Model trained on historical data from similar MLOps implementations.
"""
# Coefficients from trained model
coefficients = {
'adoption': 0.15,
'velocity': 0.25,
'quality': 0.35,
'productivity': 0.25,
'intercept': -0.5
}
# Normalize inputs (0-1 scale)
features = np.array([
adoption_rate,
min(deployment_velocity / 10, 1.0), # Cap at 10x improvement
model_quality_score,
productivity_gain
])
# Predict ROI multiplier
roi_multiplier = (
coefficients['adoption'] * features[0] +
coefficients['velocity'] * features[1] +
coefficients['quality'] * features[2] +
coefficients['productivity'] * features[3] +
coefficients['intercept']
)
return max(0, roi_multiplier)
-
Leading indicators predict success: Don’t wait for ROI to know if you’re on track.
-
Measure across dimensions: Platform, models, people, governance.
-
Set targets and warning signs: Know what good looks like.
-
Collect continuously: Automate data collection.
-
Build early warning systems: Catch problems before they impact business.
-
Connect to business outcomes: Leading indicators should predict lagging ROI.
graph TB
A[Leading Indicators] --> B[Early Warning]
B --> C[Corrective Action]
C --> D[Improved Outcomes]
D --> E[Lagging Indicators]
E --> F[Prove ROI]
F --> A
Next: 8.2 ROI Tracking Dashboard — Building the executive dashboard.