43.1. The Buy vs Build Decision Matrix
Status: Production-Ready Version: 2.0.0 Tags: #Strategy, #Startups, #MLOps
The “Not Invented Here” Syndrome
Startups are founded by Engineers. Engineers love to code. Therefore, Startups tend to Overbuild.
The Result: “Resume Driven Development”. You have a great custom platform, but 0 customers and 2 months of runway left.
The Overbuilding Trap
graph TD
A[Engineer joins startup] --> B[Sees missing tooling]
B --> C{Decision Point}
C -->|Build| D[3 months building Feature Store]
C -->|Buy| E[2 days integrating Feast]
D --> F[Still no customers]
E --> G[Shipping ML features]
F --> H[Runway: 2 months]
G --> I[Revenue growing]
Common Overbuilding Patterns
| Pattern | What They Built | What They Should Have Bought |
|---|---|---|
| Custom Orchestrator | Airflow clone in Python | Managed Airflow (MWAA/Composer) |
| Feature Store v1 | Redis + custom SDK | Feast or Tecton |
| Model Registry | S3 + DynamoDB + scripts | MLflow or Weights & Biases |
| GPU Scheduler | Custom K8s controller | Karpenter or GKE Autopilot |
| Monitoring Stack | Prometheus + custom dashboards | Datadog or managed Cloud Monitoring |
The Time-to-Value Calculation
from dataclasses import dataclass
from typing import Optional
@dataclass
class TimeToValue:
"""Calculate the true cost of build vs buy decisions."""
build_time_weeks: int
buy_setup_days: int
engineer_weekly_rate: float
opportunity_cost_per_week: float
def build_cost(self) -> float:
"""Total cost of building in-house."""
engineering = self.build_time_weeks * self.engineer_weekly_rate
opportunity = self.build_time_weeks * self.opportunity_cost_per_week
return engineering + opportunity
def buy_cost(self, monthly_license: float, months: int = 12) -> float:
"""Total cost of buying for first year."""
setup_cost = (self.buy_setup_days / 5) * self.engineer_weekly_rate
license_cost = monthly_license * months
return setup_cost + license_cost
def breakeven_analysis(self, monthly_license: float) -> dict:
"""When does building become cheaper than buying?"""
build = self.build_cost()
yearly_license = monthly_license * 12
if yearly_license == 0:
return {"breakeven_months": 0, "recommendation": "BUILD"}
breakeven_months = build / (yearly_license / 12)
recommendation = "BUY" if breakeven_months > 24 else "BUILD"
return {
"build_cost": build,
"yearly_license": yearly_license,
"breakeven_months": round(breakeven_months, 1),
"recommendation": recommendation
}
# Example: Feature Store decision
feature_store_calc = TimeToValue(
build_time_weeks=12,
buy_setup_days=5,
engineer_weekly_rate=5000,
opportunity_cost_per_week=10000
)
result = feature_store_calc.breakeven_analysis(monthly_license=2000)
# {'build_cost': 180000, 'yearly_license': 24000, 'breakeven_months': 90.0, 'recommendation': 'BUY'}
Core vs Context Framework
Geoffrey Moore’s framework helps distinguish what to build:
| Type | Definition | Action | Examples |
|---|---|---|---|
| Core | Differentiating activities that drive competitive advantage | BUILD | Recommendation algorithm, Pricing model |
| Context | Necessary but generic, doesn’t differentiate | BUY | Payroll, Email, Monitoring |
| Mission-Critical Context | Generic but must be reliable | BUY + SLA | Authentication, Payment processing |
The Core/Context Matrix
quadrantChart
title Core vs Context Analysis
x-axis Low Differentiation --> High Differentiation
y-axis Low Strategic Value --> High Strategic Value
quadrant-1 Build & Invest
quadrant-2 Buy Premium
quadrant-3 Buy Commodity
quadrant-4 Build if Easy
"ML Model Logic": [0.9, 0.9]
"Feature Engineering": [0.7, 0.8]
"Model Serving": [0.5, 0.6]
"Experiment Tracking": [0.3, 0.5]
"Orchestration": [0.2, 0.4]
"Compute": [0.1, 0.3]
"Logging": [0.1, 0.2]
MLOps-Specific Examples
Core (BUILD):
- Your recommendation algorithm’s core logic
- Domain-specific feature engineering pipelines
- Custom evaluation metrics for your use case
- Agent/LLM prompt chains that define your product
Context (BUY):
- GPU compute (AWS/GCP/Azure)
- Workflow orchestration (Airflow/Prefect)
- Experiment tracking (W&B/MLflow)
- Model serving infrastructure (SageMaker/Vertex)
- Feature stores for most companies (Feast/Tecton)
- Vector databases (Pinecone/Weaviate)
Industry-Specific Core Activities
| Industry | Core ML Activities | Everything Else |
|---|---|---|
| E-commerce | Personalization, Search ranking | Infrastructure, Monitoring |
| Fintech | Risk scoring, Fraud patterns | Compute, Experiment tracking |
| Healthcare | Diagnostic models, Treatment prediction | Data storage, Model serving |
| Autonomous | Perception stack, Decision making | GPU clusters, Logging |
Decision Matrix
Component-Level Analysis
| Component | Evolution Stage | Decision | Reason | Typical Cost |
|---|---|---|---|---|
| GPU Compute | Commodity | BUY | Don’t build datacenters | $$/hour |
| Container Orchestration | Commodity | BUY | K8s managed services mature | $100-500/mo |
| Workflow Orchestration | Product | BUY | Airflow/Prefect are battle-tested | $200-2000/mo |
| Experiment Tracking | Product | BUY | W&B/MLflow work well | $0-500/mo |
| Feature Store | Product | BUY* | Unless at massive scale | $500-5000/mo |
| Model Serving | Custom* | DEPENDS | May need custom for latency | Variable |
| Inference Optimization | Custom | BUILD | Your models, your constraints | Engineering time |
| Agent Logic | Genesis | BUILD | This IS your differentiation | Engineering time |
| Domain Features | Genesis | BUILD | Your competitive moat | Engineering time |
The Wardley Map Approach
graph TB
subgraph "Genesis (Build)"
A[Agent Logic]
B[Custom Eval Framework]
C[Domain Features]
end
subgraph "Custom (Build or Buy)"
D[Model Fine-tuning]
E[Inference Serving]
F[Feature Pipelines]
end
subgraph "Product (Buy)"
G[Experiment Tracking]
H[Orchestration]
I[Vector DB]
end
subgraph "Commodity (Buy)"
J[GPU Compute]
K[Object Storage]
L[Managed K8s]
end
A --> D
B --> G
C --> F
D --> J
E --> L
F --> K
G --> K
H --> L
I --> K
TCO Calculator
Total Cost of Ownership goes beyond license fees:
from dataclasses import dataclass, field
from typing import List, Optional
from enum import Enum
class CostCategory(Enum):
SETUP = "setup"
LICENSE = "license"
INFRASTRUCTURE = "infrastructure"
MAINTENANCE = "maintenance"
TRAINING = "training"
OPPORTUNITY = "opportunity"
@dataclass
class CostItem:
category: CostCategory
name: str
monthly_cost: float = 0
one_time_cost: float = 0
hours_per_month: float = 0
@dataclass
class Solution:
name: str
costs: List[CostItem] = field(default_factory=list)
def add_cost(self, cost: CostItem) -> None:
self.costs.append(cost)
def calculate_tco(
solution: Solution,
hourly_rate: float = 100,
months: int = 12
) -> dict:
"""Calculate Total Cost of Ownership with breakdown."""
one_time = sum(c.one_time_cost for c in solution.costs)
monthly_fixed = sum(c.monthly_cost for c in solution.costs)
monthly_labor = sum(
c.hours_per_month * hourly_rate
for c in solution.costs
)
total_monthly = monthly_fixed + monthly_labor
total = one_time + (total_monthly * months)
breakdown = {}
for category in CostCategory:
category_costs = [c for c in solution.costs if c.category == category]
category_total = sum(
c.one_time_cost + (c.monthly_cost + c.hours_per_month * hourly_rate) * months
for c in category_costs
)
if category_total > 0:
breakdown[category.value] = category_total
return {
"solution": solution.name,
"one_time": one_time,
"monthly": total_monthly,
"total_12_months": total,
"breakdown": breakdown
}
# Build scenario
build = Solution("In-House Feature Store")
build.add_cost(CostItem(
CostCategory.SETUP, "Initial development",
hours_per_month=160, # 4 weeks full-time
one_time_cost=0
))
build.add_cost(CostItem(
CostCategory.INFRASTRUCTURE, "Redis cluster",
monthly_cost=200
))
build.add_cost(CostItem(
CostCategory.INFRASTRUCTURE, "S3 storage",
monthly_cost=50
))
build.add_cost(CostItem(
CostCategory.MAINTENANCE, "Ongoing maintenance",
hours_per_month=20
))
# Buy scenario
buy = Solution("Tecton Feature Store")
buy.add_cost(CostItem(
CostCategory.SETUP, "Integration & training",
one_time_cost=5000 # 50 hours setup
))
buy.add_cost(CostItem(
CostCategory.LICENSE, "Platform fee",
monthly_cost=2000
))
buy.add_cost(CostItem(
CostCategory.MAINTENANCE, "Administration",
hours_per_month=5
))
print("BUILD:", calculate_tco(build))
print("BUY:", calculate_tco(buy))
# BUILD: {'solution': 'In-House Feature Store', 'one_time': 0, 'monthly': 2250,
# 'total_12_months': 27000, 'breakdown': {...}}
# BUY: {'solution': 'Tecton Feature Store', 'one_time': 5000, 'monthly': 2500,
# 'total_12_months': 35000, 'breakdown': {...}}
Hidden Costs Checklist
Many organizations underestimate the true cost of building:
| Hidden Cost | Description | Typical Multiplier |
|---|---|---|
| Maintenance | Bug fixes, upgrades, security patches | 2-3x initial build |
| Documentation | Internal docs, onboarding materials | 10-20% of build |
| On-call | 24/7 support for production systems | $5-15K/month |
| Opportunity Cost | What else could engineers build? | 2-5x direct cost |
| Knowledge Drain | When builders leave | 50-100% rebuild |
| Security | Audits, penetration testing, compliance | $10-50K/year |
| Integration | Connecting with other systems | 20-40% of build |
The 3-Year View
Short-term thinking leads to bad decisions:
def project_costs(
build_initial: float,
build_monthly: float,
buy_setup: float,
buy_monthly: float,
years: int = 3
) -> dict:
"""Project costs over multiple years."""
results = {"year": [], "build_cumulative": [], "buy_cumulative": []}
build_total = build_initial
buy_total = buy_setup
for year in range(1, years + 1):
build_total += build_monthly * 12
buy_total += buy_monthly * 12
results["year"].append(year)
results["build_cumulative"].append(build_total)
results["buy_cumulative"].append(buy_total)
crossover = None
for i, (b, y) in enumerate(zip(results["build_cumulative"], results["buy_cumulative"])):
if b < y:
crossover = i + 1
break
return {
"projection": results,
"crossover_year": crossover,
"recommendation": "BUILD" if crossover and crossover <= 2 else "BUY"
}
Escape Hatch Architecture
The worst outcome: vendor lock-in with no exit path. Build abstraction layers:
The Interface Pattern
from abc import ABC, abstractmethod
from typing import Dict, Any, Optional, List
from dataclasses import dataclass
@dataclass
class ExperimentRun:
run_id: str
metrics: Dict[str, float]
params: Dict[str, Any]
artifacts: List[str]
class ExperimentLogger(ABC):
"""Abstract interface for experiment tracking."""
@abstractmethod
def start_run(self, name: str, tags: Optional[Dict] = None) -> str:
"""Start a new experiment run."""
pass
@abstractmethod
def log_param(self, key: str, value: Any) -> None:
"""Log a hyperparameter."""
pass
@abstractmethod
def log_metric(self, name: str, value: float, step: Optional[int] = None) -> None:
"""Log a metric value."""
pass
@abstractmethod
def log_artifact(self, local_path: str, artifact_path: Optional[str] = None) -> None:
"""Log a file as an artifact."""
pass
@abstractmethod
def end_run(self, status: str = "FINISHED") -> ExperimentRun:
"""End the current run."""
pass
class WandBLogger(ExperimentLogger):
"""Weights & Biases implementation."""
def __init__(self, project: str, entity: Optional[str] = None):
import wandb
self.wandb = wandb
self.project = project
self.entity = entity
self._run = None
def start_run(self, name: str, tags: Optional[Dict] = None) -> str:
self._run = self.wandb.init(
project=self.project,
entity=self.entity,
name=name,
tags=list(tags.keys()) if tags else None
)
return self._run.id
def log_param(self, key: str, value: Any) -> None:
self.wandb.config[key] = value
def log_metric(self, name: str, value: float, step: Optional[int] = None) -> None:
self.wandb.log({name: value}, step=step)
def log_artifact(self, local_path: str, artifact_path: Optional[str] = None) -> None:
self.wandb.save(local_path)
def end_run(self, status: str = "FINISHED") -> ExperimentRun:
run_id = self._run.id
self._run.finish()
return ExperimentRun(
run_id=run_id,
metrics=dict(self._run.summary),
params=dict(self.wandb.config),
artifacts=[]
)
class MLflowLogger(ExperimentLogger):
"""MLflow implementation."""
def __init__(self, tracking_uri: str, experiment_name: str):
import mlflow
self.mlflow = mlflow
self.mlflow.set_tracking_uri(tracking_uri)
self.mlflow.set_experiment(experiment_name)
self._run_id = None
def start_run(self, name: str, tags: Optional[Dict] = None) -> str:
run = self.mlflow.start_run(run_name=name, tags=tags)
self._run_id = run.info.run_id
return self._run_id
def log_param(self, key: str, value: Any) -> None:
self.mlflow.log_param(key, value)
def log_metric(self, name: str, value: float, step: Optional[int] = None) -> None:
self.mlflow.log_metric(name, value, step=step)
def log_artifact(self, local_path: str, artifact_path: Optional[str] = None) -> None:
self.mlflow.log_artifact(local_path, artifact_path)
def end_run(self, status: str = "FINISHED") -> ExperimentRun:
run = self.mlflow.active_run()
self.mlflow.end_run(status=status)
return ExperimentRun(
run_id=self._run_id,
metrics={},
params={},
artifacts=[]
)
# Factory pattern for easy switching
def get_logger(backend: str = "mlflow", **kwargs) -> ExperimentLogger:
"""Factory to get appropriate logger implementation."""
backends = {
"wandb": WandBLogger,
"mlflow": MLflowLogger,
}
if backend not in backends:
raise ValueError(f"Unknown backend: {backend}. Options: {list(backends.keys())}")
return backends[backend](**kwargs)
# Training code uses interface, not vendor-specific API
def train_model(model, train_data, val_data, logger: ExperimentLogger):
"""Training loop that works with any logging backend."""
run_id = logger.start_run(name="training-run")
logger.log_param("model_type", type(model).__name__)
logger.log_param("train_size", len(train_data))
for epoch in range(100):
loss = model.train_epoch(train_data)
val_loss = model.validate(val_data)
logger.log_metric("train_loss", loss, step=epoch)
logger.log_metric("val_loss", val_loss, step=epoch)
if epoch % 10 == 0:
model.save("checkpoint.pt")
logger.log_artifact("checkpoint.pt")
return logger.end_run()
Multi-Cloud Escape Hatch
from abc import ABC, abstractmethod
from typing import BinaryIO
class ObjectStorage(ABC):
"""Abstract interface for object storage."""
@abstractmethod
def put(self, key: str, data: BinaryIO) -> str:
pass
@abstractmethod
def get(self, key: str) -> BinaryIO:
pass
@abstractmethod
def delete(self, key: str) -> None:
pass
@abstractmethod
def list(self, prefix: str) -> list:
pass
class S3Storage(ObjectStorage):
def __init__(self, bucket: str, region: str = "us-east-1"):
import boto3
self.s3 = boto3.client("s3", region_name=region)
self.bucket = bucket
def put(self, key: str, data: BinaryIO) -> str:
self.s3.upload_fileobj(data, self.bucket, key)
return f"s3://{self.bucket}/{key}"
def get(self, key: str) -> BinaryIO:
import io
buffer = io.BytesIO()
self.s3.download_fileobj(self.bucket, key, buffer)
buffer.seek(0)
return buffer
def delete(self, key: str) -> None:
self.s3.delete_object(Bucket=self.bucket, Key=key)
def list(self, prefix: str) -> list:
response = self.s3.list_objects_v2(Bucket=self.bucket, Prefix=prefix)
return [obj["Key"] for obj in response.get("Contents", [])]
class GCSStorage(ObjectStorage):
def __init__(self, bucket: str, project: str):
from google.cloud import storage
self.client = storage.Client(project=project)
self.bucket = self.client.bucket(bucket)
def put(self, key: str, data: BinaryIO) -> str:
blob = self.bucket.blob(key)
blob.upload_from_file(data)
return f"gs://{self.bucket.name}/{key}"
def get(self, key: str) -> BinaryIO:
import io
blob = self.bucket.blob(key)
buffer = io.BytesIO()
blob.download_to_file(buffer)
buffer.seek(0)
return buffer
def delete(self, key: str) -> None:
self.bucket.blob(key).delete()
def list(self, prefix: str) -> list:
return [blob.name for blob in self.bucket.list_blobs(prefix=prefix)]
Wardley Map for MLOps 2024
Current evolution of MLOps components:
| Component | Evolution Stage | Strategy | Recommended Vendors |
|---|---|---|---|
| GPU Compute | Commodity | Buy cloud | AWS/GCP/Azure |
| LLM Base Models | Commodity | Buy/Download | OpenAI, Anthropic, HuggingFace |
| Vector Database | Product | Buy | Pinecone, Weaviate, Qdrant |
| Experiment Tracking | Product | Buy OSS | MLflow, W&B |
| Orchestration | Product | Buy OSS | Airflow, Prefect, Dagster |
| Feature Store | Product | Buy | Feast, Tecton |
| Model Serving | Custom → Product | Buy + Customize | KServe, Seldon, Ray Serve |
| Agent Logic | Genesis | Build | Your IP |
| Eval Framework | Genesis | Build/Adapt | Custom + LangSmith |
| Domain Prompts | Genesis | Build | Your IP |
Evolution Over Time
timeline
title MLOps Component Evolution
2019 : Experiment Tracking (Genesis)
: Feature Stores (Genesis)
2021 : Experiment Tracking (Custom)
: Feature Stores (Custom)
: Vector DBs (Genesis)
2023 : Experiment Tracking (Product)
: Feature Stores (Product)
: Vector DBs (Custom)
: LLM APIs (Custom)
2025 : Experiment Tracking (Commodity)
: Feature Stores (Product)
: Vector DBs (Product)
: LLM APIs (Commodity)
: Agent Frameworks (Genesis)
Vendor Evaluation Framework
Due Diligence Checklist
Before buying, verify:
from dataclasses import dataclass
from typing import List, Optional
from enum import Enum
class RiskLevel(Enum):
LOW = "low"
MEDIUM = "medium"
HIGH = "high"
CRITICAL = "critical"
@dataclass
class VendorEvaluation:
vendor_name: str
# Financial stability
funding_status: str # "Series A", "Profitable", "Public"
runway_months: Optional[int]
revenue_growth: Optional[float]
# Technical evaluation
uptime_sla: float # 99.9%, 99.99%
data_export_api: bool
self_hosted_option: bool
open_source_core: bool
# Strategic risk
acquisition_risk: RiskLevel
pricing_lock_risk: RiskLevel
def calculate_risk_score(self) -> dict:
"""Calculate overall vendor risk."""
scores = {
"financial": 0,
"technical": 0,
"strategic": 0
}
# Financial scoring
if self.funding_status == "Public" or self.funding_status == "Profitable":
scores["financial"] = 10
elif self.runway_months and self.runway_months > 24:
scores["financial"] = 7
elif self.runway_months and self.runway_months > 12:
scores["financial"] = 4
else:
scores["financial"] = 2
# Technical scoring
tech_score = 0
if self.data_export_api:
tech_score += 4
if self.self_hosted_option:
tech_score += 3
if self.open_source_core:
tech_score += 3
scores["technical"] = tech_score
# Strategic scoring
risk_values = {RiskLevel.LOW: 10, RiskLevel.MEDIUM: 6, RiskLevel.HIGH: 3, RiskLevel.CRITICAL: 1}
scores["strategic"] = (
risk_values[self.acquisition_risk] +
risk_values[self.pricing_lock_risk]
) / 2
overall = sum(scores.values()) / 3
return {
"scores": scores,
"overall": round(overall, 1),
"recommendation": "SAFE" if overall >= 7 else "CAUTION" if overall >= 4 else "AVOID"
}
# Example evaluation
wandb_eval = VendorEvaluation(
vendor_name="Weights & Biases",
funding_status="Series C",
runway_months=36,
revenue_growth=0.8,
uptime_sla=99.9,
data_export_api=True,
self_hosted_option=True,
open_source_core=False,
acquisition_risk=RiskLevel.MEDIUM,
pricing_lock_risk=RiskLevel.LOW
)
print(wandb_eval.calculate_risk_score())
# {'scores': {'financial': 7, 'technical': 7, 'strategic': 8.0},
# 'overall': 7.3, 'recommendation': 'SAFE'}
Data Portability Requirements
Always verify before signing:
| Requirement | Question to Ask | Red Flag |
|---|---|---|
| Data Export | “Can I export all my data via API?” | “Export available on request” |
| Format | “What format is the export?” | Proprietary format only |
| Frequency | “Can I schedule automated exports?” | Manual only |
| Completeness | “Does export include all metadata?” | Partial exports |
| Cost | “Is there an export fee?” | Per-GB charges |
| Self-hosting | “Can I run this on my infra?” | SaaS only |
Cloud Credit Strategy
Startups can get significant free credits:
| Program | Credits | Requirements |
|---|---|---|
| AWS Activate | $10K-$100K | Affiliated with accelerator |
| Google for Startups | $100K-$200K | Series A or earlier |
| Azure for Startups | $25K-$150K | Association membership |
| NVIDIA Inception | GPU credits + DGX access | ML-focused startup |
Stacking Credits Strategy
graph LR
A[Seed Stage] --> B[AWS: $10K]
A --> C[GCP: $100K]
A --> D[Azure: $25K]
B --> E[Series A]
C --> E
D --> E
E --> F[AWS: $100K]
E --> G[GCP: $200K]
E --> H[NVIDIA: GPU Access]
F --> I[$435K Total Credits]
G --> I
H --> I
Troubleshooting Common Decisions
| Problem | Cause | Solution |
|---|---|---|
| Vendor acquired/shutdown | Startup risk | Own your data, use interfaces |
| Unexpected bill spike | Auto-scaling without limits | Set budgets, alerts, quotas |
| Shadow IT emerging | Official tooling too slow | Improve DX, reduce friction |
| Vendor price increase | Contract renewal | Multi-year lock, exit clause |
| Integration nightmare | Closed ecosystem | Prefer open standards |
| Performance issues | Shared infra limits | Negotiate dedicated resources |
Acquisition Contingency Plan
# acquisition_contingency.yaml
vendor_dependencies:
- name: "Experiment Tracker (W&B)"
criticality: high
alternative_vendors:
- mlflow-self-hosted
- neptune-ai
migration_time_estimate: "2-4 weeks"
data_export_method: "wandb sync --export"
- name: "Vector Database (Pinecone)"
criticality: high
alternative_vendors:
- weaviate
- qdrant
- pgvector
migration_time_estimate: "1-2 weeks"
data_export_method: "pinecone export --format parquet"
migration_procedures:
quarterly_export_test:
- Export all data from each vendor
- Verify import into alternative
- Document any schema changes
- Update migration runbooks
Decision Flowchart
flowchart TD
A[New Capability Needed] --> B{Is this your<br>core differentiator?}
B -->|Yes| C[BUILD IT]
B -->|No| D{Does a good<br>product exist?}
D -->|No| E{Can you wait<br>6 months?}
E -->|Yes| F[Wait & Monitor]
E -->|No| G[Build Minimum]
D -->|Yes| H{Open source<br>or SaaS?}
H -->|OSS Available| I{Do you have ops<br>capacity?}
I -->|Yes| J[Deploy OSS]
I -->|No| K[Buy Managed]
H -->|SaaS Only| L{Vendor risk<br>acceptable?}
L -->|Yes| M[Buy SaaS]
L -->|No| N[Build with<br>abstraction layer]
C --> O[Document & Abstract]
G --> O
J --> O
K --> O
M --> O
N --> O
O --> P[Review Annually]
Summary Checklist
| Step | Action | Owner | Frequency |
|---|---|---|---|
| 1 | Inventory all tools (Built vs Bought) | Platform Team | Quarterly |
| 2 | Audit “Built” tools for TCO | Engineering Lead | Bi-annually |
| 3 | Get startup credits from all clouds | Finance/Founders | At funding rounds |
| 4 | Verify data export capability | Platform Team | Before signing |
| 5 | Wrap vendor SDKs in interfaces | Engineering | At integration |
| 6 | Test vendor migration path | Platform Team | Annually |
| 7 | Review vendor financial health | Finance | Quarterly |
| 8 | Update contingency plans | Platform Team | Bi-annually |
Quick Decision Matrix
| If… | Then… | Because… |
|---|---|---|
| < 3 engineers | Buy everything | Focus on product |
| Revenue < $1M ARR | Buy managed | Can’t afford ops |
| Core ML capability | Build it | Your IP moat |
| Generic infrastructure | Buy it | Not differentiating |
| Vendor is tiny startup | Build abstraction | Acquisition risk |
| Open source exists | Deploy if ops capacity | Lower cost long-term |
[End of Section 43.1]