43.1. The Buy vs Build Decision Matrix

Status: Production-Ready Version: 2.0.0 Tags: #Strategy, #Startups, #MLOps

The “Not Invented Here” Syndrome

Startups are founded by Engineers. Engineers love to code. Therefore, Startups tend to Overbuild.

The Result: “Resume Driven Development”. You have a great custom platform, but 0 customers and 2 months of runway left.

The Overbuilding Trap

graph TD
    A[Engineer joins startup] --> B[Sees missing tooling]
    B --> C{Decision Point}
    C -->|Build| D[3 months building Feature Store]
    C -->|Buy| E[2 days integrating Feast]
    D --> F[Still no customers]
    E --> G[Shipping ML features]
    F --> H[Runway: 2 months]
    G --> I[Revenue growing]

Common Overbuilding Patterns

Pattern	What They Built	What They Should Have Bought
Custom Orchestrator	Airflow clone in Python	Managed Airflow (MWAA/Composer)
Feature Store v1	Redis + custom SDK	Feast or Tecton
Model Registry	S3 + DynamoDB + scripts	MLflow or Weights & Biases
GPU Scheduler	Custom K8s controller	Karpenter or GKE Autopilot
Monitoring Stack	Prometheus + custom dashboards	Datadog or managed Cloud Monitoring

The Time-to-Value Calculation

from dataclasses import dataclass
from typing import Optional

@dataclass
class TimeToValue:
    """Calculate the true cost of build vs buy decisions."""
    
    build_time_weeks: int
    buy_setup_days: int
    engineer_weekly_rate: float
    opportunity_cost_per_week: float
    
    def build_cost(self) -> float:
        """Total cost of building in-house."""
        engineering = self.build_time_weeks * self.engineer_weekly_rate
        opportunity = self.build_time_weeks * self.opportunity_cost_per_week
        return engineering + opportunity
    
    def buy_cost(self, monthly_license: float, months: int = 12) -> float:
        """Total cost of buying for first year."""
        setup_cost = (self.buy_setup_days / 5) * self.engineer_weekly_rate
        license_cost = monthly_license * months
        return setup_cost + license_cost
    
    def breakeven_analysis(self, monthly_license: float) -> dict:
        """When does building become cheaper than buying?"""
        build = self.build_cost()
        yearly_license = monthly_license * 12
        
        if yearly_license == 0:
            return {"breakeven_months": 0, "recommendation": "BUILD"}
        
        breakeven_months = build / (yearly_license / 12)
        
        recommendation = "BUY" if breakeven_months > 24 else "BUILD"
        
        return {
            "build_cost": build,
            "yearly_license": yearly_license,
            "breakeven_months": round(breakeven_months, 1),
            "recommendation": recommendation
        }

# Example: Feature Store decision
feature_store_calc = TimeToValue(
    build_time_weeks=12,
    buy_setup_days=5,
    engineer_weekly_rate=5000,
    opportunity_cost_per_week=10000
)

result = feature_store_calc.breakeven_analysis(monthly_license=2000)
# {'build_cost': 180000, 'yearly_license': 24000, 'breakeven_months': 90.0, 'recommendation': 'BUY'}

Core vs Context Framework

Geoffrey Moore’s framework helps distinguish what to build:

Type	Definition	Action	Examples
Core	Differentiating activities that drive competitive advantage	BUILD	Recommendation algorithm, Pricing model
Context	Necessary but generic, doesn’t differentiate	BUY	Payroll, Email, Monitoring
Mission-Critical Context	Generic but must be reliable	BUY + SLA	Authentication, Payment processing

The Core/Context Matrix

quadrantChart
    title Core vs Context Analysis
    x-axis Low Differentiation --> High Differentiation
    y-axis Low Strategic Value --> High Strategic Value
    quadrant-1 Build & Invest
    quadrant-2 Buy Premium
    quadrant-3 Buy Commodity
    quadrant-4 Build if Easy
    
    "ML Model Logic": [0.9, 0.9]
    "Feature Engineering": [0.7, 0.8]
    "Model Serving": [0.5, 0.6]
    "Experiment Tracking": [0.3, 0.5]
    "Orchestration": [0.2, 0.4]
    "Compute": [0.1, 0.3]
    "Logging": [0.1, 0.2]

MLOps-Specific Examples

Core (BUILD):

Your recommendation algorithm’s core logic
Domain-specific feature engineering pipelines
Custom evaluation metrics for your use case
Agent/LLM prompt chains that define your product

Context (BUY):

GPU compute (AWS/GCP/Azure)
Workflow orchestration (Airflow/Prefect)
Experiment tracking (W&B/MLflow)
Model serving infrastructure (SageMaker/Vertex)
Feature stores for most companies (Feast/Tecton)
Vector databases (Pinecone/Weaviate)

Industry-Specific Core Activities

Industry	Core ML Activities	Everything Else
E-commerce	Personalization, Search ranking	Infrastructure, Monitoring
Fintech	Risk scoring, Fraud patterns	Compute, Experiment tracking
Healthcare	Diagnostic models, Treatment prediction	Data storage, Model serving
Autonomous	Perception stack, Decision making	GPU clusters, Logging

Decision Matrix

Component-Level Analysis

Component	Evolution Stage	Decision	Reason	Typical Cost
GPU Compute	Commodity	BUY	Don’t build datacenters	$$/hour
Container Orchestration	Commodity	BUY	K8s managed services mature	$100-500/mo
Workflow Orchestration	Product	BUY	Airflow/Prefect are battle-tested	$200-2000/mo
Experiment Tracking	Product	BUY	W&B/MLflow work well	$0-500/mo
Feature Store	Product	BUY*	Unless at massive scale	$500-5000/mo
Model Serving	Custom*	DEPENDS	May need custom for latency	Variable
Inference Optimization	Custom	BUILD	Your models, your constraints	Engineering time
Agent Logic	Genesis	BUILD	This IS your differentiation	Engineering time
Domain Features	Genesis	BUILD	Your competitive moat	Engineering time

The Wardley Map Approach

graph TB
    subgraph "Genesis (Build)"
        A[Agent Logic]
        B[Custom Eval Framework]
        C[Domain Features]
    end
    
    subgraph "Custom (Build or Buy)"
        D[Model Fine-tuning]
        E[Inference Serving]
        F[Feature Pipelines]
    end
    
    subgraph "Product (Buy)"
        G[Experiment Tracking]
        H[Orchestration]
        I[Vector DB]
    end
    
    subgraph "Commodity (Buy)"
        J[GPU Compute]
        K[Object Storage]
        L[Managed K8s]
    end
    
    A --> D
    B --> G
    C --> F
    D --> J
    E --> L
    F --> K
    G --> K
    H --> L
    I --> K

TCO Calculator

Total Cost of Ownership goes beyond license fees:

from dataclasses import dataclass, field
from typing import List, Optional
from enum import Enum

class CostCategory(Enum):
    SETUP = "setup"
    LICENSE = "license"
    INFRASTRUCTURE = "infrastructure"
    MAINTENANCE = "maintenance"
    TRAINING = "training"
    OPPORTUNITY = "opportunity"

@dataclass
class CostItem:
    category: CostCategory
    name: str
    monthly_cost: float = 0
    one_time_cost: float = 0
    hours_per_month: float = 0

@dataclass
class Solution:
    name: str
    costs: List[CostItem] = field(default_factory=list)
    
    def add_cost(self, cost: CostItem) -> None:
        self.costs.append(cost)

def calculate_tco(
    solution: Solution,
    hourly_rate: float = 100,
    months: int = 12
) -> dict:
    """Calculate Total Cost of Ownership with breakdown."""
    
    one_time = sum(c.one_time_cost for c in solution.costs)
    
    monthly_fixed = sum(c.monthly_cost for c in solution.costs)
    
    monthly_labor = sum(
        c.hours_per_month * hourly_rate 
        for c in solution.costs
    )
    
    total_monthly = monthly_fixed + monthly_labor
    total = one_time + (total_monthly * months)
    
    breakdown = {}
    for category in CostCategory:
        category_costs = [c for c in solution.costs if c.category == category]
        category_total = sum(
            c.one_time_cost + (c.monthly_cost + c.hours_per_month * hourly_rate) * months
            for c in category_costs
        )
        if category_total > 0:
            breakdown[category.value] = category_total
    
    return {
        "solution": solution.name,
        "one_time": one_time,
        "monthly": total_monthly,
        "total_12_months": total,
        "breakdown": breakdown
    }

# Build scenario
build = Solution("In-House Feature Store")
build.add_cost(CostItem(
    CostCategory.SETUP, "Initial development",
    hours_per_month=160,  # 4 weeks full-time
    one_time_cost=0
))
build.add_cost(CostItem(
    CostCategory.INFRASTRUCTURE, "Redis cluster",
    monthly_cost=200
))
build.add_cost(CostItem(
    CostCategory.INFRASTRUCTURE, "S3 storage",
    monthly_cost=50
))
build.add_cost(CostItem(
    CostCategory.MAINTENANCE, "Ongoing maintenance",
    hours_per_month=20
))

# Buy scenario
buy = Solution("Tecton Feature Store")
buy.add_cost(CostItem(
    CostCategory.SETUP, "Integration & training",
    one_time_cost=5000  # 50 hours setup
))
buy.add_cost(CostItem(
    CostCategory.LICENSE, "Platform fee",
    monthly_cost=2000
))
buy.add_cost(CostItem(
    CostCategory.MAINTENANCE, "Administration",
    hours_per_month=5
))

print("BUILD:", calculate_tco(build))
print("BUY:", calculate_tco(buy))

# BUILD: {'solution': 'In-House Feature Store', 'one_time': 0, 'monthly': 2250, 
#         'total_12_months': 27000, 'breakdown': {...}}
# BUY: {'solution': 'Tecton Feature Store', 'one_time': 5000, 'monthly': 2500, 
#       'total_12_months': 35000, 'breakdown': {...}}

Hidden Costs Checklist

Many organizations underestimate the true cost of building:

Hidden Cost	Description	Typical Multiplier
Maintenance	Bug fixes, upgrades, security patches	2-3x initial build
Documentation	Internal docs, onboarding materials	10-20% of build
On-call	24/7 support for production systems	$5-15K/month
Opportunity Cost	What else could engineers build?	2-5x direct cost
Knowledge Drain	When builders leave	50-100% rebuild
Security	Audits, penetration testing, compliance	$10-50K/year
Integration	Connecting with other systems	20-40% of build

The 3-Year View

Short-term thinking leads to bad decisions:

def project_costs(
    build_initial: float,
    build_monthly: float,
    buy_setup: float,
    buy_monthly: float,
    years: int = 3
) -> dict:
    """Project costs over multiple years."""
    
    results = {"year": [], "build_cumulative": [], "buy_cumulative": []}
    
    build_total = build_initial
    buy_total = buy_setup
    
    for year in range(1, years + 1):
        build_total += build_monthly * 12
        buy_total += buy_monthly * 12
        
        results["year"].append(year)
        results["build_cumulative"].append(build_total)
        results["buy_cumulative"].append(buy_total)
    
    crossover = None
    for i, (b, y) in enumerate(zip(results["build_cumulative"], results["buy_cumulative"])):
        if b < y:
            crossover = i + 1
            break
    
    return {
        "projection": results,
        "crossover_year": crossover,
        "recommendation": "BUILD" if crossover and crossover <= 2 else "BUY"
    }

Escape Hatch Architecture

The worst outcome: vendor lock-in with no exit path. Build abstraction layers:

The Interface Pattern

from abc import ABC, abstractmethod
from typing import Dict, Any, Optional, List
from dataclasses import dataclass

@dataclass
class ExperimentRun:
    run_id: str
    metrics: Dict[str, float]
    params: Dict[str, Any]
    artifacts: List[str]

class ExperimentLogger(ABC):
    """Abstract interface for experiment tracking."""
    
    @abstractmethod
    def start_run(self, name: str, tags: Optional[Dict] = None) -> str:
        """Start a new experiment run."""
        pass
    
    @abstractmethod
    def log_param(self, key: str, value: Any) -> None:
        """Log a hyperparameter."""
        pass
    
    @abstractmethod
    def log_metric(self, name: str, value: float, step: Optional[int] = None) -> None:
        """Log a metric value."""
        pass
    
    @abstractmethod
    def log_artifact(self, local_path: str, artifact_path: Optional[str] = None) -> None:
        """Log a file as an artifact."""
        pass
    
    @abstractmethod
    def end_run(self, status: str = "FINISHED") -> ExperimentRun:
        """End the current run."""
        pass


class WandBLogger(ExperimentLogger):
    """Weights & Biases implementation."""
    
    def __init__(self, project: str, entity: Optional[str] = None):
        import wandb
        self.wandb = wandb
        self.project = project
        self.entity = entity
        self._run = None
    
    def start_run(self, name: str, tags: Optional[Dict] = None) -> str:
        self._run = self.wandb.init(
            project=self.project,
            entity=self.entity,
            name=name,
            tags=list(tags.keys()) if tags else None
        )
        return self._run.id
    
    def log_param(self, key: str, value: Any) -> None:
        self.wandb.config[key] = value
    
    def log_metric(self, name: str, value: float, step: Optional[int] = None) -> None:
        self.wandb.log({name: value}, step=step)
    
    def log_artifact(self, local_path: str, artifact_path: Optional[str] = None) -> None:
        self.wandb.save(local_path)
    
    def end_run(self, status: str = "FINISHED") -> ExperimentRun:
        run_id = self._run.id
        self._run.finish()
        return ExperimentRun(
            run_id=run_id,
            metrics=dict(self._run.summary),
            params=dict(self.wandb.config),
            artifacts=[]
        )


class MLflowLogger(ExperimentLogger):
    """MLflow implementation."""
    
    def __init__(self, tracking_uri: str, experiment_name: str):
        import mlflow
        self.mlflow = mlflow
        self.mlflow.set_tracking_uri(tracking_uri)
        self.mlflow.set_experiment(experiment_name)
        self._run_id = None
    
    def start_run(self, name: str, tags: Optional[Dict] = None) -> str:
        run = self.mlflow.start_run(run_name=name, tags=tags)
        self._run_id = run.info.run_id
        return self._run_id
    
    def log_param(self, key: str, value: Any) -> None:
        self.mlflow.log_param(key, value)
    
    def log_metric(self, name: str, value: float, step: Optional[int] = None) -> None:
        self.mlflow.log_metric(name, value, step=step)
    
    def log_artifact(self, local_path: str, artifact_path: Optional[str] = None) -> None:
        self.mlflow.log_artifact(local_path, artifact_path)
    
    def end_run(self, status: str = "FINISHED") -> ExperimentRun:
        run = self.mlflow.active_run()
        self.mlflow.end_run(status=status)
        return ExperimentRun(
            run_id=self._run_id,
            metrics={},
            params={},
            artifacts=[]
        )


# Factory pattern for easy switching
def get_logger(backend: str = "mlflow", **kwargs) -> ExperimentLogger:
    """Factory to get appropriate logger implementation."""
    
    backends = {
        "wandb": WandBLogger,
        "mlflow": MLflowLogger,
    }
    
    if backend not in backends:
        raise ValueError(f"Unknown backend: {backend}. Options: {list(backends.keys())}")
    
    return backends[backend](**kwargs)


# Training code uses interface, not vendor-specific API
def train_model(model, train_data, val_data, logger: ExperimentLogger):
    """Training loop that works with any logging backend."""
    
    run_id = logger.start_run(name="training-run")
    
    logger.log_param("model_type", type(model).__name__)
    logger.log_param("train_size", len(train_data))
    
    for epoch in range(100):
        loss = model.train_epoch(train_data)
        val_loss = model.validate(val_data)
        
        logger.log_metric("train_loss", loss, step=epoch)
        logger.log_metric("val_loss", val_loss, step=epoch)
        
        if epoch % 10 == 0:
            model.save("checkpoint.pt")
            logger.log_artifact("checkpoint.pt")
    
    return logger.end_run()

Multi-Cloud Escape Hatch

from abc import ABC, abstractmethod
from typing import BinaryIO

class ObjectStorage(ABC):
    """Abstract interface for object storage."""
    
    @abstractmethod
    def put(self, key: str, data: BinaryIO) -> str:
        pass
    
    @abstractmethod
    def get(self, key: str) -> BinaryIO:
        pass
    
    @abstractmethod
    def delete(self, key: str) -> None:
        pass
    
    @abstractmethod
    def list(self, prefix: str) -> list:
        pass


class S3Storage(ObjectStorage):
    def __init__(self, bucket: str, region: str = "us-east-1"):
        import boto3
        self.s3 = boto3.client("s3", region_name=region)
        self.bucket = bucket
    
    def put(self, key: str, data: BinaryIO) -> str:
        self.s3.upload_fileobj(data, self.bucket, key)
        return f"s3://{self.bucket}/{key}"
    
    def get(self, key: str) -> BinaryIO:
        import io
        buffer = io.BytesIO()
        self.s3.download_fileobj(self.bucket, key, buffer)
        buffer.seek(0)
        return buffer
    
    def delete(self, key: str) -> None:
        self.s3.delete_object(Bucket=self.bucket, Key=key)
    
    def list(self, prefix: str) -> list:
        response = self.s3.list_objects_v2(Bucket=self.bucket, Prefix=prefix)
        return [obj["Key"] for obj in response.get("Contents", [])]


class GCSStorage(ObjectStorage):
    def __init__(self, bucket: str, project: str):
        from google.cloud import storage
        self.client = storage.Client(project=project)
        self.bucket = self.client.bucket(bucket)
    
    def put(self, key: str, data: BinaryIO) -> str:
        blob = self.bucket.blob(key)
        blob.upload_from_file(data)
        return f"gs://{self.bucket.name}/{key}"
    
    def get(self, key: str) -> BinaryIO:
        import io
        blob = self.bucket.blob(key)
        buffer = io.BytesIO()
        blob.download_to_file(buffer)
        buffer.seek(0)
        return buffer
    
    def delete(self, key: str) -> None:
        self.bucket.blob(key).delete()
    
    def list(self, prefix: str) -> list:
        return [blob.name for blob in self.bucket.list_blobs(prefix=prefix)]

Wardley Map for MLOps 2024

Current evolution of MLOps components:

Component	Evolution Stage	Strategy	Recommended Vendors
GPU Compute	Commodity	Buy cloud	AWS/GCP/Azure
LLM Base Models	Commodity	Buy/Download	OpenAI, Anthropic, HuggingFace
Vector Database	Product	Buy	Pinecone, Weaviate, Qdrant
Experiment Tracking	Product	Buy OSS	MLflow, W&B
Orchestration	Product	Buy OSS	Airflow, Prefect, Dagster
Feature Store	Product	Buy	Feast, Tecton
Model Serving	Custom → Product	Buy + Customize	KServe, Seldon, Ray Serve
Agent Logic	Genesis	Build	Your IP
Eval Framework	Genesis	Build/Adapt	Custom + LangSmith
Domain Prompts	Genesis	Build	Your IP

Evolution Over Time

timeline
    title MLOps Component Evolution
    2019 : Experiment Tracking (Genesis)
         : Feature Stores (Genesis)
    2021 : Experiment Tracking (Custom)
         : Feature Stores (Custom)
         : Vector DBs (Genesis)
    2023 : Experiment Tracking (Product)
         : Feature Stores (Product)
         : Vector DBs (Custom)
         : LLM APIs (Custom)
    2025 : Experiment Tracking (Commodity)
         : Feature Stores (Product)
         : Vector DBs (Product)
         : LLM APIs (Commodity)
         : Agent Frameworks (Genesis)

Vendor Evaluation Framework

Due Diligence Checklist

Before buying, verify:

from dataclasses import dataclass
from typing import List, Optional
from enum import Enum

class RiskLevel(Enum):
    LOW = "low"
    MEDIUM = "medium"
    HIGH = "high"
    CRITICAL = "critical"

@dataclass
class VendorEvaluation:
    vendor_name: str
    
    # Financial stability
    funding_status: str  # "Series A", "Profitable", "Public"
    runway_months: Optional[int]
    revenue_growth: Optional[float]
    
    # Technical evaluation
    uptime_sla: float  # 99.9%, 99.99%
    data_export_api: bool
    self_hosted_option: bool
    open_source_core: bool
    
    # Strategic risk
    acquisition_risk: RiskLevel
    pricing_lock_risk: RiskLevel
    
    def calculate_risk_score(self) -> dict:
        """Calculate overall vendor risk."""
        
        scores = {
            "financial": 0,
            "technical": 0,
            "strategic": 0
        }
        
        # Financial scoring
        if self.funding_status == "Public" or self.funding_status == "Profitable":
            scores["financial"] = 10
        elif self.runway_months and self.runway_months > 24:
            scores["financial"] = 7
        elif self.runway_months and self.runway_months > 12:
            scores["financial"] = 4
        else:
            scores["financial"] = 2
        
        # Technical scoring
        tech_score = 0
        if self.data_export_api:
            tech_score += 4
        if self.self_hosted_option:
            tech_score += 3
        if self.open_source_core:
            tech_score += 3
        scores["technical"] = tech_score
        
        # Strategic scoring
        risk_values = {RiskLevel.LOW: 10, RiskLevel.MEDIUM: 6, RiskLevel.HIGH: 3, RiskLevel.CRITICAL: 1}
        scores["strategic"] = (
            risk_values[self.acquisition_risk] + 
            risk_values[self.pricing_lock_risk]
        ) / 2
        
        overall = sum(scores.values()) / 3
        
        return {
            "scores": scores,
            "overall": round(overall, 1),
            "recommendation": "SAFE" if overall >= 7 else "CAUTION" if overall >= 4 else "AVOID"
        }


# Example evaluation
wandb_eval = VendorEvaluation(
    vendor_name="Weights & Biases",
    funding_status="Series C",
    runway_months=36,
    revenue_growth=0.8,
    uptime_sla=99.9,
    data_export_api=True,
    self_hosted_option=True,
    open_source_core=False,
    acquisition_risk=RiskLevel.MEDIUM,
    pricing_lock_risk=RiskLevel.LOW
)

print(wandb_eval.calculate_risk_score())
# {'scores': {'financial': 7, 'technical': 7, 'strategic': 8.0}, 
#  'overall': 7.3, 'recommendation': 'SAFE'}

Data Portability Requirements

Always verify before signing:

Requirement	Question to Ask	Red Flag
Data Export	“Can I export all my data via API?”	“Export available on request”
Format	“What format is the export?”	Proprietary format only
Frequency	“Can I schedule automated exports?”	Manual only
Completeness	“Does export include all metadata?”	Partial exports
Cost	“Is there an export fee?”	Per-GB charges
Self-hosting	“Can I run this on my infra?”	SaaS only

Cloud Credit Strategy

Startups can get significant free credits:

Program	Credits	Requirements
AWS Activate	$10K-$100K	Affiliated with accelerator
Google for Startups	$100K-$200K	Series A or earlier
Azure for Startups	$25K-$150K	Association membership
NVIDIA Inception	GPU credits + DGX access	ML-focused startup

Stacking Credits Strategy

graph LR
    A[Seed Stage] --> B[AWS: $10K]
    A --> C[GCP: $100K]
    A --> D[Azure: $25K]
    
    B --> E[Series A]
    C --> E
    D --> E
    
    E --> F[AWS: $100K]
    E --> G[GCP: $200K]
    E --> H[NVIDIA: GPU Access]
    
    F --> I[$435K Total Credits]
    G --> I
    H --> I

Troubleshooting Common Decisions

Problem	Cause	Solution
Vendor acquired/shutdown	Startup risk	Own your data, use interfaces
Unexpected bill spike	Auto-scaling without limits	Set budgets, alerts, quotas
Shadow IT emerging	Official tooling too slow	Improve DX, reduce friction
Vendor price increase	Contract renewal	Multi-year lock, exit clause
Integration nightmare	Closed ecosystem	Prefer open standards
Performance issues	Shared infra limits	Negotiate dedicated resources

Acquisition Contingency Plan

# acquisition_contingency.yaml
vendor_dependencies:
  - name: "Experiment Tracker (W&B)"
    criticality: high
    alternative_vendors:
      - mlflow-self-hosted
      - neptune-ai
    migration_time_estimate: "2-4 weeks"
    data_export_method: "wandb sync --export"
    
  - name: "Vector Database (Pinecone)"
    criticality: high
    alternative_vendors:
      - weaviate
      - qdrant
      - pgvector
    migration_time_estimate: "1-2 weeks"
    data_export_method: "pinecone export --format parquet"

migration_procedures:
  quarterly_export_test:
    - Export all data from each vendor
    - Verify import into alternative
    - Document any schema changes
    - Update migration runbooks

Decision Flowchart

flowchart TD
    A[New Capability Needed] --> B{Is this your<br>core differentiator?}
    B -->|Yes| C[BUILD IT]
    B -->|No| D{Does a good<br>product exist?}
    
    D -->|No| E{Can you wait<br>6 months?}
    E -->|Yes| F[Wait & Monitor]
    E -->|No| G[Build Minimum]
    
    D -->|Yes| H{Open source<br>or SaaS?}
    
    H -->|OSS Available| I{Do you have ops<br>capacity?}
    I -->|Yes| J[Deploy OSS]
    I -->|No| K[Buy Managed]
    
    H -->|SaaS Only| L{Vendor risk<br>acceptable?}
    L -->|Yes| M[Buy SaaS]
    L -->|No| N[Build with<br>abstraction layer]
    
    C --> O[Document & Abstract]
    G --> O
    J --> O
    K --> O
    M --> O
    N --> O
    
    O --> P[Review Annually]

Summary Checklist

Step	Action	Owner	Frequency
1	Inventory all tools (Built vs Bought)	Platform Team	Quarterly
2	Audit “Built” tools for TCO	Engineering Lead	Bi-annually
3	Get startup credits from all clouds	Finance/Founders	At funding rounds
4	Verify data export capability	Platform Team	Before signing
5	Wrap vendor SDKs in interfaces	Engineering	At integration
6	Test vendor migration path	Platform Team	Annually
7	Review vendor financial health	Finance	Quarterly
8	Update contingency plans	Platform Team	Bi-annually

Quick Decision Matrix

If…	Then…	Because…
< 3 engineers	Buy everything	Focus on product
Revenue < $1M ARR	Buy managed	Can’t afford ops
Core ML capability	Build it	Your IP moat
Generic infrastructure	Buy it	Not differentiating
Vendor is tiny startup	Build abstraction	Acquisition risk
Open source exists	Deploy if ops capacity	Lower cost long-term

[End of Section 43.1]

Keyboard shortcuts

The MLOps Omni-Reference