Keyboard shortcuts

Press or to navigate between chapters

Press ? to show this help

Press Esc to hide this help

43.1. The Buy vs Build Decision Matrix

Status: Production-Ready Version: 2.0.0 Tags: #Strategy, #Startups, #MLOps


The “Not Invented Here” Syndrome

Startups are founded by Engineers. Engineers love to code. Therefore, Startups tend to Overbuild.

The Result: “Resume Driven Development”. You have a great custom platform, but 0 customers and 2 months of runway left.

The Overbuilding Trap

graph TD
    A[Engineer joins startup] --> B[Sees missing tooling]
    B --> C{Decision Point}
    C -->|Build| D[3 months building Feature Store]
    C -->|Buy| E[2 days integrating Feast]
    D --> F[Still no customers]
    E --> G[Shipping ML features]
    F --> H[Runway: 2 months]
    G --> I[Revenue growing]

Common Overbuilding Patterns

PatternWhat They BuiltWhat They Should Have Bought
Custom OrchestratorAirflow clone in PythonManaged Airflow (MWAA/Composer)
Feature Store v1Redis + custom SDKFeast or Tecton
Model RegistryS3 + DynamoDB + scriptsMLflow or Weights & Biases
GPU SchedulerCustom K8s controllerKarpenter or GKE Autopilot
Monitoring StackPrometheus + custom dashboardsDatadog or managed Cloud Monitoring

The Time-to-Value Calculation

from dataclasses import dataclass
from typing import Optional

@dataclass
class TimeToValue:
    """Calculate the true cost of build vs buy decisions."""
    
    build_time_weeks: int
    buy_setup_days: int
    engineer_weekly_rate: float
    opportunity_cost_per_week: float
    
    def build_cost(self) -> float:
        """Total cost of building in-house."""
        engineering = self.build_time_weeks * self.engineer_weekly_rate
        opportunity = self.build_time_weeks * self.opportunity_cost_per_week
        return engineering + opportunity
    
    def buy_cost(self, monthly_license: float, months: int = 12) -> float:
        """Total cost of buying for first year."""
        setup_cost = (self.buy_setup_days / 5) * self.engineer_weekly_rate
        license_cost = monthly_license * months
        return setup_cost + license_cost
    
    def breakeven_analysis(self, monthly_license: float) -> dict:
        """When does building become cheaper than buying?"""
        build = self.build_cost()
        yearly_license = monthly_license * 12
        
        if yearly_license == 0:
            return {"breakeven_months": 0, "recommendation": "BUILD"}
        
        breakeven_months = build / (yearly_license / 12)
        
        recommendation = "BUY" if breakeven_months > 24 else "BUILD"
        
        return {
            "build_cost": build,
            "yearly_license": yearly_license,
            "breakeven_months": round(breakeven_months, 1),
            "recommendation": recommendation
        }

# Example: Feature Store decision
feature_store_calc = TimeToValue(
    build_time_weeks=12,
    buy_setup_days=5,
    engineer_weekly_rate=5000,
    opportunity_cost_per_week=10000
)

result = feature_store_calc.breakeven_analysis(monthly_license=2000)
# {'build_cost': 180000, 'yearly_license': 24000, 'breakeven_months': 90.0, 'recommendation': 'BUY'}

Core vs Context Framework

Geoffrey Moore’s framework helps distinguish what to build:

TypeDefinitionActionExamples
CoreDifferentiating activities that drive competitive advantageBUILDRecommendation algorithm, Pricing model
ContextNecessary but generic, doesn’t differentiateBUYPayroll, Email, Monitoring
Mission-Critical ContextGeneric but must be reliableBUY + SLAAuthentication, Payment processing

The Core/Context Matrix

quadrantChart
    title Core vs Context Analysis
    x-axis Low Differentiation --> High Differentiation
    y-axis Low Strategic Value --> High Strategic Value
    quadrant-1 Build & Invest
    quadrant-2 Buy Premium
    quadrant-3 Buy Commodity
    quadrant-4 Build if Easy
    
    "ML Model Logic": [0.9, 0.9]
    "Feature Engineering": [0.7, 0.8]
    "Model Serving": [0.5, 0.6]
    "Experiment Tracking": [0.3, 0.5]
    "Orchestration": [0.2, 0.4]
    "Compute": [0.1, 0.3]
    "Logging": [0.1, 0.2]

MLOps-Specific Examples

Core (BUILD):

  • Your recommendation algorithm’s core logic
  • Domain-specific feature engineering pipelines
  • Custom evaluation metrics for your use case
  • Agent/LLM prompt chains that define your product

Context (BUY):

  • GPU compute (AWS/GCP/Azure)
  • Workflow orchestration (Airflow/Prefect)
  • Experiment tracking (W&B/MLflow)
  • Model serving infrastructure (SageMaker/Vertex)
  • Feature stores for most companies (Feast/Tecton)
  • Vector databases (Pinecone/Weaviate)

Industry-Specific Core Activities

IndustryCore ML ActivitiesEverything Else
E-commercePersonalization, Search rankingInfrastructure, Monitoring
FintechRisk scoring, Fraud patternsCompute, Experiment tracking
HealthcareDiagnostic models, Treatment predictionData storage, Model serving
AutonomousPerception stack, Decision makingGPU clusters, Logging

Decision Matrix

Component-Level Analysis

ComponentEvolution StageDecisionReasonTypical Cost
GPU ComputeCommodityBUYDon’t build datacenters$$/hour
Container OrchestrationCommodityBUYK8s managed services mature$100-500/mo
Workflow OrchestrationProductBUYAirflow/Prefect are battle-tested$200-2000/mo
Experiment TrackingProductBUYW&B/MLflow work well$0-500/mo
Feature StoreProductBUY*Unless at massive scale$500-5000/mo
Model ServingCustom*DEPENDSMay need custom for latencyVariable
Inference OptimizationCustomBUILDYour models, your constraintsEngineering time
Agent LogicGenesisBUILDThis IS your differentiationEngineering time
Domain FeaturesGenesisBUILDYour competitive moatEngineering time

The Wardley Map Approach

graph TB
    subgraph "Genesis (Build)"
        A[Agent Logic]
        B[Custom Eval Framework]
        C[Domain Features]
    end
    
    subgraph "Custom (Build or Buy)"
        D[Model Fine-tuning]
        E[Inference Serving]
        F[Feature Pipelines]
    end
    
    subgraph "Product (Buy)"
        G[Experiment Tracking]
        H[Orchestration]
        I[Vector DB]
    end
    
    subgraph "Commodity (Buy)"
        J[GPU Compute]
        K[Object Storage]
        L[Managed K8s]
    end
    
    A --> D
    B --> G
    C --> F
    D --> J
    E --> L
    F --> K
    G --> K
    H --> L
    I --> K

TCO Calculator

Total Cost of Ownership goes beyond license fees:

from dataclasses import dataclass, field
from typing import List, Optional
from enum import Enum

class CostCategory(Enum):
    SETUP = "setup"
    LICENSE = "license"
    INFRASTRUCTURE = "infrastructure"
    MAINTENANCE = "maintenance"
    TRAINING = "training"
    OPPORTUNITY = "opportunity"

@dataclass
class CostItem:
    category: CostCategory
    name: str
    monthly_cost: float = 0
    one_time_cost: float = 0
    hours_per_month: float = 0

@dataclass
class Solution:
    name: str
    costs: List[CostItem] = field(default_factory=list)
    
    def add_cost(self, cost: CostItem) -> None:
        self.costs.append(cost)

def calculate_tco(
    solution: Solution,
    hourly_rate: float = 100,
    months: int = 12
) -> dict:
    """Calculate Total Cost of Ownership with breakdown."""
    
    one_time = sum(c.one_time_cost for c in solution.costs)
    
    monthly_fixed = sum(c.monthly_cost for c in solution.costs)
    
    monthly_labor = sum(
        c.hours_per_month * hourly_rate 
        for c in solution.costs
    )
    
    total_monthly = monthly_fixed + monthly_labor
    total = one_time + (total_monthly * months)
    
    breakdown = {}
    for category in CostCategory:
        category_costs = [c for c in solution.costs if c.category == category]
        category_total = sum(
            c.one_time_cost + (c.monthly_cost + c.hours_per_month * hourly_rate) * months
            for c in category_costs
        )
        if category_total > 0:
            breakdown[category.value] = category_total
    
    return {
        "solution": solution.name,
        "one_time": one_time,
        "monthly": total_monthly,
        "total_12_months": total,
        "breakdown": breakdown
    }

# Build scenario
build = Solution("In-House Feature Store")
build.add_cost(CostItem(
    CostCategory.SETUP, "Initial development",
    hours_per_month=160,  # 4 weeks full-time
    one_time_cost=0
))
build.add_cost(CostItem(
    CostCategory.INFRASTRUCTURE, "Redis cluster",
    monthly_cost=200
))
build.add_cost(CostItem(
    CostCategory.INFRASTRUCTURE, "S3 storage",
    monthly_cost=50
))
build.add_cost(CostItem(
    CostCategory.MAINTENANCE, "Ongoing maintenance",
    hours_per_month=20
))

# Buy scenario
buy = Solution("Tecton Feature Store")
buy.add_cost(CostItem(
    CostCategory.SETUP, "Integration & training",
    one_time_cost=5000  # 50 hours setup
))
buy.add_cost(CostItem(
    CostCategory.LICENSE, "Platform fee",
    monthly_cost=2000
))
buy.add_cost(CostItem(
    CostCategory.MAINTENANCE, "Administration",
    hours_per_month=5
))

print("BUILD:", calculate_tco(build))
print("BUY:", calculate_tco(buy))

# BUILD: {'solution': 'In-House Feature Store', 'one_time': 0, 'monthly': 2250, 
#         'total_12_months': 27000, 'breakdown': {...}}
# BUY: {'solution': 'Tecton Feature Store', 'one_time': 5000, 'monthly': 2500, 
#       'total_12_months': 35000, 'breakdown': {...}}

Hidden Costs Checklist

Many organizations underestimate the true cost of building:

Hidden CostDescriptionTypical Multiplier
MaintenanceBug fixes, upgrades, security patches2-3x initial build
DocumentationInternal docs, onboarding materials10-20% of build
On-call24/7 support for production systems$5-15K/month
Opportunity CostWhat else could engineers build?2-5x direct cost
Knowledge DrainWhen builders leave50-100% rebuild
SecurityAudits, penetration testing, compliance$10-50K/year
IntegrationConnecting with other systems20-40% of build

The 3-Year View

Short-term thinking leads to bad decisions:

def project_costs(
    build_initial: float,
    build_monthly: float,
    buy_setup: float,
    buy_monthly: float,
    years: int = 3
) -> dict:
    """Project costs over multiple years."""
    
    results = {"year": [], "build_cumulative": [], "buy_cumulative": []}
    
    build_total = build_initial
    buy_total = buy_setup
    
    for year in range(1, years + 1):
        build_total += build_monthly * 12
        buy_total += buy_monthly * 12
        
        results["year"].append(year)
        results["build_cumulative"].append(build_total)
        results["buy_cumulative"].append(buy_total)
    
    crossover = None
    for i, (b, y) in enumerate(zip(results["build_cumulative"], results["buy_cumulative"])):
        if b < y:
            crossover = i + 1
            break
    
    return {
        "projection": results,
        "crossover_year": crossover,
        "recommendation": "BUILD" if crossover and crossover <= 2 else "BUY"
    }

Escape Hatch Architecture

The worst outcome: vendor lock-in with no exit path. Build abstraction layers:

The Interface Pattern

from abc import ABC, abstractmethod
from typing import Dict, Any, Optional, List
from dataclasses import dataclass

@dataclass
class ExperimentRun:
    run_id: str
    metrics: Dict[str, float]
    params: Dict[str, Any]
    artifacts: List[str]

class ExperimentLogger(ABC):
    """Abstract interface for experiment tracking."""
    
    @abstractmethod
    def start_run(self, name: str, tags: Optional[Dict] = None) -> str:
        """Start a new experiment run."""
        pass
    
    @abstractmethod
    def log_param(self, key: str, value: Any) -> None:
        """Log a hyperparameter."""
        pass
    
    @abstractmethod
    def log_metric(self, name: str, value: float, step: Optional[int] = None) -> None:
        """Log a metric value."""
        pass
    
    @abstractmethod
    def log_artifact(self, local_path: str, artifact_path: Optional[str] = None) -> None:
        """Log a file as an artifact."""
        pass
    
    @abstractmethod
    def end_run(self, status: str = "FINISHED") -> ExperimentRun:
        """End the current run."""
        pass


class WandBLogger(ExperimentLogger):
    """Weights & Biases implementation."""
    
    def __init__(self, project: str, entity: Optional[str] = None):
        import wandb
        self.wandb = wandb
        self.project = project
        self.entity = entity
        self._run = None
    
    def start_run(self, name: str, tags: Optional[Dict] = None) -> str:
        self._run = self.wandb.init(
            project=self.project,
            entity=self.entity,
            name=name,
            tags=list(tags.keys()) if tags else None
        )
        return self._run.id
    
    def log_param(self, key: str, value: Any) -> None:
        self.wandb.config[key] = value
    
    def log_metric(self, name: str, value: float, step: Optional[int] = None) -> None:
        self.wandb.log({name: value}, step=step)
    
    def log_artifact(self, local_path: str, artifact_path: Optional[str] = None) -> None:
        self.wandb.save(local_path)
    
    def end_run(self, status: str = "FINISHED") -> ExperimentRun:
        run_id = self._run.id
        self._run.finish()
        return ExperimentRun(
            run_id=run_id,
            metrics=dict(self._run.summary),
            params=dict(self.wandb.config),
            artifacts=[]
        )


class MLflowLogger(ExperimentLogger):
    """MLflow implementation."""
    
    def __init__(self, tracking_uri: str, experiment_name: str):
        import mlflow
        self.mlflow = mlflow
        self.mlflow.set_tracking_uri(tracking_uri)
        self.mlflow.set_experiment(experiment_name)
        self._run_id = None
    
    def start_run(self, name: str, tags: Optional[Dict] = None) -> str:
        run = self.mlflow.start_run(run_name=name, tags=tags)
        self._run_id = run.info.run_id
        return self._run_id
    
    def log_param(self, key: str, value: Any) -> None:
        self.mlflow.log_param(key, value)
    
    def log_metric(self, name: str, value: float, step: Optional[int] = None) -> None:
        self.mlflow.log_metric(name, value, step=step)
    
    def log_artifact(self, local_path: str, artifact_path: Optional[str] = None) -> None:
        self.mlflow.log_artifact(local_path, artifact_path)
    
    def end_run(self, status: str = "FINISHED") -> ExperimentRun:
        run = self.mlflow.active_run()
        self.mlflow.end_run(status=status)
        return ExperimentRun(
            run_id=self._run_id,
            metrics={},
            params={},
            artifacts=[]
        )


# Factory pattern for easy switching
def get_logger(backend: str = "mlflow", **kwargs) -> ExperimentLogger:
    """Factory to get appropriate logger implementation."""
    
    backends = {
        "wandb": WandBLogger,
        "mlflow": MLflowLogger,
    }
    
    if backend not in backends:
        raise ValueError(f"Unknown backend: {backend}. Options: {list(backends.keys())}")
    
    return backends[backend](**kwargs)


# Training code uses interface, not vendor-specific API
def train_model(model, train_data, val_data, logger: ExperimentLogger):
    """Training loop that works with any logging backend."""
    
    run_id = logger.start_run(name="training-run")
    
    logger.log_param("model_type", type(model).__name__)
    logger.log_param("train_size", len(train_data))
    
    for epoch in range(100):
        loss = model.train_epoch(train_data)
        val_loss = model.validate(val_data)
        
        logger.log_metric("train_loss", loss, step=epoch)
        logger.log_metric("val_loss", val_loss, step=epoch)
        
        if epoch % 10 == 0:
            model.save("checkpoint.pt")
            logger.log_artifact("checkpoint.pt")
    
    return logger.end_run()

Multi-Cloud Escape Hatch

from abc import ABC, abstractmethod
from typing import BinaryIO

class ObjectStorage(ABC):
    """Abstract interface for object storage."""
    
    @abstractmethod
    def put(self, key: str, data: BinaryIO) -> str:
        pass
    
    @abstractmethod
    def get(self, key: str) -> BinaryIO:
        pass
    
    @abstractmethod
    def delete(self, key: str) -> None:
        pass
    
    @abstractmethod
    def list(self, prefix: str) -> list:
        pass


class S3Storage(ObjectStorage):
    def __init__(self, bucket: str, region: str = "us-east-1"):
        import boto3
        self.s3 = boto3.client("s3", region_name=region)
        self.bucket = bucket
    
    def put(self, key: str, data: BinaryIO) -> str:
        self.s3.upload_fileobj(data, self.bucket, key)
        return f"s3://{self.bucket}/{key}"
    
    def get(self, key: str) -> BinaryIO:
        import io
        buffer = io.BytesIO()
        self.s3.download_fileobj(self.bucket, key, buffer)
        buffer.seek(0)
        return buffer
    
    def delete(self, key: str) -> None:
        self.s3.delete_object(Bucket=self.bucket, Key=key)
    
    def list(self, prefix: str) -> list:
        response = self.s3.list_objects_v2(Bucket=self.bucket, Prefix=prefix)
        return [obj["Key"] for obj in response.get("Contents", [])]


class GCSStorage(ObjectStorage):
    def __init__(self, bucket: str, project: str):
        from google.cloud import storage
        self.client = storage.Client(project=project)
        self.bucket = self.client.bucket(bucket)
    
    def put(self, key: str, data: BinaryIO) -> str:
        blob = self.bucket.blob(key)
        blob.upload_from_file(data)
        return f"gs://{self.bucket.name}/{key}"
    
    def get(self, key: str) -> BinaryIO:
        import io
        blob = self.bucket.blob(key)
        buffer = io.BytesIO()
        blob.download_to_file(buffer)
        buffer.seek(0)
        return buffer
    
    def delete(self, key: str) -> None:
        self.bucket.blob(key).delete()
    
    def list(self, prefix: str) -> list:
        return [blob.name for blob in self.bucket.list_blobs(prefix=prefix)]

Wardley Map for MLOps 2024

Current evolution of MLOps components:

ComponentEvolution StageStrategyRecommended Vendors
GPU ComputeCommodityBuy cloudAWS/GCP/Azure
LLM Base ModelsCommodityBuy/DownloadOpenAI, Anthropic, HuggingFace
Vector DatabaseProductBuyPinecone, Weaviate, Qdrant
Experiment TrackingProductBuy OSSMLflow, W&B
OrchestrationProductBuy OSSAirflow, Prefect, Dagster
Feature StoreProductBuyFeast, Tecton
Model ServingCustom → ProductBuy + CustomizeKServe, Seldon, Ray Serve
Agent LogicGenesisBuildYour IP
Eval FrameworkGenesisBuild/AdaptCustom + LangSmith
Domain PromptsGenesisBuildYour IP

Evolution Over Time

timeline
    title MLOps Component Evolution
    2019 : Experiment Tracking (Genesis)
         : Feature Stores (Genesis)
    2021 : Experiment Tracking (Custom)
         : Feature Stores (Custom)
         : Vector DBs (Genesis)
    2023 : Experiment Tracking (Product)
         : Feature Stores (Product)
         : Vector DBs (Custom)
         : LLM APIs (Custom)
    2025 : Experiment Tracking (Commodity)
         : Feature Stores (Product)
         : Vector DBs (Product)
         : LLM APIs (Commodity)
         : Agent Frameworks (Genesis)

Vendor Evaluation Framework

Due Diligence Checklist

Before buying, verify:

from dataclasses import dataclass
from typing import List, Optional
from enum import Enum

class RiskLevel(Enum):
    LOW = "low"
    MEDIUM = "medium"
    HIGH = "high"
    CRITICAL = "critical"

@dataclass
class VendorEvaluation:
    vendor_name: str
    
    # Financial stability
    funding_status: str  # "Series A", "Profitable", "Public"
    runway_months: Optional[int]
    revenue_growth: Optional[float]
    
    # Technical evaluation
    uptime_sla: float  # 99.9%, 99.99%
    data_export_api: bool
    self_hosted_option: bool
    open_source_core: bool
    
    # Strategic risk
    acquisition_risk: RiskLevel
    pricing_lock_risk: RiskLevel
    
    def calculate_risk_score(self) -> dict:
        """Calculate overall vendor risk."""
        
        scores = {
            "financial": 0,
            "technical": 0,
            "strategic": 0
        }
        
        # Financial scoring
        if self.funding_status == "Public" or self.funding_status == "Profitable":
            scores["financial"] = 10
        elif self.runway_months and self.runway_months > 24:
            scores["financial"] = 7
        elif self.runway_months and self.runway_months > 12:
            scores["financial"] = 4
        else:
            scores["financial"] = 2
        
        # Technical scoring
        tech_score = 0
        if self.data_export_api:
            tech_score += 4
        if self.self_hosted_option:
            tech_score += 3
        if self.open_source_core:
            tech_score += 3
        scores["technical"] = tech_score
        
        # Strategic scoring
        risk_values = {RiskLevel.LOW: 10, RiskLevel.MEDIUM: 6, RiskLevel.HIGH: 3, RiskLevel.CRITICAL: 1}
        scores["strategic"] = (
            risk_values[self.acquisition_risk] + 
            risk_values[self.pricing_lock_risk]
        ) / 2
        
        overall = sum(scores.values()) / 3
        
        return {
            "scores": scores,
            "overall": round(overall, 1),
            "recommendation": "SAFE" if overall >= 7 else "CAUTION" if overall >= 4 else "AVOID"
        }


# Example evaluation
wandb_eval = VendorEvaluation(
    vendor_name="Weights & Biases",
    funding_status="Series C",
    runway_months=36,
    revenue_growth=0.8,
    uptime_sla=99.9,
    data_export_api=True,
    self_hosted_option=True,
    open_source_core=False,
    acquisition_risk=RiskLevel.MEDIUM,
    pricing_lock_risk=RiskLevel.LOW
)

print(wandb_eval.calculate_risk_score())
# {'scores': {'financial': 7, 'technical': 7, 'strategic': 8.0}, 
#  'overall': 7.3, 'recommendation': 'SAFE'}

Data Portability Requirements

Always verify before signing:

RequirementQuestion to AskRed Flag
Data Export“Can I export all my data via API?”“Export available on request”
Format“What format is the export?”Proprietary format only
Frequency“Can I schedule automated exports?”Manual only
Completeness“Does export include all metadata?”Partial exports
Cost“Is there an export fee?”Per-GB charges
Self-hosting“Can I run this on my infra?”SaaS only

Cloud Credit Strategy

Startups can get significant free credits:

ProgramCreditsRequirements
AWS Activate$10K-$100KAffiliated with accelerator
Google for Startups$100K-$200KSeries A or earlier
Azure for Startups$25K-$150KAssociation membership
NVIDIA InceptionGPU credits + DGX accessML-focused startup

Stacking Credits Strategy

graph LR
    A[Seed Stage] --> B[AWS: $10K]
    A --> C[GCP: $100K]
    A --> D[Azure: $25K]
    
    B --> E[Series A]
    C --> E
    D --> E
    
    E --> F[AWS: $100K]
    E --> G[GCP: $200K]
    E --> H[NVIDIA: GPU Access]
    
    F --> I[$435K Total Credits]
    G --> I
    H --> I

Troubleshooting Common Decisions

ProblemCauseSolution
Vendor acquired/shutdownStartup riskOwn your data, use interfaces
Unexpected bill spikeAuto-scaling without limitsSet budgets, alerts, quotas
Shadow IT emergingOfficial tooling too slowImprove DX, reduce friction
Vendor price increaseContract renewalMulti-year lock, exit clause
Integration nightmareClosed ecosystemPrefer open standards
Performance issuesShared infra limitsNegotiate dedicated resources

Acquisition Contingency Plan

# acquisition_contingency.yaml
vendor_dependencies:
  - name: "Experiment Tracker (W&B)"
    criticality: high
    alternative_vendors:
      - mlflow-self-hosted
      - neptune-ai
    migration_time_estimate: "2-4 weeks"
    data_export_method: "wandb sync --export"
    
  - name: "Vector Database (Pinecone)"
    criticality: high
    alternative_vendors:
      - weaviate
      - qdrant
      - pgvector
    migration_time_estimate: "1-2 weeks"
    data_export_method: "pinecone export --format parquet"

migration_procedures:
  quarterly_export_test:
    - Export all data from each vendor
    - Verify import into alternative
    - Document any schema changes
    - Update migration runbooks

Decision Flowchart

flowchart TD
    A[New Capability Needed] --> B{Is this your<br>core differentiator?}
    B -->|Yes| C[BUILD IT]
    B -->|No| D{Does a good<br>product exist?}
    
    D -->|No| E{Can you wait<br>6 months?}
    E -->|Yes| F[Wait & Monitor]
    E -->|No| G[Build Minimum]
    
    D -->|Yes| H{Open source<br>or SaaS?}
    
    H -->|OSS Available| I{Do you have ops<br>capacity?}
    I -->|Yes| J[Deploy OSS]
    I -->|No| K[Buy Managed]
    
    H -->|SaaS Only| L{Vendor risk<br>acceptable?}
    L -->|Yes| M[Buy SaaS]
    L -->|No| N[Build with<br>abstraction layer]
    
    C --> O[Document & Abstract]
    G --> O
    J --> O
    K --> O
    M --> O
    N --> O
    
    O --> P[Review Annually]

Summary Checklist

StepActionOwnerFrequency
1Inventory all tools (Built vs Bought)Platform TeamQuarterly
2Audit “Built” tools for TCOEngineering LeadBi-annually
3Get startup credits from all cloudsFinance/FoundersAt funding rounds
4Verify data export capabilityPlatform TeamBefore signing
5Wrap vendor SDKs in interfacesEngineeringAt integration
6Test vendor migration pathPlatform TeamAnnually
7Review vendor financial healthFinanceQuarterly
8Update contingency plansPlatform TeamBi-annually

Quick Decision Matrix

If…Then…Because…
< 3 engineersBuy everythingFocus on product
Revenue < $1M ARRBuy managedCan’t afford ops
Core ML capabilityBuild itYour IP moat
Generic infrastructureBuy itNot differentiating
Vendor is tiny startupBuild abstractionAcquisition risk
Open source existsDeploy if ops capacityLower cost long-term

[End of Section 43.1]