43.4. Minimum Viable Platform (MVP)

Status: Production-Ready Version: 2.0.0 Tags: #PlatformEngineering, #MVP, #Startup

The Trap of “Scaling Prematurely”

You are a Series A startup with 3 data scientists. Do NOT build a Kubernetes Controller. Do NOT build a Feature Store.

Your goal is Iteration Speed. The “Platform” should be just enough to stop people from overwriting each other’s code.

MLOps Maturity Model

Level	Name	Characteristic	Tooling
0	ClickOps	SSH, nohup	Terminal, Jupyter
1	ScriptOps	Bash scripts	Make, Shell
2	GitOps	CI/CD on merge	GitHub Actions
3	PlatformOps	Self-serve APIs	Backstage, Kubeflow
4	AutoOps	Automated retrain/rollback	Airflow, Evidently

Goal for Startups: Reach Level 2. Stay there until Series C.

graph LR
    A[Level 0: ClickOps] --> B[Level 1: ScriptOps]
    B --> C[Level 2: GitOps]
    C --> D[Level 3: PlatformOps]
    D --> E[Level 4: AutoOps]
    
    F[Series A] -.-> C
    G[Series C] -.-> D

Level 1: The Golden Path

Standard project template:

my-project/
├── data/            # GitIgnored
├── notebooks/       # Exploration only
├── src/             # Python modules
│   ├── __init__.py
│   ├── train.py
│   └── predict.py
├── tests/           # Pytest
├── Dockerfile
├── Makefile
├── pyproject.toml
└── .github/
    └── workflows/
        └── ci.yaml

Universal Makefile

.PHONY: help setup train test docker-build deploy

help:  ## Show this help
	@grep -E '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST) | \
	awk 'BEGIN {FS = ":.*?## "}; {printf "\033[36m%-15s\033[0m %s\n", $$1, $$2}'

setup:  ## Install dependencies
	poetry install
	pre-commit install

train:  ## Run training
	poetry run python src/train.py

test:  ## Run tests
	poetry run pytest tests/ -v

docker-build:  ## Build container
	docker build -t $(PROJECT):latest .

deploy:  ## Deploy to staging
	cd terraform && terraform apply -auto-approve

Level 2: The Monorepo

Don’t split 5 services into 5 repos.

Benefits:

Atomic commits across services
Shared libraries
Consistent tooling

ml-platform/
├── packages/
│   ├── model-training/
│   ├── model-serving/
│   └── shared-utils/
├── infra/
│   └── terraform/
├── Makefile
└── pants.toml

GitHub Actions for Monorepo

# .github/workflows/ci.yaml
name: CI

on:
  push:
    paths:
      - 'packages/**'
  pull_request:

jobs:
  detect-changes:
    runs-on: ubuntu-latest
    outputs:
      training: ${{ steps.filter.outputs.training }}
      serving: ${{ steps.filter.outputs.serving }}
    steps:
      - uses: dorny/paths-filter@v2
        id: filter
        with:
          filters: |
            training:
              - 'packages/model-training/**'
            serving:
              - 'packages/model-serving/**'

  test-training:
    needs: detect-changes
    if: needs.detect-changes.outputs.training == 'true'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: make -C packages/model-training test

  test-serving:
    needs: detect-changes
    if: needs.detect-changes.outputs.serving == 'true'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: make -C packages/model-serving test

Cookiecutter Template

{
  "project_name": "My ML Project",
  "project_slug": "{{ cookiecutter.project_name.lower().replace(' ', '-') }}",
  "python_version": ["3.10", "3.11"],
  "use_gpu": ["yes", "no"],
  "author_email": "team@company.com"
}

Template Structure

ml-template/
├── cookiecutter.json
├── hooks/
│   └── post_gen_project.py
└── {{cookiecutter.project_slug}}/
    ├── Dockerfile
    ├── Makefile
    ├── pyproject.toml
    ├── src/
    │   ├── __init__.py
    │   └── train.py
    └── .github/
        └── workflows/
            └── ci.yaml

Usage:

cookiecutter https://github.com/company/ml-template
# New hire has working CI/CD in 2 minutes

Level 3: Platform Abstraction

Users define what they need, not how:

# model.yaml
apiVersion: mlplatform/v1
kind: Model
metadata:
  name: fraud-detection
spec:
  type: inference
  framework: pytorch
  resources:
    gpu: T4
    memory: 16Gi
  scaling:
    minReplicas: 1
    maxReplicas: 10

Platform Controller reads this and generates Kubernetes resources.

Recommendation: Don’t build this yourself. Use:

Backstage (Spotify)
Port
Humanitec

Strangler Fig Pattern

Migrate from legacy incrementally:

graph TB
    subgraph "Phase 1"
        A[100% Old Platform]
    end
    
    subgraph "Phase 2"
        B[Old Platform] --> C[70%]
        D[New Platform] --> E[30%]
    end
    
    subgraph "Phase 3"
        F[New Platform] --> G[100%]
    end
    
    A --> B
    A --> D
    B --> F
    E --> F

Troubleshooting

Problem	Cause	Solution
Ticket queues	Human gatekeepers	Self-service Terraform
“Works on my machine”	Env mismatch	Dev Containers
Slow CI	Rebuilds everything	Change detection
Shadow IT	Platform too complex	Improve UX

Summary Checklist

Item	Status
Monorepo created	[ ]
CI on push	[ ]
CD on merge	[ ]
Makefile standards	[ ]
CONTRIBUTING.md	[ ]
Time to First PR < 1 day	[ ]

[End of Section 43.4]

Keyboard shortcuts

The MLOps Omni-Reference