43.4. Minimum Viable Platform (MVP)
Status: Production-Ready Version: 2.0.0 Tags: #PlatformEngineering, #MVP, #Startup
The Trap of “Scaling Prematurely”
You are a Series A startup with 3 data scientists. Do NOT build a Kubernetes Controller. Do NOT build a Feature Store.
Your goal is Iteration Speed. The “Platform” should be just enough to stop people from overwriting each other’s code.
MLOps Maturity Model
| Level | Name | Characteristic | Tooling |
|---|---|---|---|
| 0 | ClickOps | SSH, nohup | Terminal, Jupyter |
| 1 | ScriptOps | Bash scripts | Make, Shell |
| 2 | GitOps | CI/CD on merge | GitHub Actions |
| 3 | PlatformOps | Self-serve APIs | Backstage, Kubeflow |
| 4 | AutoOps | Automated retrain/rollback | Airflow, Evidently |
Goal for Startups: Reach Level 2. Stay there until Series C.
graph LR
A[Level 0: ClickOps] --> B[Level 1: ScriptOps]
B --> C[Level 2: GitOps]
C --> D[Level 3: PlatformOps]
D --> E[Level 4: AutoOps]
F[Series A] -.-> C
G[Series C] -.-> D
Level 1: The Golden Path
Standard project template:
my-project/
├── data/ # GitIgnored
├── notebooks/ # Exploration only
├── src/ # Python modules
│ ├── __init__.py
│ ├── train.py
│ └── predict.py
├── tests/ # Pytest
├── Dockerfile
├── Makefile
├── pyproject.toml
└── .github/
└── workflows/
└── ci.yaml
Universal Makefile
.PHONY: help setup train test docker-build deploy
help: ## Show this help
@grep -E '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST) | \
awk 'BEGIN {FS = ":.*?## "}; {printf "\033[36m%-15s\033[0m %s\n", $$1, $$2}'
setup: ## Install dependencies
poetry install
pre-commit install
train: ## Run training
poetry run python src/train.py
test: ## Run tests
poetry run pytest tests/ -v
docker-build: ## Build container
docker build -t $(PROJECT):latest .
deploy: ## Deploy to staging
cd terraform && terraform apply -auto-approve
Level 2: The Monorepo
Don’t split 5 services into 5 repos.
Benefits:
- Atomic commits across services
- Shared libraries
- Consistent tooling
ml-platform/
├── packages/
│ ├── model-training/
│ ├── model-serving/
│ └── shared-utils/
├── infra/
│ └── terraform/
├── Makefile
└── pants.toml
GitHub Actions for Monorepo
# .github/workflows/ci.yaml
name: CI
on:
push:
paths:
- 'packages/**'
pull_request:
jobs:
detect-changes:
runs-on: ubuntu-latest
outputs:
training: ${{ steps.filter.outputs.training }}
serving: ${{ steps.filter.outputs.serving }}
steps:
- uses: dorny/paths-filter@v2
id: filter
with:
filters: |
training:
- 'packages/model-training/**'
serving:
- 'packages/model-serving/**'
test-training:
needs: detect-changes
if: needs.detect-changes.outputs.training == 'true'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: make -C packages/model-training test
test-serving:
needs: detect-changes
if: needs.detect-changes.outputs.serving == 'true'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: make -C packages/model-serving test
Cookiecutter Template
{
"project_name": "My ML Project",
"project_slug": "{{ cookiecutter.project_name.lower().replace(' ', '-') }}",
"python_version": ["3.10", "3.11"],
"use_gpu": ["yes", "no"],
"author_email": "team@company.com"
}
Template Structure
ml-template/
├── cookiecutter.json
├── hooks/
│ └── post_gen_project.py
└── {{cookiecutter.project_slug}}/
├── Dockerfile
├── Makefile
├── pyproject.toml
├── src/
│ ├── __init__.py
│ └── train.py
└── .github/
└── workflows/
└── ci.yaml
Usage:
cookiecutter https://github.com/company/ml-template
# New hire has working CI/CD in 2 minutes
Level 3: Platform Abstraction
Users define what they need, not how:
# model.yaml
apiVersion: mlplatform/v1
kind: Model
metadata:
name: fraud-detection
spec:
type: inference
framework: pytorch
resources:
gpu: T4
memory: 16Gi
scaling:
minReplicas: 1
maxReplicas: 10
Platform Controller reads this and generates Kubernetes resources.
Recommendation: Don’t build this yourself. Use:
- Backstage (Spotify)
- Port
- Humanitec
Strangler Fig Pattern
Migrate from legacy incrementally:
graph TB
subgraph "Phase 1"
A[100% Old Platform]
end
subgraph "Phase 2"
B[Old Platform] --> C[70%]
D[New Platform] --> E[30%]
end
subgraph "Phase 3"
F[New Platform] --> G[100%]
end
A --> B
A --> D
B --> F
E --> F
Troubleshooting
| Problem | Cause | Solution |
|---|---|---|
| Ticket queues | Human gatekeepers | Self-service Terraform |
| “Works on my machine” | Env mismatch | Dev Containers |
| Slow CI | Rebuilds everything | Change detection |
| Shadow IT | Platform too complex | Improve UX |
Summary Checklist
| Item | Status |
|---|---|
| Monorepo created | [ ] |
| CI on push | [ ] |
| CD on merge | [ ] |
| Makefile standards | [ ] |
| CONTRIBUTING.md | [ ] |
| Time to First PR < 1 day | [ ] |
[End of Section 43.4]