21.1 Prompt Versioning: Git vs. Database
In the early days of LLMs, prompts were hardcoded strings in Python files:
response = openai.Completion.create(prompt=f"Summarize {text}")
This is the “Magic String” Anti-Pattern. It leads to:
- No History: “Who changed the prompt yesterday? Why is the bot rude now?”
- No Rollbacks: “V2 is broken, how do I go back to V1?”
- Engineering Bottleneck: Product Managers want to iterate on text, but they need to file a Pull Request to change a Python string.
This chapter solves the Prompt Lifecycle Management problem.
1. The Core Debate: Code vs. Data
Is a prompt “Source Code” (Logic) or “Config” (Data)?
1.1. Strategy A: Prompts as Code (Git)
Treat prompts like functions. Store them in .yaml or .jinja2 files in the repo.
- Pros:
- Versioning is free (Git).
- Code Review (PRs) is built-in.
- CI/CD runs automatically on change.
- Cons:
- Non-technical people (Subject Matter Experts) cannot edit them easily.
- Release velocity is tied to App deployment velocity.
1.2. Strategy B: Prompts as Data (Database/CMS)
Store prompts in a Postgres DB or a SaaS (PromptLayer, W&B). Fetch them at runtime.
- Pros:
- Decoupled Deployment: Update prompt without re-deploying the app.
- UI for PMs/SMEs.
- A/B Testing is easier (Traffic splitting features).
- Cons:
- Latency (Network call to fetch prompt).
- “Production Surprise”: Someone changes the prompt in the UI, breaking the live app.
1.3. The Hybrid Consensus
“Git for Logic, DB for Content.”
- Structure (Chain of Thought, Few-Shot Logic) stays in Git.
- Wording (Tone, Style, Examples) lives in DB/CMS.
- Or better: Sync Strategy. Edit in UI -> Commit to Git -> Deploy to DB.
2. Strategy A: The GitOps Workflow
If you choose Git (Recommended for Engineering-heavy teams).
2.1. File Structure
Organize by domain/model/version.
/prompts
/customer_support
/triage
v1.yaml
v2.yaml
latest.yaml -> symlink to v2.yaml
2.2. The Prompts.yaml Standard
Do not use .txt. Use structured YAML to capture metadata.
id: support_triage_v2
version: 2.1.0
model: gpt-4-turbo
parameters:
temperature: 0.0
max_tokens: 500
input_variables: ["ticket_body", "user_tier"]
template: |
You are a triage agent.
User Tier: {{user_tier}}
Ticket: {{ticket_body}}
Classify urgency (High/Medium/Low).
tests:
- inputs: { "ticket_body": "My server is on fire", "user_tier": "Free" }
assert_contains: "High"
2.3. Loading in Python
Write a simple PromptLoader that caches these files.
import yaml
from jinja2 import Template
class PromptLoader:
def __init__(self, prompt_dir="./prompts"):
self.cache = {}
self.load_all(prompt_dir)
def get(self, prompt_id, **kwargs):
p = self.cache[prompt_id]
t = Template(p['template'])
return t.render(**kwargs)
3. Strategy B: The Database Registry
If you need dynamic updates (e.g., A/B tests), you need a DB.
3.1. Schema Design (Postgres)
We need to support Immutability. Never UPDATE a prompt. Only INSERT.
CREATE TABLE prompt_definitions (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL, -- e.g. "checkout_flow"
version INT NOT NULL,
template TEXT NOT NULL,
model_config JSONB, -- { "temp": 0.7 }
created_at TIMESTAMP DEFAULT NOW(),
author VARCHAR(100),
is_active BOOLEAN DEFAULT FALSE,
UNIQUE (name, version)
);
-- Index for fast lookup of "latest"
CREATE INDEX idx_prompt_name_ver ON prompt_definitions (name, version DESC);
3.2. Caching Layer (Redis)
You cannot hit Postgres on every LLM call. Latency.
- Write Path: New Prompt -> Postgres -> Redis Pub/Sub -> App Instances clear cache.
- Read Path: App Memory -> Redis -> Postgres.
3.3. The “Stale Prompt” Safety Mechanism
What if the DB is down?
- Pattern: Bake the “Last Known Good” version into the Container Image as a fallback.
- If
reg.get("checkout")fails, load./fallbacks/checkout.yaml.
4. Hands-On Lab: Building the Registry Client
Let’s build a production-grade Python client that handles Versioning and Fallbacks.
4.1. The Interface
class PromptRegistryClient:
def get_prompt(self, name: str, version: str = "latest", tags: list = None) -> PromptObject:
pass
4.2. Implementation
import redis
import json
import os
class Registry:
def __init__(self, redis_url):
self.redis = redis.from_url(redis_url)
def get(self, name, version="latest"):
cache_key = f"prompt:{name}:{version}"
# 1. Try Cache
data = self.redis.get(cache_key)
if data:
return json.loads(data)
# 2. Try DB (Mocked here)
# prompt = db.fetch(...)
# if not prompt and version == "latest":
# raise FatalError("Prompt not found")
# 3. Fallback to Local File
try:
with open(f"prompts/{name}.json") as f:
print("⚠️ Serving local fallback")
return json.load(f)
except FileNotFoundError:
raise Exception(f"Prompt {name} missing in DB and Disk.")
def render(self, name, variables, version="latest"):
p = self.get(name, version)
return p['template'].format(**variables)
5. Migration Strategy: Git to DB
How do you move a team from Git files to a DB Registry?
5.1. The Deployment Hook
Do not make devs manually insert SQL. Add a step in CI/CD (GitHub Actions).
- Developer: Edits
prompts/login.yaml. Pushes to Git. - CI/CD:
- Parses YAML.
- Checks if content differs from “latest” in DB.
- If changed,
INSERT INTO prompts ...(New Version). - Tags it
sha-123.
This gives you the Best of Both Worlds:
- Git History for Blame/Review.
- DB for dynamic serving and tracking.
6. A/B Testing Prompts
The main reason to use a DB is traffic splitting. “Is the ‘Polite’ prompt better than the ‘Direct’ prompt?”
6.1. The Traffic Splitter
In the Registry, define a “Split Config”.
{
"name": "checkout_flow",
"strategies": [
{ "variant": "v12", "weight": 0.9 },
{ "variant": "v13", "weight": 0.1 }
]
}
6.2. Deterministic Hashing
Use the user_id to determine the variant. Do not use random().
If User A sees “Variant B” today, they must see “Variant B” tomorrow.
import hashlib
def get_variant(user_id, split_config):
# Hash user_id to 0-100
hash_val = int(hashlib.md5(user_id.encode()).hexdigest(), 16) % 100
cumulative = 0
for strat in split_config['strategies']:
cumulative += strat['weight'] * 100
if hash_val < cumulative:
return strat['variant']
return split_config['strategies'][0]['variant']
In the next section, we will discuss how to Evaluate these variants to decide if V13 is actually better than V12.
7. Unit Testing for Prompts
How do you “Test” a prompt? You can’t assert the exact output string because LLMs are probabilistic. But you can assert:
- Format: Is it valid JSON?
- Determinism: Does the template render correctly?
- Safety: Does it leak PII?
7.1. Rendering Tests
Before sending to OpenAI, test the Jinja2 template.
def test_prompt_rendering():
# Ensure no {{variable}} is left unplaced
template = "Hello {{name}}"
# Bad case
try:
render(template, {}) # Missing 'name'
except TemplateError:
print("Pass")
Ops Rule: Your CI pipeline must fail if a prompt variable is renamed in Python but not in the YAML.
7.2. Assertions (The “Vibe Check” Automator)
Use a library like pytest combined with lightweight LLM checks.
# test_prompts.py
import pytest
from llm_client import run_llm
@pytest.mark.parametrize("name", ["Alice", "Bob"])
def test_greeting_tone(name):
prompt_template = load_prompt("greeting_v2")
prompt = prompt_template.format(name=name)
response = run_llm(prompt, temperature=0)
# 1. Structure Check
assert len(response) < 100
# 2. Semantic Check (Simple)
assert "Polite" in classify_tone(response)
# 3. Negative Constraint
assert "I hate you" not in response
8. Localization (I18N) for Prompts
If your app supports 20 languages, do you write 20 prompts? No. Strategy: English Logic, Localized Content.
8.1. The “English-First” Pattern
LLMs are best at reasoning in English. Even if the user asks in Japanese. Flow:
- User (JP): “Konnichiwa…”
- App: Translate to English.
- LLM (EN): Reason about the query. Generate English response.
- App: Translate to Japanese.
- Pros: Consistent logic. Easier debugging.
- Cons: Latency (2x translation steps). Loss of nuance.
8.2. The “Native Template” Pattern
Use Jinja2 to swap languages.
# customer_service.yaml
variants:
en: "You are a helpful assistant."
es: "Eres un asistente útil."
fr: "Vous êtes un assistant utile."
def get_prompt(prompt_id, lang="en"):
p = registry.get(prompt_id)
template = p['variants'].get(lang, p['variants']['en']) # Fallback to EN
return template
Ops Challenge: Maintaining feature parity.
If you update English v2 to include “Ask for email”, you must update es and fr.
Tool: Use GPT-4 to auto-translate the diffs in your CI/CD pipeline.
9. Semantic Versioning for Prompts
What is a v1.0.0 vs v2.0.0 prompt change?
9.1. MAJOR (Breaking)
- Changing
input_variables. (e.g., removing{user_name}).- Why: Breaks the Python code calling
.format().
- Why: Breaks the Python code calling
- Changing Output Format. (e.g., JSON -> XML).
- Why: Breaks the response parser.
9.2. MINOR (Feature)
- Adding a new Few-Shot example.
- Changing the System Instruction significantly (“Be rude” -> “Be polite”).
- Why: Logic changes, but code signatures remain compatible.
9.3. PATCH (Tweak)
- Fixing a typo.
- Changing whitespace.
Ops Rule: Enforce SemVer in your Registry. A MAJOR change must trigger a new deployment of the App Code. MINOR and PATCH can be hot-swapped via DB.
10. Code Gallery: SQLAlchemy Registry Model
A production-ready ORM definition for the Registry.
from sqlalchemy import Column, Integer, String, JSON, DateTime, UniqueConstraint
from sqlalchemy.orm import declarative_base
from datetime import datetime
Base = declarative_base()
class PromptVersion(Base):
__tablename__ = 'prompt_versions'
id = Column(Integer, primary_key=True)
name = Column(String(255), index=True)
version = Column(Integer)
# content
template = Column(String) # The Jinja string
input_variables = Column(JSON) # ["var1", "var2"]
# metadata
model_settings = Column(JSON) # {"temp": 0.7, "stop": ["\n"]}
tags = Column(JSON) # ["prod", "experiment-A"]
created_at = Column(DateTime, default=datetime.utcnow)
__table_args__ = (
UniqueConstraint('name', 'version', name='_name_version_uc'),
)
def to_langchain(self):
from langchain.prompts import PromptTemplate
return PromptTemplate(
template=self.template,
input_variables=self.input_variables
)
Usage with FastAPI:
@app.post("/prompts/render")
def render_prompt(req: RenderRequest, db: Session = Depends(get_db)):
# 1. Fetch
prompt = db.query(PromptVersion).filter_by(
name=req.name,
version=req.version
).first()
# 2. Validate Inputs
missing = set(prompt.input_variables) - set(req.variables.keys())
if missing:
raise HTTPException(400, f"Missing variables: {missing}")
# 3. Render
return {"text": prompt.template.format(**req.variables)}
11. Cost Ops: Prompt Compression
Managing prompts is also about managing Length. If you verify a prompt v1 that is 4000 tokens, and v2 is 8000 tokens, you just doubled your cloud bill.
11.1. Compression Strategies
- Stop Words Removal: “The”, “A”, “Is”. (Low impact).
- Summarization: Use a cheap model (GPT-3.5) to summarize the History context before feeding it to GPT-4.
- LLMLingua: A structured compression method (Microsoft).
- Uses a small language model (LLaMA-7B) to calculate the perplexity of each token.
- Removes tokens with low perplexity (low information density).
- Result: 20x compression with minimal accuracy loss.
11.2. Implementation
# pip install llmlingua
from llmlingua import PromptCompressor
compressor = PromptCompressor()
original_prompt = "..." # Long context
compressed = compressor.compress_prompt(
original_prompt,
instruction="Summarize this",
question="What is X?",
target_token=500
)
print(f"Compressed from {len(original_prompt)} to {len(compressed['compressed_prompt'])}")
# Send compressed['compressed_prompt'] to GPT-4
12. Comparison: Template Engines
Which syntax should your Registry use?
| Engine | Syntax | Pros | Cons | Verdict |
|---|---|---|---|---|
| f-strings | {var} | Python Native. Fast. Zero deps. | Security Risk. Arbitrary code execution if using eval. No logic loops. | Good for prototypes. |
| Mustache | {{var}} | Logic-less. Multi-language support (JS, Go, Py). | No if/else logic. Hard to handle complex few-shot lists. | Good for cross-platform. |
| Jinja2 | {% if x %} | Powerful logic. Loops. Filters. | Python specific. | The Industry Standard. |
| LangChain | {var} | Built-in to framework. | Proprietary syntax quirks. | Use if using LangChain. |
13. Glossary of Prompt Management
- Prompt Registry: A centralized database to store, version, and fetch prompts.
- System Prompt: The initial instruction (
"You are a helpful assistant") that sets the behavior. - Zero-Shot: Asking for a completion without examples.
- Few-Shot: providing examples (
input -> output) in the context. - Jinja2: The templating engine used to inject variables into prompts.
- Prompt Injection: A security exploit where user input overrides system instructions.
- Token: The atomic unit of cost.
- Context Window: The maximum memory of the model (e.g. 128k tokens).
14. Bibliography
1. “Jinja2 Documentation”
- Pallets Projects: The reference for templating syntax.
2. “LLMLingua: Compressing Prompts for Accelerated Inference”
- Jiang et al. (Microsoft) (2023): The paper on token dropping optimization.
3. “The Art of Prompt Engineering”
- OpenAI Cookbook: Getting started guide.
15. Final Checklist: The “PromptOps” Maturity Model
How mature is your organization?
- Level 0 (Chaos): Hardcoded string literals in Python code.
- Level 1 (Structured): Prompts in
prompts.pyfile constants. - Level 2 (GitOps): Prompts in generic
.yamlfiles in Git. - Level 3 (Registry): Database-backed registry with a UI/CMS.
- Level 4 (Automated): A/B testing framework automatically promoting the winner.
End of Chapter 21.1.
16. Deep Dive: The Hybrid Git+DB Architecture
We said “Git for Logic, DB for Content”. How do you build that?
16.1. The Sync Script
We need a script that runs on CI/CD deploy.
It reads @/prompts and Upserts to Postgres.
# sync_prompts.py
import yaml
import hashlib
from sqlalchemy.orm import Session
from database import Engine, PromptVersion
def calculate_hash(content):
return hashlib.sha256(content.encode()).hexdigest()
def sync(directory):
session = Session(Engine)
for file in os.listdir(directory):
if not file.endswith(".yaml"): continue
with open(file) as f:
data = yaml.safe_load(f)
content_hash = calculate_hash(data['template'])
# Check redundancy
existing = session.query(PromptVersion).filter_by(
name=data['id'],
hash=content_hash
).first()
if existing:
print(f"Skipping {data['id']} (No change)")
continue
# Create new version
latest = session.query(PromptVersion).filter_by(name=data['id']).order_by(PromptVersion.version.desc()).first()
new_ver = (latest.version + 1) if latest else 1
pv = PromptVersion(
name=data['id'],
version=new_ver,
template=data['template'],
hash=content_hash,
author="system (git)"
)
session.add(pv)
print(f"Deployed {data['id']} v{new_ver}")
session.commit()
16.2. The UI Overlay
The “Admin Panel” reads from DB. If a PM edits a prompt in the Admin Panel:
- We save a new version
v2.1 (draft)in the DB. - We allow them to “Test” it in the UI.
- We do not promote it to
latestautomatically. - Option A: The UI generates a Pull Request via GitHub API to update the YAML file.
- Option B: The UI updates DB, and the App uses DB. Git becomes “Backup”.
- Recommendation: Option A (Git as Truth).
17. Operational Efficiency: Semantic Caching
If two users ask the same thing, we pay twice. Exact match caching (“Hello” vs “Hello “) fails. Semantic Caching saves money.
17.1. Architecture
- User Query: “How do I reset password?”
- Embed:
[0.1, 0.2, ...] - Vector Search (Redis VSS): Find neighbors.
- Found: “Reset my pass” (Distance 0.1).
- Action: Return cached answer.
17.2. Implementation with GPTCache
GPTCache is the standard library for this.
from gptcache import cache
from gptcache.manager import CacheBase, VectorBase
from gptcache.embedding import Onnx
from gptcache.similarity_evaluation.distance import SearchDistanceEvaluation
# 1. Init
onnx = Onnx()
cache.init(
pre_embedding_func=onnx.to_embeddings,
embedding_func=onnx.to_embeddings,
data_manager=CacheBase("sqlite"),
vector_manager=VectorBase("faiss", dimension=onnx.dimension),
similarity_evaluation=SearchDistanceEvaluation(),
similarity_threshold=0.9, # Strict match
)
# 2. Patch OpenAI
import openai
def cached_completion(prompt):
return cache.chat(
openai.ChatCompletion.create,
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": prompt}]
)
# 3. Validation
# First call: 3000ms (API)
# Second call: 10ms (Cache)
17.3. The Cache Invalidation Problem
If you update the Prompt Template (v1 -> v2), all cache entries are invalid.
Ops Rule: Cache Key must include prompt_version_hash.
Key = Embed(UserQuery) + Hash(SystemPrompt).
18. Governance: RBAC for Prompts
Who controls the brain of the AI?
18.1. Roles
- Developer: Full access to Code and YAML.
- Product Manager: Can Edit Content in UI. Cannot deploy to Prod without approval.
- Legal/Compliance: Read-Only. Can flag prompts as “Unsafe”.
- System: CI/CD bot.
18.2. Approval Workflow
Implementing “Prompt Review” gates.
- Trigger: Any change to
prompts/legal/*.yaml. - Gate: CI fails unless
CODEOWNERS(@legal-team) approves PR. - Why: You don’t want a dev accidentally changing the liability waiver.
19. Case Study: Migration from “Magic Strings”
You joined a startup. They have 50 files with f"Translate {x}".
How do you fix this?
Phase 1: Discovery (Grep)
Run grep -r "openai.Chat" .
Inventory clearly shows 32 calls.
Phase 2: Refactor (The “Proxy”)
Create registry.py with a simple mapping.
Don’t move to DB yet. Just move strings to one file.
# prompts.py
PROMPTS = {
"translation": "Translate {x}",
"summary": "Summarize {x}"
}
# In app code, replace literal with:
# prompt = PROMPTS["translation"].format(x=...)
Phase 3: Externalize (YAML)
Move dictionary to prompts.yaml.
Ops Team can now see them.
Phase 4: Instrumentation (W&B)
Add W&B Tracing. Discover that “Summary” fails 20% of the time.
Phase 5: Optimization
Now you can iterate on “Summary” in the YAML without touching the App Code. Result: You lowered error rate to 5%. Value: You proved MLOps ROI.
20. Code Gallery: The Migration Script
A script to hunt down magic strings and propose a refactor.
import ast
import os
class PromptHunter(ast.NodeVisitor):
def visit_Call(self, node):
# Look for openai.ChatCompletion.create
if isinstance(node.func, ast.Attribute) and node.func.attr == 'create':
print(f"Found OpenAI call at line {node.lineno}")
# Analyze arguments for 'messages'
for keyword in node.keywords:
if keyword.arg == 'messages':
print(" Arguments found. Manual review needed.")
def scan(directory):
for root, dirs, files in os.walk(directory):
for file in files:
if file.endswith(".py"):
with open(os.path.join(root, file)) as f:
try:
tree = ast.parse(f.read())
print(f"Scanning {file}...")
PromptHunter().visit(tree)
except:
pass
if __name__ == "__main__":
scan("./src")
21. Summary
We have built a System of Record for our prompts. No more magic strings. No more “Who changed that?”. No more deploying code to fix a typo.
We have Versioned, Tested, and Localized our probabilistic logic. Now, we need to know if our prompts are any good. Metrics like “Accuracy” are fuzzy in GenAI. In the next chapter, we build the Evaluation Framework (21.2).
22. Architecture Patterns: The Prompt Middleware
We don’t just call the registry. We often need “Interceptors”.
22.1. The Chain of Responsibility Pattern
A request goes through layers:
- Auth Layer: Checks JWT.
- Rate Limit Layer: Checks Redis quota.
- Prompt Layer: Fetches template from Registry.
- Guardrail Layer: Scans input for Injection.
- Cache Layer: Checks semantic cache.
- Model Layer: Calls Azure/OpenAI.
- Audit Layer: Logs result to Data Lake.
Code Skeleton:
class Middleware:
def process(self, req):
# pre-hook
resp = self.next.process(req)
# post-hook
return resp
class PromptMiddleware(Middleware):
def process(self, req):
prompt = registry.get(req.prompt_id)
req.rendered_text = prompt.format(**req.vars)
return self.next.process(req)
22.2. The Circuit Breaker Pattern
If OpenAI is down, or latency > 5s.
- State: Closed (Normal), Open (Failing), Half-Open (Testing).
- Fallback: If State == Open, switch to
AzureorLlama-Local. - Registry Implication: Your Registry must store multiple model configs for the same prompt ID.
v1 (Primary):gpt-4v1 (Fallback):gpt-3.5-turbo
23. The Tooling Landscape: Build vs. Buy
You can build the Registry (as we did), or buy it.
23.1. General Purpose (Encouraged)
- Weights & Biases (W&B):
- Pros: You likely already use it for Training. “Prompts” are just artifacts. Good visualization.
- Cons: Not a real-time serving latency SLA. Use for Logging, not Serving.
- MLflow:
- Pros: Open Source. “AI Gateway” feature.
- Cons: Java/Heavy.
23.2. Specialized PromptOps (Niche)
- LangSmith:
- Pros: Essential if using LangChain. “Playground” is excellent.
- Cons: Vendor lock-in risk.
- Helicone:
- Pros: Focus on Caching and Analytics. “Proxy” architecture (change 1 line of URL).
- Cons: Smaller ecosystem.
- PromptLayer:
- Pros: Great visual CMS for PMs.
- Cons: Another SaaS bill.
Verdict:
- Start with Git + W&B (Logging).
- Move to Postgres + Redis (Serving) when you hit 10k users.
- Use Helicone if you purely want Caching/Monitoring without build effort.
24. Comparison: Configuration Formats
We used YAML. Why not JSON?
| Format | Readability | Comments? | Multi-line Strings? | Verdict |
|---|---|---|---|---|
| JSON | Low (Quotes everywhere) | No | No (Need \n) | Bad. Hard for humans to write prompts in. |
| YAML | High | Yes | Yes (Using ` | `) |
| TOML | High | Yes | Yes (Using """) | Good. popular in Rust/Python config. |
| Python | Medium | Yes | Yes | Okay, but dangerous (Arbitrary execution). |
Why YAML Wins: The | block operator.
template: |
You are a helpful assistant.
You answer in haikus.
This preserves newlines perfectly without ugly \n characters.
25. Final Ops Checklist: The “Prompt Freeze”
Before Black Friday (or Launch Day):
- Registry Lock: Revoke “Write” access to the Registry for all non-Admins.
- Cache Warmup: Run a script to populate Redis with the top 1000 queries.
- Fallback Verification: Kill the OpenAI connection and ensure the app switches to Azure (or error handles gracefully).
- Token Budget: Verify current burn rate projected against traffic spike.
- Latency Budget: Verify P99 is under 2s.
End of Chapter 21.1. (Proceed to 21.2 for Evaluation Frameworks).
26. Code Gallery: The Complete Registry (Pydantic)
A production-grade implementation you can copy-paste.
from typing import List, Optional, Dict, Any
from pydantic import BaseModel, Field, validator
from datetime import datetime
import yaml
import hashlib
# 1. Models
class PromptMetadata(BaseModel):
author: str
tags: List[str] = []
created_at: datetime = Field(default_factory=datetime.utcnow)
deprecated: bool = False
class ModelConfig(BaseModel):
provider: str # "openai", "azure"
model_name: str # "gpt-4"
parameters: Dict[str, Any] = {} # {"temperature": 0.5}
class PromptVersion(BaseModel):
id: str # "checkout_flow"
version: int # 1, 2, 3
template: str
input_variables: List[str]
config: ModelConfig
metadata: PromptMetadata
hash: Optional[str] = None
@validator('template')
def check_template_vars(cls, v, values):
# Validate that variables in template match input_variables list
# Simple string check (in reality use Jinja AST)
inputs = values.get('input_variables', [])
for i in inputs:
token = f"{{{{{i}}}}}" # {{var}}
if token not in v:
raise ValueError(f"Variable {i} declared but not used in template")
return v
def calculate_hash(self):
content = f"{self.template}{self.config.json()}"
self.hash = hashlib.sha256(content.encode()).hexdigest()
# 2. Storage Interface
class RegistryStore:
def save(self, prompt: PromptVersion):
raise NotImplementedError
def get(self, id: str, version: int = None) -> PromptVersion:
raise NotImplementedError
# 3. File System Implementation
import os
class FileRegistry(RegistryStore):
def __init__(self, root_dir="./prompts"):
self.root = root_dir
os.makedirs(root_dir, exist_ok=True)
def save(self, prompt: PromptVersion):
prompt.calculate_hash()
path = f"{self.root}/{prompt.id}_v{prompt.version}.yaml"
with open(path, 'w') as f:
yaml.dump(prompt.dict(), f)
def get(self, id: str, version: int = None) -> PromptVersion:
if version is None:
# Find latest
files = [f for f in os.listdir(self.root) if f.startswith(f"{id}_v")]
if not files:
raise FileNotFoundError
# Sort by version number
version = max([int(f.split('_v')[1].split('.yaml')[0]) for f in files])
path = f"{self.root}/{id}_v{version}.yaml"
with open(path) as f:
data = yaml.safe_load(f)
return PromptVersion(**data)
# 4. Usage
if __name__ == "__main__":
# Create
p = PromptVersion(
id="summarize",
version=1,
template="Summarize this: {{text}}",
input_variables=["text"],
config=ModelConfig(provider="openai", model_name="gpt-3.5"),
metadata=PromptMetadata(author="alex")
)
reg = FileRegistry()
reg.save(p)
print("Saved.")
# Load
p2 = reg.get("summarize")
print(f"Loaded v{p2.version}: {p2.config.model_name}")
27. Future Architecture: The Prompt Compiler
In 2025, we won’t write prompts. We will write Intent. DSPy (Declarative Self-improving Language Programs) is leading this.
- You write:
Maximize(Accuracy). - Compiler: Automatically tries 50 variations of the prompt (“Think step by step”, “Act as expert”) and selects the best one based on your validation set.
- Ops: The “Prompt Registry” becomes a “Program Registry”. The artifacts are optimized weights/instructions, not human-readable text.
- Constraint: Requires a labeled validation set (Golden Data).
28. Epilogue
Chapter 21.1 has transformed the “Magic String” into a “Managed Artifact”.
But a managed artifact is useless if it’s bad.
How do we know if v2 is better than v1?
We cannot just “eyeball” it.
We need Metrics.
Proceed to Chapter 21.2: Evaluation Frameworks.
.
29. Recommended Reading
- The Pragmatic Programmer: For the ‘Don’t Repeat Yourself’ (DRY) principle applied to prompts.
- Site Reliability Engineering (Google): For the ‘Error Budget’ concept applied to hallucinations.
- LangChain Handbook (Pinecone): Excellent patterns for prompt management.