Chapter 11: The Feature Store Architecture
11.4. Open Source: Deploying Feast on EKS/GKE with Redis
“The only thing worse than a proprietary lock-in that slows you down is building your own platform that slows you down even more. But when done right, open source is the only path to true sovereignty over your data semantics.” — Infrastructure Engineering Maxim
In the previous sections, we explored the managed offerings: AWS SageMaker Feature Store and Google Cloud Vertex AI Feature Store. These services offer the allure of the “Easy Button”—managed infrastructure, integrated security, and SLA-backed availability. However, for the high-maturity organization, they often present insurmountable friction points: opacity in pricing, lack of support for complex custom data types, localized vendor lock-in, and latency floors that are too high for high-frequency trading or real-time ad bidding.
Enter Feast (Feature Store).
Feast has emerged as the de-facto open-source standard for feature stores. It is not a database; it is a connector. It manages the registry of features, standardizes the retrieval of data for training (offline) and serving (online), and orchestrates the movement of data between the two.
Deploying Feast effectively requires a shift in mindset from “Consumer” to “Operator.” You are no longer just calling an API; you are responsible for the CAP theorem properties of your serving layer. You own the Redis eviction policies. You own the Kubernetes Horizontal Pod Autoscalers. You own the synchronization lag.
This section serves as the definitive reference architecture for deploying Feast in a high-scale production environment, leveraging Kubernetes (EKS/GKE) for compute and Managed Redis (ElastiCache/Memorystore) for state.
5.4.1. The Architecture of Self-Hosted Feast
To operate Feast, one must understand its anatomy. Unlike its early versions (0.9 and below), modern Feast (0.10+) is highly modular and unopinionated about infrastructure. It does not require a heavy JVM stack or Kafka by default. It runs where your compute runs.
The Core Components
-
The Registry: The central catalog. It maps feature names (
user_churn_score) to data sources (Parquet on S3) and entity definitions (user_id).- Production Storage: An object store bucket (S3/GCS) or a SQL database (PostgreSQL).
- Behavior: Clients (training pipelines, inference services) pull the registry to understand how to fetch data.
-
The Offline Store: The historical data warehouse. Feast does not manage this data; it manages the queries against it.
- AWS: Redshift, Snowflake, or Athena (S3).
- GCP: BigQuery.
- Role: Used for generating point-in-time correct training datasets.
-
The Online Store: The low-latency cache. This is the critical piece for real-time inference.
- AWS: ElastiCache for Redis.
- GCP: Cloud Memorystore for Redis.
- Role: Serves the latest known value of a feature for a specific entity ID at millisecond latency.
-
The Feature Server: A lightweight HTTP/gRPC service (usually Python or Go) that exposes the retrieval API.
- Deployment: A scalable microservice on Kubernetes.
- Role: It parses the request, hashes the entity keys, queries Redis, deserializes the Protobuf payloads, and returns the feature vector.
-
The Materialization Engine: The worker process that moves data from Offline to Online.
- Deployment: Airflow DAGs, Kubernetes CronJobs, or a stream processor.
- Role: Ensures the Online Store is eventually consistent with the Offline Store.
The “Thin Client” vs. “Feature Server” Model
One of the most significant architectural decisions you will make is how your inference service consumes features.
-
Pattern A: The Embedded Client (Fat Client)
- How: Your inference service (e.g., a FastAPI container running the model) imports the
feastPython library directly. It connects to Redis and the Registry itself. - Pros: Lowest possible latency (no extra network hop).
- Cons: Tight coupling. Your inference image bloats with Feast dependencies. Configuration updates (e.g., changing Redis endpoints) require redeploying the model container.
- Verdict: Use for extreme latency sensitivity (< 5ms).
- How: Your inference service (e.g., a FastAPI container running the model) imports the
-
Pattern B: The Feature Service (Sidecar or Microservice)
- How: You deploy the Feast Feature Server as a standalone deployment behind a Service/LoadBalancer. Your model calls
GET /get-online-features. - Pros: Decoupling. The Feature Server can scale independently of the model. Multiple models can share the same feature server.
- Cons: Adds network latency (serialization + wire time).
- Verdict: The standard enterprise pattern. Easier to secure and govern.
- How: You deploy the Feast Feature Server as a standalone deployment behind a Service/LoadBalancer. Your model calls
5.4.2. The AWS Reference Architecture (EKS + ElastiCache)
Building this on AWS requires navigating the VPC networking intricacies of connecting EKS (Kubernetes) to ElastiCache (Redis).
1. Network Topology
Do not expose Redis to the public internet. Do not peer VPCs unnecessarily.
- VPC: One VPC for the ML Platform.
- Subnets:
- Private App Subnets: Host the EKS Worker Nodes.
- Private Data Subnets: Host the ElastiCache Subnet Group.
- Security Groups:
sg-eks-nodes: Allow outbound 6379 tosg-elasticache.sg-elasticache: Allow inbound 6379 fromsg-eks-nodes.
2. The Online Store: ElastiCache for Redis
We choose Cluster Mode Enabled for scale. If your feature set fits in one node (< 25GB), Cluster Mode Disabled is simpler, but ML systems tend to grow.
Terraform Implementation Detail:
resource "aws_elasticache_replication_group" "feast_online_store" {
replication_group_id = "feast-production-store"
description = "Feast Online Store for Low Latency Serving"
node_type = "cache.r6g.xlarge" # Graviton2 for cost/performance
port = 6379
parameter_group_name = "default.redis7.cluster.on"
automatic_failover_enabled = true
# Cluster Mode Configuration
num_node_groups = 3 # Shards
replicas_per_node_group = 1 # High Availability
subnet_group_name = aws_elasticache_subnet_group.ml_data.name
security_group_ids = [aws_security_group.elasticache.id]
at_rest_encryption_enabled = true
transit_encryption_enabled = true
auth_token = var.redis_auth_token # Or utilize IAM Auth if supported by client
}
3. The Registry: S3 + IAM Roles for Service Accounts (IRSA)
Feast needs to read the registry file (registry.db or registry.pb) from S3. The Feast Feature Server running in a Pod should not have hardcoded AWS keys.
- Create an OIDC Provider for the EKS cluster.
- Create an IAM Role with
s3:GetObjectands3:PutObjectpermissions on the registry bucket. - Annotate the ServiceAccount:
apiVersion: v1
kind: ServiceAccount
metadata:
name: feast-service-account
namespace: ml-platform
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/FeastRegistryAccessRole
4. The Feast Configuration (feature_store.yaml)
This file controls how Feast connects. In a containerized environment, we inject secrets via Environment Variables.
project: my_organization_ml
registry: s3://my-ml-platform-bucket/feast/registry.pb
provider: aws
online_store:
type: redis
# The cluster endpoint from Terraform output
connection_string: master.feast-production-store.xxxxxx.use1.cache.amazonaws.com:6379
auth_token: ${REDIS_AUTH_TOKEN} # Injected via K8s Secret
ssl: true
offline_store:
type: snowflake.offline
account: ${SNOWFLAKE_ACCOUNT}
user: ${SNOWFLAKE_USER}
database: ML_FEATURES
warehouse: COMPUTE_WH
5.4.3. The GCP Reference Architecture (GKE + Memorystore)
Google Cloud offers a smoother integration for networking but stricter constraints on the Redis service types.
1. Network Topology: VPC Peering
Memorystore instances reside in a Google-managed project. To access them from GKE, you must use Private Services Access (VPC Peering).
- Action: Allocate an IP range (CIDR /24) for Service Networking.
- Constraint: Memorystore for Redis (Basic/Standard Tier) does not support “Cluster Mode” in the same way open Redis does. It uses a Primary/Read-Replica model. For massive scale, you might need Memorystore for Redis Cluster (a newer offering).
2. The Online Store: Memorystore
For most use cases, a Standard Tier (High Availability) instance suffices.
Terraform Implementation Detail:
resource "google_redis_instance" "feast_online_store" {
name = "feast-online-store"
tier = "STANDARD_HA"
memory_size_gb = 50
location_id = "us-central1-a"
alternative_location_id = "us-central1-f"
authorized_network = google_compute_network.vpc_network.id
connect_mode = "PRIVATE_SERVICE_ACCESS"
redis_version = "REDIS_7_0"
display_name = "Feast Feature Store Cache"
# Auth is critical
auth_enabled = true
}
3. GKE Workload Identity
Similar to AWS IRSA, GKE Workload Identity maps a Kubernetes Service Account (KSA) to a Google Service Account (GSA).
- GSA:
feast-sa@my-project.iam.gserviceaccount.comhasroles/storage.objectAdmin(for Registry GCS) androles/bigquery.dataViewer(for Offline Store). - Binding:
gcloud iam service-accounts add-iam-policy-binding feast-sa@... \ --role roles/iam.workloadIdentityUser \ --member "serviceAccount:my-project.svc.id.goog[ml-platform/feast-sa]"
5.4.4. Deploying the Feast Feature Server
Whether on AWS or GCP, the Feature Server is a stateless deployment.
The Dockerfile
We need a lean image. Start with a Python slim base.
FROM python:3.10-slim
# Install system dependencies for building C extensions (if needed)
RUN apt-get update && apt-get install -y build-essential
# Install Feast with specific extras
# redis: for online store
# snowflake/postgres/bigquery: for offline store dependencies
RUN pip install "feast[redis,snowflake,aws]" gunicorn
WORKDIR /app
COPY feature_store.yaml .
# We assume the registry is pulled from S3/GCS at runtime or pointed to via S3 path
# The Feast CLI exposes a server command
# --no-access-log is crucial for high throughput performance
CMD ["feast", "serve", "--host", "0.0.0.0", "--port", "6566", "--no-access-log"]
The Kubernetes Deployment
This is where we define the scale.
apiVersion: apps/v1
kind: Deployment
metadata:
name: feast-feature-server
namespace: ml-platform
spec:
replicas: 3
selector:
matchLabels:
app: feast-server
template:
metadata:
labels:
app: feast-server
spec:
serviceAccountName: feast-service-account # Critical for IRSA/Workload Identity
containers:
- name: feast
image: my-registry/feast-server:v1.0.0
env:
- name: FEAST_USAGE
value: "False" # Disable telemetry
- name: REDIS_AUTH_TOKEN
valueFrom:
secretKeyRef:
name: redis-secrets
key: auth-token
resources:
requests:
cpu: "1000m"
memory: "2Gi"
limits:
cpu: "2000m"
memory: "4Gi"
readinessProbe:
tcpSocket:
port: 6566
initialDelaySeconds: 5
periodSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
name: feast-feature-server
spec:
selector:
app: feast-server
ports:
- protocol: TCP
port: 80
targetPort: 6566
type: ClusterIP
Autoscaling (HPA)
The CPU usage of the Feature Server is dominated by Protobuf serialization/deserialization. It is CPU-bound.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: feast-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: feast-feature-server
minReplicas: 3
maxReplicas: 50
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
5.4.5. The Materialization Engine (Syncing Data)
The “Online Store” in Redis is useless if it’s empty. We must populate it from the Offline Store (Data Warehouse). This process is called Materialization.
The Challenge of Freshness
- Full Refresh: Overwriting the entire Redis cache. Safe but slow. High IOPS.
- Incremental: Only writing rows that changed since the last run.
In a naive setup, engineers run feast materialize-incremental from their laptop. In production, this must be orchestrated.
Pattern: The Airflow Operator
Using Apache Airflow (Managed Workflows for Apache Airflow on AWS or Cloud Composer on GCP) is the standard pattern.
from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.bash import BashOperator
# Definition of the sync window
# We want to sync data up to the current moment
default_args = {
'owner': 'ml-platform',
'retries': 3,
'retry_delay': timedelta(minutes=5),
}
with DAG(
'feast_materialization_hourly',
default_args=default_args,
schedule_interval='0 * * * *', # Every hour
start_date=datetime(2023, 1, 1),
catchup=False,
) as dag:
# The Docker image used here must match the feature definitions
# It must have access to feature_store.yaml and credentials
materialize = BashOperator(
task_id='materialize_features',
bash_command='cd /app && feast materialize-incremental $(date -u +"%Y-%m-%dT%H:%M:%S")',
)
Pattern: Stream Materialization (Near Real-Time)
For features like “number of clicks in the last 10 minutes,” hourly batch jobs are insufficient. You need streaming. Feast supports Push Sources.
- Event Source: Kafka or Kinesis.
- Stream Processor: Flink or Spark Streaming.
- Feast Push API: The processor calculates the feature and pushes it directly to the Feast Online Store, bypassing the Offline Store synchronization lag.
# In your stream processor (e.g., Spark Structured Streaming)
from feast import FeatureStore
store = FeatureStore(repo_path=".")
def write_to_online_store(row):
# Convert row to dict
data = row.asDict()
# Push to Feast
store.push("click_stream_push_source", data)
5.4.6. Operational Challenges and Performance Tuning
Deploying Feast is easy; keeping it fast and consistent at scale is hard.
1. Redis Memory Management
Redis is an in-memory store. RAM is expensive.
- The Debt: You define a feature
user_embedding(a 768-float vector). You have 100M users.- Size = 100M * 768 * 4 bytes ≈ 300 GB.
- This requires a massive Redis cluster (e.g., AWS
cache.r6g.4xlargeclusters).
- The Fix: Use Entity TTL.
- Feast allows setting a TTL (Time To Live) on features.
ttl=timedelta(days=7)means “if the user hasn’t been active in 7 days, let Redis evict their features.”- Feast Configuration: Feast uses Redis hashes. It does not natively map Feast TTL to Redis TTL perfectly in all versions. You may need to rely on Redis
maxmemory-policy allkeys-lruto handle eviction when memory is full.
2. Serialization Overhead (The Protobuf Tax)
Feast stores data in Redis as Protocol Buffers.
- Write Path: Python Object -> Protobuf -> Bytes -> Redis.
- Read Path: Redis -> Bytes -> Protobuf -> Python Object -> JSON (HTTP response).
- Impact: At 10,000 RPS, CPU becomes the bottleneck, not Redis network I/O.
- Mitigation: Use the Feast Go Server or Feast Java Server (alpha features) if Python’s Global Interpreter Lock (GIL) becomes a blocker. Alternatively, scale the Python Deployment horizontally.
3. The “Thundering Herd” on Registry
If you have 500 pods of your Inference Service starting simultaneously (e.g., after a deploy), they all try to download registry.pb from S3.
- Result: S3 503 Slow Down errors or latency spikes.
- Mitigation: Set
cache_ttl_secondsin the Feature Store config. This caches the registry in memory in the client/server, checking for updates only periodically.
4. Connection Pooling
Standard Redis clients in Python create a new connection per request or use a small pool. In Kubernetes with sidecars (Istio/Envoy), connection management can get messy.
- Symptom:
RedisTimeoutErrororConnectionRefusedError. - Fix: Tune the
redis_pool_sizein Feast config (passed to the underlyingredis-pyclient). Ensuretcp_keepaliveis enabled to detect dead connections in cloud networks.
5.4.7. Feature Definition Management: GitOps for Data
How do you manage the definitions of features? Do not let Data Scientists run feast apply from their laptops against the production registry. This is Schema Drift.
The GitOps Workflow
-
Repository Structure:
my-feature-repo/ ├── features/ │ ├── user_churn.py │ ├── product_recs.py ├── feature_store.yaml └── .github/workflows/feast_apply.yml -
The feature_store.yaml: The configuration is versioned.
-
Feature Definitions as Code:
# features/user_churn.py from feast import Entity, Feature, FeatureView, ValueType from feast.data_source import FileSource from datetime import timedelta user = Entity(name="user", value_type=ValueType.INT64, description="User ID") user_features_source = FileSource( path="s3://data/user_features.parquet", event_timestamp_column="event_timestamp" ) user_churn_fv = FeatureView( name="user_churn_features", entities=[user], ttl=timedelta(days=365), features=[ Feature(name="total_purchases", dtype=ValueType.INT64), Feature(name="avg_order_value", dtype=ValueType.DOUBLE), Feature(name="days_since_last_purchase", dtype=ValueType.INT64) ], source=user_features_source ) -
CI/CD Pipeline (GitHub Actions):
# .github/workflows/feast_apply.yml name: Deploy Features on: push: branches: [main] jobs: deploy: runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 - name: Setup Python uses: actions/setup-python@v2 with: python-version: '3.9' - name: Install Feast run: pip install feast[redis,aws] - name: Validate Features run: | cd my-feature-repo feast plan - name: Deploy to Production if: github.ref == 'refs/heads/main' run: | cd my-feature-repo feast apply env: AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }} AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }} - name: Materialize Features run: | cd my-feature-repo feast materialize-incremental $(date -u +"%Y-%m-%dT%H:%M:%S") -
Pull Request Review: Feature changes require approval from the ML Platform team.
5.4.8. Real-World Case Study: E-Commerce Personalization
Company: ShopCo (anonymized retailer)
Challenge: Deploy Feast on EKS to serve 20M users, 50k requests/second peak.
Architecture:
# Production Infrastructure (Terraform + Helm)
# 1. EKS Cluster
module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "18.0"
cluster_name = "feast-prod"
cluster_version = "1.27"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnets
eks_managed_node_groups = {
feast_workers = {
min_size = 10
max_size = 100
desired_size = 20
instance_types = ["c6i.2xlarge"] # 8 vCPU, 16 GiB RAM
capacity_type = "ON_DEMAND"
labels = {
workload = "feast"
}
taints = [{
key = "feast"
value = "true"
effect = "NO_SCHEDULE"
}]
}
}
}
# 2. ElastiCache for Redis (Online Store)
resource "aws_elasticache_replication_group" "feast_online" {
replication_group_id = "feast-online-prod"
description = "Feast Online Store"
node_type = "cache.r6g.8xlarge" # 256 GB RAM
num_node_groups = 10 # 10 shards
replicas_per_node_group = 2 # 1 primary + 2 replicas per shard
automatic_failover_enabled = true
parameter_group_name = "default.redis7.cluster.on"
# Eviction policy
parameter {
name = "maxmemory-policy"
value = "allkeys-lru" # Evict least recently used keys when memory full
}
}
# 3. S3 for Registry and Offline Store
resource "aws_s3_bucket" "feast_data" {
bucket = "shopco-feast-data"
versioning {
enabled = true
}
lifecycle_rule {
enabled = true
noncurrent_version_expiration {
days = 90
}
}
}
Helm Deployment:
# feast-values.yaml
replicaCount: 20
image:
repository: shopco/feast-server
tag: "0.32.0"
pullPolicy: IfNotPresent
resources:
requests:
cpu: 2000m
memory: 4Gi
limits:
cpu: 4000m
memory: 8Gi
autoscaling:
enabled: true
minReplicas: 20
maxReplicas: 100
targetCPUUtilizationPercentage: 70
service:
type: ClusterIP
port: 6566
ingress:
enabled: true
className: alb
annotations:
alb.ingress.kubernetes.io/scheme: internal
alb.ingress.kubernetes.io/target-type: ip
hosts:
- host: feast.internal.shopco.com
paths:
- path: /
pathType: Prefix
env:
- name: FEAST_USAGE
value: "False"
- name: REDIS_CONNECTION_STRING
valueFrom:
secretKeyRef:
name: feast-secrets
key: redis-connection
serviceAccount:
create: true
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/FeastServerRole
nodeSelector:
workload: feast
tolerations:
- key: feast
operator: Equal
value: "true"
effect: NoSchedule
Results:
- P99 latency: 8ms (target: <10ms) ✓
- Availability: 99.97% (target: 99.95%) ✓
- Cost: $18k/month (ElastiCache $12k + EKS $6k)
- Requests handled: 50k RPS peak without issues
Key Lessons:
- HPA scaled Feast pods from 20 → 85 during Black Friday
- Redis cluster mode prevented hotspotting issues
- Connection pooling critical (default pool size too small)
- Registry caching (5 min TTL) reduced S3 costs by 90%
5.4.9. Cost Optimization Strategies
Strategy 1: Right-Size Redis
def calculate_redis_memory(num_entities, avg_feature_vector_size_bytes):
"""
Estimate Redis memory requirements
"""
# Feature data
feature_data = num_entities * avg_feature_vector_size_bytes
# Overhead: Redis adds ~25% overhead (pointers, metadata)
overhead = feature_data * 0.25
# Buffer: Keep 20% free for operations
buffer = (feature_data + overhead) * 0.20
total_memory_bytes = feature_data + overhead + buffer
total_memory_gb = total_memory_bytes / (1024**3)
print(f"Entities: {num_entities:,}")
print(f"Avg feature size: {avg_feature_vector_size_bytes:,} bytes")
print(f"Raw data: {feature_data / (1024**3):.1f} GB")
print(f"With overhead: {(feature_data + overhead) / (1024**3):.1f} GB")
print(f"Recommended: {total_memory_gb:.1f} GB")
return total_memory_gb
# Example: 20M users, 5KB feature vector
required_gb = calculate_redis_memory(20_000_000, 5000)
# Output:
# Entities: 20,000,000
# Avg feature size: 5,000 bytes
# Raw data: 93.1 GB
# With overhead: 116.4 GB
# Recommended: 139.7 GB
# Choose instance: cache.r6g.8xlarge (256 GB) = $1.344/hr = $981/month
Strategy 2: Use Spot Instances for Feast Pods
# EKS Node Group with Spot
eks_managed_node_groups = {
feast_spot = {
min_size = 5
max_size = 50
desired_size = 10
instance_types = ["c6i.2xlarge", "c5.2xlarge", "c5a.2xlarge"]
capacity_type = "SPOT"
labels = {
workload = "feast-spot"
}
}
}
# Savings: ~70% compared to on-demand
# Risk: Pods may be terminated (but Kubernetes reschedules automatically)
Strategy 3: Tiered Feature Access
class TieredFeatureRetrieval:
"""
Hot features: Redis
Warm features: DynamoDB (cheaper than Redis for infrequent access)
Cold features: S3 direct read
"""
def __init__(self):
self.redis = redis.StrictRedis(...)
self.dynamodb = boto3.resource('dynamodb')
self.s3 = boto3.client('s3')
self.hot_features = set(['clicks_last_hour', 'cart_items'])
self.warm_features = set(['lifetime_value', 'favorite_category'])
# Everything else is cold
def get_features(self, entity_id, feature_list):
results = {}
# Hot tier (Redis)
hot_needed = [f for f in feature_list if f in self.hot_features]
if hot_needed:
# Feast retrieval from Redis
results.update(self.fetch_from_redis(entity_id, hot_needed))
# Warm tier (DynamoDB)
warm_needed = [f for f in feature_list if f in self.warm_features]
if warm_needed:
table = self.dynamodb.Table('features_warm')
response = table.get_item(Key={'entity_id': entity_id})
results.update(response.get('Item', {}))
# Cold tier (S3)
cold_needed = [f for f in feature_list if f not in self.hot_features and f not in self.warm_features]
if cold_needed:
# Read from Parquet file in S3
results.update(self.fetch_from_s3(entity_id, cold_needed))
return results
# Cost savings: 50% reduction by moving infrequent features out of Redis
5.4.10. Monitoring and Alerting
Prometheus Metrics:
from prometheus_client import Counter, Histogram, Gauge, start_http_server
# Define metrics
feature_requests = Counter(
'feast_feature_requests_total',
'Total feature requests',
['feature_view', 'status']
)
feature_request_duration = Histogram(
'feast_feature_request_duration_seconds',
'Feature request duration',
['feature_view']
)
redis_connection_pool_size = Gauge(
'feast_redis_pool_size',
'Redis connection pool size'
)
feature_cache_hit_rate = Gauge(
'feast_cache_hit_rate',
'Feature cache hit rate'
)
# Instrument Feast retrieval
def get_online_features_instrumented(feature_store, entity_rows, features):
feature_view_name = features[0].split(':')[0]
with feature_request_duration.labels(feature_view=feature_view_name).time():
try:
result = feature_store.get_online_features(
entity_rows=entity_rows,
features=features
)
feature_requests.labels(
feature_view=feature_view_name,
status='success'
).inc()
return result
except Exception as e:
feature_requests.labels(
feature_view=feature_view_name,
status='error'
).inc()
raise
# Start metrics server
start_http_server(9090)
Grafana Dashboard:
{
"dashboard": {
"title": "Feast Feature Store",
"panels": [
{
"title": "Request Rate",
"targets": [{
"expr": "rate(feast_feature_requests_total[5m])"
}]
},
{
"title": "P99 Latency",
"targets": [{
"expr": "histogram_quantile(0.99, feast_feature_request_duration_seconds)"
}]
},
{
"title": "Error Rate",
"targets": [{
"expr": "rate(feast_feature_requests_total{status='error'}[5m]) / rate(feast_feature_requests_total[5m])"
}]
},
{
"title": "Redis Memory Usage",
"targets": [{
"expr": "redis_memory_used_bytes / redis_memory_max_bytes * 100"
}]
}
]
}
}
5.4.11. Troubleshooting Guide
| Issue | Symptoms | Diagnosis | Solution |
|---|---|---|---|
| High latency | P99 >100ms | Check Redis CPU, network | Scale Redis nodes, add connection pooling |
| Memory pressure | Redis evictions increasing | INFO memory on Redis | Increase instance size or enable LRU eviction |
| Feast pods crashing | OOM kills | kubectl describe pod | Increase memory limits, reduce registry cache size |
| Features missing | Get returns null | Check materialization logs | Run feast materialize, verify Offline Store data |
| Registry errors | “Registry not found” | S3 access logs | Fix IAM permissions, check S3 path |
| Slow materialization | Takes >1 hour | Profile Spark job | Partition data, increase parallelism |
Debugging Commands:
# Check Feast server logs
kubectl logs -n ml-platform deployment/feast-server --tail=100 -f
# Test Redis connectivity
kubectl run -it --rm redis-test --image=redis:7 --restart=Never -- \
redis-cli -h feast-redis.cache.amazonaws.com -p 6379 PING
# Check registry
aws s3 ls s3://my-ml-platform-bucket/feast/registry.pb
# Test feature retrieval
kubectl exec -it deployment/feast-server -- python3 -c "
from feast import FeatureStore
store = FeatureStore(repo_path='.')
features = store.get_online_features(
entity_rows=[{'user_id': 123}],
features=['user:total_purchases']
)
print(features.to_dict())
"
# Monitor Redis performance
redis-cli --latency -h feast-redis.cache.amazonaws.com
5.4.12. Advanced: Multi-Region Deployment
For global applications requiring low latency worldwide:
# Architecture: Active-Active Multi-Region
# Region 1: US-East-1
resource "aws_elasticache_replication_group" "feast_us_east" {
provider = aws.us_east_1
# ... Redis config ...
}
resource "aws_eks_cluster" "feast_us_east" {
provider = aws.us_east_1
# ... EKS config ...
}
# Region 2: EU-West-1
resource "aws_elasticache_replication_group" "feast_eu_west" {
provider = aws.eu_west_1
# ... Redis config ...
}
resource "aws_eks_cluster" "feast_eu_west" {
provider = aws.eu_west_1
# ... EKS config ...
}
# Global Accelerator for routing
resource "aws_globalaccelerator_accelerator" "feast" {
name = "feast-global"
enabled = true
}
resource "aws_globalaccelerator_endpoint_group" "us_east" {
listener_arn = aws_globalaccelerator_listener.feast.id
endpoint_group_region = "us-east-1"
endpoint_configuration {
endpoint_id = aws_lb.feast_us_east.arn
weight = 100
}
}
resource "aws_globalaccelerator_endpoint_group" "eu_west" {
listener_arn = aws_globalaccelerator_listener.feast.id
endpoint_group_region = "eu-west-1"
endpoint_configuration {
endpoint_id = aws_lb.feast_eu_west.arn
weight = 100
}
}
Synchronization Strategy:
# Option 1: Write to all regions (strong consistency)
def write_features_multi_region(entity_id, features):
"""Write to all regions simultaneously"""
import concurrent.futures
regions = ['us-east-1', 'eu-west-1', 'ap-southeast-1']
def write_to_region(region):
store = FeatureStore(region=region)
store.push(source_name='user_features', features=features)
with concurrent.futures.ThreadPoolExecutor() as executor:
futures = [executor.submit(write_to_region, r) for r in regions]
results = [f.result() for f in futures]
return all(results)
# Option 2: Async replication (eventual consistency, lower cost)
# Write to primary region, replicate asynchronously to others via Kinesis
5.4.13. Best Practices Summary
- Start Small: Deploy Feast in dev/staging before production
- Version Registry: Use S3 versioning for rollback capability
- Monitor Everything: Track latency, error rate, memory usage
- Connection Pooling: Configure appropriate pool sizes for Redis
- Cache Registry: Set
cache_ttl_secondsto reduce S3 calls - GitOps: Treat feature definitions as code with CI/CD
- Right-Size Redis: Calculate memory needs, don’t over-provision
- Use Spot Instances: For Feast pods (not Redis)
- Test Failover: Regularly test Redis failover scenarios
- Document Features: Maintain feature catalog with owners and SLAs
5.4.14. Comparison: Managed vs. Self-Hosted
| Aspect | AWS SageMaker | GCP Vertex AI | Feast (Self-Hosted) |
|---|---|---|---|
| Setup Complexity | Low | Low | High |
| Operational Overhead | None | None | High (you manage K8s, Redis) |
| Cost | $$$ | $$$ | $$ (compute + storage only) |
| Flexibility | Limited | Limited | Full control |
| Multi-Cloud | AWS only | GCP only | Yes |
| Customization | Limited | Limited | Unlimited |
| Latency | ~5-10ms | ~5-10ms | ~3-8ms (if optimized) |
| Vendor Lock-In | High | High | None |
When to Choose Self-Hosted Feast:
- Need multi-cloud or hybrid deployment
- Require custom feature transformations
- Have Kubernetes expertise in-house
- Want to avoid vendor lock-in
- Need <5ms latency with aggressive optimization
- Cost-sensitive (can optimize infrastructure)
When to Choose Managed:
- Small team without K8s expertise
- Want to move fast without ops burden
- Already invested in AWS/GCP ecosystem
- Compliance requirements met by managed service
- Prefer predictable support SLAs
5.4.15. Exercises
Exercise 1: Local Deployment Set up Feast locally:
- Install Feast:
pip install feast[redis] - Initialize repository:
feast init my_repo - Define features for your use case
- Test materialization and retrieval
Exercise 2: Cost Calculator Build a cost model:
- Calculate Redis memory needs for your workload
- Estimate EKS costs (nodes, load balancers)
- Compare with managed alternative (SageMaker/Vertex AI)
- Determine break-even point
Exercise 3: Load Testing Benchmark Feast performance:
- Deploy Feast on EKS/GKE
- Use Locust or k6 to generate load
- Measure P50, P95, P99 latencies
- Identify bottlenecks (Redis, network, serialization)
Exercise 4: Disaster Recovery Implement and test:
- Redis AOF backups
- Registry versioning in S3
- Cross-region replication
- Measure RTO and RPO
Exercise 5: Feature Skew Detection Build monitoring to detect training-serving skew:
- Log feature vectors from production
- Compare with offline store snapshots
- Calculate statistical divergence
- Alert on significant drift
5.4.16. Summary
Deploying Feast on Kubernetes provides maximum flexibility and control over your Feature Store, at the cost of operational complexity.
Key Capabilities:
- Multi-Cloud: Deploy anywhere Kubernetes runs
- Open Source: No vendor lock-in, community-driven
- Customizable: Full control over infrastructure and configuration
- Cost-Effective: Pay only for compute and storage, no managed service markup
Operational Requirements:
- Kubernetes expertise (EKS/GKE/AKS)
- Redis cluster management (ElastiCache/Memorystore)
- Monitoring and alerting setup (Prometheus/Grafana)
- CI/CD pipeline for feature deployment
Cost Structure:
- EKS/GKE: ~$0.10/hour per cluster + worker nodes
- Redis: $0.50-2.00/hour depending on size
- Storage: S3/GCS standard rates
- Total: Typically 40-60% cheaper than managed alternatives
Critical Success Factors:
- Robust connection pooling for Redis
- Horizontal pod autoscaling for Feast server
- Registry caching to minimize S3 calls
- Comprehensive monitoring and alerting
- GitOps workflow for feature definitions
- Regular disaster recovery testing
Trade-Offs:
- ✓ Full control and flexibility
- ✓ Multi-cloud portability
- ✓ Lower cost at scale
- ✗ Higher operational burden
- ✗ Requires Kubernetes expertise
- ✗ No managed support SLA
Feast is the right choice for mature engineering organizations that value control and cost efficiency over operational simplicity. For teams without Kubernetes expertise or those wanting to move fast, managed solutions (SageMaker, Vertex AI) remain compelling alternatives.
In the next chapter, we move from feature management to model training orchestration, exploring Kubeflow Pipelines and SageMaker Pipelines for reproducible, scalable training workflows.