Chapter 11: The Feature Store Architecture

11.4. Open Source: Deploying Feast on EKS/GKE with Redis

“The only thing worse than a proprietary lock-in that slows you down is building your own platform that slows you down even more. But when done right, open source is the only path to true sovereignty over your data semantics.” — Infrastructure Engineering Maxim

In the previous sections, we explored the managed offerings: AWS SageMaker Feature Store and Google Cloud Vertex AI Feature Store. These services offer the allure of the “Easy Button”—managed infrastructure, integrated security, and SLA-backed availability. However, for the high-maturity organization, they often present insurmountable friction points: opacity in pricing, lack of support for complex custom data types, localized vendor lock-in, and latency floors that are too high for high-frequency trading or real-time ad bidding.

Enter Feast (Feature Store).

Feast has emerged as the de-facto open-source standard for feature stores. It is not a database; it is a connector. It manages the registry of features, standardizes the retrieval of data for training (offline) and serving (online), and orchestrates the movement of data between the two.

Deploying Feast effectively requires a shift in mindset from “Consumer” to “Operator.” You are no longer just calling an API; you are responsible for the CAP theorem properties of your serving layer. You own the Redis eviction policies. You own the Kubernetes Horizontal Pod Autoscalers. You own the synchronization lag.

This section serves as the definitive reference architecture for deploying Feast in a high-scale production environment, leveraging Kubernetes (EKS/GKE) for compute and Managed Redis (ElastiCache/Memorystore) for state.

5.4.1. The Architecture of Self-Hosted Feast

To operate Feast, one must understand its anatomy. Unlike its early versions (0.9 and below), modern Feast (0.10+) is highly modular and unopinionated about infrastructure. It does not require a heavy JVM stack or Kafka by default. It runs where your compute runs.

The Core Components

The Registry: The central catalog. It maps feature names (user_churn_score) to data sources (Parquet on S3) and entity definitions (user_id).
- Production Storage: An object store bucket (S3/GCS) or a SQL database (PostgreSQL).
- Behavior: Clients (training pipelines, inference services) pull the registry to understand how to fetch data.
The Offline Store: The historical data warehouse. Feast does not manage this data; it manages the queries against it.
- AWS: Redshift, Snowflake, or Athena (S3).
- GCP: BigQuery.
- Role: Used for generating point-in-time correct training datasets.
The Online Store: The low-latency cache. This is the critical piece for real-time inference.
- AWS: ElastiCache for Redis.
- GCP: Cloud Memorystore for Redis.
- Role: Serves the latest known value of a feature for a specific entity ID at millisecond latency.
The Feature Server: A lightweight HTTP/gRPC service (usually Python or Go) that exposes the retrieval API.
- Deployment: A scalable microservice on Kubernetes.
- Role: It parses the request, hashes the entity keys, queries Redis, deserializes the Protobuf payloads, and returns the feature vector.
The Materialization Engine: The worker process that moves data from Offline to Online.
- Deployment: Airflow DAGs, Kubernetes CronJobs, or a stream processor.
- Role: Ensures the Online Store is eventually consistent with the Offline Store.

The “Thin Client” vs. “Feature Server” Model

One of the most significant architectural decisions you will make is how your inference service consumes features.

Pattern A: The Embedded Client (Fat Client)
- How: Your inference service (e.g., a FastAPI container running the model) imports the feast Python library directly. It connects to Redis and the Registry itself.
- Pros: Lowest possible latency (no extra network hop).
- Cons: Tight coupling. Your inference image bloats with Feast dependencies. Configuration updates (e.g., changing Redis endpoints) require redeploying the model container.
- Verdict: Use for extreme latency sensitivity (< 5ms).
Pattern B: The Feature Service (Sidecar or Microservice)
- How: You deploy the Feast Feature Server as a standalone deployment behind a Service/LoadBalancer. Your model calls GET /get-online-features.
- Pros: Decoupling. The Feature Server can scale independently of the model. Multiple models can share the same feature server.
- Cons: Adds network latency (serialization + wire time).
- Verdict: The standard enterprise pattern. Easier to secure and govern.

5.4.2. The AWS Reference Architecture (EKS + ElastiCache)

Building this on AWS requires navigating the VPC networking intricacies of connecting EKS (Kubernetes) to ElastiCache (Redis).

1. Network Topology

Do not expose Redis to the public internet. Do not peer VPCs unnecessarily.

VPC: One VPC for the ML Platform.
Subnets:
- Private App Subnets: Host the EKS Worker Nodes.
- Private Data Subnets: Host the ElastiCache Subnet Group.
Security Groups:
- sg-eks-nodes: Allow outbound 6379 to sg-elasticache.
- sg-elasticache: Allow inbound 6379 from sg-eks-nodes.

2. The Online Store: ElastiCache for Redis

We choose Cluster Mode Enabled for scale. If your feature set fits in one node (< 25GB), Cluster Mode Disabled is simpler, but ML systems tend to grow.

Terraform Implementation Detail:

resource "aws_elasticache_replication_group" "feast_online_store" {
  replication_group_id          = "feast-production-store"
  description                   = "Feast Online Store for Low Latency Serving"
  node_type                     = "cache.r6g.xlarge" # Graviton2 for cost/performance
  port                          = 6379
  parameter_group_name          = "default.redis7.cluster.on"
  automatic_failover_enabled    = true
  
  # Cluster Mode Configuration
  num_node_groups         = 3 # Shards
  replicas_per_node_group = 1 # High Availability

  subnet_group_name    = aws_elasticache_subnet_group.ml_data.name
  security_group_ids   = [aws_security_group.elasticache.id]
  
  at_rest_encryption_enabled = true
  transit_encryption_enabled = true
  auth_token                 = var.redis_auth_token # Or utilize IAM Auth if supported by client
}

3. The Registry: S3 + IAM Roles for Service Accounts (IRSA)

Feast needs to read the registry file (registry.db or registry.pb) from S3. The Feast Feature Server running in a Pod should not have hardcoded AWS keys.

Create an OIDC Provider for the EKS cluster.
Create an IAM Role with s3:GetObject and s3:PutObject permissions on the registry bucket.
Annotate the ServiceAccount:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: feast-service-account
  namespace: ml-platform
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/FeastRegistryAccessRole

4. The Feast Configuration (`feature_store.yaml`)

This file controls how Feast connects. In a containerized environment, we inject secrets via Environment Variables.

project: my_organization_ml
registry: s3://my-ml-platform-bucket/feast/registry.pb
provider: aws

online_store:
  type: redis
  # The cluster endpoint from Terraform output
  connection_string: master.feast-production-store.xxxxxx.use1.cache.amazonaws.com:6379
  auth_token: ${REDIS_AUTH_TOKEN}  # Injected via K8s Secret
  ssl: true

offline_store:
  type: snowflake.offline
  account: ${SNOWFLAKE_ACCOUNT}
  user: ${SNOWFLAKE_USER}
  database: ML_FEATURES
  warehouse: COMPUTE_WH

5.4.3. The GCP Reference Architecture (GKE + Memorystore)

Google Cloud offers a smoother integration for networking but stricter constraints on the Redis service types.

1. Network Topology: VPC Peering

Memorystore instances reside in a Google-managed project. To access them from GKE, you must use Private Services Access (VPC Peering).

Action: Allocate an IP range (CIDR /24) for Service Networking.
Constraint: Memorystore for Redis (Basic/Standard Tier) does not support “Cluster Mode” in the same way open Redis does. It uses a Primary/Read-Replica model. For massive scale, you might need Memorystore for Redis Cluster (a newer offering).

2. The Online Store: Memorystore

For most use cases, a Standard Tier (High Availability) instance suffices.

Terraform Implementation Detail:

resource "google_redis_instance" "feast_online_store" {
  name           = "feast-online-store"
  tier           = "STANDARD_HA"
  memory_size_gb = 50
  
  location_id             = "us-central1-a"
  alternative_location_id = "us-central1-f"

  authorized_network = google_compute_network.vpc_network.id
  connect_mode       = "PRIVATE_SERVICE_ACCESS"

  redis_version     = "REDIS_7_0"
  display_name      = "Feast Feature Store Cache"
  
  # Auth is critical
  auth_enabled = true
}

3. GKE Workload Identity

Similar to AWS IRSA, GKE Workload Identity maps a Kubernetes Service Account (KSA) to a Google Service Account (GSA).

GSA: feast-sa@my-project.iam.gserviceaccount.com has roles/storage.objectAdmin (for Registry GCS) and roles/bigquery.dataViewer (for Offline Store).

Binding:

gcloud iam service-accounts add-iam-policy-binding feast-sa@... \
    --role roles/iam.workloadIdentityUser \
    --member "serviceAccount:my-project.svc.id.goog[ml-platform/feast-sa]"

5.4.4. Deploying the Feast Feature Server

Whether on AWS or GCP, the Feature Server is a stateless deployment.

The Dockerfile

We need a lean image. Start with a Python slim base.

FROM python:3.10-slim

# Install system dependencies for building C extensions (if needed)
RUN apt-get update && apt-get install -y build-essential

# Install Feast with specific extras
# redis: for online store
# snowflake/postgres/bigquery: for offline store dependencies
RUN pip install "feast[redis,snowflake,aws]" gunicorn

WORKDIR /app
COPY feature_store.yaml .
# We assume the registry is pulled from S3/GCS at runtime or pointed to via S3 path

# The Feast CLI exposes a server command
# --no-access-log is crucial for high throughput performance
CMD ["feast", "serve", "--host", "0.0.0.0", "--port", "6566", "--no-access-log"]

The Kubernetes Deployment

This is where we define the scale.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: feast-feature-server
  namespace: ml-platform
spec:
  replicas: 3
  selector:
    matchLabels:
      app: feast-server
  template:
    metadata:
      labels:
        app: feast-server
    spec:
      serviceAccountName: feast-service-account # Critical for IRSA/Workload Identity
      containers:
      - name: feast
        image: my-registry/feast-server:v1.0.0
        env:
        - name: FEAST_USAGE
          value: "False" # Disable telemetry
        - name: REDIS_AUTH_TOKEN
          valueFrom:
            secretKeyRef:
              name: redis-secrets
              key: auth-token
        resources:
          requests:
            cpu: "1000m"
            memory: "2Gi"
          limits:
            cpu: "2000m"
            memory: "4Gi"
        readinessProbe:
          tcpSocket:
            port: 6566
          initialDelaySeconds: 5
          periodSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
  name: feast-feature-server
spec:
  selector:
    app: feast-server
  ports:
  - protocol: TCP
    port: 80
    targetPort: 6566
  type: ClusterIP

Autoscaling (HPA)

The CPU usage of the Feature Server is dominated by Protobuf serialization/deserialization. It is CPU-bound.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: feast-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: feast-feature-server
  minReplicas: 3
  maxReplicas: 50
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60

5.4.5. The Materialization Engine (Syncing Data)

The “Online Store” in Redis is useless if it’s empty. We must populate it from the Offline Store (Data Warehouse). This process is called Materialization.

The Challenge of Freshness

Full Refresh: Overwriting the entire Redis cache. Safe but slow. High IOPS.
Incremental: Only writing rows that changed since the last run.

In a naive setup, engineers run feast materialize-incremental from their laptop. In production, this must be orchestrated.

Pattern: The Airflow Operator

Using Apache Airflow (Managed Workflows for Apache Airflow on AWS or Cloud Composer on GCP) is the standard pattern.

from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.bash import BashOperator

# Definition of the sync window
# We want to sync data up to the current moment
default_args = {
    'owner': 'ml-platform',
    'retries': 3,
    'retry_delay': timedelta(minutes=5),
}

with DAG(
    'feast_materialization_hourly',
    default_args=default_args,
    schedule_interval='0 * * * *', # Every hour
    start_date=datetime(2023, 1, 1),
    catchup=False,
) as dag:

    # The Docker image used here must match the feature definitions
    # It must have access to feature_store.yaml and credentials
    materialize = BashOperator(
        task_id='materialize_features',
        bash_command='cd /app && feast materialize-incremental $(date -u +"%Y-%m-%dT%H:%M:%S")',
    )

Pattern: Stream Materialization (Near Real-Time)

For features like “number of clicks in the last 10 minutes,” hourly batch jobs are insufficient. You need streaming. Feast supports Push Sources.

Event Source: Kafka or Kinesis.
Stream Processor: Flink or Spark Streaming.
Feast Push API: The processor calculates the feature and pushes it directly to the Feast Online Store, bypassing the Offline Store synchronization lag.

# In your stream processor (e.g., Spark Structured Streaming)
from feast import FeatureStore

store = FeatureStore(repo_path=".")

def write_to_online_store(row):
    # Convert row to dict
    data = row.asDict()
    # Push to Feast
    store.push("click_stream_push_source", data)

5.4.6. Operational Challenges and Performance Tuning

Deploying Feast is easy; keeping it fast and consistent at scale is hard.

1. Redis Memory Management

Redis is an in-memory store. RAM is expensive.

The Debt: You define a feature user_embedding (a 768-float vector). You have 100M users.
- Size = 100M * 768 * 4 bytes ≈ 300 GB.
- This requires a massive Redis cluster (e.g., AWS cache.r6g.4xlarge clusters).
The Fix: Use Entity TTL.
- Feast allows setting a TTL (Time To Live) on features.
- ttl=timedelta(days=7) means “if the user hasn’t been active in 7 days, let Redis evict their features.”
- Feast Configuration: Feast uses Redis hashes. It does not natively map Feast TTL to Redis TTL perfectly in all versions. You may need to rely on Redis maxmemory-policy allkeys-lru to handle eviction when memory is full.

2. Serialization Overhead (The Protobuf Tax)

Feast stores data in Redis as Protocol Buffers.

Write Path: Python Object -> Protobuf -> Bytes -> Redis.
Read Path: Redis -> Bytes -> Protobuf -> Python Object -> JSON (HTTP response).
Impact: At 10,000 RPS, CPU becomes the bottleneck, not Redis network I/O.
Mitigation: Use the Feast Go Server or Feast Java Server (alpha features) if Python’s Global Interpreter Lock (GIL) becomes a blocker. Alternatively, scale the Python Deployment horizontally.

3. The “Thundering Herd” on Registry

If you have 500 pods of your Inference Service starting simultaneously (e.g., after a deploy), they all try to download registry.pb from S3.

Result: S3 503 Slow Down errors or latency spikes.
Mitigation: Set cache_ttl_seconds in the Feature Store config. This caches the registry in memory in the client/server, checking for updates only periodically.

4. Connection Pooling

Standard Redis clients in Python create a new connection per request or use a small pool. In Kubernetes with sidecars (Istio/Envoy), connection management can get messy.

Symptom: RedisTimeoutError or ConnectionRefusedError.
Fix: Tune the redis_pool_size in Feast config (passed to the underlying redis-py client). Ensure tcp_keepalive is enabled to detect dead connections in cloud networks.

5.4.7. Feature Definition Management: GitOps for Data

How do you manage the definitions of features? Do not let Data Scientists run feast apply from their laptops against the production registry. This is Schema Drift.

The GitOps Workflow

Repository Structure:

my-feature-repo/
├── features/
│   ├── user_churn.py
│   ├── product_recs.py
├── feature_store.yaml
└── .github/workflows/feast_apply.yml

The feature_store.yaml: The configuration is versioned.

Feature Definitions as Code:

# features/user_churn.py
from feast import Entity, Feature, FeatureView, ValueType
from feast.data_source import FileSource
from datetime import timedelta

user = Entity(name="user", value_type=ValueType.INT64, description="User ID")

user_features_source = FileSource(
    path="s3://data/user_features.parquet",
    event_timestamp_column="event_timestamp"
)

user_churn_fv = FeatureView(
    name="user_churn_features",
    entities=[user],
    ttl=timedelta(days=365),
    features=[
        Feature(name="total_purchases", dtype=ValueType.INT64),
        Feature(name="avg_order_value", dtype=ValueType.DOUBLE),
        Feature(name="days_since_last_purchase", dtype=ValueType.INT64)
    ],
    source=user_features_source
)

CI/CD Pipeline (GitHub Actions):

# .github/workflows/feast_apply.yml
name: Deploy Features
on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2

      - name: Setup Python
        uses: actions/setup-python@v2
        with:
          python-version: '3.9'

      - name: Install Feast
        run: pip install feast[redis,aws]

      - name: Validate Features
        run: |
          cd my-feature-repo
          feast plan

      - name: Deploy to Production
        if: github.ref == 'refs/heads/main'
        run: |
          cd my-feature-repo
          feast apply
        env:
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}

      - name: Materialize Features
        run: |
          cd my-feature-repo
          feast materialize-incremental $(date -u +"%Y-%m-%dT%H:%M:%S")

Pull Request Review: Feature changes require approval from the ML Platform team.

5.4.8. Real-World Case Study: E-Commerce Personalization

Company: ShopCo (anonymized retailer)

Challenge: Deploy Feast on EKS to serve 20M users, 50k requests/second peak.

Architecture:

# Production Infrastructure (Terraform + Helm)

# 1. EKS Cluster
module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "18.0"

  cluster_name    = "feast-prod"
  cluster_version = "1.27"

  vpc_id     = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnets

  eks_managed_node_groups = {
    feast_workers = {
      min_size     = 10
      max_size     = 100
      desired_size = 20

      instance_types = ["c6i.2xlarge"]  # 8 vCPU, 16 GiB RAM
      capacity_type  = "ON_DEMAND"

      labels = {
        workload = "feast"
      }

      taints = [{
        key    = "feast"
        value  = "true"
        effect = "NO_SCHEDULE"
      }]
    }
  }
}

# 2. ElastiCache for Redis (Online Store)
resource "aws_elasticache_replication_group" "feast_online" {
  replication_group_id       = "feast-online-prod"
  description                = "Feast Online Store"
  node_type                  = "cache.r6g.8xlarge"  # 256 GB RAM
  num_node_groups            = 10  # 10 shards
  replicas_per_node_group    = 2   # 1 primary + 2 replicas per shard
  automatic_failover_enabled = true

  parameter_group_name = "default.redis7.cluster.on"

  # Eviction policy
  parameter {
    name  = "maxmemory-policy"
    value = "allkeys-lru"  # Evict least recently used keys when memory full
  }
}

# 3. S3 for Registry and Offline Store
resource "aws_s3_bucket" "feast_data" {
  bucket = "shopco-feast-data"

  versioning {
    enabled = true
  }

  lifecycle_rule {
    enabled = true

    noncurrent_version_expiration {
      days = 90
    }
  }
}

Helm Deployment:

# feast-values.yaml
replicaCount: 20

image:
  repository: shopco/feast-server
  tag: "0.32.0"
  pullPolicy: IfNotPresent

resources:
  requests:
    cpu: 2000m
    memory: 4Gi
  limits:
    cpu: 4000m
    memory: 8Gi

autoscaling:
  enabled: true
  minReplicas: 20
  maxReplicas: 100
  targetCPUUtilizationPercentage: 70

service:
  type: ClusterIP
  port: 6566

ingress:
  enabled: true
  className: alb
  annotations:
    alb.ingress.kubernetes.io/scheme: internal
    alb.ingress.kubernetes.io/target-type: ip
  hosts:
    - host: feast.internal.shopco.com
      paths:
        - path: /
          pathType: Prefix

env:
  - name: FEAST_USAGE
    value: "False"
  - name: REDIS_CONNECTION_STRING
    valueFrom:
      secretKeyRef:
        name: feast-secrets
        key: redis-connection

serviceAccount:
  create: true
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/FeastServerRole

nodeSelector:
  workload: feast

tolerations:
  - key: feast
    operator: Equal
    value: "true"
    effect: NoSchedule

Results:

P99 latency: 8ms (target: <10ms) ✓
Availability: 99.97% (target: 99.95%) ✓
Cost: $18k/month (ElastiCache $12k + EKS $6k)
Requests handled: 50k RPS peak without issues

Key Lessons:

HPA scaled Feast pods from 20 → 85 during Black Friday
Redis cluster mode prevented hotspotting issues
Connection pooling critical (default pool size too small)
Registry caching (5 min TTL) reduced S3 costs by 90%

5.4.9. Cost Optimization Strategies

Strategy 1: Right-Size Redis

def calculate_redis_memory(num_entities, avg_feature_vector_size_bytes):
    """
    Estimate Redis memory requirements
    """

    # Feature data
    feature_data = num_entities * avg_feature_vector_size_bytes

    # Overhead: Redis adds ~25% overhead (pointers, metadata)
    overhead = feature_data * 0.25

    # Buffer: Keep 20% free for operations
    buffer = (feature_data + overhead) * 0.20

    total_memory_bytes = feature_data + overhead + buffer
    total_memory_gb = total_memory_bytes / (1024**3)

    print(f"Entities: {num_entities:,}")
    print(f"Avg feature size: {avg_feature_vector_size_bytes:,} bytes")
    print(f"Raw data: {feature_data / (1024**3):.1f} GB")
    print(f"With overhead: {(feature_data + overhead) / (1024**3):.1f} GB")
    print(f"Recommended: {total_memory_gb:.1f} GB")

    return total_memory_gb

# Example: 20M users, 5KB feature vector
required_gb = calculate_redis_memory(20_000_000, 5000)
# Output:
# Entities: 20,000,000
# Avg feature size: 5,000 bytes
# Raw data: 93.1 GB
# With overhead: 116.4 GB
# Recommended: 139.7 GB

# Choose instance: cache.r6g.8xlarge (256 GB) = $1.344/hr = $981/month

Strategy 2: Use Spot Instances for Feast Pods

# EKS Node Group with Spot
eks_managed_node_groups = {
  feast_spot = {
    min_size     = 5
    max_size     = 50
    desired_size = 10

    instance_types = ["c6i.2xlarge", "c5.2xlarge", "c5a.2xlarge"]
    capacity_type  = "SPOT"

    labels = {
      workload = "feast-spot"
    }
  }
}

# Savings: ~70% compared to on-demand
# Risk: Pods may be terminated (but Kubernetes reschedules automatically)

Strategy 3: Tiered Feature Access

class TieredFeatureRetrieval:
    """
    Hot features: Redis
    Warm features: DynamoDB (cheaper than Redis for infrequent access)
    Cold features: S3 direct read
    """

    def __init__(self):
        self.redis = redis.StrictRedis(...)
        self.dynamodb = boto3.resource('dynamodb')
        self.s3 = boto3.client('s3')

        self.hot_features = set(['clicks_last_hour', 'cart_items'])
        self.warm_features = set(['lifetime_value', 'favorite_category'])
        # Everything else is cold

    def get_features(self, entity_id, feature_list):
        results = {}

        # Hot tier (Redis)
        hot_needed = [f for f in feature_list if f in self.hot_features]
        if hot_needed:
            # Feast retrieval from Redis
            results.update(self.fetch_from_redis(entity_id, hot_needed))

        # Warm tier (DynamoDB)
        warm_needed = [f for f in feature_list if f in self.warm_features]
        if warm_needed:
            table = self.dynamodb.Table('features_warm')
            response = table.get_item(Key={'entity_id': entity_id})
            results.update(response.get('Item', {}))

        # Cold tier (S3)
        cold_needed = [f for f in feature_list if f not in self.hot_features and f not in self.warm_features]
        if cold_needed:
            # Read from Parquet file in S3
            results.update(self.fetch_from_s3(entity_id, cold_needed))

        return results

# Cost savings: 50% reduction by moving infrequent features out of Redis

5.4.10. Monitoring and Alerting

Prometheus Metrics:

from prometheus_client import Counter, Histogram, Gauge, start_http_server

# Define metrics
feature_requests = Counter(
    'feast_feature_requests_total',
    'Total feature requests',
    ['feature_view', 'status']
)

feature_request_duration = Histogram(
    'feast_feature_request_duration_seconds',
    'Feature request duration',
    ['feature_view']
)

redis_connection_pool_size = Gauge(
    'feast_redis_pool_size',
    'Redis connection pool size'
)

feature_cache_hit_rate = Gauge(
    'feast_cache_hit_rate',
    'Feature cache hit rate'
)

# Instrument Feast retrieval
def get_online_features_instrumented(feature_store, entity_rows, features):
    feature_view_name = features[0].split(':')[0]

    with feature_request_duration.labels(feature_view=feature_view_name).time():
        try:
            result = feature_store.get_online_features(
                entity_rows=entity_rows,
                features=features
            )
            feature_requests.labels(
                feature_view=feature_view_name,
                status='success'
            ).inc()
            return result
        except Exception as e:
            feature_requests.labels(
                feature_view=feature_view_name,
                status='error'
            ).inc()
            raise

# Start metrics server
start_http_server(9090)

Grafana Dashboard:

{
  "dashboard": {
    "title": "Feast Feature Store",
    "panels": [
      {
        "title": "Request Rate",
        "targets": [{
          "expr": "rate(feast_feature_requests_total[5m])"
        }]
      },
      {
        "title": "P99 Latency",
        "targets": [{
          "expr": "histogram_quantile(0.99, feast_feature_request_duration_seconds)"
        }]
      },
      {
        "title": "Error Rate",
        "targets": [{
          "expr": "rate(feast_feature_requests_total{status='error'}[5m]) / rate(feast_feature_requests_total[5m])"
        }]
      },
      {
        "title": "Redis Memory Usage",
        "targets": [{
          "expr": "redis_memory_used_bytes / redis_memory_max_bytes * 100"
        }]
      }
    ]
  }
}

5.4.11. Troubleshooting Guide

Issue	Symptoms	Diagnosis	Solution
High latency	P99 >100ms	Check Redis CPU, network	Scale Redis nodes, add connection pooling
Memory pressure	Redis evictions increasing	`INFO memory` on Redis	Increase instance size or enable LRU eviction
Feast pods crashing	OOM kills	`kubectl describe pod`	Increase memory limits, reduce registry cache size
Features missing	Get returns null	Check materialization logs	Run `feast materialize`, verify Offline Store data
Registry errors	“Registry not found”	S3 access logs	Fix IAM permissions, check S3 path
Slow materialization	Takes >1 hour	Profile Spark job	Partition data, increase parallelism

Debugging Commands:

# Check Feast server logs
kubectl logs -n ml-platform deployment/feast-server --tail=100 -f

# Test Redis connectivity
kubectl run -it --rm redis-test --image=redis:7 --restart=Never -- \
  redis-cli -h feast-redis.cache.amazonaws.com -p 6379 PING

# Check registry
aws s3 ls s3://my-ml-platform-bucket/feast/registry.pb

# Test feature retrieval
kubectl exec -it deployment/feast-server -- python3 -c "
from feast import FeatureStore
store = FeatureStore(repo_path='.')
features = store.get_online_features(
    entity_rows=[{'user_id': 123}],
    features=['user:total_purchases']
)
print(features.to_dict())
"

# Monitor Redis performance
redis-cli --latency -h feast-redis.cache.amazonaws.com

5.4.12. Advanced: Multi-Region Deployment

For global applications requiring low latency worldwide:

# Architecture: Active-Active Multi-Region

# Region 1: US-East-1
resource "aws_elasticache_replication_group" "feast_us_east" {
  provider = aws.us_east_1
  # ... Redis config ...
}

resource "aws_eks_cluster" "feast_us_east" {
  provider = aws.us_east_1
  # ... EKS config ...
}

# Region 2: EU-West-1
resource "aws_elasticache_replication_group" "feast_eu_west" {
  provider = aws.eu_west_1
  # ... Redis config ...
}

resource "aws_eks_cluster" "feast_eu_west" {
  provider = aws.eu_west_1
  # ... EKS config ...
}

# Global Accelerator for routing
resource "aws_globalaccelerator_accelerator" "feast" {
  name = "feast-global"
  enabled = true
}

resource "aws_globalaccelerator_endpoint_group" "us_east" {
  listener_arn = aws_globalaccelerator_listener.feast.id
  endpoint_group_region = "us-east-1"

  endpoint_configuration {
    endpoint_id = aws_lb.feast_us_east.arn
    weight      = 100
  }
}

resource "aws_globalaccelerator_endpoint_group" "eu_west" {
  listener_arn = aws_globalaccelerator_listener.feast.id
  endpoint_group_region = "eu-west-1"

  endpoint_configuration {
    endpoint_id = aws_lb.feast_eu_west.arn
    weight      = 100
  }
}

Synchronization Strategy:

# Option 1: Write to all regions (strong consistency)
def write_features_multi_region(entity_id, features):
    """Write to all regions simultaneously"""
    import concurrent.futures

    regions = ['us-east-1', 'eu-west-1', 'ap-southeast-1']

    def write_to_region(region):
        store = FeatureStore(region=region)
        store.push(source_name='user_features', features=features)

    with concurrent.futures.ThreadPoolExecutor() as executor:
        futures = [executor.submit(write_to_region, r) for r in regions]
        results = [f.result() for f in futures]

    return all(results)

# Option 2: Async replication (eventual consistency, lower cost)
# Write to primary region, replicate asynchronously to others via Kinesis

5.4.13. Best Practices Summary

Start Small: Deploy Feast in dev/staging before production
Version Registry: Use S3 versioning for rollback capability
Monitor Everything: Track latency, error rate, memory usage
Connection Pooling: Configure appropriate pool sizes for Redis
Cache Registry: Set cache_ttl_seconds to reduce S3 calls
GitOps: Treat feature definitions as code with CI/CD
Right-Size Redis: Calculate memory needs, don’t over-provision
Use Spot Instances: For Feast pods (not Redis)
Test Failover: Regularly test Redis failover scenarios
Document Features: Maintain feature catalog with owners and SLAs

5.4.14. Comparison: Managed vs. Self-Hosted

Aspect	AWS SageMaker	GCP Vertex AI	Feast (Self-Hosted)
Setup Complexity	Low	Low	High
Operational Overhead	None	None	High (you manage K8s, Redis)
Cost	$$$	$$$	$$ (compute + storage only)
Flexibility	Limited	Limited	Full control
Multi-Cloud	AWS only	GCP only	Yes
Customization	Limited	Limited	Unlimited
Latency	~5-10ms	~5-10ms	~3-8ms (if optimized)
Vendor Lock-In	High	High	None

When to Choose Self-Hosted Feast:

Need multi-cloud or hybrid deployment
Require custom feature transformations
Have Kubernetes expertise in-house
Want to avoid vendor lock-in
Need <5ms latency with aggressive optimization
Cost-sensitive (can optimize infrastructure)

When to Choose Managed:

Small team without K8s expertise
Want to move fast without ops burden
Already invested in AWS/GCP ecosystem
Compliance requirements met by managed service
Prefer predictable support SLAs

5.4.15. Exercises

Exercise 1: Local Deployment Set up Feast locally:

Install Feast: pip install feast[redis]
Initialize repository: feast init my_repo
Define features for your use case
Test materialization and retrieval

Exercise 2: Cost Calculator Build a cost model:

Calculate Redis memory needs for your workload
Estimate EKS costs (nodes, load balancers)
Compare with managed alternative (SageMaker/Vertex AI)
Determine break-even point

Exercise 3: Load Testing Benchmark Feast performance:

Deploy Feast on EKS/GKE
Use Locust or k6 to generate load
Measure P50, P95, P99 latencies
Identify bottlenecks (Redis, network, serialization)

Exercise 4: Disaster Recovery Implement and test:

Redis AOF backups
Registry versioning in S3
Cross-region replication
Measure RTO and RPO

Exercise 5: Feature Skew Detection Build monitoring to detect training-serving skew:

Log feature vectors from production
Compare with offline store snapshots
Calculate statistical divergence
Alert on significant drift

5.4.16. Summary

Deploying Feast on Kubernetes provides maximum flexibility and control over your Feature Store, at the cost of operational complexity.

Key Capabilities:

Multi-Cloud: Deploy anywhere Kubernetes runs
Open Source: No vendor lock-in, community-driven
Customizable: Full control over infrastructure and configuration
Cost-Effective: Pay only for compute and storage, no managed service markup

Operational Requirements:

Kubernetes expertise (EKS/GKE/AKS)
Redis cluster management (ElastiCache/Memorystore)
Monitoring and alerting setup (Prometheus/Grafana)
CI/CD pipeline for feature deployment

Cost Structure:

EKS/GKE: ~$0.10/hour per cluster + worker nodes
Redis: $0.50-2.00/hour depending on size
Storage: S3/GCS standard rates
Total: Typically 40-60% cheaper than managed alternatives

Critical Success Factors:

Robust connection pooling for Redis
Horizontal pod autoscaling for Feast server
Registry caching to minimize S3 calls
Comprehensive monitoring and alerting
GitOps workflow for feature definitions
Regular disaster recovery testing

Trade-Offs:

✓ Full control and flexibility
✓ Multi-cloud portability
✓ Lower cost at scale
✗ Higher operational burden
✗ Requires Kubernetes expertise
✗ No managed support SLA

Feast is the right choice for mature engineering organizations that value control and cost efficiency over operational simplicity. For teams without Kubernetes expertise or those wanting to move fast, managed solutions (SageMaker, Vertex AI) remain compelling alternatives.

In the next chapter, we move from feature management to model training orchestration, exploring Kubeflow Pipelines and SageMaker Pipelines for reproducible, scalable training workflows.

Keyboard shortcuts

The MLOps Omni-Reference