Chapter 11: The Feature Store Architecture
11.3. GCP Implementation: Vertex AI Feature Store
“Data gravity is the only law of physics that applies to software. Where your data rests, there your applications must reside.”
If the AWS SageMaker Feature Store is a lesson in managed persistence (using DynamoDB and S3), the Google Cloud Platform (GCP) Vertex AI Feature Store is a masterclass in virtualization.
In 2023, Google fundamentally re-architected this service. They deprecated the “Legacy” Feature Store (which required expensive data copying and proprietary ingestion jobs) and released the “Next Generation” Feature Store.
The architectural philosophy of the modern Vertex AI Feature Store is radical: The Feature Store is just a metadata layer over BigQuery.
There is no separate “Offline Store” storage pricing. There are no proprietary ingestion pipelines to maintain for offline data. If your data is in BigQuery, it is already in the Feature Store. This “Zero-Copy” architecture eliminates the single biggest source of technical debt in ML data systems: the synchronization lag between the data warehouse and the ML platform.
For the Architect operating on GCP, understanding this paradigm shift is critical. It transforms the Feature Store from a storage bucket into a compute engine that manages the flow of data from BigQuery (Analytical) to Bigtable (Transactional).
5.3.1. The “Zero-Copy” Architecture
In traditional Feature Store designs (including Feast and AWS), the workflow is:
- Source: Data lands in a Data Lake.
- Ingest: A Spark job copies data into the Offline Store (Parquet/Iceberg).
- Sync: Another job copies data into the Online Store (Redis/DynamoDB).
This creates Data Drift: The Offline Store is always $N$ hours behind the Source.
The GCP “BigQuery Backend” Approach
In Vertex AI, the “Offline Store” is BigQuery.
- Source: You create a standard BigQuery Table or View containing your features.
- Register: You define a
FeatureViewresource that points to that BigQuery query. - Sync: The Feature Store manages the materialization of that query into the Online Store (Bigtable) for low-latency serving.
This inversion of control implies that Feature Engineering is SQL Engineering. If you can write a SQL query to calculate a rolling average, you have created a feature. You do not need Python implementations of your features for the offline path.
Architectural Components
- FeatureOnlineStore: The container resource. This dictates the infrastructure used for low-latency serving. It can be backed by Bigtable (for high-throughput auto-scaling) or an Optimized Serving endpoint (for ultra-low latency embedding retrieval).
- FeatureView: The logical definition. It maps a BigQuery source (Table or View) to the Online Store. It defines what data to sync and when.
- BigQuery Source: The source of truth. All historical retrieval, point-in-time joins, and batch training data generation happen here using standard BigQuery SQL.
5.3.2. Offline Store: The BigQuery Foundation
Since the Offline Store is BigQuery, optimizing the Feature Store effectively means optimizing BigQuery. A poorly optimized BigQuery table will lead to slow training jobs and massive slot-consumption bills.
Partitioning and Clustering Strategy
To support efficient Point-in-Time (PIT) lookups, your source tables must be structurally optimized.
1. The Timestamp Requirement
Every source table must have a timestamp column. This is not optional. It represents the “Event Time”—the moment the feature value became known.
2. Partitioning by Time You must partition the BigQuery table by the event time column.
- Why: When generating training data for “Last Month,” you do not want to scan “All History.” Partitioning allows the BigQuery engine to prune partitions.
3. Clustering by Entity ID
You must cluster the table by the Entity ID (e.g., user_id, product_id).
- Why: A Feature Store lookup retrieves specific entities. Clustering colocates data for
user_123in the same storage blocks.
SQL DDL Example: Optimized Source Table
CREATE TABLE `my_project.feature_engineering.user_features_v1`
(
entity_id STRING NOT NULL,
feature_timestamp TIMESTAMP NOT NULL,
total_spend_30d FLOAT64,
last_category_viewed STRING,
embedding ARRAY<FLOAT64>
)
PARTITION BY DATE(feature_timestamp)
CLUSTER BY entity_id
OPTIONS(
description="Precomputed user activity features, updated hourly via Dataflow"
);
Feature Views: Tables vs. SQL Views
Vertex AI allows a FeatureView to point to a static Table or a logical SQL View.
- Pointing to a Table: High performance. The sync process simply reads rows. Best for features calculated by upstream ETL pipelines (dbt, Dataform, Dataflow).
- Pointing to a View: High agility. You can change the logic of the feature (e.g., change the window of a rolling average) by updating the SQL View, without moving data.
- Risk: If the SQL View is complex (multiple JOINs), the sync process to the Online Store will be slow and expensive.
5.3.3. Online Store: The Serving Layer Options
When provisioning the FeatureOnlineStore, GCP offers two distinct backends. This is a critical architectural decision based on your latency and QPS (Queries Per Second) requirements.
Option A: Bigtable Online Store
This uses Cloud Bigtable under the hood. It is designed for tabular data (Key-Value lookups).
- Scalability: Linear. You can add nodes to handle millions of QPS.
- Latency: Single-digit milliseconds (p95 ~5-10ms).
- Use Case: Recommendation systems, Fraud detection, dynamic pricing.
- Auto-scaling: Supports CPU-based auto-scaling of the underlying Bigtable nodes.
Option B: Optimized Online Store (PSC)
This is a fully managed service where Google manages the infrastructure completely. It exposes a public endpoint or a Private Service Connect (PSC) endpoint.
- Capabilities: Supports Vector Search (Approximate Nearest Neighbor) natively.
- Latency: Ultra-low latency (p95 < 5ms).
- Use Case: RAG (Retrieval Augmented Generation), Semantic Search, Real-time embedding retrieval.
- Constraint: The dataset size must fit in the memory provisioning of the nodes.
Terraform Implementation: Provisioning the Store
Unlike AWS, where you provision a “Feature Group,” in GCP you provision the infrastructure (Online Store) and the logic (Feature View) separately.
resource "google_vertex_ai_feature_online_store" "main" {
name = "omni_recsys_store"
project = var.project_id
location = "us-central1"
# Option A: Bigtable Backend
bigtable {
auto_scaling {
min_node_count = 1
max_node_count = 10
cpu_utilization_target = 60
}
}
}
resource "google_vertex_ai_feature_view" "user_features" {
name = "user_features_view"
location = "us-central1"
project = var.project_id
feature_online_store = google_vertex_ai_feature_online_store.main.name
big_query_source {
uri = "bq://my_project.feature_engineering.user_features_v1"
entity_id_columns = ["entity_id"]
}
sync_config {
cron = "0 * * * *" # Sync hourly
}
}
5.3.4. Point-in-Time Correctness (The ASOF JOIN)
The most complex mathematical operation in any Feature Store is generating a historical training dataset without Data Leakage.
The Problem
Imagine a Fraud Model.
- Label: Transaction at
2023-10-05 14:30:00. - Feature:
num_transactions_last_hour.
If you simply join on date, you might include the fraud transaction itself in the count, or transactions that happened at 14:55:00. This leaks the future into the past. The model will learn a correlation that doesn’t exist at inference time.
You need the value of num_transactions_last_hour known exactly at 2023-10-05 14:29:59.
The BigQuery Solution
Vertex AI Feature Store leverages BigQuery’s ability to perform efficient ASOF JOIN logic. When you use the SDK to batch_serve_to_bq, it generates a complex SQL query under the hood.
For architects building custom SQL pipelines, the logic looks like this using BigQuery’s window functions:
/*
Manual Implementation of Point-in-Time Correctness
Use this if you are bypassing the Vertex SDK for custom ETL
*/
WITH observation_data AS (
-- Your labels (e.g., Transaction Log)
SELECT user_id, transaction_time, is_fraud
FROM `raw.transactions`
),
feature_history AS (
-- Your feature updates
SELECT user_id, feature_timestamp, account_balance
FROM `features.user_balance_updates`
)
SELECT
obs.user_id,
obs.transaction_time,
obs.is_fraud,
-- Get the last known balance strictly BEFORE the transaction
(
SELECT account_balance
FROM feature_history fh
WHERE fh.user_id = obs.user_id
AND fh.feature_timestamp <= obs.transaction_time
ORDER BY fh.feature_timestamp DESC
LIMIT 1
) as pit_account_balance
FROM observation_data obs
The Vertex AI SDK abstracts this. It takes a list of entities and timestamps, and creates a temporary table in BigQuery, performs the join, and exports the result.
Architectural Tip: For massive datasets (billions of rows), the ASOF JOIN can be computationally expensive. Ensure your feature tables are clustered by Entity ID to prevent BigQuery from shuffling petabytes of data during this join.
5.3.5. Streaming Ingestion: The Real-Time Path
For features that change second-by-second (e.g., “Clicks in the last minute”), the scheduled BigQuery sync is too slow. We need a streaming path.
In the Next Gen architecture, streaming is treated as High-Frequency BigQuery Ingestion.
The Pipeline Topology
- Source: Application emits events to Cloud Pub/Sub.
- Processing: Cloud Dataflow (Apache Beam) aggregates the stream (e.g., tumbling window count).
- Storage: Dataflow writes to BigQuery using the Storage Write API.
- Sync: The Feature Online Store is configured to sync continuously or the application writes directly to the Online Store (if using the Optimized backend).
Wait, does it write to BigQuery or the Online Store?
In the purest Next-Gen implementation, you write to BigQuery. Why? Because the Feature Store sync process monitors the BigQuery table. However, there is a latency lag (minutes).
Ultra-Low Latency Streaming (The “write-back” pattern)
If you need sub-second freshness (data available for inference 100ms after the event), you cannot wait for the BigQuery sync.
You must Dual-Write:
- Dataflow writes to BigQuery (for offline training/logging).
- Dataflow writes directly to the FeatureOnlineStore Serving Endpoint using the
write_feature_valuesAPI.
# Python snippet for real-time feature injection (Serving path)
from google.cloud import aiplatform
aiplatform.init(location="us-central1")
my_store = aiplatform.FeatureOnlineStore("omni_recsys_store")
my_view = my_store.feature_views["user_features_view"]
# Write immediately to the online serving layer
my_view.write_feature_values(
entity_id="user_123",
feature_values={
"click_count_1min": 42,
"last_click_ts": "2023-10-27T10:00:01Z"
}
)
Warning: This creates a potential Training-Serving Skew. If your Dataflow logic for writing to the Online Store differs slightly from the logic writing to BigQuery (or if one fails), your inference data will diverge from your training data.
5.3.6. Vector Embeddings and RAG Integration
GCP treats Vector Embeddings as first-class citizens in the Feature Store. This is a significant differentiator from AWS (where vectors are often relegated to OpenSearch).
Structuring the Embedding Feature
In BigQuery, an embedding is just an ARRAY<FLOAT64>.
CREATE TABLE `features.product_embeddings` (
product_id STRING,
feature_timestamp TIMESTAMP,
description_embedding ARRAY<FLOAT64>, -- 768-dim vector
category STRING
)
Configuring Vector Search
When defining the FeatureView, you enable embedding management. The Feature Store will automatically index these vectors using ScaNN (Scalable Nearest Neighbors), the same algorithm powering Google Search.
feature_view = my_store.create_feature_view(
name="product_embedding_view",
source=bigquery_source,
# Enable Vector Search
embedding_management_config=aiplatform.gapic.EmbeddingManagementConfig(
enabled=True,
dimension=768
)
)
The Retrieval Workflow
At inference time, you can query by ID or by Vector similarity.
- Fetch:
get_feature_values("product_55")-> Returns the vector. - Search:
search_nearest_entities(embedding=[0.1, ...])-> Returns similar product IDs.
This unifies the Feature Store and the Vector Database into a single architectural component. For RAG architectures, this simplifies the stack immensely: the same system that provides the LLM with context (features) also performs the retrieval.
5.3.7. The Sync: Online/Offline Consistency Management
The synchronization process (“Materialization”) is the heartbeat of the system.
The cron Schedule
You define a sync schedule (e.g., hourly, daily).
- Full Sync: Overwrites the entire Online Store with the latest snapshot from BigQuery. Safe but expensive for large tables.
- Delta Sync: Since BigQuery tables are partitioned, the Feature Store engine is smart enough to query only the partitions modified since the last sync.
Monitoring the Sync
Sync jobs are standard GCP operations. You must monitor them via Cloud Logging.
Key Metric: aiplatform.googleapis.com/feature_view/sync/latency
If this spikes, your BigQuery table might be growing too large, or your BigQuery slots are exhausted.
Handling “Ghosts” (Deletions)
If a user is deleted from BigQuery, does they disappear from the Online Store?
- Full Sync: Yes, eventually.
- Delta Sync: No. The deletion in BigQuery is a “state of absence.” The Feature Store needs an explicit signal.
- Mitigation: You must handle TTL (Time To Live) in the Online Store configuration, or explicitly write “tombstone” records if using the write-back API.
5.3.8. Performance Tuning & Cost (FinOps)
The Vertex AI Feature Store billing model is composite. You pay for:
- BigQuery Storage & Compute: Storing the offline features and running the sync queries.
- Feature Store Node Allocation: The hourly cost of the Online Store nodes (Bigtable or Optimized).
- Data Processing: Costs associated with syncing.
The BigQuery Trap
The sync process runs a SQL query. If your feature view is SELECT * FROM huge_table, and you run it every 5 minutes, you will burn through thousands of BigQuery slots.
Optimization 1: Projected Columns Only select the columns you actually need for inference.
-- Bad
SELECT * FROM users;
-- Good
SELECT user_id, timestamp, age, geo_hash FROM users;
Optimization 2: Sync Frequency vs. Freshness Do not sync hourly if the features only change weekly. Align the sync schedule with the upstream ETL schedule. If dbt runs at 2 AM, schedule the Feature Store sync for 3 AM.
Feature Store Node Sizing
For the Bigtable backend:
- Start with 1 node per 1,000 QPS (approximate rule of thumb, highly dependent on payload size).
- Use standard Bigtable monitoring (CPU utilization) to tune.
- Enable Autoscaling. Set
min_nodesto cover your baseline traffic andmax_nodesto handle marketing spikes.
For the Optimized backend:
- Pricing is based on node hours and data volume.
- Since data is loaded into memory, cost scales with dataset size.
- Calculus: If you have 10TB of sparse features, Bigtable is cheaper. If you have 10GB of dense embeddings requiring vector search, Optimized is better.
5.3.9. Comparison: AWS vs. GCP
To close the chapter, let’s contrast the two major cloud approaches.
| Feature | AWS SageMaker Feature Store | GCP Vertex AI Feature Store (Next Gen) |
|---|---|---|
| Offline Storage | S3 (Iceberg/Parquet) | BigQuery |
| Online Storage | DynamoDB (Managed) | Bigtable or Optimized Memory |
| Ingestion | PutRecord API (Push) | SQL Sync (Pull) or Streaming |
| Point-in-Time | Requires Spark/Athena processing | Native SQL (ASOF JOIN) |
| Vector Search | Via OpenSearch integration | Native (ScaNN) |
| Philosophy | Storage Container | Data Virtualization |
| Latency | Low (DynamoDB speeds) | Low (Bigtable speeds) |
| DevEx | Python/Boto3 heavy | SQL/Terraform heavy |
The Verdict for the Architect:
- Choose AWS if you want granular control over storage files (S3) and a unified Python SDK experience.
- Choose GCP if you are already heavily invested in BigQuery. The integration is seamless and significantly reduces the code footprint required to move data from warehouse to production.
5.3.11. Real-World Case Study: Recommendation System at Scale
Company: StreamFlix (anonymized video streaming platform)
Challenge: Personalized recommendations for 50M users, <50ms p99 latency, 100k recommendations/second peak.
Initial Architecture (Problems):
# Problem: Separate data warehouse and feature store
# Daily ETL: BigQuery → CSV → GCS → Feature Store (12 hour lag)
def daily_feature_sync():
"""Legacy approach with massive lag"""
# 1. Export from BigQuery (slow)
bq_client.extract_table(
'analytics.user_features',
'gs://exports/features.csv'
) # 2 hours for 50M rows
# 2. Transform CSV (slow)
df = pd.read_csv('gs://exports/features.csv') # 1 hour
# 3. Upload to Feature Store (slow)
for _, row in df.iterrows():
feature_store.write(row) # 8 hours for 50M rows
# Total: 11 hours lag
# Problem: User watched a show at 9am, recommendation still uses yesterday's data at 8pm
Optimized Architecture (BigQuery Native):
-- Step 1: Create optimized feature table directly in BigQuery
CREATE OR REPLACE TABLE `feature_engineering.user_viewing_features`
PARTITION BY DATE(feature_timestamp)
CLUSTER BY user_id
AS
SELECT
user_id,
CURRENT_TIMESTAMP() as feature_timestamp,
-- Viewing features (last 7 days)
COUNT(DISTINCT show_id) as shows_watched_7d,
SUM(watch_duration_sec) / 3600.0 as hours_watched_7d,
-- Genre preferences
APPROX_TOP_COUNT(genre, 5) as top_genres,
-- Time-of-day preference
CASE
WHEN EXTRACT(HOUR FROM watch_timestamp) BETWEEN 6 AND 12 THEN 'morning'
WHEN EXTRACT(HOUR FROM watch_timestamp) BETWEEN 12 AND 18 THEN 'afternoon'
WHEN EXTRACT(HOUR FROM watch_timestamp) BETWEEN 18 AND 23 THEN 'evening'
ELSE 'night'
END as preferred_time_slot,
-- Engagement metrics
AVG(rating) as avg_rating,
COUNT(*) FILTER(WHERE completed = true) / COUNT(*) as completion_rate,
-- Recency
TIMESTAMP_DIFF(CURRENT_TIMESTAMP(), MAX(watch_timestamp), HOUR) as hours_since_last_watch
FROM
`raw.viewing_events`
WHERE
watch_timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
GROUP BY
user_id;
-- Step 2: Create Feature View pointing to this table
# Python: Register with Vertex AI Feature Store
from google.cloud import aiplatform
aiplatform.init(project='streamflix-prod', location='us-central1')
# Create Feature Online Store (Bigtable backend)
online_store = aiplatform.FeatureOnlineStore.create(
name='streaming-recommendations',
bigtable=aiplatform.gapic.FeatureOnlineStore.Bigtable(
auto_scaling=aiplatform.gapic.FeatureOnlineStore.Bigtable.AutoScaling(
min_node_count=10,
max_node_count=100,
cpu_utilization_target=70
)
)
)
# Create Feature View
feature_view = online_store.create_feature_view(
name='user_viewing_features',
big_query_source=aiplatform.gapic.FeatureView.BigQuerySource(
uri='bq://streamflix-prod.feature_engineering.user_viewing_features',
entity_id_columns=['user_id']
),
sync_config=aiplatform.gapic.FeatureView.SyncConfig(
cron='*/15 * * * *' # Sync every 15 minutes
)
)
Results:
- Freshness: 12 hours → 15 minutes (98% improvement)
- Infrastructure cost: $45k/month → $18k/month (60% reduction, no ETL jobs)
- Recommendation CTR: 12.3% → 14.8% (fresher data = better recs)
- Development velocity: Feature deployment 3 days → 4 hours (just update SQL)
Key Lesson: By treating BigQuery as the Feature Store, eliminated entire ETL pipeline and associated lag.
5.3.12. Advanced Pattern: Real-Time Feature Augmentation
For features that change second-by-second, batch sync isn’t enough:
from google.cloud import pubsub_v1, aiplatform
from apache_beam import Pipeline, DoFn
from apache_beam.options.pipeline_options import PipelineOptions
class ComputeRealTimeFeatures(DoFn):
"""
Dataflow pipeline: Pub/Sub → Features → Dual Write
"""
def setup(self):
self.bq_client = bigquery.Client()
self.feature_store = aiplatform.FeatureOnlineStore('streaming-recommendations')
self.feature_view = self.feature_store.get_feature_view('user_realtime_features')
def process(self, element):
"""Process each event"""
event = json.loads(element)
user_id = event['user_id']
# Compute windowed features (last 5 minutes)
features = self.compute_windowed_features(user_id, window_minutes=5)
# Dual write pattern
# 1. Write to BigQuery (source of truth, for training)
self.bq_client.insert_rows_json(
'feature_engineering.user_realtime_features',
[{
'user_id': user_id,
'feature_timestamp': datetime.now().isoformat(),
'clicks_last_5min': features['clicks_last_5min'],
'watches_last_5min': features['watches_last_5min']
}]
)
# 2. Write to Online Store (for inference)
self.feature_view.write_feature_values(
entity_id=user_id,
feature_values=features
)
return [features]
# Define Dataflow pipeline
def run_streaming_pipeline():
options = PipelineOptions(
project='streamflix-prod',
runner='DataflowRunner',
streaming=True,
region='us-central1',
temp_location='gs://temp/dataflow'
)
with Pipeline(options=options) as pipeline:
(pipeline
| 'Read from Pub/Sub' >> beam.io.ReadFromPubSub(
topic='projects/streamflix-prod/topics/user-events'
)
| 'Compute Features' >> beam.ParDo(ComputeRealTimeFeatures())
| 'Log Results' >> beam.Map(lambda x: logging.info(f"Processed: {x}"))
)
5.3.13. Cost Optimization: BigQuery Slot Management
BigQuery costs can explode if not managed carefully:
Problem: Uncapped Slot Usage
-- This query scans 500GB and uses 2000 slots for 30 minutes
-- Cost: 500GB * $5/TB = $2.50 (scan) + slot time
-- But if run hourly: $2.50 * 24 * 30 = $1,800/month just for this one query!
SELECT
user_id,
COUNT(*) as events,
-- Expensive: Full table scan without partition filter
AVG(watch_duration) as avg_duration
FROM `raw.viewing_events`
GROUP BY user_id;
Solution 1: Partition Pruning
-- Optimized: Only scan last 7 days
SELECT
user_id,
COUNT(*) as events,
AVG(watch_duration) as avg_duration
FROM `raw.viewing_events`
WHERE DATE(event_timestamp) >= CURRENT_DATE() - 7 -- Partition filter!
GROUP BY user_id;
-- Cost: 7 days of data = ~50GB * $5/TB = $0.25 (92% savings)
Solution 2: Materialized Views
-- Create materialized view (incremental refresh)
CREATE MATERIALIZED VIEW `feature_engineering.user_stats_mv`
PARTITION BY DATE(last_updated)
AS
SELECT
user_id,
COUNT(*) as total_events,
AVG(watch_duration) as avg_duration,
CURRENT_TIMESTAMP() as last_updated
FROM `raw.viewing_events`
WHERE DATE(event_timestamp) >= CURRENT_DATE() - 7
GROUP BY user_id;
-- Query the MV (automatically maintained by BigQuery)
SELECT * FROM `feature_engineering.user_stats_mv`;
-- Cost: Only pays for incremental updates, not full recompute
-- Savings: ~95% compared to full query
Solution 3: Slot Reservations
# Reserve slots for predictable cost
from google.cloud import bigquery_reservation_v1
client = bigquery_reservation_v1.ReservationServiceClient()
# Create reservation: 1000 slots at $0.04/slot-hour
reservation = client.create_reservation(
parent='projects/streamflix-prod/locations/us-central1',
reservation=bigquery_reservation_v1.Reservation(
name='ml-feature-store',
slot_capacity=1000,
ignore_idle_slots=False
)
)
# Assign to specific project
assignment = client.create_assignment(
parent=reservation.name,
assignment=bigquery_reservation_v1.Assignment(
job_type='QUERY',
assignee='projects/streamflix-prod'
)
)
# Cost: 1000 slots * $0.04/hr * 730 hrs/month = $29,200/month (flat rate)
# Compare to on-demand spikes: $50k-80k/month
# Savings: ~$25k/month with predictable billing
5.3.14. Monitoring and Alerting
Custom Metrics for Feature Store Health:
from google.cloud import monitoring_v3
import time
def publish_feature_metrics():
"""
Publish custom metrics to Cloud Monitoring
"""
client = monitoring_v3.MetricServiceClient()
project_name = f"projects/streamflix-prod"
# Metric 1: Feature freshness
freshness_query = """
SELECT
user_id,
TIMESTAMP_DIFF(CURRENT_TIMESTAMP(), feature_timestamp, MINUTE) as staleness_minutes
FROM `feature_engineering.user_viewing_features`
WHERE feature_timestamp < TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 MINUTE)
LIMIT 1
"""
result = bq_client.query(freshness_query).result()
max_staleness = max([row.staleness_minutes for row in result]) if result.total_rows > 0 else 0
# Write to Cloud Monitoring
series = monitoring_v3.TimeSeries()
series.metric.type = 'custom.googleapis.com/feature_store/staleness_minutes'
series.resource.type = 'global'
point = monitoring_v3.Point()
point.value.int64_value = max_staleness
point.interval.end_time.seconds = int(time.time())
series.points = [point]
client.create_time_series(name=project_name, time_series=[series])
# Metric 2: Feature completeness
completeness_query = """
SELECT
COUNTIF(shows_watched_7d IS NOT NULL) / COUNT(*) * 100 as completeness_pct
FROM `feature_engineering.user_viewing_features`
"""
result = bq_client.query(completeness_query).result()
completeness = list(result)[0].completeness_pct
series = monitoring_v3.TimeSeries()
series.metric.type = 'custom.googleapis.com/feature_store/completeness_percent'
series.resource.type = 'global'
point = monitoring_v3.Point()
point.value.double_value = completeness
point.interval.end_time.seconds = int(time.time())
series.points = [point]
client.create_time_series(name=project_name, time_series=[series])
# Create alert policy
def create_staleness_alert():
"""Alert if features are stale"""
alert_client = monitoring_v3.AlertPolicyServiceClient()
policy = monitoring_v3.AlertPolicy(
display_name='Feature Store Staleness Alert',
conditions=[
monitoring_v3.AlertPolicy.Condition(
display_name='Features older than 30 minutes',
condition_threshold=monitoring_v3.AlertPolicy.Condition.MetricThreshold(
filter='metric.type="custom.googleapis.com/feature_store/staleness_minutes"',
comparison=monitoring_v3.ComparisonType.COMPARISON_GT,
threshold_value=30,
duration=monitoring_v3.Duration(seconds=300)
)
)
],
notification_channels=['projects/streamflix-prod/notificationChannels/123'],
alert_strategy=monitoring_v3.AlertPolicy.AlertStrategy(
auto_close=monitoring_v3.Duration(seconds=3600)
)
)
alert_client.create_alert_policy(
name='projects/streamflix-prod',
alert_policy=policy
)
5.3.15. Advanced: Multi-Tenant Feature Store
For organizations serving multiple business units:
class MultiTenantFeatureStore:
"""
Isolate features by tenant using separate BigQuery datasets
"""
def __init__(self):
self.bq_client = bigquery.Client()
self.tenant_configs = {
'tenant_a': {
'dataset': 'features_tenant_a',
'online_store': 'tenant-a-store',
'billing_project': 'tenant-a-billing'
},
'tenant_b': {
'dataset': 'features_tenant_b',
'online_store': 'tenant-b-store',
'billing_project': 'tenant-b-billing'
}
}
def get_features(self, tenant_id, user_id, feature_list):
"""Retrieve features for specific tenant"""
config = self.tenant_configs[tenant_id]
# Query tenant-specific dataset
query = f"""
SELECT {', '.join(feature_list)}
FROM `{config['dataset']}.user_features`
WHERE user_id = @user_id
"""
job_config = bigquery.QueryJobConfig(
query_parameters=[
bigquery.ScalarQueryParameter('user_id', 'STRING', user_id)
],
# Bill to tenant's project
default_dataset=f"{config['billing_project']}.{config['dataset']}"
)
result = self.bq_client.query(query, job_config=job_config).result()
return list(result)[0] if result.total_rows > 0 else None
def create_tenant_feature_table(self, tenant_id, schema):
"""Provision feature table for new tenant"""
config = self.tenant_configs[tenant_id]
table_id = f"{config['billing_project']}.{config['dataset']}.user_features"
table = bigquery.Table(table_id, schema=schema)
table.time_partitioning = bigquery.TimePartitioning(
type_=bigquery.TimePartitioningType.DAY,
field='feature_timestamp'
)
table.clustering_fields = ['user_id']
table = self.bq_client.create_table(table)
# Create corresponding Feature View
online_store = aiplatform.FeatureOnlineStore(config['online_store'])
feature_view = online_store.create_feature_view(
name='user_features',
big_query_source=aiplatform.gapic.FeatureView.BigQuerySource(
uri=f'bq://{table_id}',
entity_id_columns=['user_id']
),
sync_config=aiplatform.gapic.FeatureView.SyncConfig(
cron='0 * * * *'
)
)
return feature_view
5.3.16. Performance Tuning
Bigtable Node Sizing:
def calculate_bigtable_nodes(qps, avg_row_size_kb, target_latency_ms):
"""
Size Bigtable cluster for Feature Store workload
"""
# Bigtable capacity: ~10k QPS per node (for < 10KB rows)
# Latency: <10ms for properly sized cluster
nodes_for_qps = qps / 10000
# Storage consideration (if dataset is large)
# Bigtable: 2.5TB storage per node (SSD)
total_rows = 50_000_000 # 50M users
total_storage_gb = (total_rows * avg_row_size_kb) / 1024
nodes_for_storage = total_storage_gb / (2500) # 2.5TB per node
# Take max
required_nodes = max(nodes_for_qps, nodes_for_storage)
# Add 20% buffer
recommended_nodes = int(required_nodes * 1.2)
print(f"QPS: {qps}")
print(f"Avg row size: {avg_row_size_kb}KB")
print(f"Nodes for QPS: {nodes_for_qps:.1f}")
print(f"Nodes for storage: {nodes_for_storage:.1f}")
print(f"Recommended nodes: {recommended_nodes}")
# Cost calculation
cost_per_node_hour = 0.65 # us-central1 SSD
monthly_cost = recommended_nodes * cost_per_node_hour * 730
print(f"Estimated cost: ${monthly_cost:,.0f}/month")
return recommended_nodes
# Example
calculate_bigtable_nodes(qps=100000, avg_row_size_kb=5, target_latency_ms=10)
# Output:
# QPS: 100000
# Avg row size: 5KB
# Nodes for QPS: 10.0
# Nodes for storage: 0.1
# Recommended nodes: 12
# Estimated cost: $5,694/month
5.3.17. Disaster Recovery
def setup_cross_region_dr():
"""
Multi-region disaster recovery setup
"""
# 1. BigQuery: Enable cross-region dataset replication
dataset_ref = bq_client.dataset('feature_engineering')
dataset = bq_client.get_dataset(dataset_ref)
# Copy to EU for DR
eu_dataset = bigquery.Dataset('project-id.feature_engineering_eu')
eu_dataset.location = 'EU'
eu_dataset = bq_client.create_dataset(eu_dataset)
# Schedule daily copy job
transfer_config = bigquery_datatransfer.TransferConfig(
destination_dataset_id='feature_engineering_eu',
display_name='Daily Feature DR Sync',
data_source_id='cross_region_copy',
schedule='every day 03:00',
params={
'source_dataset_id': 'feature_engineering',
'source_project_id': 'project-id'
}
)
# 2. Bigtable: Create backup
from google.cloud import bigtable
client = bigtable.Client(project='streamflix-prod', admin=True)
instance = client.instance('feature-store-instance')
cluster = instance.cluster('feature-store-cluster')
table = instance.table('user_features')
backup_id = f'backup-{datetime.now().strftime("%Y%m%d")}'
expire_time = datetime.now() + timedelta(days=7)
backup = cluster.backup(backup_id, table=table, expire_time=expire_time)
operation = backup.create()
operation.result(timeout=3600) # Wait up to 1 hour
print(f"Backup created: {backup_id}")
5.3.18. Best Practices
- Partition Everything: Always partition BigQuery tables by date
- Cluster by Entity: Cluster on user_id/entity_id for fast lookups
- Use Materialized Views: For frequently computed aggregations
- Reserve Slots: For predictable costs and guaranteed performance
- Monitor Freshness: Alert if sync lag exceeds SLA
- Dual Write Carefully: Ensure consistency between BigQuery and Online Store
- Test Point-in-Time: Verify no data leakage in training data
- Size Bigtable Properly: Don’t under-provision (latency) or over-provision (cost)
- Enable Backups: Daily Bigtable backups and cross-region BigQuery copies
- Document Schema: Every feature needs clear definition and owner
5.3.19. Troubleshooting Guide
| Issue | Symptoms | Solution |
|---|---|---|
| High BigQuery costs | Bill >$10k/month | Add partition filters, use materialized views, reserve slots |
| Stale features | Sync lag >30min | Check Dataflow pipeline, increase sync frequency |
| High Bigtable latency | p99 >50ms | Add nodes, check hotspotting, optimize row key |
| Sync failures | Features not appearing | Check service account permissions, verify BigQuery table exists |
| Out of memory | Dataflow pipeline crashes | Increase worker machine type, reduce batch size |
| Inconsistent features | Training vs inference mismatch | Verify same BigQuery query, check write_feature_values calls |
5.3.20. Exercises
Exercise 1: Cost Optimization Analyze your current BigQuery queries:
- Identify queries scanning >100GB
- Add partition filters
- Measure cost savings
Exercise 2: Latency Benchmarking Compare retrieval latency:
- Direct BigQuery query: ? ms
- Bigtable Online Store: ? ms
- Memorystore cache + Bigtable: ? ms
Exercise 3: Disaster Recovery Implement and test:
- BigQuery cross-region copy
- Bigtable backup/restore
- Measure RTO and RPO
Exercise 4: Monitoring Dashboard Create Cloud Monitoring dashboard:
- Feature freshness
- BigQuery slot utilization
- Bigtable node CPU
- Sync success rate
Exercise 5: Point-in-Time Verification Write test ensuring:
- Training data has correct timestamps
- No future information leaks
- Features match inference
5.3.21. Summary
The Vertex AI Feature Store represents the modern “Data-Centric AI” philosophy. By collapsing the distinction between the Data Warehouse and the ML Offline Store, it removes a massive synchronization headache.
Key Advantages:
- Zero-Copy Architecture: BigQuery IS the Offline Store
- SQL-First: Feature engineering is just SQL queries
- Native Integration: Seamless with Vertex AI Pipelines
- Flexible Storage: Bigtable (scale) or Optimized (vectors)
Cost Model:
- BigQuery: $5/TB scanned OR reserved slots
- Bigtable: $0.65/node-hour (~$474/node-month)
- Data processing: Dataflow worker costs
Best For:
- Organizations heavy on BigQuery
- SQL-proficient teams
- Need for real-time and batch features
- Vector search requirements
Challenges:
- Requires BigQuery/SQL expertise
- Slot management complexity
- Dual-write consistency for streaming
- Limited offline storage format control
Critical Success Factors:
- Partition and cluster BigQuery tables properly
- Use materialized views for expensive computations
- Reserve slots for cost predictability
- Monitor freshness continuously
- Size Bigtable appropriately for QPS
However, it shifts the complexity to SQL and BigQuery Optimization. The MLOps engineer on GCP must effectively be a DBA. They must understand partitioning, clustering, and slot utilization.
In the next chapter, we will leave the world of managed services and explore the open-source alternative: deploying Feast on Kubernetes, for those who require total control or multi-cloud portability.