39.2. Cold-Start Strategies

Status: Draft Version: 1.0.0 Tags: #RecSys, #ColdStart, #Bandits, #Rust, #Redis Author: MLOps Team

The Zero-Data Problem
Taxonomy of Cold Start
Technique 1: Heuristic Ladders & Onboarding
Technique 2: Content-Based Hybrids (DropoutNet)
Technique 3: Multi-Armed Bandits (MAB)
Rust Implementation: Thompson Sampling Bandit
Python Simulation: Greedy vs Thompson
Architecture: The Dual-Track Serving Pattern
Infrastructure: Redis State Management
Troubleshooting: Bandit convergence
MLOps Interview Questions
Glossary
Summary Checklist

Prerequisites

Before diving into this chapter, ensure you have the following installed:

Rust: 1.70+ (rand, statrs, redis crates)
Redis: Local instance or Docker (docker run -p 6379:6379 redis)
Python: For bandit simulation comparison.

The Zero-Data Problem

Collaborative Filtering (Matrix Factorization) relies on the intersection of Users and Items. $$ \hat{r}_{ui} = \mathbf{u}_u \cdot \mathbf{v}_i $$

If User $u$ is new, $\mathbf{u}_u$ is a random vector. Prediction is garbage. If Item $i$ is new, $\mathbf{v}_i$ is a random vector. Prediction is garbage.

This is the Cold Start problem. It is the #1 killer of new products. If a user’s first 5 minutes (the onboarding session) are bad, they churn forever (“Day-1 Retention”).

Case Study: TikTok’s New User Feed

TikTok solves the User Cold Start problem not by asking “What topics do you like?” but by Rapid Bandit Exploration.

Immerse: Show 8 random high-quality viral videos from different clusters (Pets, Comedy, DIY, Dance).
Measure: Track “Watch Time” and “Re-watch” signal.
Converge: Within 5 minutes (30 videos), the bandit has narrowed the distribution to 2 clusters.
Result: The algorithm learns the user vector $\mathbf{u}_u$ faster than the user could type “I like cats”.

Taxonomy of Cold Start

1. New User Cold Start

A user signs up. We know nothing about them.

Data Available: IP Address (Geo), Device Type, Referral Source, Time of Day.
Strategy: Popularity Baseline, Demographic Targeting, Onboarding Quiz.

2. New Item Cold Start

A new video is uploaded. No one has watched it.

Data Available: Content (Video Frames, Audio, Text metadata).
Strategy: Content-Based Filtering (Embeddings), Bandits (Explore).

3. System Cold Start

You launch a completely new app. NO users, NO interaction history.

Strategy: Rule-based, Curated lists, “Fake it till you make it” (Simulated data).

Technique 1: Heuristic Ladders & Onboarding

Do not overengineer ML for the first 10 seconds. Use a Heuristic Ladder.

The Logic Fallback:

Personalized CF: Have >= 10 interactions? -> Use Deep Model.
Near-Cold CF: Have >= 1 interaction? -> Item-to-Item Similarity on that 1 item.
Contextual Heuristic: No history? -> “Trending in your City (GeoIP)”.
Global Heuristic: Geo lookup failed? -> “Trending Globally (Last 1hr)”.
Fail-Safe: Redis down? -> Hardcoded “Editor’s Picks”.

“Select 3 topics you like.” This seeds the user vector $\mathbf{u}{new}$ with the average of the selected topics’ centroids. $$ \mathbf{u}{new} = \frac{1}{|S|} \sum_{topic \in S} \mathbf{v}_{topic} $$

Technique 2: Content-Based Hybrids (DropoutNet)

Standard Deep Recommendation models (Two-Tower) learn embeddings for UserID and ItemID. For new items, ItemID embedding is useless. We must rely on Content Embeddings (BERT for text, ResNet for images).

DropoutNet Trick: During training, we simulate cold start.

For a batch of interactions, randomly “dropout” the input User/Item ID embeddings.
Force the network to rely only on the Content Embeddings for those samples.
Inference:
- Warm Item: Use ID Embedding + Content Embedding.
- Cold Item: Use Content Embedding (Network is robust to missing ID).

Technique 3: Multi-Armed Bandits (MAB)

For New Items, we treat them as slot machines. We want to find the “winning” items (high CTR) quickly, while minimizing the cost of showing “losing” items.

Algorithms

Epsilon-Greedy: 10% of time, show random new item. Slow convergence.
UCB1 (Upper Confidence Bound): $\mu + \sqrt{\frac{2 \ln N}{n_i}}$. Optimism in the face of uncertainty. Prefer items with high variance (less data).
Thompson Sampling: Sample from the posterior distribution. State-of-the-art for production.

Rust Implementation: Thompson Sampling Bandit

We model the Click-Through Rate (CTR) of an item as a Beta Distribution ($\alpha, \beta$).

$\alpha$: Successes (Clicks) + 1
$\beta$: Failures (No-Clicks) + 1
Mean: $\frac{\alpha}{\alpha + \beta}$

For each request, we sample a value from $Beta(\alpha_i, \beta_i)$ for every candidate item, and sort by the sampled value. Items with less data have wider distributions, so they have a chance to sample a high value (Exploration).

Project Structure

bandit-server/
├── Cargo.toml
└── src/
    └── lib.rs

Cargo.toml:

[package]
name = "bandit-server"
version = "0.1.0"
edition = "2021"

[dependencies]
rand = "0.8"
rand_distr = "0.4" # Contains Beta distribution
statrs = "0.16"    # Statistics
redis = "0.23"     # State management
serde = { version = "1.0", features = ["derive"] }

src/lib.rs:

#![allow(unused)]
fn main() {
//! Thompson Sampling Bandit implementation.
//! Provides a stateful abstraction for managing exploration/exploitation.

use rand::distributions::Distribution;
use rand_distr::Beta;
use std::collections::HashMap;

/// Represents a single arm (Item) in the bandit.
#[derive(Clone, Debug)]
pub struct BanditArm {
    pub id: String,
    pub clicks: u64,
    pub impressions: u64,
}

impl BanditArm {
    pub fn new(id: &str) -> Self {
        Self {
            id: id.to_string(),
            clicks: 0,
            impressions: 0,
        }
    }

    /// Sample a score from the Beta posterior distribution.
    /// Beta(alpha, beta) where alpha = clicks + 1, beta = misses + 1.
    pub fn sample_score(&self) -> f64 {
        // Add 1.0 prior for Laplace Smoothing
        let alpha = 1.0 + self.clicks as f64;
        let beta_param = 1.0 + (self.impressions - self.clicks) as f64;
        
        let beta_dist = Beta::new(alpha, beta_param).unwrap();
        let mut rng = rand::thread_rng();
        
        // This is the core magic of Thompson Sampling
        // If we have little data, the variance is high, so we might sample a high score.
        // If we have lots of data and low CTR, variance is low, sample will be consistently low.
        beta_dist.sample(&mut rng)
    }

    pub fn update(&mut self, clicked: bool) {
        self.impressions += 1;
        if clicked {
            self.clicks += 1;
        }
    }
}

pub struct ThompsonSampler {
    arms: HashMap<String, BanditArm>,
}

impl ThompsonSampler {
    pub fn new() -> Self {
        Self { arms: HashMap::new() }
    }

    pub fn add_arm(&mut self, id: &str) {
        self.arms.insert(id.to_string(), BanditArm::new(id));
    }

    /// Select the best arm to show by sampling all posteriors
    /// O(N) operation per request. For large N, use Hierarchical Bandits.
    pub fn select_arm(&self) -> Option<String> {
        if self.arms.is_empty() {
            return None;
        }

        let mut best_arm = None;
        let mut best_score = -1.0;

        for (id, arm) in &self.arms {
            let score = arm.sample_score();
            if score > best_score {
                best_score = score;
                best_arm = Some(id.clone());
            }
        }
        
        best_arm
    }
}
}

Python Simulation: Greedy vs Thompson

Just to prove Rust implementation is correct, here is the simulation logic in Python for debugging. You can plot the Regret bounds.

import numpy as np

class Bandit:
    def __init__(self, p):
        self.p = p # True probability
        self.alpha = 1
        self.beta = 1

    def pull(self):
        return np.random.random() < self.p

    def sample(self):
        return np.random.beta(self.alpha, self.beta)

    def update(self, x):
        self.alpha += x
        self.beta += (1 - x)

# Simulation
bandits = [Bandit(0.1), Bandit(0.5), Bandit(0.8)]
rewards = []

for i in range(1000):
    # Thompson Sampling
    j = np.argmax([b.sample() for b in bandits])
    
    # Reward
    x = bandits[j].pull()
    rewards.append(x)
    
    # Update
    bandits[j].update(x)
    
print(f"Total Reward: {sum(rewards)}")

Architecture: The Dual-Track Serving Pattern

You cannot easily mix Bandits and Deep Learning in one prediction call. Use a Dual-Track architecture.

                  +---------------+
                  |  Request (User) |
                  +-------+-------+
                          |
          +---------------+---------------+
          |                               |
  +-------v-------+               +-------v-------+
  |  Warm Track   |               |   Cold Track  |
  | (Vector DB)   |               |    (Bandit)   |
  |  Use: Faiss   |               |   Use: Redis  |
  +-------+-------+               +-------+-------+
          |                               |
          |  Top 50 Candidates (0.8)      | Top 10 New Candidates (0.5)
          +---------------+---------------+
                          |
                  +-------v-------+
                  |    Ranker     | <-- Blends & Interleaves
                  |   (XGBoost)   | "If user is eclectic, boost cold items"
                  +-------+-------+
                          |
                  +-------v-------+
                  |    Response   |
                  +---------------+

The Ranker is responsible for the final merge. It might learn that “New Users prefer Cold Track items” (Trends) while “Old Users prefer Warm Track”.

Infrastructure: Redis State Management

Bandits require Atomic Updates. Two users might click the same item at the same time. We use Redis HINCRBY.

Redis Schema:

Key: bandit:campaign_v1:item:123
Field: clicks -> Increment on click.
Field: impressions -> Increment on view.

#![allow(unused)]
fn main() {
// redis_bandit.rs
use redis::Commands;

pub fn update_bandit(con: &mut redis::Connection, item_id: &str, clicked: bool) {
    let key = format!("bandit:item:{}", item_id);
    
    let _: () = con.hincr("bandit:impressions", item_id, 1).unwrap();
    if clicked {
        let _: () = con.hincr("bandit:clicks", item_id, 1).unwrap();
    }
}
}

Troubleshooting: Bandit Convergence

Common issues when deploying Bandits:

Scenario 1: Bandit converges to sub-optimal arm

Cause: Early bad luck. The “Best” arm got 0 clicks in first 10 tries. The “OK” arm got 1 click. Thompson sampling thinks the “Best” arm is trash.
Fix: Ensure “Optimistic Initialization” (Assume everything starts with $\alpha=5, \beta=1$) or force minimum samples before updating posterior.

Scenario 2: Bandit keeps exploring forever

Cause: Click rates are very low (0.001). Beta(1, 1000) and Beta(2, 2000) overlap significantly.
Fix: Scale your rewards. Or accept that separating 0.1% CTR from 0.11% CTR takes millions of samples.

Scenario 3: Non-Stationary Trends

Cause: An item was good yesterday, bad today. The Bandit remembers history forever.
Fix: Time-Decay. Multiply $\alpha, \beta$ by 0.999 every hour. This “forgets” old history and keeps variance high.

MLOps Interview Questions

Q: How do you evaluate a Cold Start improved algorithm? A: You cannot use standard offline recall. You must use Online A/B Testing targeting only new users (Cohort Analysis). Metrics: Day-1 Retention, Session Length.
Q: Why not just use Item-Item similarity based on Content? A: It works, but “Visual Similarity” != “Semantic Similarity”. Just because two movies have red posters doesn’t mean they appeal to the same user. Bandits learn actual preference.
Q: What happens to the Bandit counters over time? A: They grow indefinitely. The variance shrinks to zero. The bandit stops exploring. Fix: Use a Sliding Window or verify “Time Decay” on the Beta distribution parameters ($\alpha \leftarrow \alpha \cdot 0.99$) to keep the system adaptable to trend changes.
Q: Explain DropoutNet. A: It’s a training technique to align ID embeddings and Content embeddings. By masking IDs during training, the content encoder is forced to learn to predict interactions, making it robust when IDs are missing at inference time.
Q: How do you handle “Fake” cold start (Bots)? A: Bots look like new users. If your Cold Start strategy is “Show Trending”, bots will crawl trending items. You need a Bot Detection layer before the Recommender.

Glossary

Cold Start: Prediction with sparse or missing history.
MAB (Multi-Armed Bandit): Algorithm balancing Exploration and Exploitation.
Thompson Sampling: Probability matching strategy using posterior sampling.
Content-Based Filtering: Using item metadata (text/image) instead of interaction graph.
Heuristic Ladder: Hierarchy of fallback strategies from complex to simple.

Summary Checklist

Exploration: Have a dedicated strategy (Bandits) for new items. Do not let them rot in the database.
Onboarding: Gather explicit signals (Tags/Topics) during signup to skip the cold phase.
Hybrid Models: Train models that accept both ID and Content features.
Decay: Implement time-decay on bandit statistics to handle non-stationary trends.
Fallback: Ensure your API never returns 500 or Empty list. Always have a “Global Popular” fallback.
Real-Time: Cold start requires Real-Time updates. If your bandit updates only once a day, you lose the “Viral” window.
Dual Track: Separate your serving logic. Don’t pollute your main Vector DB with untested items.
Monitoring: Track “Traffic % to Cold Items”. If it drops to 0%, your exploration mechanism is broken.
Diversity: Ensure your cold start items cover diverse categories, not just “Action Movies”.
Latency: Bandit sampling is fast ($O(K)$), but fetching content embeddings is slow. optimize accordingly.

The MLOps Omni-Reference