46.1. Federated Learning at Scale

The Decentralized Training Paradigm

As privacy regulations tighten (GDPR, CCPA, EU AI Act) and data gravity becomes a more significant bottleneck, the centralized training paradigm—where all data is moved to a central data lake for model training—is becoming increasingly untenable for certain classes of problems. Federated Learning (FL) represents a fundamental shift in MLOps, moving the compute to the data rather than the data to the compute.

In a Federated Learning system, a global model is trained across multiple decentralized edge devices or servers holding local data samples, without exchanging them. This approach addresses critical challenges in privacy, data security, and access rights, but it introduces a new set of massive operational complexities that MLOps engineers must solve.

The Core Architectural Components

A production-grade Federated Learning system consists of four primary architectural layers:

The Orchestration Server (The Coordinator): This is the central nervous system of the FL topology. It manages the training lifecycle, selects clients for participation, aggregates model updates, and manages the global model versioning.
The Client Runtime (The Edge): This is the software stack running on the remote device (smartphone, IoT gateway, hospital server, or cross-silo enterprise server). It is responsible for local training, validation, and communication with the coordinator.
The Aggregation Engine: The mathematical core that combines local model weights or gradients into a global update. This often involves complex secure multi-party computation (SMPC) protocols.
The Governance & Trust Layer: The security framework that ensures malicious clients cannot poison the model and that the coordinator cannot infer private data from the updates (Differential Privacy).

Federated Learning Topologies

There are two distinct topologies in FL, each requiring different MLOps strategies:

1. Cross-Silo Federated Learning

Context: A consortium of organizations (e.g., typically 2-100 banks or hospitals) collaborating to train a shared model.
Compute Resources: High-performance servers with GPUs/TPUs.
Connectivity: High bandwidth, reliable, always-on.
Data Partitioning: Often non-IID (Independent and Identically Distributed) but relatively stable.
State: Stateful clients.
MLOps Focus: Security, governance, auditability, and precise version control.

2. Cross-Device Federated Learning

Context: Training on millions of consumer devices (e.g., Android phones keypads, smart home assistants).
Compute Resources: severely constrained (mobile CPUs/NPU), battery limited.
Connectivity: Flaky, intermittent, WiFi-only constraints.
Data Partitioning: Highly non-IID, unbalanced.
State: Stateless clients (devices drop in and out).
MLOps Focus: Scalability, fault tolerance, device profiling, and over-the-air (OTA) efficiency.

Operational Challenges in FL

Communication Efficiency: Sending full model weights (e.g., a 7B LLM) to millions of devices is impossible. We need compression, varying dropout, and LoRA adapters.
System Heterogeneity: Clients have vastly different hardware. Stragglers (slow devices) can stall the entire training round.
Statistical Heterogeneity: Data on one user’s phone is not representative of the population. This “client drift” causes the optimization to diverge.
Privacy Attacks: “Model Inversion” attacks can reconstruct training data from gradients. “Membership Inference” can determine if a specific user was in the training set.

46.1.1. Feature Engineering in a Federated World

In centralized ML, feature engineering is a batch process on a data lake. In FL, feature engineering must happen on the device, often in a streaming fashion, using only local context. This creates a “Feature Engineering Consistency” problem.

The Problem of Feature Skew

If the Android team implements a feature extraction logic for “time of day” differently than the iOS team, or differently than the server-side validator, the model will fail silently.

Solution: Portable Feature Definitions

We need a way to define features as code that can compile to multiple targets (Python for server, Java/Kotlin for Android, Swift for iOS, C++ for embedded).

Implementation Pattern: WASM-based Feature Stores

WebAssembly (WASM) is emerging as the standard for portable feature logic in FL.

#![allow(unused)]
fn main() {
// Rust implementation of a portable feature extractor compiling to WASM
// src/lib.rs

use wasm_bindgen::prelude::*;
use serde::{Serialize, Deserialize};

#[derive(Serialize, Deserialize)]
pub struct RawInput {
    pub timestamp_ms: u64,
    pub location_lat: f64,
    pub location_lon: f64,
    pub battery_level: f32,
}

#[derive(Serialize, Deserialize)]
pub struct FeatureVector {
    pub hour_of_day: u8,
    pub is_weekend: bool,
    pub battery_bucket: u8,
    pub location_hash: String,
}

#[wasm_bindgen]
pub fn extract_features(input_json: &str) -> String {
    let input: RawInput = serde_json::from_str(input_json).unwrap();
    
    // Feature Logic strictly versioned here
    let features = FeatureVector {
        hour_of_day: ((input.timestamp_ms / 3600000) % 24) as u8,
        is_weekend: is_weekend(input.timestamp_ms),
        battery_bucket: (input.battery_level * 10.0) as u8,
        location_hash: geohash::encode(
            geohash::Coord { x: input.location_lon, y: input.location_lat }, 
            5
        ).unwrap(),
    };

    serde_json::to_string(&features).unwrap()
}

fn is_weekend(ts: u64) -> bool {
    // Deterministic logic independent of device locale
    let day = (ts / 86400000) % 7;
    day == 0 || day == 6
}
}

This WASM binary is versioned in the Model Registry and deployed to all clients alongside the model weights. This guarantees that hour_of_day is calculated exactly the same way on a Samsung fridge as it is on an iPhone.

Federated Preprocessing Pipelines

Data normalization (e.g., Z-score scaling) requires global statistics (mean, variance) which no single client possesses.

The Two-Pass Approach:

Pass 1 (Statistics): The coordinator requests summary statistics (sum, sum_of_squares, count) from a random sample of clients. These are aggregated using Secure Aggregation to produce global mean and variance.
Pass 2 (Training): The coordinator broadcasts the global scaler (mean, std_dev) to clients. Clients use this to normalize local data before computing gradients.

# Conceptual flow for Federated Statistics with Tensorflow Federated (TFF)

import tensorflow_federated as tff

@tff.federated_computation(tff.type_at_clients(tf.float32))
def get_global_statistics(client_data):
    # Each client computes local sum and count
    local_stats = tff.federated_map(local_sum_and_count, client_data)
    
    # Securely aggregate to get global values
    global_stats = tff.federated_sum(local_stats)
    
    return global_stats

# The Coordinator runs this round first
global_mean, global_std = run_statistics_round(coordinator, client_selector)

# Then broadcasts for training
run_training_round(coordinator, client_selector, preprocessing_metadata={
    'mean': global_mean,
    'std': global_std
})

46.1.2. Cross-Silo Governance and Architecture

In Cross-Silo FL (e.g., between competing banks for fraud detection), trust is zero. The architecture must enforce that no raw data ever leaves the silo.

The “Sidecar” Architecture for FL Containers

A robust pattern for Cross-Silo FL is deploying a “Federated Sidecar” container into the partner’s Kubernetes cluster. This sidecar has limited egress permissions—it can only talk to the Aggregation Server, and only transmit encrypted gradients.

Reference Architecture: KubeFed for FL

# Kubernetes Deployment for a Federated Client Node
apiVersion: apps/v1
kind: Deployment
metadata:
  name: fl-client-bank-a
  namespace: federated-learning
spec:
  replicas: 1
  template:
    spec:
      containers:
        # The actual Training Container (The Worker)
        - name: trainer
          image: bank-a/fraud-model:v1.2
          volumeMounts:
            - name: local-data
              mountPath: /data
              readOnly: true
          # Network isolated - no egress
          securityContext:
            allowPrivilegeEscalation: false
            capabilities:
              drop: ["ALL"]
        
        # The FL Sidecar (The Communicator)
        - name: fl-sidecar
          image: federated-platform/sidecar:v2.0
          env:
            - name: AGGREGATOR_URL
              value: "grpcs://aggregator.federated-consortium.com:443"
            - name: CLIENT_ID
              value: "bank-a-node-1"
          # Only this container has egress
      
      volumes:
        - name: local-data
          persistentVolumeClaim:
            claimName: sensitive-financial-data

The Governance Policy Registry

We need a policy engine (like Open Policy Agent - OPA) to enforce rules on the updates.

Example Policy: Gradient Norm Clipping To prevent a malicious actor from overwhelming the global model with massive weights (a “model poisoning” attack), we enforce strict clipping norms.

# OPA Policy for FL Updates
package fl.governance

default allow = false

# Allow update if...
allow {
    valid_signature
    gradient_norm_acceptable
    differential_privacy_budget_ok
}

valid_signature {
    # Cryptographic check of the client's identity
    input.signature == crypto.verify(input.payload, input.cert)
}

gradient_norm_acceptable {
    # Prevent model poisoning by capping the L2 norm of the update
    input.metadata.l2_norm < 5.0
}

differential_privacy_budget_ok {
    # Check if this client has exhausted their "privacy budget" (epsilon)
    input.client_stats.current_epsilon < input.policy.max_epsilon
}

46.1.3. Secure Aggregation Protocols

Secure Aggregation ensures that the server never sees an individual client’s update in the clear. It only sees the sum of the updates.

One-Time Pad Masking (The Google Protocol)

The most common protocol (Bonawitz et al.) works by having pairs of clients exchange Diffie-Hellman keys to generate shared masking values.

Client u generates a random vector $r_u$.
Client u adds $r_u$ to their weights $w_u$.
For every pair $(u, v)$, they agree on a random seed $s_{uv}$.
If $u < v$, $u$ adds $PRG(s_{uv})$, else subtracts.
When the server sums everyone, all $PRG(s_{uv})$ cancel out, leaving $\sum w_u$.

MLOps Implication: If a client drops out during the protocol (which happens 20% of the time in mobile), the sum cannot be reconstructed. Recovery requires complex “secret sharing” (Shamir’s Secret Sharing) to reconstruct the masks of dropped users without revealing their data.

Homomorphic Encryption (HE)

A more robust but computationally expensive approach is Fully Homomorphic Encryption (FHE). The clients encrypt their weights $Enc(w_u)$. The server computes $Enc(W) = \sum Enc(w_u)$ directly on the ciphertexts. The server cannot decrypt the result; only a trusted key holder (or a committee holding key shares) can.

Hardware Acceleration for FHE: Running HE on CPUs is notoriously slow (1000x overhead). We are seeing the rise of “FHE Accelerators” (ASICs and FPGA implementations) specifically for this.

Integration with NVIDIA Flare: NVIDIA Flare offers a pluggable aggregations strategy.

# Custom Aggregator in NVIDIA Flare
from nvflare.apis.shareable import Shareable
from nvflare.app_common.abstract.aggregator import Aggregator

class HomomorphicAggregator(Aggregator):
    def __init__(self, he_context):
        self.he_context = he_context
        self.encrypted_sum = None

    def accept(self, shareable: Shareable, fl_ctx) -> bool:
        # Received encrypted weights
        enc_weights = shareable.get_encrypted_weights()
        
        if self.encrypted_sum is None:
            self.encrypted_sum = enc_weights
        else:
            # Homomorphic Addition: + operation on ciphertext
            self.encrypted_sum = self.he_context.add(
                self.encrypted_sum, 
                enc_weights
            )
        return True

    def aggregate(self, fl_ctx) -> Shareable:
        # Return the encrypted sum to the clients for distributed decryption
        return Shareable(data=self.encrypted_sum)

46.1.4. Update Compression and Bandwidth Optimization

In cross-device FL, bandwidth is the bottleneck. Uploading a 500MB ResNet model update from a phone over 4G is unacceptable.

Techniques for Bandwidth Reduction

Federated Dropout: Randomly remove 20-40% of neurons for each client. They train a sub-network and upload a smaller sparse vector.
Ternary Quantization: Quantize gradients to {-1, 0, 1}. This creates extreme compression (from 32-bit float to ~1.6 bits per parameter).
Golomb Coding: Entropy coding optimized for sparse updates.

Differential Privacy (DP) as a Service

DP adds noise to the gradients to mask individual contributions. This is often parameterized by $\epsilon$ (epsilon).

Local DP: Noise added on the device. High privacy, high utility loss.
Central DP: Noise added by the trusted aggregator.
Distributed DP: Noise added by shuffling or secure aggregation so the aggregator never sees raw values.

Managing The Privacy Budget In MLOps, $\epsilon$ is a resource like CPU or RAM. Each query to the data consumes budget. When the budget is exhausted, the data “locks.”

Tracking Epsilon in MLflow:

import mlflow

def log_privacy_metrics(round_id, used_epsilon, total_delta):
    mlflow.log_metric("privacy_epsilon", used_epsilon, step=round_id)
    mlflow.log_metric("privacy_delta", total_delta, step=round_id)
    
    if used_epsilon > MAX_EPSILON:
        alert_governance_team("Privacy Budget Exceeded")
        stop_training()

46.1.5. Tools of the Trade: The FL Ecosystem

Open Source Frameworks

Framework	Backer	Strength	Best For
TensorFlow Federated (TFF)	Google	Research, Simulation	Research verification of algorithms
PySyft	OpenMined	Privacy, Encryption	Heavy privacy requirements, healthcare
Flower (Flwr)	Independent	Mobile, Heterogeneous	Production deployment to iOS/Android
NVIDIA Flare	NVIDIA	Hospital/Medical Imaging	Cross-silo, HPC integration
FATE	WeBank	Fintech	Financial institution interconnects

Implementing a Flower Client on Android

Flower is becoming the de-facto standard for mobile deployment because it is ML-framework agnostic (supports TFLite, PyTorch Mobile, etc.).

Android (Kotlin) Client Stub:

class MyFlowerClient(
    private val tflite: Interpreter, 
    private val data: List<FloatArray>
) : Client {

    override fun getParameters(): Array<ByteBuffer> {
        // Extract weights from TFLite model
        return tflite.getWeights() 
    }

    override fun fit(
        parameters: Array<ByteBuffer>, 
        config: Config
    ): FitRes {
        // 1. Update local model with global parameters
        tflite.updateWeights(parameters)
        
        // 2. Train on local data (On-Device Training)
        val loss = trainOneEpoch(tflite, data)
        
        // 3. Return updated weights to server
        return FitRes(
            tflite.getWeights(), 
            data.size, 
            mapOf("loss" to loss)
        )
    }

    override fun evaluate(
        parameters: Array<ByteBuffer>, 
        config: Config
    ): EvaluateRes {
        // Validation step
        tflite.updateWeights(parameters)
        val accuracy = runInference(tflite, testData)
        return EvaluateRes(loss, data.size, mapOf("acc" to accuracy))
    }
}

46.1.6. Over-the-Air (OTA) Management for FL

Managing the lifecycle of FL binaries is closer to MDM (Mobile Device Management) than standard Kubernetes deployments.

Versioning Matrix

You must track:

App Version: The version of the binary (APK/IPA) installed on the phone.
Runtime Version: The version of the FL library (e.g., Flower v1.2.0).
Model Architecture Version: “MobileNetV2_Quantized_v3”.
Global Model Checkpoint: “Round_452_Weights”.

If a client has an incompatible App Version (e.g., an old feature extractor), it must be rejected from the training round to prevent polluting the global model.

The Client Registry

A DynamoDB table usually serves as the state store for millions of clients.

{
  "client_id": "uuid-5521...",
  "device_class": "high-end-android",
  "battery_status": "charging",
  "wifi_status": "connected",
  "app_version": "2.4.1",
  "last_seen": "2024-03-20T10:00:00Z",
  "eligibility": {
    "can_train": true,
    "rejection_reason": null
  }
}

The Selector Service queries this table:

“Give me 1000 clients that are charging, on WiFi, running app version > 2.4, and have at least 2GB of RAM.”

46.1.7. FL-Specific Monitoring

Standard metrics (latency, error rate) are insufficient. We need FL Telemetry.

Client Drop Rate: What % of clients disconnect mid-round? High drop rates indicate the training job is too heavy for the device.
Straggler Index: The distribution of training times. The “tail latency” (p99) determines the speed of global convergence.
Model Divergence: The distance (Euclidean or Cosine) between a client’s update and the global average. A sudden spike indicates “Model Poisoning” or a corrupted client.
Cohort Fairness: Are we only training on high-end iPhones? We must monitor the distribution of participating device types to ensure the model works on budget Android phones too.

Visualizing Client Drift

We often use dimensionality reduction (t-SNE or PCA) on the updates (gradients) sent by clients.

Cluster Analysis: If clients cluster tightly into 2 or 3 distinct groups, it suggests we have distinct data distributions (e.g., “Day Users” vs “Night Users”, or “bimodal usage patterns”).
Action: This signals the need for Personalized Federated Learning, where we might train separate models for each cluster rather than forcing a single global average.

46.1.8. Checklist for Production Readiness

Client Selection: Implemented logic to only select devices on WiFi/Charging.
Versioning: Host/Client compatibility checks in place.
Bandwidth: Gradient compression (quantization/sparsification) active.
Privacy: Differential Privacy budget tracking active.
Security: Secure Aggregation enabled; model updates signed.
Fallbacks: Strategy for when >50% of clients drop out of a round.
Evaluation: Federated evaluation rounds separate from training rounds.

46.1.9. Deep Dive: Mathematical Foundations of Secure Aggregation

To truly understand why FL is “secure,” we must prove the mathematical guarantees of the aggregation protocols.

The Bonawitz Algorithm (2017) Detailed

Let $U$ be the set of users. For each pair of users $(u, v)$, they agree on a symmetric key $s_{uv}$. The value $u$ adds to their update $x_u$ is: $$ y_u = x_u + \sum_{v > u} PRG(s_{uv}) - \sum_{v < u} PRG(s_{uv}) $$

When the server sums $y_u$: $$ \sum_u y_u = \sum_u x_u + \sum_u (\sum_{v > u} PRG(s_{uv}) - \sum_{v < u} PRG(s_{uv})) $$

The double summation terms cancel out exactly.

Proof: For every pair ${i, j}$, the term $PRG(s_{ij})$ is added exactly once (by $i$ when $i < j$) and subtracted exactly once (by $j$ when $j > i$).
Result: The server sees $\sum x_u$ but sees nothing about an individual $x_u$, provided that at least one honest participant exists in the summation who keeps their $s_{uv}$ secret.

Differential Privacy: The Moments Accountant

Standard Composition theorems for DP are too loose for deep learning (where we might do 10,000 steps). The Moments Accountant method tracks the specific privacy loss random variable and bounds its moments.

Code Implementation: DP-SGD Optimizer from Scratch

import torch
from torch.optim import Optimizer

class DPSGD(Optimizer):
    def __init__(self, params, lr=0.1, noise_multiplier=1.0, max_grad_norm=1.0):
        defaults = dict(lr=lr, noise_multiplier=noise_multiplier, max_grad_norm=max_grad_norm)
        super(DPSGD, self).__init__(params, defaults)

    def step(self):
        """
        Performs a single optimization step with Differential Privacy.
        1. Clip Gradients (per sample).
        2. Add Gaussian Noise.
        3. Average.
        """
        for group in self.param_groups:
            for p in group['params']:
                if p.grad is None:
                    continue
                
                # 1. Per-Sample Gradient Clipping
                # Note: In PyTorch vanilla, p.grad is already the MEAN of the batch.
                # To do true DP, we need "Ghost Clipping" or per-sample gradients 
                # (using Opacus library). 
                # This is a simplified "Batch Processing" view for illustration.
                
                grad_norm = p.grad.norm(2)
                clip_coef = group['max_grad_norm'] / (grad_norm + 1e-6)
                clip_coef = torch.clamp(clip_coef, max=1.0)
                p.grad.mul_(clip_coef)

                # 2. Add Noise
                # Noise scale = (noise_multiplier * max_grad_norm) / batch_size
                # Since we are working with averaged gradients:
                noise = torch.normal(
                    mean=0.0,
                    std=group['noise_multiplier'] * group['max_grad_norm'],
                    size=p.grad.shape,
                    device=p.grad.device
                )
                
                # 3. Apply Update
                p.data.add_(-group['lr'], p.grad + noise)

46.1.10. Operational Playbook: Handling Failures

In a fleet of 10 million devices, “rare” errors happen every second.

Scenario A: The “Poisoned Model” Rollback

Symptoms:

Global model accuracy drops by 20% in one round.
Validation loss spikes to NaN.

Root Cause:

A malicious actor injected gradients to maximize error (Byzantine Attack).
OR: A software bug in ExtractFeatures caused integer overflow on a specific Android version.

Recovery Protocol:

Stop the Coordinator: systemctl stop fl-server.
Identify the Bad Round: Look at the “Model Divergence” metric in Grafana.
Rollback: git checkout models/global_v451.pt (The last good state).
Device Ban: Identify the Client IDs that participated in Round 452. Mark them as SUSPENDED in DynamoDB.
Resume: Restart the coordinator with the old weights.

Scenario B: The “Straggler” Gridlock

Symptoms:

Round 105 has been running for 4 hours (average is 5 mins).
Waiting on 3 clients out of 1000.

Root Cause:

Clients are on weak WiFi or have gone offline without sending FIN.

Recovery Protocol:

Timeouts: Set a strict round_timeout_seconds = 600.
Partial Aggregation: If $> 80%$ of clients have reported, close the round and ignore the stragglers.
- Trade-off: This biases the model towards “Fast Devices” (New iPhones), potentially hurting performance on “Slow Devices” (Old Androids). This is a Fairness Issue.

46.1.11. Reference Architecture: Terraform for Cross-Silo FL

Setting up a secure aggregation server on AWS with enclave support.

# main.tf

provider "aws" {
  region = "us-east-1"
}

# 1. The Coordinator Enclave (Nitro Enclaves)
resource "aws_instance" "fl_coordinator" {
  ami           = "ami-0c55b159cbfafe1f0" # Amazon Linux 2 with Nitro Enclave support
  instance_type = "m5.xlarge" # Nitro supported
  enclave_options {
    enabled = true
  }

  iam_instance_profile = aws_iam_instance_profile.fl_coordinator_profile.name
  vpc_security_group_ids = [aws_security_group.fl_sg.id]

  user_data = <<-EOF
              #!/bin/bash
              yum install -y nitro-enclaves-cli nitro-enclaves-cli-devel
              systemctl enable nitro-enclaves-allocator.service
              systemctl start nitro-enclaves-allocator.service
              
              # Allocate hugepages for the enclave
              # 2 CPU, 6GB RAM
              nitro-cli run-enclave --cpu-count 2 --memory 6144 \
                --eif-path /home/ec2-user/server.eif \
                --enclave-cid 10
              EOF
}

# 2. The Client Registration Table
resource "aws_dynamodb_table" "fl_clients" {
  name           = "fl-client-registry"
  billing_mode   = "PAY_PER_REQUEST"
  hash_key       = "client_id"
  range_key      = "last_seen_timestamp"

  attribute {
    name = "client_id"
    type = "S"
  }

  attribute {
    name = "last_seen_timestamp"
    type = "N"
  }

  ttl {
    attribute_name = "ttl"
    enabled        = true
  }
}

# 3. Model Storage (Checkpointing)
resource "aws_s3_bucket" "fl_models" {
  bucket = "enterprise-fl-checkpoints-v1"
}

resource "aws_s3_bucket_versioning" "fl_models_ver" {
  bucket = aws_s3_bucket.fl_models.id
  versioning_configuration {
    status = "Enabled"
  }
}

46.1.12. Vendor Landscape Analysis (2025)

Vendor	Product	Primary Use Case	Deployment Model	Pricing
NVIDIA	Flare (NVFlare)	Medical Imaging, Financial Services	Self-Hosted, sidecar container	Open Source / Enterprise Support
HP	Swarm Learning	Blockchain-based FL (Decenteralized Coordinator)	On-Prem / Edge	Licensing
Google	Gboard FL	Mobile Keyboards (Internal Tech now public via TFF)	Mobile (Android)	Free (OSS)
Sherpa.ai	Sherpa	Privacy-Preserving AI	SaaS / Hybrid	Enterprise
OpenMined	PyGrid	Research & Healthcare	Self-Hosted	Open Source

Feature Comparison: NVFlare vs. Flower

NVIDIA Flare:

Architecture: Hub-and-Spoke with strict “Site” definitions.
Security: Built-in support for HA (High Availability) and Root-of-Trust.
Simulators: Accurate simulation of multi-threaded clients on a single GPU.
Best For: When you control the nodes (e.g., 5 hospitals).

Flower:

Architecture: Extremely lightweight client (just a callback function).
Mobile: First-class support for iOS/Android/C++.
Scaling: Tested up to 10M concurrent clients.
Best For: When you don’t control the nodes (Consumer devices).

46.1.13. Future Trends: Federated LLMs

Evaluating the feasibility of training Llama-3 (70B) via FL.

The Bottleneck:

Parameter size: 140GB (BF16).
Upload speed: 20Mbps (Consumer Uplink).
Time to upload one update: $140,000 \text{ MB} / 2.5 \text{ MB/s} \approx 56,000 \text{ seconds} \approx 15 \text{ hours}$.
Conclusion: Full fine-tuning of LLMs on consumer edge is impossible today.

The Solution: PEFT + QLoRA

Instead of updating 70B params, we update LoRA Adapters (Rank 8).
Adapter Size: ~10MB.
Upload time: 4 seconds.
Architecture:
- Frozen Backbone: The 70B weights are pre-loaded on the device (or streamed).
- Trainable Parts: Only the Adapter matrices $A$ and $B$.
- Aggregation: The server aggregates only the adapters.

# Federated PEFT Configuration (Concept)
peft_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM, 
    inference_mode=False, 
    r=8, 
    lora_alpha=32, 
    lora_dropout=0.1
)

# On Client
def train_step(model, batch):
    # Only gradients for 'lora_A' and 'lora_B' are computed
    loss = model(batch)
    loss.backward()
    
    # Extract only the adapter gradients for transmission
    adapter_grads = {k: v.grad for k, v in model.named_parameters() if "lora" in k}
    return adapter_grads

46.1.14. Case Study: Predictive Maintenance in Manufacturing

The User: Global Heavy Industry Corp (GHIC). The Assets: 5,000 Wind Turbines across 30 countries. The Problem: Turbine vibration data is 10TB/day. Satellite internet is expensive ($10/GB).

The FL Solution:

Edge Compute: NVIDIA Jetson mounted on every turbine.
Local Training: An Autoencoder learns the “Normal Vibration Pattern” for that specific turbine.
Federated Round: Every night, turbines send updates to a global “Anomaly Detector” model.
Bandwidth Savings:
- Raw Data: 10TB/day.
- Model Updates: 50MB/day.
- Cost Reduction: 99.9995%.

Outcome: GHIC detected a gearbox failure signature in the North Sea turbines (high wind) and propagated the learned pattern to the Brazil turbines (low wind) before the Brazil turbines experienced the failure conditions.

46.1.15. Anti-Patterns in Federated Learning

1. “Just use the Centralized Hyperparameters”

Mistake: Using lr=0.001 because it worked on the data lake.
Reality: FL optimization landscapes are “bumpy” due to non-IID data. You often need Server Learning Rates (applying the update to the global model) separate from Client Learning Rates.

2. “Assuming Client Availability”

Mistake: Waiting for specific high-value clients to report.
Reality: Clients die. Batteries die. WiFi drops. Your system must be statistically robust to any subset of clients disappearing.

3. “Ignoring System Heterogeneity”

Mistake: Sending the same model to a standard iPhone 15 and a budget Android.
Reality: The Android runs out of RAM (OOM) and crashes. You have biased your model towards rich users.
Fix: Ordered Dropout. Structure the model so that “first 50% layers” is a valid sub-model for weak devices, and “100% layers” is for strong devices.

4. “Leakage via Metadata”

Mistake: Encrypting the gradients but leaving the client_id and timestamp visible.
Reality: Side-channel attack. “This client sends updates at 3 AM” -> “User is an insomniac.”

46.1.16. Checklist: The Zero-Trust FL Deployment

Security Audit

Attestation: Does the server verify the client runs a signed binary? (Android SafetyNet / iOS DeviceCheck).
Man-in-the-Middle: Is TLS 1.3 pinned?
Model Signing: Are global weights signed by the server private key?

Data Governance

Right to be Forgotten: If User X deletes their account, can we “unlearn” their contribution? (Machine Unlearning is an active research field; typical answer: “Re-train from checkpoint before User X joined”).
Purpose Limiation: Are we ensuring the model learns “Keyboard Prediction” and not “Credit Card Numbers”?

Performance

Quantization: Are we using INT8 transfer?
Caching: Do clients cache the dataset locally to avoid re-reading from flash storage every epoch?

Federated Learning allows us to unlock the “Dark Matter” of data—the petabytes of private, sensitive data living on edges that will never see a cloud data lake. It is the ultimate frontier of decentralized MLOps.

Keyboard shortcuts

The MLOps Omni-Reference