45.11. Production Case Studies: Rust in the Wild
Note
Theory is nice. Production is better. These are five architectural patterns derived from real-world deployments where Rust replaced Python/Java and achieved 10x-100x gains.
45.11.1. Case Study 1: High Frequency Trading (The Microsecond Barrier)
The Problem: A hedge fund runs a Market Maker bot. It receives Order Book updates (WebSocket/UDP), runs a small XGBoost model, and places orders. Python latency: 800 microseconds (includes GC spikes). Target latency: < 50 microseconds.
The Solution: Rust with io_uring and Static Dispatch.
Architecture Overview
┌─────────────────────────────────────────────────────────────────┐
│ HFT Trading System │
├─────────────────────────────────────────────────────────────────┤
│ ┌─────────────┐ ┌──────────────┐ ┌─────────────────────┐ │
│ │ Network │ │ Feature │ │ Model Inference │ │
│ │ io_uring │───▶│ Extraction │───▶│ (XGBoost/LGB) │ │
│ │ UDP │ │ Zero-Alloc │ │ Static Dispatch │ │
│ └─────────────┘ └──────────────┘ └──────────┬──────────┘ │
│ │ │
│ ┌─────────────────────────────────────────────────▼──────────┐ │
│ │ Order Execution │ │
│ │ Direct Memory-Mapped FIX Protocol │ │
│ └─────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
1. Network Layer: io_uring
Traditional sockets require syscalls for every recv().
io_uring batches syscalls via a submission queue.
#![allow(unused)]
fn main() {
use io_uring::{opcode, types, IoUring};
use std::os::unix::io::AsRawFd;
struct MarketDataReceiver {
ring: IoUring,
socket: std::net::UdpSocket,
buffer: [u8; 4096],
}
impl MarketDataReceiver {
fn new(port: u16) -> std::io::Result<Self> {
let socket = std::net::UdpSocket::bind(format!("0.0.0.0:{}", port))?;
socket.set_nonblocking(true)?;
// Create io_uring with 256 entries
let ring = IoUring::builder()
.setup_sqpoll(1000) // Kernel-side polling, no syscalls!
.build(256)?;
Ok(Self {
ring,
socket,
buffer: [0u8; 4096]
})
}
fn run(&mut self) -> ! {
let fd = self.socket.as_raw_fd();
loop {
// 1. Submit Read Request (Zero Syscall in SQPOLL mode)
let read_e = opcode::Recv::new(
types::Fd(fd),
self.buffer.as_mut_ptr(),
self.buffer.len() as u32
)
.build()
.user_data(0x01);
unsafe {
self.ring.submission().push(&read_e).expect("queue full");
}
// 2. Wait for Completion (this is the only blocking point)
self.ring.submit_and_wait(1).unwrap();
// 3. Process Completion Queue
for cqe in self.ring.completion() {
if cqe.user_data() == 0x01 {
let bytes_read = cqe.result() as usize;
if bytes_read > 0 {
self.on_packet(bytes_read);
}
}
}
}
}
#[inline(always)]
fn on_packet(&self, len: usize) {
// Zero-Copy Parsing using repr(C, packed)
if len >= std::mem::size_of::<MarketPacket>() {
let packet = unsafe {
&*(self.buffer.as_ptr() as *const MarketPacket)
};
// Extract features
let features = self.extract_features(packet);
// Run model
let signal = MODEL.predict(&features);
// Execute if signal is strong
if signal.abs() > 0.5 {
ORDER_SENDER.send(Order {
symbol: packet.symbol,
side: if signal > 0.0 { Side::Buy } else { Side::Sell },
price: packet.price,
quantity: 100,
});
}
}
}
#[inline(always)]
fn extract_features(&self, packet: &MarketPacket) -> [f32; 64] {
let mut features = [0.0f32; 64];
// Feature 0: Normalized Price
features[0] = packet.price as f32 / 10000.0;
// Feature 1: Bid-Ask Spread
features[1] = (packet.ask - packet.bid) as f32;
// Feature 2: Volume Imbalance
features[2] = (packet.bid_qty as f32 - packet.ask_qty as f32)
/ (packet.bid_qty as f32 + packet.ask_qty as f32 + 1.0);
// ... more features
features
}
}
#[repr(C, packed)]
struct MarketPacket {
symbol: [u8; 8],
timestamp: u64,
price: f64,
bid: f64,
ask: f64,
bid_qty: u32,
ask_qty: u32,
trade_id: u64,
}
}
2. Model Layer: Static Dispatch
Python ML libraries use dynamic dispatch (virtual function calls). We convert the model to a Rust decision tree with compile-time optimizations.
#![allow(unused)]
fn main() {
// Auto-generated from XGBoost model
#[inline(always)]
fn predict_tree_0(features: &[f32; 64]) -> f32 {
if features[2] < 0.35 {
if features[0] < 0.52 {
if features[15] < 0.18 {
-0.0423
} else {
0.0156
}
} else {
if features[3] < 0.71 {
0.0287
} else {
-0.0089
}
}
} else {
if features[7] < 0.44 {
0.0534
} else {
-0.0312
}
}
}
// Ensemble of 100 trees
pub fn predict(features: &[f32; 64]) -> f32 {
let mut sum = 0.0;
sum += predict_tree_0(features);
sum += predict_tree_1(features);
// ... 98 more trees
sum += predict_tree_99(features);
1.0 / (1.0 + (-sum).exp()) // Sigmoid
}
}
Why this is fast:
- No dynamic dispatch (
if/elsecompiles tocmovor branch prediction) - Features accessed via direct array indexing (no hash maps)
- All code inlined into a single function
3. Results
| Metric | Python (NumPy + LightGBM) | Rust (io_uring + Static) |
|---|---|---|
| P50 Latency | 450 μs | 8 μs |
| P99 Latency | 2,100 μs | 18 μs |
| P99.9 Latency | 15,000 μs (GC) | 35 μs |
| Throughput | 50k events/sec | 2M events/sec |
| CPU Usage | 95% (1 core) | 12% (1 core) |
Business Impact:
- Fund profitability increased by 15% due to winning more races
- Reduced server count from 10 to 1
- Eliminated GC-induced losses during market volatility
45.11.2. Case Study 2: Satellite Imagery Pipeline (The Throughput Monster)
The Problem: Processing 50TB of GeoTIFFs per day.
Detecting illegal deforestation for a conservation NGO.
Python Stack: rasterio + pytorch.
Bottleneck: Python’s multiprocessing overhead (pickling 100MB images across processes).
The Solution: Rust + gdal + wgpu + Tokio.
Architecture Overview
┌─────────────────────────────────────────────────────────────────────┐
│ Satellite Processing Pipeline │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌────────────────────────┐ │
│ │ S3 Input │───▶│ COG Reader │───▶│ Tile Generator │ │
│ │ (HTTP GET) │ │ (GDAL/Rust) │ │ (rayon) │ │
│ └──────────────┘ └──────────────┘ └───────────┬────────────┘ │
│ │ │
│ ┌───────────────────────────────────────────────────▼────────────┐ │
│ │ GPU Inference Pool │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
│ │ │ WGPU 0 │ │ WGPU 1 │ │ WGPU 2 │ │ WGPU 3 │ │ │
│ │ │ (Metal) │ │ (Vulkan)│ │ (DX12) │ │ (WebGPU)│ │ │
│ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │ │
│ └───────────────────────────────────────────────────┬────────────┘ │
│ │ │
│ ┌───────────────────────────────────────────────────▼────────────┐ │
│ │ Result Aggregation │ │
│ │ Deforestation Alerts → PostGIS → Vector Tiles → Dashboard │ │
│ └─────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
1. Cloud Optimized GeoTIFF Reader
COG files support HTTP Range requests, allowing partial reads.
#![allow(unused)]
fn main() {
use gdal::Dataset;
use tokio::sync::mpsc;
pub struct CogReader {
url: String,
tile_size: usize,
}
impl CogReader {
pub async fn read_tiles(&self, tx: mpsc::Sender<Tile>) -> Result<(), Error> {
// Open with VSICURL for HTTP range requests
let path = format!("/vsicurl/{}", self.url);
let dataset = Dataset::open(&path)?;
let (width, height) = dataset.raster_size();
let bands = dataset.raster_count();
// Calculate tile grid
let tiles_x = (width + self.tile_size - 1) / self.tile_size;
let tiles_y = (height + self.tile_size - 1) / self.tile_size;
// Read tiles in parallel using rayon
let tiles: Vec<Tile> = (0..tiles_y)
.into_par_iter()
.flat_map(|ty| {
(0..tiles_x).into_par_iter().map(move |tx_idx| {
self.read_single_tile(&dataset, tx_idx, ty)
})
})
.collect();
// Send to channel
for tile in tiles {
tx.send(tile).await?;
}
Ok(())
}
fn read_single_tile(&self, dataset: &Dataset, tx: usize, ty: usize) -> Tile {
let x_off = tx * self.tile_size;
let y_off = ty * self.tile_size;
// Read all bands at once
let mut data = vec![0f32; self.tile_size * self.tile_size * 4]; // RGBI
for band_idx in 1..=4 {
let band = dataset.rasterband(band_idx).unwrap();
let offset = (band_idx - 1) * self.tile_size * self.tile_size;
band.read_into_slice(
(x_off as isize, y_off as isize),
(self.tile_size, self.tile_size),
(self.tile_size, self.tile_size),
&mut data[offset..offset + self.tile_size * self.tile_size],
None,
).unwrap();
}
Tile {
x: tx,
y: ty,
data,
width: self.tile_size,
height: self.tile_size,
}
}
}
}
2. WGPU Inference Kernel
Cross-platform GPU compute without CUDA dependency.
// shader.wgsl - Deforestation Detection Kernel
struct Params {
width: u32,
height: u32,
ndvi_threshold: f32,
min_cluster_size: u32,
};
@group(0) @binding(0) var<uniform> params: Params;
@group(0) @binding(1) var<storage, read> input: array<f32>; // RGBI interleaved
@group(0) @binding(2) var<storage, read_write> output: array<f32>; // Mask
@compute @workgroup_size(16, 16)
fn main(@builtin(global_invocation_id) global_id: vec3<u32>) {
let x = global_id.x;
let y = global_id.y;
if (x >= params.width || y >= params.height) {
return;
}
let idx = y * params.width + x;
let pixel_size = 4u; // RGBI
// Read RGBI values
let red = input[idx * pixel_size + 0u];
let green = input[idx * pixel_size + 1u];
let blue = input[idx * pixel_size + 2u];
let nir = input[idx * pixel_size + 3u]; // Near-infrared
// Calculate NDVI (Normalized Difference Vegetation Index)
let ndvi = (nir - red) / (nir + red + 0.0001);
// Calculate EVI (Enhanced Vegetation Index) for better sensitivity
let evi = 2.5 * ((nir - red) / (nir + 6.0 * red - 7.5 * blue + 1.0));
// Combined vegetation index
let veg_index = (ndvi + evi) / 2.0;
// Deforestation detection: low vegetation index = potential deforestation
if (veg_index < params.ndvi_threshold && veg_index > -0.2) {
output[idx] = 1.0; // Potential deforestation
} else if (veg_index >= params.ndvi_threshold) {
output[idx] = 0.0; // Healthy vegetation
} else {
output[idx] = -1.0; // Water or clouds
}
}
Rust Host Code:
#![allow(unused)]
fn main() {
use wgpu::util::DeviceExt;
pub struct GpuInferenceEngine {
device: wgpu::Device,
queue: wgpu::Queue,
pipeline: wgpu::ComputePipeline,
bind_group_layout: wgpu::BindGroupLayout,
}
impl GpuInferenceEngine {
pub async fn new() -> Self {
let instance = wgpu::Instance::new(wgpu::InstanceDescriptor::default());
let adapter = instance.request_adapter(&wgpu::RequestAdapterOptions {
power_preference: wgpu::PowerPreference::HighPerformance,
..Default::default()
}).await.unwrap();
let (device, queue) = adapter.request_device(
&wgpu::DeviceDescriptor {
label: Some("Inference Device"),
required_features: wgpu::Features::empty(),
required_limits: wgpu::Limits::default(),
memory_hints: wgpu::MemoryHints::Performance,
},
None,
).await.unwrap();
let shader = device.create_shader_module(wgpu::ShaderModuleDescriptor {
label: Some("Deforestation Shader"),
source: wgpu::ShaderSource::Wgsl(include_str!("shader.wgsl").into()),
});
// Create pipeline and bind group layout...
// (simplified for brevity)
Self { device, queue, pipeline, bind_group_layout }
}
pub async fn process_tile(&self, tile: &Tile) -> Vec<f32> {
let input_buffer = self.device.create_buffer_init(&wgpu::util::BufferInitDescriptor {
label: Some("Input Buffer"),
contents: bytemuck::cast_slice(&tile.data),
usage: wgpu::BufferUsages::STORAGE | wgpu::BufferUsages::COPY_DST,
});
let output_size = (tile.width * tile.height * std::mem::size_of::<f32>()) as u64;
let output_buffer = self.device.create_buffer(&wgpu::BufferDescriptor {
label: Some("Output Buffer"),
size: output_size,
usage: wgpu::BufferUsages::STORAGE | wgpu::BufferUsages::COPY_SRC,
mapped_at_creation: false,
});
// Dispatch compute shader
let mut encoder = self.device.create_command_encoder(&wgpu::CommandEncoderDescriptor::default());
{
let mut pass = encoder.begin_compute_pass(&wgpu::ComputePassDescriptor::default());
pass.set_pipeline(&self.pipeline);
// pass.set_bind_group(0, &bind_group, &[]);
pass.dispatch_workgroups(
(tile.width as u32 + 15) / 16,
(tile.height as u32 + 15) / 16,
1
);
}
self.queue.submit(std::iter::once(encoder.finish()));
// Read back results
let buffer_slice = output_buffer.slice(..);
let (tx, rx) = tokio::sync::oneshot::channel();
buffer_slice.map_async(wgpu::MapMode::Read, move |result| {
tx.send(result).unwrap();
});
self.device.poll(wgpu::Maintain::Wait);
rx.await.unwrap().unwrap();
let data = buffer_slice.get_mapped_range();
bytemuck::cast_slice(&data).to_vec()
}
}
}
3. Results
| Metric | Python (rasterio + PyTorch) | Rust (WGPU + Rayon) |
|---|---|---|
| Data Processed/Day | 5 TB | 80 TB |
| Tile Latency | 450 ms | 8 ms |
| Memory Usage | 32 GB (OOM common) | 4 GB (stable) |
| EC2 Cost | $12,000/month (8x p3.2xlarge) | $800/month (2x g4dn.xlarge) |
| Cross-Platform | CUDA only | Mac/Windows/Linux/Web |
Business Impact:
- 93% reduction in compute costs
- Real-time alerts instead of next-day batch
- Runs on local workstations for field teams
45.11.3. Case Study 3: Real-time Recommendation Engine (The Scale Problem)
The Problem: E-commerce site. “You might also like…”. Traffic: 50,000 req/sec during Black Friday. Legacy System: Java (Spring Boot) + Elasticsearch. Issues: JVM GC pauses caused 500ms latency spikes, killing conversion.
The Solution: Rust + lance (Vector Search) + axum.
Architecture Overview
┌─────────────────────────────────────────────────────────────────────┐
│ Recommendation System │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────────────────────────────────┐ │
│ │ API Layer │ │ Vector Store │ │
│ │ (Axum) │◀──▶│ ┌──────────────────────────────────┐ │ │
│ │ 50k rps │ │ │ Lance (mmap'd NVMe) │ │ │
│ └──────────────┘ │ │ • Product Embeddings (768d) │ │ │
│ │ │ • 10M vectors │ │ │
│ ┌──────────────┐ │ │ • IVF-PQ Index │ │ │
│ │ Updater │───▶│ └──────────────────────────────────┘ │ │
│ │ (Nightly) │ │ │ │
│ └──────────────┘ └──────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
1. Vector Store: Lance
Lance is a columnar format optimized for ML (vectors + metadata).
#![allow(unused)]
fn main() {
use lance::dataset::Dataset;
use lance::index::vector::VectorIndexParams;
use arrow_array::{RecordBatch, Float32Array, StringArray};
use arrow_schema::{Schema, Field, DataType};
pub struct ProductIndex {
dataset: Dataset,
}
impl ProductIndex {
pub async fn from_embeddings(path: &str) -> Result<Self, Error> {
let dataset = Dataset::open(path).await?;
Ok(Self { dataset })
}
pub async fn create_index(&mut self) -> Result<(), Error> {
// Create IVF-PQ index for fast ANN search
let params = VectorIndexParams::ivf_pq(
256, // num_partitions
8, // num_sub_vectors
8, // bits per sub-vector
lance::index::vector::MetricType::Cosine,
);
self.dataset.create_index(
&["embedding"],
lance::index::IndexType::Vector,
Some("product_idx"),
¶ms,
true, // replace existing
).await?;
Ok(())
}
pub async fn search(
&self,
query_embedding: &[f32],
limit: usize,
) -> Result<Vec<ProductRecommendation>, Error> {
let results = self.dataset
.scan()
.nearest("embedding", query_embedding, limit)?
.nprobes(20) // Search 20 IVF partitions
.refine_factor(2) // Rerank top 2x candidates
.project(&["product_id", "title", "price", "image_url"])?
.try_into_stream()
.await?
.try_collect::<Vec<_>>()
.await?;
// Convert to response type
let recommendations = results
.into_iter()
.flat_map(|batch| batch_to_recommendations(batch))
.collect();
Ok(recommendations)
}
}
fn batch_to_recommendations(batch: RecordBatch) -> Vec<ProductRecommendation> {
let ids = batch.column(0).as_any().downcast_ref::<StringArray>().unwrap();
let titles = batch.column(1).as_any().downcast_ref::<StringArray>().unwrap();
let prices = batch.column(2).as_any().downcast_ref::<Float32Array>().unwrap();
let images = batch.column(3).as_any().downcast_ref::<StringArray>().unwrap();
(0..batch.num_rows())
.map(|i| ProductRecommendation {
product_id: ids.value(i).to_string(),
title: titles.value(i).to_string(),
price: prices.value(i),
image_url: images.value(i).to_string(),
})
.collect()
}
}
2. API Layer: Zero-Copy Search
use axum::{extract::State, Json, Router, routing::post};
use std::sync::Arc;
use tokio::sync::RwLock;
#[derive(Clone)]
struct AppState {
index: Arc<RwLock<ProductIndex>>,
embedding_model: Arc<EmbeddingModel>,
}
async fn recommend(
State(state): State<AppState>,
Json(request): Json<RecommendRequest>,
) -> Json<RecommendResponse> {
// 1. Get user's recent interaction embeddings
let query_embedding = state.embedding_model
.encode(&request.user_context)
.await;
// 2. Search (read lock only, never blocks writers)
let index = state.index.read().await;
let recommendations = index
.search(&query_embedding, request.limit)
.await
.unwrap_or_default();
// 3. Apply business rules (filtering, boosting)
let filtered = apply_business_rules(recommendations, &request);
Json(RecommendResponse {
recommendations: filtered,
request_id: uuid::Uuid::new_v4().to_string(),
})
}
fn apply_business_rules(
mut recs: Vec<ProductRecommendation>,
request: &RecommendRequest,
) -> Vec<ProductRecommendation> {
// Filter out of stock
recs.retain(|r| r.in_stock);
// Boost items on sale
recs.sort_by(|a, b| {
let a_score = if a.on_sale { 1.5 } else { 1.0 };
let b_score = if b.on_sale { 1.5 } else { 1.0 };
b_score.partial_cmp(&a_score).unwrap()
});
// Limit to requested count
recs.truncate(request.limit);
recs
}
#[tokio::main]
async fn main() {
let index = ProductIndex::from_embeddings("products.lance").await.unwrap();
let embedding_model = EmbeddingModel::new().await;
let state = AppState {
index: Arc::new(RwLock::new(index)),
embedding_model: Arc::new(embedding_model),
};
let app = Router::new()
.route("/recommend", post(recommend))
.with_state(state);
let listener = tokio::net::TcpListener::bind("0.0.0.0:8080").await.unwrap();
axum::serve(listener, app).await.unwrap();
}
3. Atomic Index Updates
Update the index without downtime:
#![allow(unused)]
fn main() {
async fn update_index(state: AppState, new_path: &str) {
// 1. Build new index in background
let new_index = ProductIndex::from_embeddings(new_path).await.unwrap();
new_index.create_index().await.unwrap();
// 2. Warm up (optional: pre-fetch popular queries)
for query in get_popular_queries().await {
let _ = new_index.search(&query, 10).await;
}
// 3. Atomic swap
let mut index_guard = state.index.write().await;
*index_guard = new_index;
// Old index dropped here, mmap'd files cleaned up
}
}
4. Results
| Metric | Java (Spring + ES) | Rust (Axum + Lance) |
|---|---|---|
| P50 Latency | 45 ms | 3 ms |
| P99 Latency | 500 ms (GC) | 12 ms |
| Throughput | 5,000 rps | 80,000 rps |
| Memory | 64 GB (ES heap) | 8 GB (mmap) |
| Server Count | 20 nodes | 2 nodes |
| Annual Cost | $480,000 | $36,000 |
Business Impact:
- 93% reduction in infrastructure costs
- 0 GC-induced latency spikes during Black Friday
- Conversion rate increased 12% due to faster recommendations
45.11.4. Case Study 4: Privacy-Preserving AI (Confidential Computing)
The Problem: Running Medical AI on patient data. Hospitals refuse to send data to Cloud unless it is encrypted during computation. Python: Difficult to run in SGX Enclaves (Interpreter size, OS deps).
The Solution: Rust + Intel SGX + Gramine.
Why Rust for Enclaves?
| Concern | C++ | Python | Rust |
|---|---|---|---|
| Buffer Overflows | Common | N/A (interpretation) | Impossible |
| Binary Size | Large | Huge (interpreter) | Small |
| Side Channels | Manual prevention | Very hard | Library support |
| Attestation | Complex | Very hard | Clean abstractions |
#![allow(unused)]
fn main() {
use sgx_isa::{Report, Targetinfo};
use sgx_crypto::sha256;
/// Generate attestation report for remote verification
pub fn generate_attestation(user_data: &[u8]) -> Report {
let mut report_data = [0u8; 64];
let hash = sha256::hash(user_data);
report_data[..32].copy_from_slice(&hash);
// Get target info for quoting enclave
let target_info = Targetinfo::for_self();
// Generate report (signed by CPU)
Report::for_target(&target_info, &report_data)
}
/// Secure inference inside enclave
pub fn secure_predict(encrypted_input: &[u8], key: &[u8; 32]) -> Vec<u8> {
// 1. Decrypt input inside enclave
let input = aes_gcm_decrypt(encrypted_input, key);
// 2. Run model (all in enclave memory)
let output = MODEL.forward(&input);
// 3. Encrypt output before leaving enclave
aes_gcm_encrypt(&output, key)
}
}
Results
- Binary size: 15 MB (vs 500 MB for Python + deps)
- Attack surface: Minimal (no interpreter vulnerabilities)
- Certification: Passed HIPAA security audit
45.11.5. Case Study 5: Log Analytics (The Grep Replacement)
The Problem: Searching 10TB of JSON logs per query. Current tool: Elasticsearch. Issues: Cluster overhead, slow cold queries, expensive.
The Solution: Rust CLI tool with SIMD JSON parsing.
#![allow(unused)]
fn main() {
use simd_json;
use memmap2::Mmap;
use rayon::prelude::*;
fn search_logs(pattern: &str, paths: &[PathBuf]) -> Vec<LogMatch> {
paths.par_iter()
.flat_map(|path| {
let file = std::fs::File::open(path).unwrap();
let mmap = unsafe { Mmap::map(&file).unwrap() };
// Split into lines (SIMD-accelerated)
let lines: Vec<&[u8]> = mmap
.par_split(|&b| b == b'\n')
.collect();
// Parse and filter in parallel
lines.par_iter()
.filter_map(|line| {
let mut owned = line.to_vec();
let json: JsonValue = simd_json::from_slice(&mut owned).ok()?;
if json["message"].as_str()?.contains(pattern) {
Some(LogMatch {
timestamp: json["timestamp"].as_str()?.to_string(),
message: json["message"].as_str()?.to_string(),
file: path.to_string_lossy().to_string(),
})
} else {
None
}
})
.collect::<Vec<_>>()
})
.collect()
}
}
Results
| Tool | 10TB Query Time | Memory | Setup Time |
|---|---|---|---|
| Elasticsearch | 45 seconds | 128 GB cluster | 2 hours |
| grep + jq | 4 hours | 1 GB | 0 |
| Rust CLI | 3 seconds | 4 GB | 0 |
45.11.6. Key Takeaways for Architects
When to Use Rust
- Latency Sensitive (< 10ms requirement): HFT, AdTech, Gaming
- Cost Sensitive (> $10k/month compute): Batch processing, ETL
- Scale Critical (> 10k rps): Core infrastructure, gateways
- Security Critical: Enclaves, cryptography, medical devices
- Edge/Embedded: IoT, mobile SDKs, browser extensions
When to Keep Python
- Rapid Prototyping: < 1 week development time
- ML Training: PyTorch ecosystem is unmatched
- Data Exploration: Jupyter notebooks
- Glue Code: Orchestrating existing services
- UI Development: Streamlit, Gradio
The Hybrid Pattern (Recommended)
┌─────────────────────────────────────────────────┐
│ ML Application │
├─────────────────────────────────────────────────┤
│ Training │ Python (PyTorch, Notebook) │
│ Inference │ Rust (Axum, Candle) │
│ Data Prep │ Rust (Polars) │
│ Experiment │ Python (MLflow) │
│ Platform │ Rust (APIs, Gateways) │
│ Monitoring │ Rust (Metrics) + Grafana │
└─────────────────────────────────────────────────┘
This is not either/or. The best teams use both languages where they excel.
[End of Section 45.11]