45.5. Edge & Embedded ML: Rust on Bare Metal

Warning

The Constraint: You have 320KB of RAM. You have no OS (no Linux, no Windows). You have no malloc. Python cannot run here. C++ is unsafe. Rust is the only high-level language that can target no_std.

45.5.1. Understanding `no_std`

In normal Rust (std), you have:

Vec<T> (Heap)
std::fs (Filesystem)
std::thread (Threads)

In Embedded Rust (core), you have:

slice (Stack arrays)
iter (Iterators)
Result / Option (Error handling)

You lose convenience, but you gain Determinism. You know exactly how many bytes your program uses.

The Cargo.toml

[package]
name = "embedded-ml"
version = "0.1.0"
edition = "2021"

[dependencies]
cortex-m = "0.7"
cortex-m-rt = "0.7" # Runtime (Reset handler)
embedded-hal = "0.2"
panic-halt = "0.2" # Halt on panic (no stack trace printing)
microflow = "0.1" # Hypothetical TinyML inference crate

45.5.2. Your First Embedded Program (ESP32-C3)

The ESP32-C3 is a RISC-V microcontroller. Cost: $2.

#![no_std]
#![no_main]

use esp32c3_hal::{
    clock::ClockControl,
    gpio::IO,
    peripherals::Peripherals,
    prelude::*,
    timer::TimerGroup,
};
use panic_halt as _;

#[entry]
fn main() -> ! {
    let peripherals = Peripherals::take();
    let system = peripherals.SYSTEM.split();
    let clocks = ClockControl::boot_defaults(system.clock_control).freeze();

    let io = IO::new(peripherals.GPIO, peripherals.IO_MUX);
    let mut led = io.pins.gpio2.into_push_pull_output();

    // The Inference Loop
    loop {
        led.toggle().unwrap();
        
        // ML Inference would go here
        // run_model();
        
        // Busy wait (bad power efficiency)
        for _ in 0..100_000 {}
    }
}

45.5.3. Managing Memory: The Allocator Question

ML models need weights. Weights need memory. If you don’t have an OS malloc, where do Vec<f32> go?

Option 1: Static Allocation (Safest)

Everything is static buffer: [f32; 1000].

Pros: Impossible to run OOM at runtime. Linker fails if RAM is insufficient.
Cons: Inflexible.

#![allow(unused)]
fn main() {
static mut HEAP_MEM: [u32; 1024] = [0; 1024];

fn inference(input: &[f32]) {
    // Zero allocation inference
    let mut output = [0.0; 10];
    // ...
}
}

Option 2: Embedded Allocator (Flexible)

We can implement a simple “Bump Allocator” to enable Vec support.

#![allow(unused)]
fn main() {
use embedded_alloc::Heap;

#[global_allocator]
static HEAP: Heap = Heap::empty();

fn init_heap() {
    use core::mem::MaybeUninit;
    const HEAP_SIZE: usize = 32 * 1024; // 32KB
    static mut HEAP_MEM: [MaybeUninit<u8>; HEAP_SIZE] = [MaybeUninit::uninit(); HEAP_SIZE];
    unsafe { HEAP.init(HEAP_MEM.as_ptr() as usize, HEAP_SIZE) }
}
}

Now you can use extern crate alloc; and Vec<f32>! Just be careful: Recursion + Allocation = Stack Overflow.

45.5.4. TinyML: `tflite-micro` vs Rust

Google’s TensorFlow Lite for Microcontrollers is written in C++. It requires defining a “Tensor Arena” (a big byte array).

Rust Approach (tract or microflow): Rust can verify at compile time if your model fits in RAM.

Example: Audio Keyword Spotting (Rust)

#![allow(unused)]
fn main() {
// 1. ADC (Microphone) Interrupt
#[interrupt]
fn ADC0() {
    let sample = adc.read();
    RING_BUFFER.push(sample);
}

// 2. FFT Feature Extraction (no_std)
use microfft::real::rfft_256;

fn extract_features() -> [f32; 128] {
    let mut buffer = [0.0; 256];
    // ... fill buffer from RING_BUFFER ...
    
    let spectrum = rfft_256(&mut buffer);
    // ... compute power ...
}

// 3. Inference
fn run_inference(features: &[f32; 128]) -> bool {
    // Hardcoded weights (Flash Memory)
    const W1: [[f32; 64]; 128] = include_weights!("layer1.bin");
    
    // Matrix Mul logic (f32, no SIMD on Cortex-M0)
    // ...
}
}

45.5.5. Peripherals: Interacting with Sensors

ML input comes from sensors. Rust’s embedded-hal traits provide a universal API. Whether you are on STM32, ESP32, or nRF52, the code looks the same.

#![allow(unused)]
fn main() {
use embedded_hal::blocking::i2c::WriteRead;

const IMU_ADDR: u8 = 0x68;

fn read_accelerometer<I2C>(i2c: &mut I2C) -> [i16; 3] 
where I2C: WriteRead {
    let mut buffer = [0u8; 6];
    // Write 0x3B (ACCEL_XOUT_H register), Read 6 bytes
    i2c.write_read(IMU_ADDR, &[0x3B], &mut buffer).unwrap();
    
    let x = i16::from_be_bytes([buffer[0], buffer[1]]);
    let y = i16::from_be_bytes([buffer[2], buffer[3]]);
    let z = i16::from_be_bytes([buffer[4], buffer[5]]);
    
    [x, y, z]
}
}

45.5.6. Deployment: `probe-rs`

In C++, you use OpenOCD and GDB. It’s complex. In Rust, cargo flash just works.

# Flash the code to the plugged-in chip
cargo flash --chip esp32c3 --release

Monitor Logs (RTT): C++ printf requires configuring UART. Rust defmt (Deferred Formatting) sends compressed logs over the debug probe. It is microscopically cheap (microseconds).

#![allow(unused)]
fn main() {
use defmt::info;
info!("Inference took {} ms", latency);
}

45.5.7. Battery Life Optimization

Rust’s ownership model helps power consumption too. If you own the Peripheral, you know nobody else is using it. You can safely power it down.

#![allow(unused)]
fn main() {
{
    let i2c = peripherals.I2C0.into_active();
    let data = read_sensor(&i2c);
} // i2c goes out of scope -> Drop impl powers down the peripheral automatically.
}

This pattern implies Zero-Cost Power Management.

45.5.8. Case Study: Smart Agriculture Node

Goal: Detect pests using microphone audio. Device: nRF52840 (Bluetooth + Cortex M4). Power Budget: 1 year on Coin Cell.

Architecture:

Sleep: CPU OFF.
Wake on Sound: Low-power comparator triggers interrupt.
Record: DMA transfers audio to RAM (CPU sleeping).
Infer: Rust microfft + Tiny Neural Net (CPU 100%).
Alert: If pest detected, wake up Bluetooth Radio and send packet.
Sleep.

Why Rust? Memory safety ensures the complex state machine (Sleep -> Wake -> DMA -> BLE) never enters an undefined state. In C, race conditions in Interrupt Handlers are notoriously common.

45.5.9. The “Safe” Embedded Pattern: `heapless`

Allocating memory (Heap) on a device with 16KB RAM is risky (Fragmentation). The heapless crate provides standard collections that live on the Stack.

#![allow(unused)]
fn main() {
use heapless::{Vec, String, FnvIndexMap};

fn safe_buffers() {
    // A vector with max capacity 32. 
    // Allocated as a fixed-size array [T; 32] on stack.
    let mut buffer: Vec<f32, 32> = Vec::new();
    
    // Pushing beyond 32 returns Result::Err, not a crash.
    // buffer.push(1.0).unwrap();
    
    // A string of max 64 chars
    let mut log_line: String<64> = String::new();
}
}

This guarantees Worst Case Execution Memory Usage at compile time.

45.5.10. Async Embedded: The `embassy` Revolution

Traditionally, you use an RTOS like FreeRTOS to handle tasks. In Rust, async/await is a compile-time state machine transformation. This means you can have multitasking without an OS kernel.

Embassy is the standard framework for this.

use embassy_executor::Spawner;
use embassy_time::{Duration, Timer};

#[embassy_executor::task]
async fn blink_task(pin: AnyPin) {
    loop {
        pin.toggle();
        Timer::after(Duration::from_millis(500)).await;
        // The CPU sleeps here!
    }
}

#[embassy_executor::task]
async fn infer_task() {
    loop {
        let input = wait_for_sensor().await;
        let output = model.predict(input);
        send_over_lora(output).await;
    }
}

#[embassy_executor::main]
async fn main(spawner: Spawner) {
    // Spawn two concurrent tasks onto the same single core.
    // The compiler generates the interleaving state machine.
    spawner.spawn(blink_task(led)).unwrap();
    spawner.spawn(infer_task()).unwrap();
}

Advantage over FreeRTOS:

Memory: Each task needs a stack in FreeRTOS. In Embassy, they share the stack.
Safety: Data races between tasks are caught at compile time.

45.5.11. Digital Signal Processing (DSP)

Before ML, you need DSP. Rust has excellent iterator optimizations for this.

#![allow(unused)]
fn main() {
struct LowPassFilter {
    alpha: f32,
    last: f32,
}

impl LowPassFilter {
    fn update(&mut self, input: f32) -> f32 {
        self.last = self.last + self.alpha * (input - self.last);
        self.last
    }
}

// Zero-Cost Abstraction
// This iterator compile down to a single vectorized loop.
fn filter_buffer(input: &[f32], output: &mut [f32]) {
    let mut lpf = LowPassFilter { alpha: 0.1, last: 0.0 };
    
    input.iter()
        .zip(output.iter_mut())
        .for_each(|(in_val, out_val)| {
            *out_val = lpf.update(*in_val);
        });
}
}

45.5.12. OTA Updates: `embassy-boot`

Deploying 1000 IoT sensors is easy. Updating them is hard. Rust prevents “Bricking” the device. We use A/B partitioning.

Bootloader: Checks Framebuffer CRC.
Partition A: Active App.
Partition B: Incoming App.

#![allow(unused)]
fn main() {
// Updating Logic
async fn update_firmware(uart: &mut Uart) {
    let mut writer = PartitionB::writer();
    
    while let Some(chunk) = uart.read_chunk().await {
        writer.write(chunk).await;
    }
    
    // Verify Signature (Ed25519)
    if verify_signature(writer.digest()) {
        embassy_boot::set_boot_partition(PartitionB);
        cortex_m::peripheral::SCB::sys_reset();
    }
}
}

If signature fails, the device reboots into Partition A. Safe.

45.5.13. Hardware-in-the-Loop (HIL) Testing with QEMU

You don’t need the physical board to test code. qemu-system-arm supports popular boards (micro:bit, STM32).

Cargo Config:

[target.thumbv7em-none-eabihf]
runner = "qemu-system-arm -cpu cortex-m4 -machine lm3s6965evb -nographic -semihosting -kernel"

Now, cargo run launches QEMU. You can mock sensors by writing to specific memory addresses that QEMU intercepts.

45.5.14. Final Checklist for Edge AI

Model Size: Does it fit in Flash? (Use cargo size -- -A)
RAM: Does inference fit in Stack/Heap? (Use heapless to be sure).
Power: Are you sleeping when idle? (Use embassy).
Updates: Can you recover from a bad update? (Use A/B partitions).
Monitoring: Use defmt for efficient logging.

45.5.15. Deep Dive: Memory-Mapped I/O and PACs

How does led.toggle() actually work? In C, you do *(volatile uint32_t*)(0x50000000) |= (1 << 5). This is unsafe. In Rust, we use PACs (Peripheral Access Crates) generated from SVD files via svd2rust.

The Magic of `svd2rust`

The vendor (ST, Espressif) provides an XML file (SVD) describing every register address. svd2rust converts this into safe Rust code.

#![allow(unused)]
fn main() {
// C-style (unsafe)
unsafe {
    let gpio_out = 0x5000_0504 as *mut u32;
    *gpio_out |= 1 << 5;
}

// Rust PAC (Safe)
let dp = pac::Peripherals::take().unwrap();
let gpioa = dp.GPIOA;
// The closure ensures atomic Read-Modify-Write
gpioa.odr.modify(|r, w| w.odr5().set_bit());
}

The Rust compiler collapses all this “abstraction” into the exact same single assembly instruction (LDR, ORR, STR) as the C code. Zero Overhead.

45.5.16. Direct Memory Access (DMA): The MLOps Accelerator

In MLOps, we move heavy tensors. Copying 1MB of audio data byte-by-byte using the CPU is slow. DMA allows the hardware to copy memory while the CPU sleeps (or runs inference).

DMA with `embedded-dma`

#![allow(unused)]
fn main() {
use embedded_dma::{ReadBuffer, WriteBuffer};

// 1. Setup Buffers
static mut RX_BUF: [u8; 1024] = [0; 1024];

fn record_audio_dma(adc: &ADC, dma: &mut DMA) {
    // 2. Configure Transfer
    // Source: ADC Data Register
    // Dest: RX_BUF in RAM
    let transfer = dma.transfer(
        adc.data_register(),
        unsafe { &mut RX_BUF },
    );
    
    // 3. Start (Non-blocking)
    let transfer_handle = transfer.start();
    
    // 4. Do other work (e.g. Inference on previous buffer)
    run_inference();
    
    // 5. Wait for finish
    transfer_handle.wait();
}
}

45.5.17. Custom Panic Handlers: The “Blue Screen” of LEDS

When unwrap() fails in no_std, where does the error go? There is no console. We write a handler that blinks the error code in Morse Code on the Status LED.

#![allow(unused)]
fn main() {
#[panic_handler]
fn panic(_info: &core::panic::PanicInfo) -> ! {
    // 1. Disable Interrupts (Critical Section)
    cortex_m::interrupt::disable();
    
    // 2. Get LED hardware
    // Note: We must use 'steal()' because Peripherals might be already taken
    let p = unsafe { pac::Peripherals::steal() };
    let mut led = p.GPIOC.odr;
    
    // 3. Blink "SOS" (... --- ...)
    loop {
        blink_dot(&mut led);
        blink_dot(&mut led);
        blink_dot(&mut led);
        blink_dash(&mut led);
        // ...
    }
}
}

This is crucial for debugging field devices where you don’t have a UART cable attached.

45.5.18. Writing a Bootloader in Rust

If you want OTA, you need a custom Bootloader. It resides at address 0x0800_0000 (on STM32). It decides whether to jump to 0x0801_0000 (App A) or 0x0802_0000 (App B).

#[entry]
fn main() -> ! {
    let p = pac::Peripherals::take().unwrap();
    
    // 1. Check Button State
    if p.GPIOC.idr.read().idr13().is_low() {
        // Recovery Mode
        flash_led();
        loop {}
    }
    
    // 2. Validate App Checksum
    let app_ptr = 0x0801_0000 as *const u32;
    if verify_checksum(app_ptr) {
        // 3. Jump to Application
        unsafe {
            let stack_ptr = *app_ptr;
            let reset_vector = *(app_ptr.offset(1));
            
            // Set Main Stack Pointer
            cortex_m::register::msp::write(stack_ptr);
            
            // Re-interpret the address as a function and call it
            let output_fn: extern "C" fn() -> ! = core::mem::transmute(reset_vector);
            output_fn();
        }
    }
    
    // Fallback
    loop {}
}

45.5.19. Benchmarking: Counting Cycles

std::time::Instant doesn’t exist. On ARM Cortex-M, we use the DWT (Data Watchpoint and Trace) Cycle Counter (CYCCNT).

#![allow(unused)]
fn main() {
use cortex_m::peripheral::DWT;

fn measure_inference() {
    let mut dwt = unsafe { pac::CorePeripherals::steal().DWT };
    // Enable Cycle Counter
    dwt.enable_cycle_counter();
    
    let start = DWT::get_cycle_count();
    
    // Run Model
    let _ = model.predict(&input);
    
    let end = DWT::get_cycle_count();
    
    let cycles = end - start;
    let time_ms = cycles as f32 / (CLOCK_HZ as f32 / 1000.0);
    
    defmt::info!("Inference Cycles: {}, Time: {} ms", cycles, time_ms);
}
}

This gives you nanosecond-precision profiling. You can count exactly how many cycles a Matrix Multiplication takes.

45.5.20. Cargo Embed & Defmt

The tooling experience is superior to C. cargo-embed (by Ferrous Systems) is an all-in-one tool.

Embed.toml:

[default.probe]
protocol = "Swd"

[default.rtt]
enabled = true

[default.gdb]
enabled = false

Usage: cargo embed --release.

Compiles.
Flashes.
Resets chip.
Opens RTT console to show defmt logs. All in 2 seconds.

45.5.21. Final Exam: The Spec Sheet

Scenario: You are building a “Smart Doorbell” with Face Recognition.

MCU: STM32H7 (480MHz, 1MB RAM).
Camera: OV2640 (DCMI interface).
Model: MobileNetV2-SSD (Quantized int8).

Stack:

Driver: stm32h7xx-hal (DCMI for Camera).
DMA: Transfer Image -> RAM (Double buffering).
Preprocessing: image-proc (Resize 320x240 -> 96x96).
Inference: tract-core (Pulse backend).
Output: embedded-graphics (Draw Box on LCD).

In C++, integrating these 5 components (Vendor HAL + OpenCV port + TFLite + GUI) would take months. In Rust, cargo add and trait compatibility make it a 2-week job.

[End of Section 45.5]

45.5.22. Real-Time Operating Systems (RTOS) Integration

For hard real-time requirements, integrate with RTOS.

Embassy: Async on Bare Metal

#![no_std]
#![no_main]

use embassy_executor::Spawner;
use embassy_time::{Duration, Timer, Instant};
use embassy_sync::channel::Channel;
use embassy_sync::blocking_mutex::raw::ThreadModeRawMutex;

// Channel for sensor data
static SENSOR_CHANNEL: Channel<ThreadModeRawMutex, SensorData, 10> = Channel::new();

#[embassy_executor::task]
async fn sensor_task() {
    let mut adc = Adc::new();
    
    loop {
        let reading = adc.read().await;
        let data = SensorData {
            timestamp: Instant::now(),
            value: reading,
        };
        
        SENSOR_CHANNEL.send(data).await;
        Timer::after(Duration::from_millis(10)).await; // 100 Hz sampling
    }
}

#[embassy_executor::task]
async fn inference_task() {
    let model = load_model();
    let mut buffer = RingBuffer::new(100);
    
    loop {
        let data = SENSOR_CHANNEL.receive().await;
        buffer.push(data);
        
        if buffer.is_full() {
            let features = extract_features(&buffer);
            let prediction = model.predict(&features);
            
            if prediction.anomaly_detected() {
                trigger_alert().await;
            }
            
            buffer.clear();
        }
    }
}

#[embassy_executor::main]
async fn main(spawner: Spawner) {
    spawner.spawn(sensor_task()).unwrap();
    spawner.spawn(inference_task()).unwrap();
}

FreeRTOS Integration

use freertos_rust::*;

fn main() {
    // Create tasks
    Task::new()
        .name("sensor")
        .stack_size(2048)
        .priority(TaskPriority(3))
        .start(sensor_task)
        .unwrap();
    
    Task::new()
        .name("inference")
        .stack_size(4096)  // ML needs more stack
        .priority(TaskPriority(2))
        .start(inference_task)
        .unwrap();
    
    // Start scheduler
    FreeRtosUtils::start_scheduler();
}

fn inference_task(_: ()) {
    let model = TinyModel::load();
    let queue = Queue::<SensorData>::new(10).unwrap();
    
    loop {
        if let Ok(data) = queue.receive(Duration::ms(100)) {
            let result = model.predict(&data.features);
            // Process result...
        }
    }
}

45.5.23. Power Management

Battery life is critical for edge devices.

use embassy_stm32::low_power::{stop_with_rtc, Executor};

#[embassy_executor::main]
async fn main(spawner: Spawner) {
    let p = embassy_stm32::init(Default::default());
    
    // Configure RTC for wake-up
    let rtc = Rtc::new(p.RTC, RtcClockSource::LSE);
    
    loop {
        // 1. Collect sensor data
        let data = read_sensors().await;
        
        // 2. Run inference
        let result = model.predict(&data);
        
        // 3. Transmit if interesting
        if result.is_significant() {
            radio.transmit(&result).await;
        }
        
        // 4. Enter low-power mode for 5 seconds
        stop_with_rtc(&rtc, Duration::from_secs(5)).await;
        // CPU wakes up here after 5 seconds
    }
}

Power Profiles

#![allow(unused)]
fn main() {
#[derive(Clone, Copy)]
pub enum PowerMode {
    Active,      // Full speed, max power
    LowPower,    // Reduced clock, peripherals off
    Sleep,       // CPU halted, RAM retained
    DeepSleep,   // Only RTC running
}

pub fn set_power_mode(mode: PowerMode) {
    match mode {
        PowerMode::Active => {
            // Max performance
            rcc.set_sysclk(480_000_000); // 480 MHz
            enable_all_peripherals();
        }
        PowerMode::LowPower => {
            // Reduce clock, disable unused peripherals
            rcc.set_sysclk(8_000_000); // 8 MHz
            disable_unused_peripherals();
        }
        PowerMode::Sleep => {
            cortex_m::asm::wfi(); // Wait for interrupt
        }
        PowerMode::DeepSleep => {
            // Configure wake-up sources
            pwr.enter_stop_mode();
        }
    }
}
}

45.5.24. ML Accelerator Integration

Many MCUs have built-in NPUs (Neural Processing Units).

STM32 with X-CUBE-AI

#![allow(unused)]
fn main() {
// Wrapper for ST's X-CUBE-AI generated code
extern "C" {
    fn ai_mnetwork_run(input: *const f32, output: *mut f32) -> i32;
    fn ai_mnetwork_get_input_size() -> u32;
    fn ai_mnetwork_get_output_size() -> u32;
}

pub struct StmAiNetwork;

impl StmAiNetwork {
    pub fn new() -> Self {
        unsafe {
            // Initialize the network
            ai_mnetwork_init();
        }
        Self
    }
    
    pub fn predict(&self, input: &[f32]) -> Vec<f32> {
        let input_size = unsafe { ai_mnetwork_get_input_size() } as usize;
        let output_size = unsafe { ai_mnetwork_get_output_size() } as usize;
        
        assert_eq!(input.len(), input_size);
        
        let mut output = vec![0.0f32; output_size];
        
        unsafe {
            ai_mnetwork_run(input.as_ptr(), output.as_mut_ptr());
        }
        
        output
    }
}
}

Coral Edge TPU

#![allow(unused)]
fn main() {
use edgetpu::EdgeTpuContext;

pub struct CoralInference {
    context: EdgeTpuContext,
    model: Vec<u8>,
}

impl CoralInference {
    pub fn new(model_path: &str) -> Result<Self, Error> {
        let context = EdgeTpuContext::open_device()?;
        let model = std::fs::read(model_path)?;
        
        Ok(Self { context, model })
    }
    
    pub fn predict(&self, input: &[u8]) -> Vec<u8> {
        // Delegate to Edge TPU
        self.context.run_inference(&self.model, input)
    }
}
}

45.5.25. OTA (Over-The-Air) Updates

Deploy model updates remotely.

#![allow(unused)]
fn main() {
use embassy_net::tcp::TcpSocket;
use embedded_storage::nor_flash::NorFlash;

pub struct OtaUpdater<F: NorFlash> {
    flash: F,
    update_partition: u32,
}

impl<F: NorFlash> OtaUpdater<F> {
    pub async fn check_for_update(&mut self, socket: &mut TcpSocket<'_>) -> Result<bool, Error> {
        // Connect to update server
        socket.connect(UPDATE_SERVER).await?;
        
        // Check version
        let current_version = self.get_current_version();
        socket.write_all(b"VERSION ").await?;
        socket.write_all(current_version.as_bytes()).await?;
        
        let mut response = [0u8; 8];
        socket.read_exact(&mut response).await?;
        
        Ok(&response == b"OUTDATED")
    }
    
    pub async fn download_and_flash(&mut self, socket: &mut TcpSocket<'_>) -> Result<(), Error> {
        // Request new firmware
        socket.write_all(b"DOWNLOAD").await?;
        
        // Read size
        let mut size_buf = [0u8; 4];
        socket.read_exact(&mut size_buf).await?;
        let size = u32::from_le_bytes(size_buf);
        
        // Flash in chunks
        let mut offset = self.update_partition;
        let mut buffer = [0u8; 4096];
        let mut remaining = size as usize;
        
        while remaining > 0 {
            let chunk_size = remaining.min(buffer.len());
            socket.read_exact(&mut buffer[..chunk_size]).await?;
            
            // Erase and write
            self.flash.erase(offset, offset + chunk_size as u32)?;
            self.flash.write(offset, &buffer[..chunk_size])?;
            
            offset += chunk_size as u32;
            remaining -= chunk_size;
        }
        
        // Mark update ready
        self.set_update_pending(true);
        
        Ok(())
    }
    
    pub fn apply_update(&mut self) {
        // Copy from update partition to active partition
        // Reset to boot new firmware
        cortex_m::peripheral::SCB::sys_reset();
    }
}
}

45.5.26. Sensor Fusion

Combine multiple sensors for better predictions.

#![allow(unused)]
fn main() {
pub struct SensorFusion {
    imu: Imu,
    magnetometer: Mag,
    kalman_filter: KalmanFilter,
}

impl SensorFusion {
    pub fn update(&mut self) -> Orientation {
        // Read raw sensors
        let accel = self.imu.read_accel();
        let gyro = self.imu.read_gyro();
        let mag = self.magnetometer.read();
        
        // Kalman filter prediction
        self.kalman_filter.predict(gyro);
        
        // Kalman filter update with measurements
        self.kalman_filter.update_accel(accel);
        self.kalman_filter.update_mag(mag);
        
        // Get fused orientation
        self.kalman_filter.get_orientation()
    }
}

pub struct KalmanFilter {
    state: [f32; 4],      // Quaternion
    covariance: [[f32; 4]; 4],
    process_noise: f32,
    measurement_noise: f32,
}

impl KalmanFilter {
    pub fn predict(&mut self, gyro: Vector3) {
        // Update state based on gyroscope
        let dt = 0.01; // 100 Hz
        let omega = Quaternion::from_gyro(gyro, dt);
        
        // q_new = q * omega
        self.state = quaternion_multiply(self.state, omega);
        
        // Update covariance
        // P = P + Q
        for i in 0..4 {
            self.covariance[i][i] += self.process_noise;
        }
    }
    
    pub fn update_accel(&mut self, accel: Vector3) {
        // Compute expected gravity in body frame
        let expected = rotate_vector(self.state, GRAVITY);
        
        // Innovation
        let innovation = vector_subtract(accel, expected);
        
        // Kalman gain and state update
        // ... (full implementation omitted)
    }
}
}

45.5.27. Production Deployment Checklist

Hardware Requirements

Flash: Minimum 512KB for model + firmware
RAM: Minimum 64KB for inference
Clock: 80 MHz+ for real-time inference
ADC: 12-bit minimum for sensor quality

Software Requirements

Watchdog: Prevent hangs
Error handling: Graceful degradation
Logging: Debug via RTT/UART
OTA: Remote updates

Testing

Unit tests: Core algorithms
Hardware-in-loop: Real sensors
Power profiling: Battery life
Stress testing: Edge cases

45.5.28. Final Architecture: Complete Edge ML System

┌─────────────────────────────────────────────────────────────────────┐
│                    Edge ML System Architecture                       │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  Sensors                                                             │
│  ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐                │
│  │Camera   │  │IMU      │  │Mic      │  │Temp     │                │
│  │(DCMI)   │  │(I2C)    │  │(I2S)    │  │(ADC)    │                │
│  └────┬────┘  └────┬────┘  └────┬────┘  └────┬────┘                │
│       │            │            │            │                      │
│  ┌────▼────────────▼────────────▼────────────▼────────────────────┐│
│  │                      DMA Engine                                 ││
│  │  (Zero-copy transfer from peripherals to RAM)                   ││
│  └─────────────────────────────┬───────────────────────────────────┘│
│                                │                                     │
│  ┌─────────────────────────────▼───────────────────────────────────┐│
│  │                     Preprocessing                                ││
│  │  • Normalization  • FFT  • Resize  • Quantization               ││
│  └─────────────────────────────┬───────────────────────────────────┘│
│                                │                                     │
│  ┌─────────────────────────────▼───────────────────────────────────┐│
│  │                      ML Inference                                ││
│  │  • tract-core  • TensorFlow Lite Micro  • NPU delegation        ││
│  └─────────────────────────────┬───────────────────────────────────┘│
│                                │                                     │
│  ┌─────────────┬───────────────┴───────────────┬───────────────────┐│
│  │  Local      │         Alert                 │    Cloud          ││
│  │  Display    │         GPIO/Buzzer           │    (WiFi/LoRa)    ││
│  └─────────────┴───────────────────────────────┴───────────────────┘│
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

Edge ML enables AI everywhere:

Medical devices monitoring patients
Industrial sensors predicting failures
Smart home devices understanding context
Wearables tracking health
Agricultural systems optimizing crops

All running on $5 chips with Rust’s safety guarantees.

[End of Section 45.5]

Keyboard shortcuts

The MLOps Omni-Reference