45.5. Edge & Embedded ML: Rust on Bare Metal
Warning
The Constraint: You have 320KB of RAM. You have no OS (no Linux, no Windows). You have no
malloc. Python cannot run here. C++ is unsafe. Rust is the only high-level language that can targetno_std.
45.5.1. Understanding no_std
In normal Rust (std), you have:
Vec<T>(Heap)std::fs(Filesystem)std::thread(Threads)
In Embedded Rust (core), you have:
slice(Stack arrays)iter(Iterators)Result/Option(Error handling)
You lose convenience, but you gain Determinism. You know exactly how many bytes your program uses.
The Cargo.toml
[package]
name = "embedded-ml"
version = "0.1.0"
edition = "2021"
[dependencies]
cortex-m = "0.7"
cortex-m-rt = "0.7" # Runtime (Reset handler)
embedded-hal = "0.2"
panic-halt = "0.2" # Halt on panic (no stack trace printing)
microflow = "0.1" # Hypothetical TinyML inference crate
45.5.2. Your First Embedded Program (ESP32-C3)
The ESP32-C3 is a RISC-V microcontroller. Cost: $2.
#![no_std]
#![no_main]
use esp32c3_hal::{
clock::ClockControl,
gpio::IO,
peripherals::Peripherals,
prelude::*,
timer::TimerGroup,
};
use panic_halt as _;
#[entry]
fn main() -> ! {
let peripherals = Peripherals::take();
let system = peripherals.SYSTEM.split();
let clocks = ClockControl::boot_defaults(system.clock_control).freeze();
let io = IO::new(peripherals.GPIO, peripherals.IO_MUX);
let mut led = io.pins.gpio2.into_push_pull_output();
// The Inference Loop
loop {
led.toggle().unwrap();
// ML Inference would go here
// run_model();
// Busy wait (bad power efficiency)
for _ in 0..100_000 {}
}
}
45.5.3. Managing Memory: The Allocator Question
ML models need weights. Weights need memory.
If you don’t have an OS malloc, where do Vec<f32> go?
Option 1: Static Allocation (Safest)
Everything is static buffer: [f32; 1000].
- Pros: Impossible to run OOM at runtime. Linker fails if RAM is insufficient.
- Cons: Inflexible.
#![allow(unused)]
fn main() {
static mut HEAP_MEM: [u32; 1024] = [0; 1024];
fn inference(input: &[f32]) {
// Zero allocation inference
let mut output = [0.0; 10];
// ...
}
}
Option 2: Embedded Allocator (Flexible)
We can implement a simple “Bump Allocator” to enable Vec support.
#![allow(unused)]
fn main() {
use embedded_alloc::Heap;
#[global_allocator]
static HEAP: Heap = Heap::empty();
fn init_heap() {
use core::mem::MaybeUninit;
const HEAP_SIZE: usize = 32 * 1024; // 32KB
static mut HEAP_MEM: [MaybeUninit<u8>; HEAP_SIZE] = [MaybeUninit::uninit(); HEAP_SIZE];
unsafe { HEAP.init(HEAP_MEM.as_ptr() as usize, HEAP_SIZE) }
}
}
Now you can use extern crate alloc; and Vec<f32>!
Just be careful: Recursion + Allocation = Stack Overflow.
45.5.4. TinyML: tflite-micro vs Rust
Google’s TensorFlow Lite for Microcontrollers is written in C++. It requires defining a “Tensor Arena” (a big byte array).
Rust Approach (tract or microflow):
Rust can verify at compile time if your model fits in RAM.
Example: Audio Keyword Spotting (Rust)
#![allow(unused)]
fn main() {
// 1. ADC (Microphone) Interrupt
#[interrupt]
fn ADC0() {
let sample = adc.read();
RING_BUFFER.push(sample);
}
// 2. FFT Feature Extraction (no_std)
use microfft::real::rfft_256;
fn extract_features() -> [f32; 128] {
let mut buffer = [0.0; 256];
// ... fill buffer from RING_BUFFER ...
let spectrum = rfft_256(&mut buffer);
// ... compute power ...
}
// 3. Inference
fn run_inference(features: &[f32; 128]) -> bool {
// Hardcoded weights (Flash Memory)
const W1: [[f32; 64]; 128] = include_weights!("layer1.bin");
// Matrix Mul logic (f32, no SIMD on Cortex-M0)
// ...
}
}
45.5.5. Peripherals: Interacting with Sensors
ML input comes from sensors.
Rust’s embedded-hal traits provide a universal API.
Whether you are on STM32, ESP32, or nRF52, the code looks the same.
#![allow(unused)]
fn main() {
use embedded_hal::blocking::i2c::WriteRead;
const IMU_ADDR: u8 = 0x68;
fn read_accelerometer<I2C>(i2c: &mut I2C) -> [i16; 3]
where I2C: WriteRead {
let mut buffer = [0u8; 6];
// Write 0x3B (ACCEL_XOUT_H register), Read 6 bytes
i2c.write_read(IMU_ADDR, &[0x3B], &mut buffer).unwrap();
let x = i16::from_be_bytes([buffer[0], buffer[1]]);
let y = i16::from_be_bytes([buffer[2], buffer[3]]);
let z = i16::from_be_bytes([buffer[4], buffer[5]]);
[x, y, z]
}
}
45.5.6. Deployment: probe-rs
In C++, you use OpenOCD and GDB. It’s complex.
In Rust, cargo flash just works.
# Flash the code to the plugged-in chip
cargo flash --chip esp32c3 --release
Monitor Logs (RTT):
C++ printf requires configuring UART.
Rust defmt (Deferred Formatting) sends compressed logs over the debug probe. It is microscopically cheap (microseconds).
#![allow(unused)]
fn main() {
use defmt::info;
info!("Inference took {} ms", latency);
}
45.5.7. Battery Life Optimization
Rust’s ownership model helps power consumption too.
If you own the Peripheral, you know nobody else is using it. You can safely power it down.
#![allow(unused)]
fn main() {
{
let i2c = peripherals.I2C0.into_active();
let data = read_sensor(&i2c);
} // i2c goes out of scope -> Drop impl powers down the peripheral automatically.
}
This pattern implies Zero-Cost Power Management.
45.5.8. Case Study: Smart Agriculture Node
Goal: Detect pests using microphone audio. Device: nRF52840 (Bluetooth + Cortex M4). Power Budget: 1 year on Coin Cell.
Architecture:
- Sleep: CPU OFF.
- Wake on Sound: Low-power comparator triggers interrupt.
- Record: DMA transfers audio to RAM (CPU sleeping).
- Infer: Rust
microfft+ Tiny Neural Net (CPU 100%). - Alert: If pest detected, wake up Bluetooth Radio and send packet.
- Sleep.
Why Rust? Memory safety ensures the complex state machine (Sleep -> Wake -> DMA -> BLE) never enters an undefined state. In C, race conditions in Interrupt Handlers are notoriously common.
45.5.9. The “Safe” Embedded Pattern: heapless
Allocating memory (Heap) on a device with 16KB RAM is risky (Fragmentation).
The heapless crate provides standard collections that live on the Stack.
#![allow(unused)]
fn main() {
use heapless::{Vec, String, FnvIndexMap};
fn safe_buffers() {
// A vector with max capacity 32.
// Allocated as a fixed-size array [T; 32] on stack.
let mut buffer: Vec<f32, 32> = Vec::new();
// Pushing beyond 32 returns Result::Err, not a crash.
// buffer.push(1.0).unwrap();
// A string of max 64 chars
let mut log_line: String<64> = String::new();
}
}
This guarantees Worst Case Execution Memory Usage at compile time.
45.5.10. Async Embedded: The embassy Revolution
Traditionally, you use an RTOS like FreeRTOS to handle tasks.
In Rust, async/await is a compile-time state machine transformation.
This means you can have multitasking without an OS kernel.
Embassy is the standard framework for this.
use embassy_executor::Spawner;
use embassy_time::{Duration, Timer};
#[embassy_executor::task]
async fn blink_task(pin: AnyPin) {
loop {
pin.toggle();
Timer::after(Duration::from_millis(500)).await;
// The CPU sleeps here!
}
}
#[embassy_executor::task]
async fn infer_task() {
loop {
let input = wait_for_sensor().await;
let output = model.predict(input);
send_over_lora(output).await;
}
}
#[embassy_executor::main]
async fn main(spawner: Spawner) {
// Spawn two concurrent tasks onto the same single core.
// The compiler generates the interleaving state machine.
spawner.spawn(blink_task(led)).unwrap();
spawner.spawn(infer_task()).unwrap();
}
Advantage over FreeRTOS:
- Memory: Each task needs a stack in FreeRTOS. In Embassy, they share the stack.
- Safety: Data races between tasks are caught at compile time.
45.5.11. Digital Signal Processing (DSP)
Before ML, you need DSP. Rust has excellent iterator optimizations for this.
#![allow(unused)]
fn main() {
struct LowPassFilter {
alpha: f32,
last: f32,
}
impl LowPassFilter {
fn update(&mut self, input: f32) -> f32 {
self.last = self.last + self.alpha * (input - self.last);
self.last
}
}
// Zero-Cost Abstraction
// This iterator compile down to a single vectorized loop.
fn filter_buffer(input: &[f32], output: &mut [f32]) {
let mut lpf = LowPassFilter { alpha: 0.1, last: 0.0 };
input.iter()
.zip(output.iter_mut())
.for_each(|(in_val, out_val)| {
*out_val = lpf.update(*in_val);
});
}
}
45.5.12. OTA Updates: embassy-boot
Deploying 1000 IoT sensors is easy. Updating them is hard. Rust prevents “Bricking” the device. We use A/B partitioning.
- Bootloader: Checks Framebuffer CRC.
- Partition A: Active App.
- Partition B: Incoming App.
#![allow(unused)]
fn main() {
// Updating Logic
async fn update_firmware(uart: &mut Uart) {
let mut writer = PartitionB::writer();
while let Some(chunk) = uart.read_chunk().await {
writer.write(chunk).await;
}
// Verify Signature (Ed25519)
if verify_signature(writer.digest()) {
embassy_boot::set_boot_partition(PartitionB);
cortex_m::peripheral::SCB::sys_reset();
}
}
}
If signature fails, the device reboots into Partition A. Safe.
45.5.13. Hardware-in-the-Loop (HIL) Testing with QEMU
You don’t need the physical board to test code.
qemu-system-arm supports popular boards (micro:bit, STM32).
Cargo Config:
[target.thumbv7em-none-eabihf]
runner = "qemu-system-arm -cpu cortex-m4 -machine lm3s6965evb -nographic -semihosting -kernel"
Now, cargo run launches QEMU.
You can mock sensors by writing to specific memory addresses that QEMU intercepts.
45.5.14. Final Checklist for Edge AI
- Model Size: Does it fit in Flash? (Use
cargo size -- -A) - RAM: Does inference fit in Stack/Heap? (Use
heaplessto be sure). - Power: Are you sleeping when idle? (Use
embassy). - Updates: Can you recover from a bad update? (Use A/B partitions).
- Monitoring: Use
defmtfor efficient logging.
45.5.15. Deep Dive: Memory-Mapped I/O and PACs
How does led.toggle() actually work?
In C, you do *(volatile uint32_t*)(0x50000000) |= (1 << 5). This is unsafe.
In Rust, we use PACs (Peripheral Access Crates) generated from SVD files via svd2rust.
The Magic of svd2rust
The vendor (ST, Espressif) provides an XML file (SVD) describing every register address.
svd2rust converts this into safe Rust code.
#![allow(unused)]
fn main() {
// C-style (unsafe)
unsafe {
let gpio_out = 0x5000_0504 as *mut u32;
*gpio_out |= 1 << 5;
}
// Rust PAC (Safe)
let dp = pac::Peripherals::take().unwrap();
let gpioa = dp.GPIOA;
// The closure ensures atomic Read-Modify-Write
gpioa.odr.modify(|r, w| w.odr5().set_bit());
}
The Rust compiler collapses all this “abstraction” into the exact same single assembly instruction (LDR, ORR, STR) as the C code. Zero Overhead.
45.5.16. Direct Memory Access (DMA): The MLOps Accelerator
In MLOps, we move heavy tensors. Copying 1MB of audio data byte-by-byte using the CPU is slow. DMA allows the hardware to copy memory while the CPU sleeps (or runs inference).
DMA with embedded-dma
#![allow(unused)]
fn main() {
use embedded_dma::{ReadBuffer, WriteBuffer};
// 1. Setup Buffers
static mut RX_BUF: [u8; 1024] = [0; 1024];
fn record_audio_dma(adc: &ADC, dma: &mut DMA) {
// 2. Configure Transfer
// Source: ADC Data Register
// Dest: RX_BUF in RAM
let transfer = dma.transfer(
adc.data_register(),
unsafe { &mut RX_BUF },
);
// 3. Start (Non-blocking)
let transfer_handle = transfer.start();
// 4. Do other work (e.g. Inference on previous buffer)
run_inference();
// 5. Wait for finish
transfer_handle.wait();
}
}
45.5.17. Custom Panic Handlers: The “Blue Screen” of LEDS
When unwrap() fails in no_std, where does the error go?
There is no console.
We write a handler that blinks the error code in Morse Code on the Status LED.
#![allow(unused)]
fn main() {
#[panic_handler]
fn panic(_info: &core::panic::PanicInfo) -> ! {
// 1. Disable Interrupts (Critical Section)
cortex_m::interrupt::disable();
// 2. Get LED hardware
// Note: We must use 'steal()' because Peripherals might be already taken
let p = unsafe { pac::Peripherals::steal() };
let mut led = p.GPIOC.odr;
// 3. Blink "SOS" (... --- ...)
loop {
blink_dot(&mut led);
blink_dot(&mut led);
blink_dot(&mut led);
blink_dash(&mut led);
// ...
}
}
}
This is crucial for debugging field devices where you don’t have a UART cable attached.
45.5.18. Writing a Bootloader in Rust
If you want OTA, you need a custom Bootloader.
It resides at address 0x0800_0000 (on STM32).
It decides whether to jump to 0x0801_0000 (App A) or 0x0802_0000 (App B).
#[entry]
fn main() -> ! {
let p = pac::Peripherals::take().unwrap();
// 1. Check Button State
if p.GPIOC.idr.read().idr13().is_low() {
// Recovery Mode
flash_led();
loop {}
}
// 2. Validate App Checksum
let app_ptr = 0x0801_0000 as *const u32;
if verify_checksum(app_ptr) {
// 3. Jump to Application
unsafe {
let stack_ptr = *app_ptr;
let reset_vector = *(app_ptr.offset(1));
// Set Main Stack Pointer
cortex_m::register::msp::write(stack_ptr);
// Re-interpret the address as a function and call it
let output_fn: extern "C" fn() -> ! = core::mem::transmute(reset_vector);
output_fn();
}
}
// Fallback
loop {}
}
45.5.19. Benchmarking: Counting Cycles
std::time::Instant doesn’t exist.
On ARM Cortex-M, we use the DWT (Data Watchpoint and Trace) Cycle Counter (CYCCNT).
#![allow(unused)]
fn main() {
use cortex_m::peripheral::DWT;
fn measure_inference() {
let mut dwt = unsafe { pac::CorePeripherals::steal().DWT };
// Enable Cycle Counter
dwt.enable_cycle_counter();
let start = DWT::get_cycle_count();
// Run Model
let _ = model.predict(&input);
let end = DWT::get_cycle_count();
let cycles = end - start;
let time_ms = cycles as f32 / (CLOCK_HZ as f32 / 1000.0);
defmt::info!("Inference Cycles: {}, Time: {} ms", cycles, time_ms);
}
}
This gives you nanosecond-precision profiling. You can count exactly how many cycles a Matrix Multiplication takes.
45.5.20. Cargo Embed & Defmt
The tooling experience is superior to C.
cargo-embed (by Ferrous Systems) is an all-in-one tool.
Embed.toml:
[default.probe]
protocol = "Swd"
[default.rtt]
enabled = true
[default.gdb]
enabled = false
Usage: cargo embed --release.
- Compiles.
- Flashes.
- Resets chip.
- Opens RTT console to show
defmtlogs. All in 2 seconds.
45.5.21. Final Exam: The Spec Sheet
Scenario: You are building a “Smart Doorbell” with Face Recognition.
- MCU: STM32H7 (480MHz, 1MB RAM).
- Camera: OV2640 (DCMI interface).
- Model: MobileNetV2-SSD (Quantized int8).
Stack:
- Driver:
stm32h7xx-hal(DCMI for Camera). - DMA: Transfer Image -> RAM (Double buffering).
- Preprocessing:
image-proc(Resize 320x240 -> 96x96). - Inference:
tract-core(Pulse backend). - Output:
embedded-graphics(Draw Box on LCD).
In C++, integrating these 5 components (Vendor HAL + OpenCV port + TFLite + GUI) would take months.
In Rust, cargo add and trait compatibility make it a 2-week job.
[End of Section 45.5]
45.5.22. Real-Time Operating Systems (RTOS) Integration
For hard real-time requirements, integrate with RTOS.
Embassy: Async on Bare Metal
#![no_std]
#![no_main]
use embassy_executor::Spawner;
use embassy_time::{Duration, Timer, Instant};
use embassy_sync::channel::Channel;
use embassy_sync::blocking_mutex::raw::ThreadModeRawMutex;
// Channel for sensor data
static SENSOR_CHANNEL: Channel<ThreadModeRawMutex, SensorData, 10> = Channel::new();
#[embassy_executor::task]
async fn sensor_task() {
let mut adc = Adc::new();
loop {
let reading = adc.read().await;
let data = SensorData {
timestamp: Instant::now(),
value: reading,
};
SENSOR_CHANNEL.send(data).await;
Timer::after(Duration::from_millis(10)).await; // 100 Hz sampling
}
}
#[embassy_executor::task]
async fn inference_task() {
let model = load_model();
let mut buffer = RingBuffer::new(100);
loop {
let data = SENSOR_CHANNEL.receive().await;
buffer.push(data);
if buffer.is_full() {
let features = extract_features(&buffer);
let prediction = model.predict(&features);
if prediction.anomaly_detected() {
trigger_alert().await;
}
buffer.clear();
}
}
}
#[embassy_executor::main]
async fn main(spawner: Spawner) {
spawner.spawn(sensor_task()).unwrap();
spawner.spawn(inference_task()).unwrap();
}
FreeRTOS Integration
use freertos_rust::*;
fn main() {
// Create tasks
Task::new()
.name("sensor")
.stack_size(2048)
.priority(TaskPriority(3))
.start(sensor_task)
.unwrap();
Task::new()
.name("inference")
.stack_size(4096) // ML needs more stack
.priority(TaskPriority(2))
.start(inference_task)
.unwrap();
// Start scheduler
FreeRtosUtils::start_scheduler();
}
fn inference_task(_: ()) {
let model = TinyModel::load();
let queue = Queue::<SensorData>::new(10).unwrap();
loop {
if let Ok(data) = queue.receive(Duration::ms(100)) {
let result = model.predict(&data.features);
// Process result...
}
}
}
45.5.23. Power Management
Battery life is critical for edge devices.
use embassy_stm32::low_power::{stop_with_rtc, Executor};
#[embassy_executor::main]
async fn main(spawner: Spawner) {
let p = embassy_stm32::init(Default::default());
// Configure RTC for wake-up
let rtc = Rtc::new(p.RTC, RtcClockSource::LSE);
loop {
// 1. Collect sensor data
let data = read_sensors().await;
// 2. Run inference
let result = model.predict(&data);
// 3. Transmit if interesting
if result.is_significant() {
radio.transmit(&result).await;
}
// 4. Enter low-power mode for 5 seconds
stop_with_rtc(&rtc, Duration::from_secs(5)).await;
// CPU wakes up here after 5 seconds
}
}
Power Profiles
#![allow(unused)]
fn main() {
#[derive(Clone, Copy)]
pub enum PowerMode {
Active, // Full speed, max power
LowPower, // Reduced clock, peripherals off
Sleep, // CPU halted, RAM retained
DeepSleep, // Only RTC running
}
pub fn set_power_mode(mode: PowerMode) {
match mode {
PowerMode::Active => {
// Max performance
rcc.set_sysclk(480_000_000); // 480 MHz
enable_all_peripherals();
}
PowerMode::LowPower => {
// Reduce clock, disable unused peripherals
rcc.set_sysclk(8_000_000); // 8 MHz
disable_unused_peripherals();
}
PowerMode::Sleep => {
cortex_m::asm::wfi(); // Wait for interrupt
}
PowerMode::DeepSleep => {
// Configure wake-up sources
pwr.enter_stop_mode();
}
}
}
}
45.5.24. ML Accelerator Integration
Many MCUs have built-in NPUs (Neural Processing Units).
STM32 with X-CUBE-AI
#![allow(unused)]
fn main() {
// Wrapper for ST's X-CUBE-AI generated code
extern "C" {
fn ai_mnetwork_run(input: *const f32, output: *mut f32) -> i32;
fn ai_mnetwork_get_input_size() -> u32;
fn ai_mnetwork_get_output_size() -> u32;
}
pub struct StmAiNetwork;
impl StmAiNetwork {
pub fn new() -> Self {
unsafe {
// Initialize the network
ai_mnetwork_init();
}
Self
}
pub fn predict(&self, input: &[f32]) -> Vec<f32> {
let input_size = unsafe { ai_mnetwork_get_input_size() } as usize;
let output_size = unsafe { ai_mnetwork_get_output_size() } as usize;
assert_eq!(input.len(), input_size);
let mut output = vec![0.0f32; output_size];
unsafe {
ai_mnetwork_run(input.as_ptr(), output.as_mut_ptr());
}
output
}
}
}
Coral Edge TPU
#![allow(unused)]
fn main() {
use edgetpu::EdgeTpuContext;
pub struct CoralInference {
context: EdgeTpuContext,
model: Vec<u8>,
}
impl CoralInference {
pub fn new(model_path: &str) -> Result<Self, Error> {
let context = EdgeTpuContext::open_device()?;
let model = std::fs::read(model_path)?;
Ok(Self { context, model })
}
pub fn predict(&self, input: &[u8]) -> Vec<u8> {
// Delegate to Edge TPU
self.context.run_inference(&self.model, input)
}
}
}
45.5.25. OTA (Over-The-Air) Updates
Deploy model updates remotely.
#![allow(unused)]
fn main() {
use embassy_net::tcp::TcpSocket;
use embedded_storage::nor_flash::NorFlash;
pub struct OtaUpdater<F: NorFlash> {
flash: F,
update_partition: u32,
}
impl<F: NorFlash> OtaUpdater<F> {
pub async fn check_for_update(&mut self, socket: &mut TcpSocket<'_>) -> Result<bool, Error> {
// Connect to update server
socket.connect(UPDATE_SERVER).await?;
// Check version
let current_version = self.get_current_version();
socket.write_all(b"VERSION ").await?;
socket.write_all(current_version.as_bytes()).await?;
let mut response = [0u8; 8];
socket.read_exact(&mut response).await?;
Ok(&response == b"OUTDATED")
}
pub async fn download_and_flash(&mut self, socket: &mut TcpSocket<'_>) -> Result<(), Error> {
// Request new firmware
socket.write_all(b"DOWNLOAD").await?;
// Read size
let mut size_buf = [0u8; 4];
socket.read_exact(&mut size_buf).await?;
let size = u32::from_le_bytes(size_buf);
// Flash in chunks
let mut offset = self.update_partition;
let mut buffer = [0u8; 4096];
let mut remaining = size as usize;
while remaining > 0 {
let chunk_size = remaining.min(buffer.len());
socket.read_exact(&mut buffer[..chunk_size]).await?;
// Erase and write
self.flash.erase(offset, offset + chunk_size as u32)?;
self.flash.write(offset, &buffer[..chunk_size])?;
offset += chunk_size as u32;
remaining -= chunk_size;
}
// Mark update ready
self.set_update_pending(true);
Ok(())
}
pub fn apply_update(&mut self) {
// Copy from update partition to active partition
// Reset to boot new firmware
cortex_m::peripheral::SCB::sys_reset();
}
}
}
45.5.26. Sensor Fusion
Combine multiple sensors for better predictions.
#![allow(unused)]
fn main() {
pub struct SensorFusion {
imu: Imu,
magnetometer: Mag,
kalman_filter: KalmanFilter,
}
impl SensorFusion {
pub fn update(&mut self) -> Orientation {
// Read raw sensors
let accel = self.imu.read_accel();
let gyro = self.imu.read_gyro();
let mag = self.magnetometer.read();
// Kalman filter prediction
self.kalman_filter.predict(gyro);
// Kalman filter update with measurements
self.kalman_filter.update_accel(accel);
self.kalman_filter.update_mag(mag);
// Get fused orientation
self.kalman_filter.get_orientation()
}
}
pub struct KalmanFilter {
state: [f32; 4], // Quaternion
covariance: [[f32; 4]; 4],
process_noise: f32,
measurement_noise: f32,
}
impl KalmanFilter {
pub fn predict(&mut self, gyro: Vector3) {
// Update state based on gyroscope
let dt = 0.01; // 100 Hz
let omega = Quaternion::from_gyro(gyro, dt);
// q_new = q * omega
self.state = quaternion_multiply(self.state, omega);
// Update covariance
// P = P + Q
for i in 0..4 {
self.covariance[i][i] += self.process_noise;
}
}
pub fn update_accel(&mut self, accel: Vector3) {
// Compute expected gravity in body frame
let expected = rotate_vector(self.state, GRAVITY);
// Innovation
let innovation = vector_subtract(accel, expected);
// Kalman gain and state update
// ... (full implementation omitted)
}
}
}
45.5.27. Production Deployment Checklist
Hardware Requirements
- Flash: Minimum 512KB for model + firmware
- RAM: Minimum 64KB for inference
- Clock: 80 MHz+ for real-time inference
- ADC: 12-bit minimum for sensor quality
Software Requirements
- Watchdog: Prevent hangs
- Error handling: Graceful degradation
- Logging: Debug via RTT/UART
- OTA: Remote updates
Testing
- Unit tests: Core algorithms
- Hardware-in-loop: Real sensors
- Power profiling: Battery life
- Stress testing: Edge cases
45.5.28. Final Architecture: Complete Edge ML System
┌─────────────────────────────────────────────────────────────────────┐
│ Edge ML System Architecture │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Sensors │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │Camera │ │IMU │ │Mic │ │Temp │ │
│ │(DCMI) │ │(I2C) │ │(I2S) │ │(ADC) │ │
│ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │
│ │ │ │ │ │
│ ┌────▼────────────▼────────────▼────────────▼────────────────────┐│
│ │ DMA Engine ││
│ │ (Zero-copy transfer from peripherals to RAM) ││
│ └─────────────────────────────┬───────────────────────────────────┘│
│ │ │
│ ┌─────────────────────────────▼───────────────────────────────────┐│
│ │ Preprocessing ││
│ │ • Normalization • FFT • Resize • Quantization ││
│ └─────────────────────────────┬───────────────────────────────────┘│
│ │ │
│ ┌─────────────────────────────▼───────────────────────────────────┐│
│ │ ML Inference ││
│ │ • tract-core • TensorFlow Lite Micro • NPU delegation ││
│ └─────────────────────────────┬───────────────────────────────────┘│
│ │ │
│ ┌─────────────┬───────────────┴───────────────┬───────────────────┐│
│ │ Local │ Alert │ Cloud ││
│ │ Display │ GPIO/Buzzer │ (WiFi/LoRa) ││
│ └─────────────┴───────────────────────────────┴───────────────────┘│
│ │
└─────────────────────────────────────────────────────────────────────┘
Edge ML enables AI everywhere:
- Medical devices monitoring patients
- Industrial sensors predicting failures
- Smart home devices understanding context
- Wearables tracking health
- Agricultural systems optimizing crops
All running on $5 chips with Rust’s safety guarantees.
[End of Section 45.5]