Keyboard shortcuts

Press or to navigate between chapters

Press ? to show this help

Press Esc to hide this help

41.3. Hardware-in-the-Loop (HIL) Testing

Status: Draft Version: 1.0.0 Tags: #Sim2Real, #HIL, #Embedded, #Rust, #Robotics Author: MLOps Team


Table of Contents

  1. Beyond Simulation: The Need for HIL
  2. SIL vs HIL vs PIL
  3. The Interface: Mocking Reality
  4. Rust Implementation: Virtual CAN Bus
  5. Time Synchronization: PTP and Real-Time Linux
  6. Infrastructure: The HIL Micro-Farm
  7. Safety Protocols: Watchdogs and Kill Switches
  8. Troubleshooting: The “Ghost in the Machine”
  9. Future Trends: Cloud HIL
  10. MLOps Interview Questions
  11. Glossary
  12. Summary Checklist

Prerequisites

Before diving into this chapter, ensure you have the following installed:

  • Rust: 1.70+
  • Hardware: A Raspberry Pi or Jetson Nano (optional, but helpful).
  • Protocol: Basic understanding of CAN Bus.

Beyond Simulation: The Need for HIL

Simulation teaches the “Brain” (Policy) how to plan. But it doesn’t test the “Nerves” (Drivers) or “Muscles” (Actuators).

What Simulation Misses:

  1. Driver Latency: A USB camera driver taking 30ms to wake up.
  2. Bus Saturation: CAN Bus dropping packets at 90% load.
  3. Thermal Throttling: The Jetson GPU slowing down after 5 minutes.

HIL (Hardware-in-the-Loop): Connect the Real Embedded Computer (running the AI) to a Simulated World (providing Sensor Data).


SIL vs HIL vs PIL

AcronymNameWhat Runs Where?Goal
SILSoftware-in-the-LoopAgent and Env on same PC.Train Logic. Fast.
HILHardware-in-the-LoopAgent on Embedded HW. Env on PC.Validate Latency/Drivers.
PILProcessor-in-the-LoopAgent on FPGA/MCU. Env on PC.Validate Timing/FPGA Logic.

The Interface: Mocking Reality

In HIL, the Embedded Computer “thinks” it is talking to motors and cameras. Actually, it is talking to a Simulator via a Bridge.

The Bridge:

  • Real Robot: Camera -> CSI -> /dev/video0.
  • HIL Robot: Simulator -> Ethernet -> HIL Bridge -> /dev/video0 (v4l2loopback).

The AI code should not change. It reads /dev/video0 in both cases.


Rust Implementation: Virtual CAN Bus

Robots use CAN (Controller Area Network) to talk to motors. In HIL, we must fake the Motor Controllers.

Scenario:

  1. Agent sends SetTorque(10NM) to CAN ID 0x100.
  2. Simulator receives this, applies torque to virtual physics model.
  3. Simulator calculates new Velocity.
  4. Bridge sends Status(Vel=5m/s) from CAN ID 0x101.

Project Structure

hil-bridge/
├── Cargo.toml
└── src/
    └── main.rs

Cargo.toml:

[package]
name = "hil-bridge"
version = "0.1.0"
edition = "2021"

[dependencies]
socketcan = "1.7" # Linux SocketCAN
tokio = { version = "1", features = ["full"] }
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
zmq = "0.9" # To talk to Unity

src/main.rs:

//! HIL Bridge: Unity <-> ZeroMQ <-> SocketCAN <-> Robot Brain
//! This process runs on the Simulator PC (Linux).
//! It emulates the CAN bus traffic of real motors.

use socketcan::{CanFrame, CanSocket, EmbeddedFrame, StandardId};
use std::sync::{Arc, Mutex};
use tokio::time::{interval, Duration};

// CAN IDs for a standard Motor Controller
const MOTOR_CMD_ID: u16 = 0x100;
const MOTOR_STATUS_ID: u16 = 0x101;

/// Shared state between the Receiving Task (ZMQ) and Sending Task (CAN)
struct SimState {
    velocity: f32, // From Unity Physics
    torque: f32,   // To Unity Physics
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // 1. Setup Virtual CAN Interface (vcan0)
    // Run: sudo modprobe vcan; sudo ip link add dev vcan0 type vcan; sudo ip link set up vcan0
    let socket = CanSocket::open("vcan0")?;
    
    // 2. Setup ZeroMQ (Mock connection to Unity)
    // We use PUB/SUB to broadcast physics state updates
    let ctx = zmq::Context::new();
    let subscriber = ctx.socket(zmq::SUB)?;
    subscriber.connect("tcp://localhost:5556")?; // Unity Publisher
    subscriber.set_subscribe(b"")?; // Subscribe to all topics
    
    let shared_state = Arc::new(Mutex::new(SimState { velocity: 0.0, torque: 0.0 }));

    // Task 1: Read CAN (Motor Commands from Robot)
    let state_writer = shared_state.clone();
    
    // We spawn a blocking thread for CAN reading because socketcan crate is sync.
    // In production, use `tokio-socketcan` for true async.
    tokio::spawn(async move {
        loop {
            if let Ok(frame) = socket.read_frame() {
                if let Ok(id) = frame.id().standard() {
                    // Check if the frame ID matches our Motor Command ID
                    if id == StandardId::new(MOTOR_CMD_ID).unwrap() {
                        // Parse Torque (Simple serialization)
                        // Real CAN frames use bit-packing (DBC files).
                        let data = frame.data();
                        if data.len() >= 4 {
                            // Assume float32 is packed in first 4 bytes
                            let torque = f32::from_le_bytes([data[0], data[1], data[2], data[3]]);
                            
                            let mut state = state_writer.lock().unwrap();
                            state.torque = torque;
                            println!("Received Torque cmd: {} Nm", torque);
                            // TODO: Send torque to Unity via ZeroMQ REQ
                        }
                    }
                }
            }
            // Small sleep to prevent CPU spin if socket is non-blocking
            tokio::time::sleep(Duration::from_millis(1)).await;
        }
    });

    // Task 2: Write CAN (Sensor Data to Robot)
    // We strictly simulate a 100Hz Motor Controller loop
    let mut ticker = interval(Duration::from_millis(10)); // 10ms = 100Hz
    let socket_tx = CanSocket::open("vcan0")?; // Clone for writing

    loop {
        ticker.tick().await;
        
        let velocity;
        {
            let state = shared_state.lock().unwrap();
            velocity = state.velocity; // Updated by ZMQ subscriber task (not shown)
        }

        // Pack Velocity into CAN Frame (Little Endian float)
        let v_bytes = velocity.to_le_bytes();
        let data = [v_bytes[0], v_bytes[1], v_bytes[2], v_bytes[3], 0, 0, 0, 0];
        
        let frame = CanFrame::new(StandardId::new(MOTOR_STATUS_ID).unwrap(), &data).unwrap();
        
        // Write to bus. The robot will see this as if a real motor responded.
        socket_tx.write_frame(&frame)?;
    }
}

Time Synchronization: PTP and Real-Time Linux

In HIL, “Time” is tricky.

  • Unity Time: Variable. Depends on rendering speed.
  • Robot Time: Real-Time (Wall Clock).

If Unity runs at 0.5x speed (slow rendering), the Robot Control Loop (running at strict 100Hz) will think the world is in slow motion. The Integral term (I in PID) will explode.

Solutions:

  1. Lockstep: The Robot pauses and waits for Unity’s next tick. (Not “True” HIL, but safe).
  2. Hard Real-Time Sim: Ensure Unity runs EXACTLY at Wall Clock speed. Requires high-end PC.
  3. PTP (Precision Time Protocol): Sync the clocks of Sim PC and Robot PC to within 1 microsecond hardware timestamping.

Infrastructure: The HIL Micro-Farm

A scalable HIL setup looks like a server rack.

[ Rack Unit 1 ]
   |-- [ Sim PC (RTX 4090) ] -- Ethernet -- [ Jetson Orin (Agent) ]
   |                                            |
   `-- [ Switch ] ------------------------------'

[ Rack Unit 2 ]
   |-- [ Sim PC (RTX 4090) ] -- Ethernet -- [ Raspberry Pi 5 (Agent) ]

Ops Challenge:

  • Remote Reboot: How to “Reboot” the Jetson remotely? Use Smart PDU (Power Distribution Unit) with an API.
  • Netboot: How to flash new firmware to the Jetson? Use PXE Boot.

Safety Protocols: Watchdogs and Kill Switches

Even in HIL, safety matters. If the Sim crashes but the Bridge keeps sending “Velocity=0”, the Robot might think it’s stopped while the Sim physics (if it were running) would show it falling.

The Watchdog Pattern:

  1. Unity sends a heartbeat counter every frame.
  2. Bridge checks: if (last_heartbeat > 100ms) { EMERGENCY_STOP_CAN_MSG() }.
  3. Robot sees E-Stop and enters safe state.

Troubleshooting: The “Ghost in the Machine”

Scenario 1: The “Hiccup” (Jitter)

  • Symptom: Robot moves smoothly, then jerks every 5 seconds.
  • Cause: Linux Scheduler. A background process (e.g. apt-get update) preempted the HIL Bridge.
  • Fix: Use PREEMPT_RT Kernel patch on the Linux Sim PC. Assign HIL Bridge process nice -n -20 (Realtime priority).

Scenario 2: Network Latency

  • Symptom: Control loop instabilities. Oscillations.
  • Cause: Using WiFi for HIL. WiFi is non-deterministic.
  • Fix: ALWAYS use Ethernet cables. Direct connection (No switch) is best.

Scenario 3: The Ground Loop

  • Symptom: CAN errors or scorched GPIO pins.
  • Cause: Determining the voltage potential difference between the PC USPC and the Robot Ground.
  • Fix: Use Galvanic Isolation (Optocouplers) on your CAN adapters. Never connect two power supplies without a common ground reference, but isolate data lines.

Scenario 4: “Bus Off” State

  • Symptom: The robot stops listening to commands entirely.
  • Cause: You flooded the CAN bus with too many messages. The CAN controller entered “Bus Off” mode to save the bus.
  • Fix: Respect the Bandwidth. 1Mbps CAN = ~2000 frames/sec max. Don’t send debug logs over CAN.

AWS RoboMaker and other services are trying to offer “Cloud HIL”. Instead of physical Jetsons, they use QEMU Emulation of the ARM processor in the cloud.

  • Pros: Infinite scale.
  • Cons: QEMU is slower than real hardware. Timing bugs are missed.

MLOps Interview Questions

  1. Q: How do you test a “Camera Driver” crash in HIL? A: The HIL Bridge can simulate faults. It can intentionally stop sending v4l2 frames or send garbage data to test the Agent’s error handling.

  2. Q: What is vcan0? A: Virtual CAN interface in Linux. It acts like a loopback device for CAN bus frames, allowing code to be tested without physical CAN transceivers.

  3. Q: Why is jitter bad for PID controllers? A: PID assumes constant $dt$. If $dt$ varies (jitter), the derivative term $D = (e_t - e_{t-1}) / dt$ becomes noisy, causing the motors to hum or shake.

  4. Q: How do you power cycle a frozen HIL board remotely? A: Use a Smart PDU (Power Distribution Unit) with an API (SNMP/HTTP) to toggle the power outlet. Or use a Relay controlled by the Sim PC (GPIO).

  5. Q: Difference between “ Soft Real-Time“ and “Hard Real-Time”? A: Soft: “Usually meets deadline” (Video Streaming). Hard: “Missed deadline = Failure” (Airbag, ABS Brakes). HIL for flight control requires Hard RT.


Glossary

  • HIL (Hardware-in-the-Loop): Testing real embedded hardware against a simulation.
  • PTP (Precision Time Protocol): IEEE 1588. Protocol for sub-microsecond clock sync.
  • CAN Bus: Controller Area Network. Robust vehicle bus standard.
  • Watchdog: A timer that triggers a system reset/safe-mode if not reset periodically.
  • PREEMPT_RT: Linux kernel patch turning Linux into a Real-Time OS.

Summary Checklist

  1. Network: Use Gigabit Ethernet cables (Cat6) between Sim and Agent. Disable “Green Ethernet” power saving.
  2. Kernel: Install linux-image-rt kernel on the Bridge machine to minimize jitter.
  3. Isolation: Isolate CPU cores (isolcpus=2,3) for the Bridge process to prevent context switching.
  4. Monitoring: Run candump vcan0 to inspect raw traffic during debugging.
  5. Validation: Measure Round-Trip Time (RTT) from “Motor Command” to “Sensor Update”. Should be < 10ms for 100Hz loops.