41.2. Domain Randomization & Synthetic Data
Status: Draft Version: 1.0.0 Tags: #Sim2Real, #DataGen, #Python, #ZeroMQ, #ComputerVision Author: MLOps Team
Table of Contents
- The “Reality Gap” Dilemma
- Taxonomy of Randomization
- Configuration as Code: The DR Schema
- Python Implementation: Remote Control DataGen
- Unity Side: The Command Listener
- Visual vs Dynamics Randomization
- Infrastructure: Massive Parallel Data Generation
- Troubleshooting: Common Artifacts
- Future Trends: Differentiable Simulation
- MLOps Interview Questions
- Glossary
- Summary Checklist
Prerequisites
Before diving into this chapter, ensure you have the following installed:
- Python:
pyzmq(ZeroMQ),pydantic. - Unity: A scene with a movable object.
The “Reality Gap” Dilemma
If you train a Robot Arm to pick up a Red Cube in a White Room, and then deploy it to a Red Cube in a Beige Room, it fails. Neural Networks overfit to the simulator’s specific rendering artifacts and physics biases.
Solution: Domain Randomization (DR) Instead of trying to make the simulation perfect (Photorealism), we make it diverse. We randomize textures, lighting, camera angles, friction, and mass. If the model sees 10,000 variations, the “Real World” just becomes the 10,001st variation.
Taxonomy of Randomization
- Visual Randomization: Changing colors, textures, lighting intensity, glare.
- Goal: Invariance to lighting conditions.
- Dynamics Randomization: Changing mass, friction, damping, joint limits.
- Goal: Robustness to hardware wear and tear.
- Procedural Generation: Changing the topology of the world (Room dimensions, Obstacle placement).
- Goal: Generalization to new environments.
Configuration as Code: The DR Schema
We define the randomization distribution in a JSON/YAML file. This is our “Dataset Definition”.
from pydantic import BaseModel, Field
from typing import List, Tuple
class LightConfig(BaseModel):
# Tuple[min, max]
intensity_range: Tuple[float, float] = (0.5, 2.0)
# Hue Jitter amount (0.0 = no color change, 1.0 = full rainbow)
color_hsv_jitter: float = 0.1
class ObjectConfig(BaseModel):
# Dynamic properties are Critical for contact-rich tasks
mass_range: Tuple[float, float] = (0.1, 5.0)
friction_range: Tuple[float, float] = (0.5, 0.9)
# Visual properties
scale_range: Tuple[float, float] = (0.8, 1.2)
# How many distractor objects to spawn
distractor_count: int = 5
class ScenarioConfig(BaseModel):
version: str = "1.0.0"
seed: int = 42
lighting: LightConfig
objects: ObjectConfig
Python Implementation: Remote Control DataGen
We don’t want to write C# logic for MLOps. We want to control Unity from Python. We use ZeroMQ (Request-Reply pattern).
Project Structure
datagen/
├── main.py
├── schema.py
└── client.py
client.py:
import zmq
import json
import time
from schema import ScenarioConfig
class SimClient:
"""
SimClient acts as the 'God Mode' controller for the simulation.
It tells Unity exactly what to spawn and where.
"""
def __init__(self, port: int = 5555):
self.context = zmq.Context()
self.socket = self.context.socket(zmq.REQ)
# Unity runs inside Docker, mapped to localhost:5555
self.socket.connect(f"tcp://localhost:{port}")
def send_command(self, cmd: str, data: dict):
payload = json.dumps({"command": cmd, "data": data})
self.socket.send_string(payload)
# Blocking wait for Unity to confirm.
# This ensures frame-perfect synchronization.
reply = self.socket.recv_string()
return json.loads(reply)
def randomize_scene(self, config: ScenarioConfig):
# 1. Randomize Lights
self.send_command("set_lighting", {
"intensity": 1.5, # In real app, sample from config.lighting
"color": [1.0, 0.9, 0.8]
})
# 2. Spawn Objects
for i in range(config.objects.distractor_count):
self.send_command("spawn_object", {
"id": i,
"type": "cube",
"mass": 2.5,
"position": [0, 0, 0] # TODO: Add random position logic
})
# 3. Capture Frame
# After randomization is applied, we take the photo.
return self.send_command("capture_frame", {})
Unity Side: The Command Listener
In Unity, we attach a C# script to a GameObject that listens on port 5555.
using UnityEngine;
using NetMQ;
using NetMQ.Sockets;
using Newtonsoft.Json.Linq;
using System.IO;
// Requires AsyncIO and NetMQ DLLs in the Plugins folder
public class ZeroMQListener : MonoBehaviour
{
private ResponseSocket server;
public Light sceneLight;
private bool running = true;
void Start()
{
// Required for NetMQ initialization on some platforms
AsyncIO.ForceDotNet.Force();
server = new ResponseSocket("@tcp://*:5555");
Debug.Log("ZeroMQ Listener started on port 5555");
}
void Update()
{
if (!running) return;
// Non-blocking poll in the game loop
// We handle one request per frame to ensure stability
string message = null;
if (server.TryReceiveFrameString(out message))
{
var json = JObject.Parse(message);
string cmd = (string)json["command"];
if (cmd == "set_lighting")
{
float intensity = (float)json["data"]["intensity"];
sceneLight.intensity = intensity;
// Acknowledge receipt
server.SendFrame("{\"status\": \"ok\"}");
}
else if (cmd == "capture_frame")
{
// Trigger ScreenCapture
// Note: Capturing usually takes 1 frame to render
string path = Path.Combine(Application.persistentDataPath, "img_0.png");
ScreenCapture.CaptureScreenshot(path);
server.SendFrame($"{{\"path\": \"{path}\"}}");
}
else
{
server.SendFrame("{\"error\": \"unknown_command\"}");
}
}
}
void OnDestroy()
{
running = false;
server?.Dispose();
NetMQConfig.Cleanup();
}
}
Visual vs Dynamics Randomization
Visual (Texture Swapping)
- Technique: Use
MaterialPropertyBlockin Unity to change colors without creating new materials (avoids GC). - Advanced: Use “Triplanar Mapping” shaders so textures don’t stretch when we scale objects.
Dynamics (Physics Fuzzing)
- Technique: Modifying
Rigidbody.massandPhysicMaterial.dynamicFrictionat the start of every episode. - Danger: If you randomize gravity to be negative, the robot flies away.
- Bounds: Always sanity check random values. Mass > 0. Friction [0, 1].
Infrastructure: Massive Parallel Data Generation
Generating 1 Million synthetic images on a laptop takes forever. We scale out using Kubernetes Jobs.
[ Orchestrator (Python) ]
|
+---> [ Job 1: Seed 0-1000 ] --> [ Unity Pod ] --> [ S3 Bucket /batch_1 ]
|
+---> [ Job 2: Seed 1000-2000 ] --> [ Unity Pod ] --> [ S3 Bucket /batch_2 ]
|
...
+---> [ Job N ]
Key Requirement: Deterministic Seeding.
Job 2 MUST produce distinctive data from Job 1.
Seed = JobIndex * 1000 + EpisodeIndex.
Troubleshooting: Common Artifacts
Scenario 1: The “Disco Effect” (Epilepsy)
- Symptom: The robot sees a world that changes colors every frame.
- Cause: You are randomizing Visuals every timestep (
Update()) instead of every episode (OnEpisodeStart()). - Fix: Only randomize visuals when the environment resets. Dynamics can be randomized continually (to simulate wind), but visuals usually shouldn’t flicker.
Scenario 2: Physics Explosion
- Symptom: Objects fly violently apart at $t=0$.
- Cause: You spawned objects overlapping each other. The Physics Engine resolves the collision by applying infinite force.
- Fix: Use “Poisson Disk Sampling” to place objects with guaranteed minimum distance. Or enable
Physics.autoSimulation = falseuntil placement is verified.
Scenario 3: The Material Leak
- Symptom: Memory usage grows by 100MB per episode. OOM after 1 hour.
- Cause:
GetComponent<Renderer>().material.color = Random.ColorHSV. Accessing.materialcreates a copy of the material. Unity does not garbage collect materials automatically. - Fix: Use
GetComponent<Renderer>().SetPropertyBlock(mpb)instead of modifying materials directly. Or callResources.UnloadUnusedAssets()periodically.
Scenario 4: Z-Fighting
- Symptom: Flickering textures where the floor meets the wall.
- Cause: Two planes occupy the exact same coordinate.
- Fix: Randomize positions with a small epsilon (
0.001). Add “jitter” to everything.
Future Trends: Differentiable Simulation
DR is “Black Box”. We guess distributions. Differentiable Physics (Brax, Dojo): We can backpropagate through the physics engine. $Loss = (RealWorld - SimWorld)^2$. $\nabla_{friction} Loss$ tells us exactly how to tune the simulator friction to match reality.
MLOps Interview Questions
-
Q: What is “Curriculum Learning” in DR? A: Start with easy randomization (gravity=9.8, friction=0.5). Once the robot learns, expand the range to [5.0, 15.0] and [0.1, 0.9]. This prevents the agent from failing early and learning nothing.
-
Q: How do you validate Synthetic Data? A: Train a model on Synthetic. Test it on Real (small validation set). If performance correlates, your data is good. If not, you have a “Sim2Real Gap”.
-
Q: Explain “Automatic Domain Randomization” (ADR). A: An RL Algorithm (like OpenAI used for Rubik’s Cube) that automatically expands the randomization bounds as the agent gets better. It removes the need for manual tuning.
-
Q: Why ZeroMQ over HTTP? A: Latency and Overhead. HTTP (JSON/Rest) creates a new connection per request. ZeroMQ keeps a persistent TCP connection and packs binary frames. For 60Hz control, HTTP is too slow.
-
Q: How do you handle “Transparent Objects”? A: Depth sensors fail on glass. Simulation renders glass perfectly. To match reality, we must introduce “Sensor Noise” models that simulate the failure modes of RealSense cameras on transparent surfaces.
Glossary
- DR (Domain Randomization): Varying simulation parameters to improve generalization.
- Sim2Real Gap: The drop in performance when moving from Sim to Physical world.
- ZeroMQ: High-performance asynchronous messaging library.
- MaterialPropertyBlock: Unity API for efficient per-object material overrides.
- Differentiable Physics: A physics engine where every operation is differentiable (like PyTorch).
Summary Checklist
- Protocol: Use Protobuf or Flatbuffers over ZeroMQ for type safety, not raw JSON.
- Halt Physics: Pause simulation (
Time.timeScale = 0) while applying randomization to prevent physics glitches during setup. - Metadata: Save the JSON config alongside the image.
img_0.png+img_0.json(contains pose, mass, lighting). - Distribution: Use Beta Distributions instead of Uniform for randomization. Reality is rarely Uniform.
- sanity Check: Always render a “Human View” occasionally to verify the randomization doesn’t look broken (e.g. black sky).