Keyboard shortcuts

Press or to navigate between chapters

Press ? to show this help

Press Esc to hide this help

41.1. Unity/Unreal CI/CD (Headless Builds)

Status: Draft Version: 1.0.0 Tags: #Sim2Real, #Unity, #UnrealEngine, #CICD, #Docker Author: MLOps Team


Table of Contents

  1. The “Game” is actually a “Simulation”
  2. The Headless Build: Running Graphics without a Monitor
  3. Unity CI/CD Pipeline
  4. C# Implementation: Automated Build Script
  5. Unreal Engine: Pixel Streaming & Vulkan
  6. Determinism: The PhysX Problem
  7. Infrastructure: Dockerizing a 40GB Engine
  8. Troubleshooting: Common Rendering Crashes
  9. Future Trends: NeRF-based Simulation
  10. MLOps Interview Questions
  11. Glossary
  12. Summary Checklist

Prerequisites

Before diving into this chapter, ensure you have the following installed:

  • Unity Hub / Unreal Engine 5: For local testing.
  • GameCI: A community toolset for Unity Actions.
  • Docker: With NVIDIA Container Toolkit support.

The “Game” is actually a “Simulation”

In Traditional MLOps, “Environment” means a Python venv or Docker container. In Embodied AI (Robotics), “Environment” means a 3D World with physics, lighting, and collision.

This world is usually built in a Game Engine (Unity or Unreal). The problem? Game Engines are GUI-heavy, Windows-centric, and hostile to CLI automation.

Sim2Real Pipeline:

  1. Artist updates the 3D model of the warehouse (adds a shelf).
  2. Commit .fbx and .prefab files to Git (LFS).
  3. CI triggers a “Headless Build” of the Linux Server binary.
  4. Deploy to a fleet of 1000 simulation pods.
  5. Train the Robot Policy (RL) in these parallel worlds.

The Headless Build: Running Graphics without a Monitor

You cannot just run unity.exe on a simplified EC2 instance. It will crash looking for a Display. You must run in Batch Mode with Headless flags.

The Command Line:

/opt/unity/Editor/Unity \
  -batchmode \
  -nographics \
  -silent-crashes \
  -logFile /var/log/unity.log \
  -projectPath /app/MySimProject \
  -executeMethod MyEditor.BuildScript.PerformBuild \
  -quit
  • -batchmode: Don’t pop up windows.
  • -nographics: Don’t initialize the GPU for display (GPU is still used for compute/rendering if configured for offscreen).
  • -executeMethod: Run a C# static function.

Unity CI/CD Pipeline

Using GitHub Actions and game-ci.

# .github/workflows/build-sim.yaml
name: Build Simulation
on: [push]

jobs:
  build:
    name: Build for Linux
    runs-on: ubuntu-latest
    container: unityci/editor:ubuntu-2022.3.10f1-linux-il2cpp
    steps:
      - name: Checkout
        uses: actions/checkout@v4
        with:
          lfs: true  # Critical for 3D assets

      - name: Cache Library
        uses: actions/cache@v3
        with:
          path: Library
          key: Library-${{ hashFiles('Packages/manifest.json') }}

      - name: Activate License
        # You need a valid Unity Serial (PRO/PLUS) for headless builds
        env:
          UNITY_SERIAL: ${{ secrets.UNITY_SERIAL }}
          UNITY_USERNAME: ${{ secrets.UNITY_USERNAME }}
          UNITY_PASSWORD: ${{ secrets.UNITY_PASSWORD }}
        run: |
          /opt/unity/Editor/Unity \
            -quit \
            -batchmode \
            -nographics \
            -serial $UNITY_SERIAL \
            -username $UNITY_USERNAME \
            -password $UNITY_PASSWORD

      - name: Build
        run: |
          /opt/unity/Editor/Unity \
            -batchmode \
            -nographics \
            -projectPath . \
            -executeMethod BuildScript.BuildLinuxServer \
            -quit

      - name: Upload Artifact
        uses: actions/upload-artifact@v3
        with:
          name: SimBuild
          path: Builds/Linux/

Git LFS Note: Unity projects are huge. Library/ folder is cache, Assets/ is source. Never commit Library/. Always cache it.


C# Implementation: Automated Build Script

You need a C# script inside an Editor folder to handle the build logic.

Project Structure

MySimProject/
├── Assets/
│   ├── Editor/
│   │   └── BuildScript.cs
│   └── Scenes/
│       └── Warehouse.unity
└── ProjectSettings/

Assets/Editor/BuildScript.cs:

using UnityEditor;
using UnityEngine;
using System;
using System.Linq;

// This class must be public for Unity's CLI to find it via reflection.
public class BuildScript
{
    /// <summary>
    /// The entry point for our CI/CD pipeline.
    /// Usage: -executeMethod BuildScript.BuildLinuxServer
    /// </summary>
    public static void BuildLinuxServer()
    {
        Console.WriteLine("---------------------------------------------");
        Console.WriteLine("       Starting Build for Linux Server       ");
        Console.WriteLine("---------------------------------------------");

        // 1. Define Scenes
        // We only fetch scenes that are enabled in the Build Settings UI.
        string[] scenes = EditorBuildSettings.scenes
            .Where(s => s.enabled)
            .Select(s => s.path)
            .ToArray();

        if (scenes.Length == 0)
        {
             Console.WriteLine("Error: No scenes selected for build.");
             EditorApplication.Exit(1);
        }

        // 2. Configure Options
        // Just like clicking File -> Build Settings -> Build
        BuildPlayerOptions buildPlayerOptions = new BuildPlayerOptions();
        buildPlayerOptions.scenes = scenes;
        buildPlayerOptions.locationPathName = "Builds/Linux/SimServer.x86_64";
        buildPlayerOptions.target = BuildTarget.StandaloneLinux64;
        
        // Critical for RL: "Server Build" removes Audio/GUI overhead
        // This makes the binary smaller and faster.
        // Also enables the "BatchMode" friendly initialization.
        buildPlayerOptions.subtarget = (int)StandaloneBuildSubtarget.Server; 
        
        // Fail if compiler errors exist. Don't produce a broken binary.
        buildPlayerOptions.options = BuildOptions.StrictMode; 

        // 3. Execute
        Console.WriteLine("Invoking BuildPipeline...");
        BuildReport report = BuildPipeline.BuildPlayer(buildPlayerOptions);
        BuildSummary summary = report.summary;

        // 4. Report Results
        if (summary.result == BuildResult.Succeeded)
        {
            Console.WriteLine("---------------------------------------------");
            Console.WriteLine($"Build succeeded: {summary.totalSize} bytes");
            Console.WriteLine($"Time: {summary.totalTime}");
            Console.WriteLine("---------------------------------------------");
        }

        if (summary.result == BuildResult.Failed)
        {
            Console.WriteLine("---------------------------------------------");
            Console.WriteLine("Build failed");
            foreach (var step in report.steps)
            {
                foreach (var msg in step.messages)
                {
                    // Print compiler errors to stdout so CI logs capture it
                    Console.WriteLine($"[{msg.type}] {msg.content}");
                }
            }
            Console.WriteLine("---------------------------------------------");
            // Exit code 1 so CI fails
            EditorApplication.Exit(1);
        }
    }
}

Unreal Engine: Pixel Streaming & Vulkan

Unreal (UE5) is heavier but more photorealistic. Ops for Unreal involves compiling C++ shaders.

Shader Compilation Hell: UE5 compiles shaders on startup. In a Docker container, this can take 20 minutes and consume 32GB RAM. Fix: Compile shaders once and commit the DerivedDataCache (DDC) to a shared NFS or S3 bucket. Configure UE5 to read DDC from there.

Pixel Streaming: For debugging the Robot, you often want to see what it sees. Unreal Pixel Streaming creates a WebRTC server. You can view the simulation in Chrome.

  • Ops: Deploy a separate “Observer” pod with GPU rendering enabled, strictly for human debugging.

Determinism: The PhysX Problem

RL requires Determinism. Run 1: Robot moves forward 1m. Run 2: Robot moves forward 1m. If Run 2 moves 1.0001m, the policy gradient becomes noisy.

Sources of Non-Determinism:

  1. Floating Point Math: $a + b + c \neq a + (b + c)$.
  2. Physics Engine (PhysX): Often sacrifices determinism for speed.
  3. Variable Timestep: If FPS drops, Time.deltaTime changes, integration changes.

Fix:

  • Fix Timestep: Set Time.fixedDeltaTime = 0.02 (50Hz).
  • Seeding: Set Random.InitState(42).
  • Physics: Enable “Deterministic Mode” in Project Settings (Unity Physics / Havok).

Infrastructure: Dockerizing a 40GB Engine

You don’t want to install Unity on every Jenkins agent. You use Docker. But the Docker image is 15GB.

# Dockerfile for Unity Simulation
# Stage 1: Editor (Huge Image, 15GB+)
FROM unityci/editor:ubuntu-2022.3.10f1-linux-il2cpp as builder

WORKDIR /project

# 1. Copy Manifest (for Package Manager resolution)
# We copy this first to leverage Docker Layer Caching for dependencies
COPY Packages/manifest.json Packages/manifest.json
COPY Packages/packages-lock.json Packages/packages-lock.json

# 2. Copy Source
COPY Assets/ Assets/
COPY ProjectSettings/ ProjectSettings/

# 3. Build
# We pipe logs to build.log AND cat it, because Unity swallows stdout sometimes
RUN /opt/unity/Editor/Unity \
    -batchmode \
    -nographics \
    -projectPath . \
    -executeMethod BuildScript.BuildLinuxServer \
    -quit \
    -logFile build.log || (cat build.log && exit 1)

# Stage 2: Runtime (Small Image, <1GB)
FROM ubuntu:22.04

WORKDIR /app
COPY --from=builder /project/Builds/Linux/ .

# Libraries needed for Unity Player (Vulkan/OpenGL drivers)
RUN apt-get update && apt-get install -y \
    libglu1-mesa \
    libxcursor1 \
    libxrandr2 \
    vulkan-utils \
    && rm -rf /var/lib/apt/lists/*

# Run in Server Mode (Headless)
ENTRYPOINT ["./SimServer.x86_64", "-batchmode", "-nographics"]

Troubleshooting: Common Rendering Crashes

Scenario 1: “Display not found”

  • Symptom: [HeadlessRender] Failed to open display.
  • Cause: You forgot -batchmode or -nographics. Or your code is trying to access Screen.width in a static constructor.
  • Fix: Ensure you strictly use Headless flags. Wrap GUI code in #if !UNITY_SERVER.

Scenario 2: The Shader Compilation Hang

  • Symptom: CI hangs for 6 hours at “Compiling Shaders…”.
  • Cause: Linux builder has no GPU. Software compilation of 10,000 shaders is slow.
  • Fix: Pre-compile shaders on a Windows machine with a GPU, commit the Library/ShaderCache, or use a Shared DDC.

Scenario 3: Memory Leaks in Simulation

  • Symptom: Pod crashes after 1000 episodes.
  • Cause: You are instantiating GameObjects (Instantiate(Bullet)) but never destroying them (Destroy(Bullet)).
  • Fix: Use Object Pooling. Never allocate memory during gameplay loops.

Scenario 4: License Activation Failure

  • Symptom: User has no authorization to use Unity.
  • Cause: The Docker container cannot reach Unity Licensing Servers, or the .ulf file is invalid.
  • Fix: Use “Manual Activation” via .ulf file in secrets, or set up a local Unity Floating License Server.

Traditional Sim uses polygons (Triangles). Reality is not made of triangles. Neural Radiance Fields (NeRFs) and Gaussian Splatting allow reconstructing real environments (scan a room) and using that as the simulation.

  • Ops Challenge: NeRF rendering is $O(N)$ heavier than Polygons. Requires massive GPU inference just to render the background.

MLOps Interview Questions

  1. Q: Why not just run the simulation on the training node (GPU)? A: CPU bottleneck. Physics runs on CPU. Rendering runs on GPU. If you run both on the training node, the GPU waits for Physics. It’s better to Scale Out simulation (1000 CPU pods) and feed one Training GPU pod over the network.

  2. Q: How do you handle “Asset Versioning”? A: 3D assets are binary blobs. Git is bad at diffing them. We use Git LFS (Large File Storage) and Lock mechanisms (“I am editing the MainMenu.unity, nobody else touch it”).

  3. Q: What is “Isaac Gym”? A: NVIDIA’s simulator that runs Physics entirely on the GPU. This avoids the CPU-GPU bottleneck. It can run 10,000 agents in parallel on a single A100.

  4. Q: Explain “Time Scaling” in Simulation. A: In Sim, we can run Time.timeScale = 100.0. 100 seconds of experience happen in 1 second of wall-clock time. This is the superpower of RL. Ops must verify that physics remains stable at high speed.

  5. Q: How do you test a Headless build? A: You can’t see it. You must add Application Metrics (Prometheus).

    • sim_fps
    • sim_episode_reward
    • sim_collisions If sim_collisions spikes to infinity, the floor collider is missing.

Glossary

  • Headless: Running software without a Graphical User Interface (GUI).
  • Prefab: A reusable Unity asset (template for a GameObject).
  • IL2CPP: Intermediate Language to C++. Unity’s compiler tech to turn C# into native C++ for performance.
  • Git LFS: Git extension for versioning large files.
  • Pixel Streaming: Rendering frames on a server and streaming video to a web client.

Summary Checklist

  1. License: Unity requires a Pro License for Headless CI. Ensure you activate the serial number via environment variable $UNITY_SERIAL.
  2. Caching: Cache the Library folder (Unity) or DerivedDataCache (Unreal). It saves 30+ minutes per build.
  3. Tests: Write Unity Test Runner tests (PlayMode) to verify physics stability before building.
  4. Artifacts: Store the built binary in S3/Artifactory with a version tag (sim-v1.0.2). RL training jobs should pull specific versions.
  5. Logs: Redirect logs to stdout (-logFile /dev/stdout) so Kubernetes/Datadog can scrape them.