Vol 8: The Autonomy & Vision Stack

The three build tiers in Volumes 5 through 7 are a hardware story: servo choices, actuator torque budgets, frame materials, and sensor bills of materials. This volume is the software counterpart — the stack that transforms those hardware tiers from remote-controlled machines into autonomous patrol platforms. Every major element of the build ladder has a software correlate: the CHAMP gait controller that makes Build 1’s servos walk, the Nav2 behavior tree that gives Build 2 assisted autonomy, and the SLAM-anchored full-patrol loop that defines Build 3’s mission capability. Understanding how the software stack is structured, how each layer depends on the one below it, and how the stack grows as hardware capability increases is the prerequisite for building, commissioning, and debugging any tier of the RoboDog program.

Figure 1 — The autonomy stack architecture for the RoboDog program. Sensor inputs (LiDAR, depth camera, IMU) feed the state estimation and SLAM layers, which supply a localized map to Nav2. Nav2's behavior tr… — Figure 1 — The autonomy stack architecture for the RoboDog program. Sensor inputs (LiDAR, depth camera, IMU) feed the state estimation and SLAM layers, which supply a localized map to Nav2. Nav2's behavior tree planner commands the locomotion controller (CHAMP or RL policy), which drives the actuators. The edge-AI perception node (YOLO on TensorRT) runs in parallel and publishes detections to the patrol behavior manager. DDS topics interconnect all ROS 2 nodes. — Generated with Ideogram 3.0. (Prompt + seed in fig-vol8-ros2-nav2-arch.jpg.prompt.json.)

8.1 The Stack at a Glance

The autonomy stack for a patrolling quadruped comprises six functional layers, each building on the one below:

Hardware interface — actuator drivers (PCA9685 PWM for Build 1; CAN-bus FOC for Builds 2 and 3) and sensor drivers (IMU, joint encoders, LiDAR, depth camera).
Locomotion control — gait generation and joint-level control: either a classic model-based controller such as CHAMP or a reinforcement-learned (RL) policy trained in simulation and transferred to hardware.
State estimation — continuous tracking of the robot’s body pose (position, velocity, attitude) by fusing IMU data, joint kinematics, and contact state through an Extended Kalman Filter (EKF) or its Invariant variant (InEKF).
SLAM — building and maintaining a map of the environment while simultaneously localizing the robot within it, using LiDAR (Cartographer) or multi-modal sensing (RTAB-Map).
Navigation — planning collision-free paths through the map, managing obstacle avoidance in real time, and executing high-level tasks via behavior trees: this is Nav2, the ROS 2 navigation stack [1].
Perception — on-board visual inference that classifies objects (people, vehicles) in the camera stream; on the Jetson Orin NX this runs as a YOLO model through the TensorRT and DeepStream pipelines.

Middleware binding all layers together is ROS 2 (Robot Operating System 2), which provides the inter-node communication fabric, the sensor driver ecosystem, the Nav2 navigation framework, and the simulation hooks used during development.

8.2 ROS 2 — Nodes, DDS, and Why It Replaces ROS 1

ROS 2 is the current-generation robot middleware standard maintained by Open Robotics. The active long-term support (LTS) release is Jazzy Jalisco, published 2024-05-23 and supported until 2029-05; it targets Ubuntu 24.04 Noble Numbat on both arm64 and amd64 and Windows 10 as Tier 1 platforms. [2] The next standard-support release, Kilted Kaiju, follows the same annual cadence. [2]

8.2.1 Architecture — Nodes, Topics, Services, Actions

A ROS 2 application is decomposed into nodes — single-purpose computational processes. Nodes communicate through three patterns:

Topics (publish–subscribe): a node advertises a named topic and publishes typed messages; any number of subscriber nodes receive those messages asynchronously. Sensor data, velocity commands, and map updates travel over topics.
Services (request–response): a synchronous, one-to-one call for configuration or status queries.
Actions: long-running tasks with feedback and cancellation, used by Nav2 for navigation goals.

All communication passes through the Data Distribution Service (DDS) middleware layer, the same standard used in defense avionics and medical devices. [3] DDS is fully decentralized — there is no ROS Master process whose failure brings down the entire system, a fundamental fragility of ROS 1’s TCP-based XMLRPC architecture. QoS (Quality of Service) policies govern reliability, message durability, deadline enforcement, and lifespan; these policies let a LiDAR scan topic use “best effort” delivery while a Nav2 goal action uses “reliable.” [3] Security is native in ROS 2 through SROS 2: TLS authentication, encryption, and fine-grained topic-level permission management.

8.2.2 Why ROS 2 Over ROS 1

ROS 1 reached end-of-life with its final distribution (Noetic) in May 2025. Three technical gaps make ROS 2 the appropriate choice for an autonomous patrol platform:

Real-time performance. Figures attributed to Fraunhofer IPA research (as summarized in [4]) show approximately 10× lower average communication latency and 5× higher throughput for ROS 2 over ROS 1. A locomotion controller running at 500 Hz cannot tolerate the multi-millisecond jitter that ROS 1’s single-threaded spin introduces; ROS 2’s multi-threaded executors and DDS bounded-latency paths make sub-millisecond inter-node messaging achievable.

Multi-robot and distributed operation. ROS 1 requires workarounds — custom multi-master patches, SSH tunnels — to run nodes on multiple machines. DDS’s topic-based discovery works across network boundaries without coordination, enabling the patrol dog to relay detections to a base-station node over WiFi without configuration overhead.

Ecosystem maturity. Nav2, MoveIt 2, the NVIDIA Jetson ROS 2 package collection, and all major simulation bridges (Gazebo Sim, Isaac Lab) are ROS 2-only or ROS 2-primary. Starting a new build on ROS 1 in 2026 means forgoing the entire current-generation tool ecosystem.

8.3 Locomotion Control

Locomotion control is the software layer that accepts a body-velocity command (linear velocity in x/y and angular velocity about z) and produces joint-angle or joint-torque targets for all twelve legs simultaneously at rates between 200 Hz and 1 kHz. Two paradigms are in use across the three build tiers: model-based gait controllers and RL-trained neural policies.

8.3.1 CHAMP — The Model-Based Controller for the FDM Tier

CHAMP is an open-source quadruped locomotion framework based on the MIT Cheetah 1 hierarchical controller, described by Bledt et al. in “Cheetah-Software: Software for MIT Cheetah 2 and 3 Robot.” [5] The implementation on GitHub (chvmp/champ, last updated 2024-07-04 [5]) provides:

A gait scheduler that cycles legs through swing and stance phases using parameterized contact schedules
Foot trajectory generation in Cartesian space, inverse-kinematics-resolved to joint angles
A setup assistant that takes a URDF robot description and generates a CHAMP configuration without writing gait math by hand
Pre-configured URDFs for ANYmal, MIT Mini Cheetah, Boston Dynamics Spot, LittleDog, and SpotMicroAI builds

In Build 1 (Volume 5), CHAMP runs on the Raspberry Pi 5, reading joint-angle targets and forwarding them to the PCA9685 PWM board. The gait operates quasi-statically at approximately 2–4 Hz stride frequency (typical for servo-driven CHAMP deployments), sufficient for a crawl-gait walk but insufficient for a dynamic trot. CHAMP is also the standard simulation bridge: its Gazebo integration allows the same gait code to run in simulation before deployment on hardware, reducing the commissioning risk of wiring errors.

8.3.2 Model Predictive Control and Whole-Body Control

Dynamic trot gaits and stair negotiation, targeted from Build 2 onward, require controllers that can reason about contact forces and body dynamics over a short prediction horizon. Two complementary formulations are used in the research and commercial quadruped community:

Model Predictive Control (MPC) treats body-velocity tracking as an optimization problem over a receding horizon (typically 10–50 ms). At each control cycle the MPC solves a convex quadratic program for the ground-reaction forces each stance foot should exert to minimize the difference between the desired and predicted body trajectory, subject to friction cone and force limits. The result is a set of commanded foot forces, which the joint-level torque controller then realizes via inverse kinematics and joint impedance. MIT Mini Cheetah and Boston Dynamics Spot both use convex MPC in their production controllers.

Whole-Body Control (WBC) extends MPC by simultaneously solving for the joint torques needed to realize the commanded foot forces while respecting joint limits, actuator saturation, and secondary objectives (center-of-mass height, arm task-space targets). WBC operates at the inner-loop rate — typically 500–1000 Hz for QDD whole-body controllers — and takes the MPC’s force plan as an input.

For Builds 2 and 3, the CubeMars AK-series actuators’ CAN-bus FOC drivers receive torque commands from a WBC node running on the Jetson, closing the joint-level loop at approximately 500 Hz — a rate typical for QDD whole-body controllers on legged robots.

8.3.3 RL Locomotion Policies — Sim-to-Real via legged_gym and Isaac Lab

The dominant path for training neural locomotion policies on quadruped hardware in 2024–2026 is reinforcement learning (RL) in GPU-accelerated simulation followed by zero-shot or fine-tuned transfer to physical hardware. The foundational framework is legged_gym, the Isaac Gym environment library released by Nikita Rudin et al. at the Robotic Systems Lab, ETH Zurich. [6] The associated paper, “Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning” (arxiv:2109.11978, 2021) [7], demonstrated that training ANYmal to walk on rough terrain required fewer than twenty minutes of wall-clock time by parallelizing thousands of simulation environments on a single GPU.

The leggedrobotics/legged_gym repository [6] provides all components necessary for sim-to-real transfer:

Actuator network: a small MLP that maps commanded torques to realistic motor outputs, trained to match physical actuator dynamics so the policy sees realistic joint responses during simulation training.
Domain randomization: friction coefficients, link masses, center-of-mass offsets, motor gains, and external push forces are randomized at the beginning of each episode, forcing the policy to learn behaviors robust to model mismatch.
Noisy observations: IMU signals, joint velocities, and base velocity estimates are corrupted with additive noise during training, encouraging the policy to develop implicit filtering.

Following NVIDIA’s transition from Isaac Gym to Isaac Sim, locomotion environments from legged_gym have been migrated to Isaac Lab (GitHub: isaac-sim/IsaacLab), the successor RL framework. [8] Isaac Lab was announced in March 2024 alongside NVIDIA’s Project GR00T; version 1.0 shipped July 2024. [8] A documented collaboration between Boston Dynamics, NVIDIA, and The AI Institute demonstrated training Spot’s locomotion controller to zero-shot sim-to-real performance entirely within Isaac Lab, using GPU-parallelized domain randomization. [9] Isaac Lab version 2.3 (early developer preview, 2025) extends support to whole-body control and imitation learning workflows. [8]

For the RoboDog program, RL locomotion policies are the target for Build 3’s full-autonomy tier. The recommended path is Isaac Lab, training on terrain curricula (slopes, stairs, irregular ground) with the AK80-64 actuator network from Build 3’s hardware parameters, then deploying the resulting PyTorch policy (converted to TensorRT via ONNX export) on the Jetson AGX Orin 64GB.

8.4 State Estimation

Before Nav2 can plan a path, the robot must know where it is and how fast it is moving. On a legged robot this is non-trivial: leg contacts with the ground are intermittent, foot slip corrupts kinematic odometry, and IMU-only integration drifts quickly. State estimation fuses multiple sensor streams to track the robot’s body pose in real time.

The dominant approach is the Extended Kalman Filter (EKF), which maintains a probability distribution over body position, velocity, and attitude and updates it with each incoming sensor measurement. For a quadruped, the EKF fuses:

IMU (accelerometer + gyroscope): provides body linear acceleration and angular velocity at 200–400 Hz; integrates to velocity and orientation but drifts over seconds without correction.
Joint encoders + kinematics: when a foot is in stance (contact with the ground), the forward kinematics of that leg from the hip to the foot provides a kinematic velocity estimate of the body relative to the stance foot.
Contact state: a contact detector (current-based on FOC actuators, or foot force sensors) identifies which legs are in stance so the kinematic update is only applied when the foot is reliably in contact.

Recent work on legged robot state estimation has explored Moving Horizon Estimation (MHE) for linear velocity, formulating velocity estimation as a sliding-window quadratic program that can leverage contact geometry more explicitly than a single-step EKF update. A 2024 decentralized framework combining an EKF for orientation with MHE for linear velocity (arxiv:2405.20567) reported MHE linear-velocity RMSE of 0.0654 m/s versus 0.1122 m/s for the standard EKF formulation on the same hardware. [10]

In the ROS 2 ecosystem, the robot_localization package provides an EKF and UKF node that accepts arbitrary input topics (IMU, odometry, GPS) with configurable measurement noise covariances and outputs a fused nav_msgs/Odometry message, which Nav2 consumes as the robot’s localization input.

8.5 SLAM — LiDAR and Visual Mapping

Simultaneous Localization and Mapping (SLAM) is the process of building a map of an unknown environment while tracking the robot’s position within that map. Without a pre-built map, Nav2 cannot plan globally; SLAM provides that map online during the first patrol traversal.

Figure 2 — An occupancy grid map as produced by a LiDAR SLAM algorithm. Occupied cells (walls, obstacles) appear black; free space is white; unobserved cells are gray. The robot's trajectory during map constr… — Figure 2 — An occupancy grid map as produced by a LiDAR SLAM algorithm. Occupied cells (walls, obstacles) appear black; free space is white; unobserved cells are gray. The robot's trajectory during map construction is shown in green. This 2D representation is the primary input to Nav2's global costmap. — Generated with Ideogram 3.0. (Prompt + seed in fig-vol8-slam-map.jpg.prompt.json.)

8.5.1 LiDAR SLAM — Cartographer

Cartographer is a real-time SLAM library developed at Google and open-sourced in 2016, supporting both 2D and 3D operation across multiple platforms and sensor configurations. [11] The associated paper, “Real-Time Loop Closure in 2D LIDAR SLAM” (Hess et al., ICRA 2016 [11]), described the core algorithm:

Scans are accumulated into submaps via a branch-and-bound scan matcher that minimizes the scan-to-submap residual.
When a submap is complete, Cartographer searches all existing submaps for loop closure constraints using a correlative scan matcher, which evaluates the likelihood of the current scan matching each candidate submap over a search window.
Detected constraints are fed into a pose graph optimizer (using Ceres Solver) that corrects accumulated drift globally.

Cartographer operates on 2D LiDAR data (such as a planar slice from the Livox Mid-360) or full 3D point clouds; the 3D path uses a scan-to-model matching approach suited to the uneven point distribution that 3D LiDARs produce. A ROS 2 integration guide (ArduPilot Dev documentation, 2024) confirms Cartographer’s compatibility with ROS 2 Humble and later. [12] For Build 3, Cartographer running on the Jetson AGX Orin 64GB processes Livox Mid-360 point clouds and publishes an occupancy grid map consumed by Nav2’s static layer.

RTAB-Map (Real-Time Appearance-Based Mapping) is a graph-based SLAM library that supports RGB-D cameras, stereo cameras, and LiDAR as primary sensors, allowing multi-modal operation. [13] A 2024 paper at arxiv:2403.06341 describes its capabilities for large-scale, long-term online operation. [13]

RTAB-Map’s loop closure uses an appearance-based approach: a bag-of-words model (DBoW2) encodes each image frame as a compact vocabulary vector and computes the similarity to all previously stored keyframes. When the similarity hypothesis exceeds a configurable threshold, the system adds a loop closure constraint to the pose graph and re-optimizes. This approach is sensor-agnostic — the same mechanism works whether the primary sensor is a depth camera or a LiDAR point cloud.

ROS 2 binary packages (ros-$ROS_DISTRO-rtabmap-ros, release 0.21.9 as of 2024) are available for Humble, Jazzy, and Kilted distributions. [13] RTAB-Map is the preferred SLAM choice for Build 2 on the OAK-D Pro, where the depth camera provides the RGB-D stream directly. The richer visual information allows RTAB-Map to re-localize in previously mapped areas without relying solely on geometric structure, which is advantageous in outdoor environments with sparse geometric features.

Nav2 is the ROS 2 navigation framework: “the professionally-supported successor of the ROS Navigation Stack deploying the same kinds of technology powering Autonomous Vehicles brought down, optimized, and reworked for mobile and surface robotics.” [1] At version 1.0.0, Nav2 is trusted by more than 100 companies for production robot deployments. [1]

Nav2 publishes velocity commands for differential-drive, holonomic, legged, and Ackermann (car-like) robots. For the RoboDog, the velocity output targets the CHAMP locomotion controller (Build 1) or the WBC torque planner (Builds 2 and 3).

8.6.1 Costmaps

Nav2 maintains two costmaps, each built from the same nav2_costmap_2d LayeredCostmap framework:

Global costmap: built from the SLAM-produced occupancy grid (static layer) plus an inflation layer that grows obstacle cells outward by the robot’s inscribed radius. The global planner uses this to find a collision-free path from the robot’s current pose to the goal.
Local costmap: a rolling window around the robot updated from live LiDAR and depth sensor data (obstacle layer, voxel layer). The local controller uses this to handle dynamic obstacles — people, vehicles — that are not in the static map.

8.6.2 Global Planners and Local Controllers

Nav2’s architecture is fully plugin-based; planners and controllers are selected in nav2_params.yaml:

Table 1 — Nav2's architecture is fully plugin-based; planners and controllers are selected in nav2params.yaml

Plugin	Role	Recommended use case
NavFn	A* / Dijkstra global planner	Structured indoor environments
*Smac Planner (Hybrid-A)**	Kinodynamically feasible global planner	Outdoor environments, large maps
DWB (Dynamic Window Approach)	Local controller	Low-CPU environments; flat terrain
MPPI (Model Predictive Path Integral)	Local controller	Dynamic obstacle environments; smooth trajectories
Regulated Pure Pursuit	Local controller	Narrow corridors; precise path tracking

For the RoboDog program, Smac Planner is recommended for outdoor patrol routes and MPPI for the local controller on Builds 2 and 3, where the Jetson Orin NX has the compute budget to run MPPI’s parallel trajectory sampling. [14]

8.6.3 Behavior Trees

Nav2’s BT Navigator exposes NavigateToPose and NavigateThroughPoses action servers and delegates all navigation logic to a configurable XML behavior tree. [1] The behavior tree allows composing arbitrarily complex navigation behaviors from standard nodes: ComputePathToPose, FollowPath, ClearCostmapService, NavigateRecovery, Spin, Wait. A patrol robot’s behavior tree might structure as:

Sequence
├─ ComputePathToWaypoint[N]
├─ FollowPath
├─ CheckDetectionAlerts
└─ ReturnToDockIfBatteryLow

The nav2_waypoint_follower server, included in Nav2, accepts a list of poses and executes each in order, with a configurable task-execution plugin (e.g., photograph, wait, check alert) at each waypoint stop.

8.7 Perception and Edge-AI

The patrol mission requires the robot to identify people and vehicles in real time and report detections through a WiFi link. This inference runs on the Jetson Orin NX 16GB (Build 2) or Jetson AGX Orin 64GB (Build 3), using NVIDIA’s TensorRT and DeepStream SDK to achieve real-time throughput within the platform’s power budget.

Figure 3 — Simulated security-camera view showing real-time person detection by a YOLO model. Green bounding boxes are drawn around detected persons with confidence scores (0.87, 0.94). This inference class —… — Figure 3 — Simulated security-camera view showing real-time person detection by a YOLO model. Green bounding boxes are drawn around detected persons with confidence scores (0.87, 0.94). This inference class — person and vehicle detection at 52–65 FPS on the Jetson Orin NX 16GB — is the target for the RoboDog perception pipeline. — Generated with Ideogram 3.0. (Prompt + seed in fig-vol8-yolo-detection.jpg.prompt.json.)

8.7.1 YOLO on Jetson Orin NX

YOLO (You Only Look Once) is a single-stage object detection architecture that processes a full image in one forward pass, delivering inference speeds that competing two-stage detectors cannot match at the same accuracy. The current family, YOLOv8 and its successors (Ultralytics), provides nano-through-extra-large size variants tunable to the available compute budget.

Benchmark (primary source): A 2025 peer-reviewed benchmark study (MDPI Computers, vol. 15, no. 2, article 74) evaluated multiple YOLOv8 variants on the Jetson Orin NX platform. [15] YOLOv8n with TensorRT achieved 52 FPS at FP16 precision and up to 65 FPS at INT8 on the Jetson Orin NX. TensorRT INT8-quantized models outperformed PyTorch FP32 baselines by approximately 17.7% at batch size 2 (from the paper’s abstract/excerpt; full text paywalled). [15]

Second-source cross-check: An NVIDIA Developer Forum thread (2024) documents YOLOv8s inference on the Jetson Orin NX 16GB at 18.9 ms per frame (≈ 53 FPS) at TensorRT FP16 for a 832×832 input resolution. [16] Because YOLOv8s is a larger model than YOLOv8n, YOLOv8n should run faster at the same resolution. The two sources are consistent: the performance range for YOLOv8 nano-to-small on the Orin NX with TensorRT is approximately 52–65 FPS, hardware- and quantization-dependent.

For context on the latest YOLO generation: Ultralytics documentation (accessed 2026-06-19) reports YOLO26n inference at 4.13 ms (≈ 242 FPS) on the Jetson Orin NX 16GB at TensorRT FP16, demonstrating how far inference efficiency has advanced across model generations. [17]

At a sustained 52 FPS, the Jetson Orin NX runs the YOLOv8n inference loop nearly five times faster than needed for standard video (30 fps), leaving headroom for multi-class detection (person, vehicle, animal) or simultaneous detection on a dual-camera front+rear configuration without reaching the thermal limit.

8.7.2 TensorRT and DeepStream SDK

TensorRT is NVIDIA’s inference optimization library. It applies layer fusion (combining adjacent operations into single GPU kernels), precision calibration (mapping FP32 weights to FP16 or INT8 with minimal accuracy loss), and hardware-specific kernel auto-tuning at engine build time to produce a .engine binary optimized for the exact GPU silicon on the target device. [17] A YOLO model exported from PyTorch via ONNX is converted to a TensorRT engine once and executed thousands of times per patrol session without re-compilation.

DeepStream SDK is NVIDIA’s GStreamer-based streaming analytics toolkit for end-to-end inference pipelines. [18] The canonical DeepStream pipeline for the RoboDog is:

NVDEC (hardware video decoder): decodes H.264/H.265 camera stream from the OAK-D Pro at minimal CPU load.
gst-nvstreammux: batches decoded frames from one or more camera sources.
gst-nvinfer: runs the TensorRT YOLO engine on the batched frames; publishes detection metadata.
gst-nvtracker: assigns persistent tracking IDs to detected objects across frames, enabling “person entered zone” logic.
ROS 2 bridge: the deepstream_ros2_bridge package (NVIDIA AI IoT GitHub) publishes detection messages to a ROS 2 topic, making detections available to the patrol behavior manager. [18]

8.8 Patrol Behaviors

The patrol behavior layer is the top-level orchestrator: it commands Nav2 to execute waypoint routes, monitors detection events from the perception pipeline, decides when to alert, and manages the return-to-charge cycle. In the RoboDog program, this logic lives in a dedicated ROS 2 node (patrol_manager) that subscribes to detection topics and publishes Nav2 action goals.

8.8.1 Waypoint Patrol Routes

A patrol route is a YAML file listing a sequence of GPS-anchored or map-frame poses:

patrol_route:
  - x: 12.4, y: 3.1, heading: 0.0   # Front gate
  - x: 24.0, y: 8.5, heading: 1.57  # Corner camera post
  - x: 5.2,  y: 22.0, heading: 3.14 # Rear fence line

The patrol_manager node feeds these poses to Nav2’s NavigateThroughPoses action in sequence, cycling indefinitely. Between waypoints, the robot targets cruise speeds of approximately 0.8–1.2 m/s on flat hardscape (Build 2 design target) or up to 1.5 m/s on hardscape with terrain-adaptive gait (Build 3 design target). At each waypoint stop, the manager executes a configurable dwell task (e.g., pan-tilt-zoom camera sweep, thermal sensor scan).

For Build 3, the real-time kinematic (RTK) GPS receiver (ArduSimple RTK module, introduced in Volume 7) anchors waypoints to geodetic coordinates, allowing the patrol route to survive a power cycle without re-teaching. The Nav2 waypoint follower’s WaypointTaskExecutor plugin is extended with a GPSWaypointTaskPlugin that converts lat/lon/alt to map-frame poses using the EKF GPS fusion node.

8.8.2 Detection Alerting

When the YOLO detection node publishes a detection above a configurable confidence threshold (example configuration defaults: 0.75 for person, 0.70 for vehicle), the patrol_manager triggers an alert sequence:

Halt and track: Nav2 is preempted; the robot stops, optionally rotates to face the detected object, and holds position for a configurable dwell period (default: 10 s).
Evidence capture: the OAK-D Pro (Build 2) or the thermal+visible dual-camera array (Build 3) records a timestamped image burst to onboard SSD.
Push notification: the patrol_alerter node publishes the detection event (timestamp, object class, confidence, camera image URI, GPS position) to an MQTT broker over the WiFi link, which triggers a push notification on the owner’s mobile device.
Resume: the patrol_manager resumes the interrupted waypoint sequence.

8.8.3 Return-to-Charge

Build 3 introduces a supervised return-to-dock cycle managed entirely in software. The battery_monitor node subscribes to the cell-voltage topic from the BMS over CAN and computes a state-of-charge (SoC) estimate. When SoC falls below a configurable threshold (default: 20%), the patrol_manager publishes a ReturnToDock behavior tree goal. Nav2 plans a path to the dock pose; within approximately 1.5 m of the dock, an AprilTag-guided docking approach takes over:

The OAK-D Pro depth camera detects the AprilTag marker on the dock (tag family: 36h11).
A proportional-derivative visual servoing controller drives the robot toward the dock’s alignment pose with centimeter-level precision.
On contact, the spring-loaded charging pin connector closes; the BMS begins charging.

The return-to-charge cycle operates without operator intervention. Patrol resumes automatically when SoC exceeds a configurable resume threshold (default: 80%).

8.9 How the Stack Grows — Teleop to Full Autonomy

The RoboDog build ladder is not just a hardware progression; each tier unlocks a specific software capability tier. The mapping across tiers follows a deliberate escalation:

Table 2 — The RoboDog build ladder is not just a hardware progression; each tier unlocks a specific software capability tier. The mapping across tiers follows a deliberate escalation

Stack layer	Build 1 (FDM hobby, Vol 5)	Build 2 (Mid machined, Vol 6)	Build 3 (Full-CNC finale, Vol 7)
Platform	Raspberry Pi 5	Jetson Orin NX 16GB	Jetson AGX Orin 64GB
Locomotion	CHAMP quasi-static crawl	CHAMP trot + optional RL policy	RL policy (Isaac Lab trained)
State estimation	Kinematic odometry only	EKF (IMU + kinematics)	EKF + MHE (IMU + kinematics + GPS)
Mapping	None	RTAB-Map (OAK-D Pro)	Cartographer (Livox Mid-360) + RTAB-Map
Navigation	Teleoperation only	Nav2 (basic waypoints, obstacle avoidance)	Nav2 (full patrol, GPS-anchored waypoints)
Perception	None	YOLOv8n (Orin NX, TensorRT)	YOLOv8 + thermal (dual-camera, TensorRT + DeepStream)
Patrol behaviors	None	Supervised route-following	Unsupervised scheduled patrol, auto-return, auto-dock
Autonomy level	Teleop	Assisted autonomy	Full autonomy

8.9.1 Build 1 — Teleop Only

The software stack on Build 1 is deliberately minimal: a joy_node reads the gamepad over Bluetooth and publishes to /cmd_vel; CHAMP’s gait controller subscribes and commands the PCA9685. No SLAM, no Nav2, no perception — the operator drives the robot at all times. The value of this tier is hardware commissioning: verifying that all twelve servos respond, that CHAMP’s inverse kinematics resolves cleanly for the frame geometry, and that the gait controller can be tuned without a deadline from a more demanding software stack.

8.9.2 Build 2 — Assisted Autonomy

Build 2 adds the Jetson Orin NX 16GB and with it the compute budget to run four significant additions in parallel: RTAB-Map for online mapping, Nav2 for waypoint navigation with real-time obstacle avoidance, YOLOv8n for person/vehicle detection at 52–65 FPS [15], and the CHAMP trot gait at higher stride frequency enabled by the QDD actuators. The operator sets waypoints and initiates patrol sessions; the robot executes the route and avoids obstacles autonomously. The operator receives push alerts when detections occur. The distinction from full autonomy is that the operator initiates each session and must physically return the robot to charge.

8.9.3 Build 3 — Full Autonomy

Build 3 closes every remaining gap: scheduled patrol without operator initiation, GPS-anchored waypoints that survive power cycles, the RL locomotion policy that handles rough terrain and stairs, the Livox Mid-360 for reliable LiDAR SLAM in all lighting conditions, the thermal camera for after-dark detection, and the auto-dock cycle that sustains multi-day unattended operation. The Jetson AGX Orin 64GB (275 TOPS of AI compute [20]) runs all these simultaneously: Cartographer SLAM, Nav2 with MPPI controller, YOLO + thermal inference through DeepStream, the EKF + MHE state estimator at 200 Hz, the WBC torque planner at 500 Hz (design target; consistent with the QDD inner-loop rate established above), and the patrol_manager behavior tree at 10 Hz. The software stack that operates on Build 3 is the full autonomy stack described throughout this volume.

8.10 Simulation Tools

Simulation plays two roles in the RoboDog program: controller development and commissioning verification before putting hardware at risk, and RL policy training at scale.

8.10.1 Gazebo Sim

Gazebo Sim (formerly Ignition Gazebo; renamed by Open Robotics in April 2022) is the standard open-source robot simulator for ROS 2. [19] A URDF robot description plus a SDF world file are sufficient to spawn a simulated RoboDog with modeled joint dynamics, a LiDAR sensor plugin, a depth-camera plugin, and a ground-truth odometry plugin. The ros_gz_bridge package translates between ROS 2 topics and Gazebo Transport topics, allowing the same Nav2 launch file to run against a simulated robot or the physical hardware with only a parameter change. Gazebo Classic reached end-of-life in January 2025; Gazebo Sim (Ionic, Garden, Harmonic releases) is the supported replacement. [19]

CHAMP’s official repository includes a Gazebo simulation environment that exports the gait controller and lets the robot walk in simulation before mechanical assembly is complete. This capability is used throughout Build 1’s commissioning: leg geometry, IK solutions, and contact patterns are verified in Gazebo before any power is applied to physical servos. The CHAMP Gazebo setup works with both ROS 2 Humble and Jazzy Jalisco distributions. [5]

8.10.2 Isaac Lab (formerly Isaac Gym)

For RL locomotion policy training, the simulation tool of record is Isaac Lab, NVIDIA’s GPU-accelerated robot learning framework. [8] Isaac Lab runs thousands of parallel physics simulations on a single GPU — an NVIDIA A100 or H100 in the cloud, or the owner’s workstation GPU for lighter-duty training jobs — enabling a locomotion policy to accumulate millions of simulation steps per hour. The same terrain curriculum used to train Spot’s locomotion policy in the Boston Dynamics + NVIDIA collaboration [9] can be adapted to the AK80-64 actuator parameters and Build 3’s mass budget.

The Isaac Lab to hardware deployment path for the RoboDog program follows the pattern established by leggedrobotics/legged_gym and its Isaac Lab successor [6][8]:

Train a PPO or SAC policy on a randomized terrain curriculum in Isaac Lab.
Export the trained policy network to ONNX, then convert to a TensorRT engine for the target Jetson platform.
Deploy the TensorRT engine in a lightweight policy_runner ROS 2 node that reads proprioceptive observations at 200 Hz (a typical proprioceptive loop rate for legged RL deployments) and publishes joint-torque targets to the CAN-bus actuator driver node.

The combination of Isaac Lab’s training environment and the Jetson AGX Orin’s 275 TOPS inference capacity [20] creates a complete sim-to-real pipeline that is proven at the Spot-class level and available as open tooling for custom builds.

Sources

Nav2 Documentation — Nav2 1.0.0 — https://docs.nav2.org/ (accessed 2026-06-19)
ROS 2 Jazzy Jalisco Release Notes — https://docs.ros.org/en/rolling/Releases/Release-Jazzy-Jalisco.html (accessed 2026-06-19)
ROS 2 Middleware Implementations (DDS) — https://docs.ros.org/en/rolling/Concepts/Advanced/About-Middleware-Implementations.html (accessed 2026-06-19)
ROS 2 vs ROS 1 Performance Overview — https://www.roboticsunveiled.com/ros2-ipc-dds-topics-services-actions-interfaces/ (accessed 2026-06-19)
CHAMP Quadruped Controller — GitHub chvmp/champ — https://github.com/chvmp/champ (accessed 2026-06-19)
legged_gym — Isaac Gym Environments for Legged Robots — GitHub leggedrobotics/legged_gym — https://github.com/leggedrobotics/legged_gym (accessed 2026-06-19)
Rudin, N. et al., “Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning,” arxiv:2109.11978, 2021 — https://arxiv.org/abs/2109.11978 (accessed 2026-06-19)
NVIDIA Isaac Lab — https://developer.nvidia.com/isaac/lab (accessed 2026-06-19); GitHub: https://github.com/isaac-sim/IsaacLab (accessed 2026-06-19)
“Closing the Sim-to-Real Gap: Training Spot Quadruped Locomotion with NVIDIA Isaac Lab” — NVIDIA Technical Blog — https://developer.nvidia.com/blog/closing-the-sim-to-real-gap-training-spot-quadruped-locomotion-with-nvidia-isaac-lab/ (accessed 2026-06-19)
Hartley, R. et al., “Fast Decentralized State Estimation for Legged Robot Locomotion via EKF and MHE,” arxiv:2405.20567, 2024 — https://arxiv.org/html/2405.20567v1 (accessed 2026-06-19)
Hess, W. et al., “Real-Time Loop Closure in 2D LIDAR SLAM,” ICRA 2016 — https://www.researchgate.net/publication/303886149_Real-time_loop_closure_in_2D_LIDAR_SLAM (accessed 2026-06-19)
Cartographer SLAM with ROS 2 — ArduPilot Dev Docs — https://ardupilot.org/dev/docs/ros2-cartographer-slam.html (accessed 2026-06-19)
Labbé, M. & Michaud, F., “RTAB-Map as an Open-Source Lidar and Visual SLAM Library for Large-Scale and Long-Term Online Operation,” arxiv:2403.06341, 2024 — https://arxiv.org/abs/2403.06341 (accessed 2026-06-19)
Nav2 MPPI Controller Documentation (Jazzy) — https://docs.ros.org/en/jazzy/p/nav2_mppi_controller/ (accessed 2026-06-19)
Farouk, A. et al., “Benchmarking YOLOv8 Variants for Object Detection Efficiency on Jetson Orin NX for Edge Computing Applications,” Computers, 2025, 15(2), 74 — https://www.mdpi.com/2073-431X/15/2/74 (accessed 2026-06-19)
NVIDIA Developer Forums — “Understanding Real-World Latency vs. Theoretical Estimates on Jetson Orin NX for YOLOv8s” — https://forums.developer.nvidia.com/t/nderstanding-real-world-latency-vs-theoretical-estimates-on-jetson-orin-nx-for-yolov8s/308749 (accessed 2026-06-19)
Ultralytics — Quick Start Guide: NVIDIA Jetson with Ultralytics YOLO — https://docs.ultralytics.com/guides/nvidia-jetson (accessed 2026-06-19)
Ultralytics — YOLO on NVIDIA Jetson using DeepStream SDK and TensorRT — https://docs.ultralytics.com/guides/deepstream-nvidia-jetson (accessed 2026-06-19)
Gazebo Sim (formerly Ignition Gazebo) — Official site: https://gazebosim.org/ (accessed 2026-06-19); Open Robotics announcement of rename, April 2022
NVIDIA Jetson AGX Orin Series Technical Brief, July 2022 — https://www.nvidia.com/content/dam/en-zz/Solutions/gtcf21/jetson-orin/nvidia-jetson-agx-orin-technical-brief.pdf (accessed 2026-06-19)