Report 1: Active Perception for Accurate Object Localization and Navigation
Robot: TurtleBot4 · Stack: ROS 2, Nav2,Visual odometry , RGB-D perception, next-best-view (NBV)
Table of Contents
- Table of Contents
- 1. Mission Statement & Scope
- 2. Technical Specifications
- 3. High-Level System Architecture
- 4. Module Intent
- 5. Safety & Operational Protocol
- 6. Git Infrastructure
1. Mission Statement & Scope
1.1 Mission Statement
The goal of this project is to develop an autonomous mobile robot system capable of accurately localizing a target object using RGB-D perception and actively improving this estimate through motion. The TurtleBot4 will estimate the target object’s ground-plane pose relative to the robot and compute a confidence metric representing the reliability of the estimate.
Using an active perception loop, the system will determine the next-best viewpoint that is expected to reduce pose uncertainty. The robot will autonomously navigate to these viewpoints while avoiding obstacles using the ROS2 Nav2 navigation stack or a reactive controller till a desired confidence threshold is achieved.
1.2 Scope
| In scope | Out of scope |
|---|---|
| Localization of a single target object (e.g., box or cylinder) on the ground plane | Multi-object simultaneous tracking |
| Indoor navigation with static and dynamic obstacles | Outdoor or unstructured terrain |
| Next-best-view selection based on confidence/uncertainty | Full 6-DOF object pose or manipulation |
| Nav2 for path planning and obstacle avoidance | Developing a custom SLAM or navigation framework |
1.3 Success State (Measurable)
The system will be considered successful if the following conditions are met:
-
The robot can estimate the ground-plane pose of a target object using RGB-D data.
-
The system can evaluate pose confidence and select a next-best viewpoint to improve localization accuracy.
-
The robot autonomously navigates between viewpoints while avoiding obstacles.
-
The pose estimate converges to a stable solution within a predefined confidence threshold.
1.4 Environment Description
- Indoor hallway/room
- Target objects (e.g., boxes, cylinders) on the ground;
- static obstacles (furniture, walls) and optional dynamic obstacles (e.g., pedestrians).
2. Technical Specifications
2.1 Robot Platform
- Platform: TurtleBot 4.
- Base: Differential drive.
- Onboard sensors: RGB-D camera, LiDAR, IMU.
2.2 Kinematic Model
- Model: Differential drive.
- State: (x, y, θ) on the ground plane; forward kinematics from wheel velocities.
2.3 Perception Stack
| Component | Role |
|---|---|
| RGB-D camera | Depth + color; point cloud and images |
| LiDAR | 2D scan for Nav2 costmaps and obstacle detection, sensor fusion with camera depth data for reliable depth estimation |
| IMU | Odometry / orientation support |
3. High-Level System Architecture
3.1 Data Flow Diagram (Perception → Estimation → Planning → Actuation)
flowchart LR
subgraph Perception
RGBD[RGBD Camera]
LIDAR[LiDAR]
IMU[IMU]
end
subgraph ObjectPerception
PCP[Point Cloud]
OPE[Object Pose]
end
subgraph RobotLocalization
VO[Visual Odometry]
EKF[EKF]
end
subgraph Planning
CONF[Confidence Evaluation]
NBV[Next Best View]
NAV2[Nav2 Global Planner]
RC[Reactive Controller]
end
subgraph Actuation
DDC[Diff Drive Controller]
MHI[Motor Hardware Interface]
end
%% Perception → Object perception
RGBD --> PCP
PCP --> OPE
%% Perception → Localization
RGBD --> VO
IMU --> EKF
VO --> EKF
%% Estimation to planning
OPE --> CONF
CONF --> NBV
EKF --> NBV
%% Planning to actuation
NBV --> NAV2
NAV2 --> RC
LIDAR --> RC
RC --> DDC
DDC --> MHI
%% Styles
style Perception fill:#ffe6e6,stroke:#333,stroke-width:2px
style ObjectPerception fill:#fff2cc,stroke:#333,stroke-width:2px
style RobotLocalization fill:#fff2cc,stroke:#333,stroke-width:2px
style Planning fill:#e6e6ff,stroke:#333,stroke-width:2px
style Actuation fill:#d9f2d9,stroke:#333,stroke-width:2px
3.2 Module Declaration Table
| Module / Node | Function Domain | Software Type | Description | Owner |
|---|---|---|---|---|
| RGBD Camera + LiDAR | Perception | Library | Provides RGB images, depth data, and LiDAR range measurements used for perception and obstacle detection. | ROS2 Driver |
| IMU | Estimation | Library | Provides inertial measurements used for robot motion estimation and fusion with visual odometry. | ROS2 Driver |
| Object Pose Estimation | Perception | Custom (Course Algorithm) | Estimates the ground-plane pose (x, y, yaw) of the target object from the segmented point cloud generated from RGB-D data. | Mohammad |
| Visual Odometry | Estimation | Library / Custom Integration | Tracks visual features between frames to estimate robot motion relative to the environment. | Vikas |
| EKF | Estimation | Library | Fuses IMU and visual odometry data to produce a filtered estimate of robot pose. | Vikas |
| Next Best View | Planning | Custom | Determines the next viewpoint that maximizes expected improvement in object pose accuracy. | Mohammad |
| Nav2 Global Planner | Planning | Library | Generates a collision-free global path from the robot’s current pose to the selected viewpoint. | Nav2 |
| Reactive Controller | Planning | Library | Performs local obstacle avoidance and trajectory tracking using LiDAR data. | Nav2 |
| Diff Drive Controller | Actuation | Library | Converts velocity commands into wheel commands for the differential drive robot. | ROS2 Control |
| Motor Hardware Interface | Actuation | Library | Interface between controller outputs and the TurtleBot4 motor hardware. | ROS2 Control |
4. Module Intent
4.1 Library Modules
RGB-D Camera Driver
This module provides synchronized RGB images and depth measurements from the TurtleBot4 camera. The depth stream is used to generate point clouds for object localization while RGB images support visual odometry.
LiDAR Driver
This module publishes laser scan data used for obstacle detection and local navigation. The LiDAR measurements are used by the reactive controller and the Nav2 stack to detect obstacles and maintain safe navigation.
IMU Driver
This module provides inertial measurements including angular velocity and linear acceleration. These measurements are fused with visual odometry in the EKF to produce a stable estimate of the robot’s motion.
EKF
This module fuses IMU measurements and visual odometry to produce a filtered estimate of the robot’s pose. The resulting state estimate improves localization stability for navigation and planning.
Nav2 Global Planner
This module computes a collision-free path from the robot’s current pose to the desired viewpoint goal. It uses the global map and costmaps to generate safe navigation routes.
Reactive Controller (Nav2 Local Planner)
This module tracks the planned path while reacting to nearby obstacles using LiDAR data. It generates real-time velocity commands that safely guide the robot along the planned trajectory.
Diff Drive Controller (ROS2 Control)
This controller converts velocity commands into wheel commands for the TurtleBot4 differential drive system. It ensures that motion commands are correctly translated into left and right wheel velocities.
Motor Hardware Interface (ROS2 Control)
This module interfaces the ROS2 control framework with the TurtleBot4 motor hardware. It sends the wheel commands to the motors and reads back hardware state information.
4.2 Custom Modules
4.2.1 Active Perception
Object Pose Estimation from RGB-D (ground-plane x, y, yaw)
This module estimates the ground-plane pose of the target object using RGB-D point cloud data. The algorithm follows the point cloud processing pipeline introduced in the course: voxel grid filtering to downsample the cloud, RANSAC plane segmentation to remove the floor, and Euclidean clustering to isolate the object. The centroid of the resulting cluster is used to compute the object’s planar position (x, y), while the dominant orientation of the cluster is used to estimate the yaw angle.
Confidence Scoring / Stability Filtering
This module evaluates the reliability of the estimated object pose across multiple observations. The system computes a confidence score based on pose stability over time, such as variance in estimated position and orientation between frames. If the pose estimates converge and remain consistent across several viewpoints, the confidence increases; otherwise, additional viewpoints are requested.
Next-Best-View (NBV) Viewpoint Selection Policy
This module determines where the robot should move next to improve object pose estimation. The intended implementation will explore next-best-view strategies from active perception literature to select viewpoints that reduce pose uncertainty. If a suitable method is not identified, a fallback strategy will sample candidate viewpoints around the object and navigate to them sequentially until the confidence score exceeds a predefined threshold.
Goal Update Gating / Replanning Trigger
This module monitors the confidence score of the object pose estimate and determines when navigation goals should be updated. If the confidence is below the desired threshold, the module triggers the next-best-view planner to generate a new viewpoint goal.
4.2.2 Visual Odometry
Overview
Visual odometry (VO) estimates the TurtleBot 4’s motion by tracking how the indoor scene changes across a sequence of stereo images. The pipeline detects visual features, matches them between frames, uses stereo depth to compute relative motion in metric scale, and integrates these motions into a trajectory. Accurate camera calibration (intrinsics, stereo baseline, and rectification) is required so pixel measurements map to correct geometry and depth. To reduce drift and improve stability during fast turns or brief visual dropouts, the VO pose will be fused with the IMU using an EKF, combining vision-based corrections with high-rate inertial motion cues.
Feature Detection
This step finds repeatable points (e.g., corners or textured patches) in each image that can be tracked over time. It is needed because VO relies on observing consistent scene points across frames. The system will detect keypoints and compute descriptors to represent their local appearance.
Feature Matching
Matching links the same features between consecutive frames (and between the left/right stereo images) to form correspondences. This is required to measure how the scene moved relative to the camera. Matches will be computed using descriptor similarity and then filtered to reject outliers (e.g., ratio test and/or robust geometric checks).
Motion Estimation
Motion estimation computes the relative rotation and translation between frames from the filtered correspondences. With stereo, depth from disparity provides real-world scale, making the motion estimate physically meaningful indoors. The output is an incremental pose change at each time step.
Pose Integration
Pose integration composes the incremental motions over time to produce a continuous trajectory. Since small errors accumulate and cause drift, the integrated VO pose will be corrected by fusing it with IMU measurements in an EKF for smoother and more reliable indoor localization.
5. Safety & Operational Protocol
5.1 Deadman Switch / Timeout Logic
The system implements a deadman (command timeout) mechanism to prevent continued motion in the absence of recent control commands. Each received velocity command updates a timestamp. A periodic safety monitor checks the age of the most recent command. If the elapsed time exceeds a configured timeout threshold, the robot immediately publishes a zero-velocity command and transitions to a safe-stop state.Motion is permitted again only after fresh commands are received and the system remains healthy for a short re-enable window. This logic ensures that a stalled controller, dropped network connection, or crashed teleoperation/planning node results in a prompt, controlled stop.
5.2 Conditions Triggering E-Stop
An emergency stop (E-stop) condition forces an immediate zero-velocity command and places the robot in a latched safe state that requires explicit operator recovery. E-stop is triggered by any of the following:
-
Manual E-stop request (hardware button if available or software E-stop command).
-
Safety zone violation, defined as an obstacle detected within a minimum stopping distance in the direction of motion.
-
Contact or hazard events reported by onboard safety sensors (e.g., bumper activation or cliff detection when available).
-
Violation of configured motion limits (linear speed and/or angular rate exceeding allowable bounds).
5.3 Behavior on Sensor Dropout or Localization Failure
The system monitors both message freshness and validity/quality for sensors and state estimation. Any failure transitions the robot to SAFE_STOP (zero velocity) immediately and blocks autonomous motion until recovery criteria are met.
Sensor dropout monitoring
Lidar: The system monitors \(/scan\) (sensor_msgs/msg/LaserScan). If no scan is received for \(T_{scan}\)(typical: 0.5–1.0 s), the robot commands zero velocity and enters SAFE_STOP. Autonomous motion remains disabled until /scan returns and remains healthy for a sustained window (e.g., 1–2 s of continuous messages).
Odometry: The system monitors \(/odom\) . If odometry messages stop for \(T_{odom}\) (typical: 0.5–1.0 s), the robot enters SAFE_STOP. This prevents operation without reliable velocity and pose integration.
Logging and status reporting
All safety triggers (deadman timeout, obstacle stop, sensor timeout, localization fault, manual E-stop) are logged with timestamps and the trigger source. A consolidated safety status(e.g., /safety_status) to support debugging and to provide evidence of correct safety behavior during demonstrations.
6. Git Infrastructure
6.1 Repository
- GitHub Page: https://seasonedleo.github.io/RAS_Mobile_Robotics_Vision/
- GitHub Repository : https://github.com/mohammadnsr1/MobileRobots_Active_Perception/