Skip to content

Unstable nav2_system_tests results #5789

@AJedancov

Description

@AJedancov

Bug report

Required Info:

  • Operating System:
    • Ubuntu 24.04.3 (Docker container and host)
  • Computer:
    • Intel Core i7-8750H CPU
  • ROS2 Version:
    • Rolling
  • Version or commit hash:
  • DDS implementation:
    • Default

Steps to reproduce issue

  • Create a Docker container according to the current Dockerfile in project. For my tests I used a Dev container with parameters from .devcontainer/devcontainer.json. As a cache I use ghcr.io/ros-navigation/nav2_docker:rolling-nightly and added the following arguments:
   "runArgs": [
       "--name=nav2",
       "--ipc=host",
       "--volume=/tmp/.X11-unix:/tmp/.X11-unix",
       "--env=DISPLAY",
   ],
  • Run the test in the container:
source ./install/setup.bash && \
colcon test --event-handlers console_direct+ --packages-select nav2_system_tests --ctest-args -R test_route

I'm using test_route as an example, but I see a similar problem with other tests from nav2_system_tests, such as:

  • test_route
  • test_dynamic_obstacle
  • test_bt_navigator
  • test_bt_navigator_with_groot_monitoring

Expected behavior

The same test results when running them.

Actual behavior

I periodically (most of the time) received the following error message during different test runs:

24: [smoother_server-15] [INFO] [1765872746.159755195] [smoother_server]: Activating
24: [smoother_server-15] [INFO] [1765872746.159799549] [smoother_server]: Creating bond (smoother_server) to lifecycle manager.
...
24: [lifecycle_manager-25] [ERROR] [1765872748.230282231] [lifecycle_manager_navigation]: Server smoother_server was unable to be reached after 4.00s by bond. This server may be misconfigured.
24: [lifecycle_manager-25] [ERROR] [1765872748.230363560] [lifecycle_manager_navigation]: Failed to bring up all requested nodes. Aborting bringup.
15: [controller_server-14] [INFO] [1765824219.152607780] [controller_server]: Activating
...
15: [lifecycle_manager-25] [ERROR] [1765824222.454117441] [lifecycle_manager_navigation]: Server controller_server was unable to be reached after 4.00s by bond. This server may be misconfigured.
15: [lifecycle_manager-25] [ERROR] [1765824222.454199653] [lifecycle_manager_navigation]: Failed to bring up all requested nodes. Aborting bringup.
24: [opennav_docking-23] [INFO] [1765871248.986823135] [docking_server]: Activating docking_server
24: [opennav_docking-23] [INFO] [1765871249.105586677] [docking_server]: Creating bond (docking_server) to lifecycle manager.
...
24: [lifecycle_manager-25] [ERROR] [1765871251.196135795] [lifecycle_manager_navigation]: Server docking_server was unable to be reached after 4.00s by bond. This server may be misconfigured.
24: [lifecycle_manager-25] [ERROR] [1765871251.196231276] [lifecycle_manager_navigation]: Failed to bring up all requested nodes. Aborting bringup.
24: [lifecycle_manager-25] [ERROR] [1765807343.684458023] [lifecycle_manager_navigation]: Server controller_server was unable to be reached after 10.00s by bond. This server may be misconfigured.
24: [lifecycle_manager-25] [ERROR] [1765807343.684536461] [lifecycle_manager_navigation]: Failed to bring up all requested nodes. Aborting bringup.

In the last output you can see that I also increased the timeout to 10 seconds for testing but got similar results.

Different test runs lead to problems with connecting to different nodes usually it's one of them:

  • smoother_server
  • controller_server
  • velocity_smoother
  • docking_server
  • bt_navigator

After an error occurs, the test continues to execute until the preset timeout expires (180 seconds).

In most cases I encounter this problem, but as the title of the issue suggests, the tests still pass sometimes. I suppose that this may also be related to the internal implementation of bondcpp that is used in lifecycle manager, but I'm not entirely sure, as I haven't encountered this problem during the normal launch process.

Reproduction instructions

Additional information

I'd be happy to help further identify the cause, but for now I'll open this issue here in case you have any suggestions or ideas that will help narrow the search.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions