Skip to content

Conversation

@sampan-s-nayak
Copy link
Contributor

@sampan-s-nayak sampan-s-nayak commented Dec 31, 2025

Description

run state api and task event unit tests with both the default (task_event -> gcs flow) and aggregator (task_event -> aggregator -> gcs) to smoothen the transition from default to aggregator flow

- Add event_routing_config fixture for dual-mode testing
- Parametrize state_api tests to run with default and aggregator routing
- Parametrize task_events tests to run with default and aggregator routing

Signed-off-by: sampan <sampan@anyscale.com>
@sampan-s-nayak sampan-s-nayak changed the base branch from master to aggr-to-gcs-fixes December 31, 2025 07:30
@sampan-s-nayak sampan-s-nayak added the go add ONLY when ready to merge, run all tests label Dec 31, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enables testing the new aggregator mode for task events by parameterizing a large number of state API and task event tests. It introduces a new pytest fixture event_routing_config to switch between the default and aggregator modes. Additionally, it enhances the aggregator event path by adding several missing fields to the event protos and the corresponding C++ implementation to achieve feature parity with the existing GCS path. The changes are well-structured, and the test additions are comprehensive.

My review has two suggestions: one for improving code conciseness in a test file and another for removing a leftover debug print statement.

Signed-off-by: sampan <sampan@anyscale.com>
@sampan-s-nayak sampan-s-nayak marked this pull request as ready for review January 2, 2026 04:28
@sampan-s-nayak sampan-s-nayak requested a review from a team as a code owner January 2, 2026 04:28
@sampan-s-nayak
Copy link
Contributor Author

this pr was originally part of #56880

@pytest.mark.parametrize(
"event_routing_config", ["default", "aggregator"], indirect=True
)
@pytest.mark.usefixtures("event_routing_config")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixture scope mismatch prevents aggregator mode testing

The TestListActors class is parametrized with event_routing_config (function-scoped fixture) but uses class_ray_instance (class-scoped fixture) to start Ray. Pytest executes higher-scoped fixtures first, so class_ray_instance starts Ray BEFORE event_routing_config sets the aggregator environment variables. This means when running with event_routing_config="aggregator", the environment variables like RAY_enable_core_worker_ray_event_to_aggregator are set after Ray has already started, and aggregator mode is never actually enabled. The tests will pass but won't actually test the aggregator code path, defeating the purpose of the parametrization.

Fix in Cursor Fix in Web

@pytest.mark.parametrize(
"event_routing_config", ["default", "aggregator"], indirect=True
)
@pytest.mark.usefixtures("event_routing_config")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing aggregator agent wait causes flaky aggregator mode tests

The test_actor_summary and test_object_summary tests have the event_routing_config parametrization for aggregator mode but don't call wait_for_aggregator_agent_if_enabled after ray.init(). In contrast, test_task_summary in the same file correctly calls this wait for all nodes. A TODO comment in other tests (e.g., test_fault_tolerance_chained_task_fail) states this wait is required until task event buffering is implemented internally. Without the wait, these tests may fail or be flaky in aggregator mode if the aggregator agent isn't ready when actors/tasks are created.

Additional Locations (1)

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests

4 participants