[core] Enable aggregator mode in state API and task event tests #59784

sampan-s-nayak · 2025-12-31T07:29:50Z

Description

run state api and task event unit tests with both the default (task_event -> gcs flow) and aggregator (task_event -> aggregator -> gcs) to smoothen the transition from default to aggregator flow

- Add event_routing_config fixture for dual-mode testing - Parametrize state_api tests to run with default and aggregator routing - Parametrize task_events tests to run with default and aggregator routing Signed-off-by: sampan <sampan@anyscale.com>

gemini-code-assist

Code Review

This pull request enables testing the new aggregator mode for task events by parameterizing a large number of state API and task event tests. It introduces a new pytest fixture event_routing_config to switch between the default and aggregator modes. Additionally, it enhances the aggregator event path by adding several missing fields to the event protos and the corresponding C++ implementation to achieve feature parity with the existing GCS path. The changes are well-structured, and the test additions are comprehensive.

My review has two suggestions: one for improving code conciseness in a test file and another for removing a leftover debug print statement.

python/ray/tests/test_state_api.py

python/ray/tests/test_task_events.py

Signed-off-by: sampan <sampan@anyscale.com>

sampan-s-nayak · 2026-01-02T04:29:26Z

this pr was originally part of #56880

cursor · 2026-01-02T04:35:45Z

python/ray/tests/test_state_api.py

+@pytest.mark.parametrize(
+    "event_routing_config", ["default", "aggregator"], indirect=True
+)
+@pytest.mark.usefixtures("event_routing_config")


Fixture scope mismatch prevents aggregator mode testing

The TestListActors class is parametrized with event_routing_config (function-scoped fixture) but uses class_ray_instance (class-scoped fixture) to start Ray. Pytest executes higher-scoped fixtures first, so class_ray_instance starts Ray BEFORE event_routing_config sets the aggregator environment variables. This means when running with event_routing_config="aggregator", the environment variables like RAY_enable_core_worker_ray_event_to_aggregator are set after Ray has already started, and aggregator mode is never actually enabled. The tests will pass but won't actually test the aggregator code path, defeating the purpose of the parametrization.

cursor · 2026-01-02T04:35:45Z

python/ray/tests/test_state_api_summary.py

+@pytest.mark.parametrize(
+    "event_routing_config", ["default", "aggregator"], indirect=True
+)
+@pytest.mark.usefixtures("event_routing_config")


Missing aggregator agent wait causes flaky aggregator mode tests

The test_actor_summary and test_object_summary tests have the event_routing_config parametrization for aggregator mode but don't call wait_for_aggregator_agent_if_enabled after ray.init(). In contrast, test_task_summary in the same file correctly calls this wait for all nodes. A TODO comment in other tests (e.g., test_fault_tolerance_chained_task_fail) states this wait is required until task event buffering is implemented internally. Without the wait, these tests may fail or be flaky in aggregator mode if the aggregator agent isn't ready when actors/tasks are created.

Additional Locations (1)

python/ray/tests/test_state_api_summary.py#L411-L421

sampan-s-nayak changed the base branch from master to aggr-to-gcs-fixes December 31, 2025 07:30

sampan-s-nayak added the go add ONLY when ready to merge, run all tests label Dec 31, 2025

gemini-code-assist bot reviewed Dec 31, 2025

View reviewed changes

python/ray/tests/test_state_api.py Show resolved Hide resolved

python/ray/tests/test_task_events.py Outdated Show resolved Hide resolved

sampan-s-nayak mentioned this pull request Jan 2, 2026

[core] Run state api and task event tests using both existing and new event aggregator based flows #56880

Closed

8 tasks

fix flaky test + remove debug statement

2197f74

Signed-off-by: sampan <sampan@anyscale.com>

sampan-s-nayak marked this pull request as ready for review January 2, 2026 04:28

sampan-s-nayak requested a review from a team as a code owner January 2, 2026 04:28

sampan-s-nayak assigned jjyao Jan 2, 2026

sampan-s-nayak assigned MengjinYan Jan 2, 2026

cursor bot reviewed Jan 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[core] Enable aggregator mode in state API and task event tests #59784

[core] Enable aggregator mode in state API and task event tests #59784

sampan-s-nayak commented Dec 31, 2025 •

edited

Loading

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

sampan-s-nayak commented Jan 2, 2026

cursor bot Jan 2, 2026

cursor bot Jan 2, 2026

Labels

4 participants

[core] Enable aggregator mode in state API and task event tests #59784

Are you sure you want to change the base?

[core] Enable aggregator mode in state API and task event tests #59784

Conversation

sampan-s-nayak commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

sampan-s-nayak commented Jan 2, 2026

cursor bot Jan 2, 2026

Choose a reason for hiding this comment

Fixture scope mismatch prevents aggregator mode testing

cursor bot Jan 2, 2026

Choose a reason for hiding this comment

Missing aggregator agent wait causes flaky aggregator mode tests

Labels

4 participants

sampan-s-nayak commented Dec 31, 2025 •

edited

Loading