-
Notifications
You must be signed in to change notification settings - Fork 7.1k
[core] Enable aggregator mode in state API and task event tests #59784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: aggr-to-gcs-fixes
Are you sure you want to change the base?
Conversation
- Add event_routing_config fixture for dual-mode testing - Parametrize state_api tests to run with default and aggregator routing - Parametrize task_events tests to run with default and aggregator routing Signed-off-by: sampan <sampan@anyscale.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request enables testing the new aggregator mode for task events by parameterizing a large number of state API and task event tests. It introduces a new pytest fixture event_routing_config to switch between the default and aggregator modes. Additionally, it enhances the aggregator event path by adding several missing fields to the event protos and the corresponding C++ implementation to achieve feature parity with the existing GCS path. The changes are well-structured, and the test additions are comprehensive.
My review has two suggestions: one for improving code conciseness in a test file and another for removing a leftover debug print statement.
Signed-off-by: sampan <sampan@anyscale.com>
|
this pr was originally part of #56880 |
| @pytest.mark.parametrize( | ||
| "event_routing_config", ["default", "aggregator"], indirect=True | ||
| ) | ||
| @pytest.mark.usefixtures("event_routing_config") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixture scope mismatch prevents aggregator mode testing
The TestListActors class is parametrized with event_routing_config (function-scoped fixture) but uses class_ray_instance (class-scoped fixture) to start Ray. Pytest executes higher-scoped fixtures first, so class_ray_instance starts Ray BEFORE event_routing_config sets the aggregator environment variables. This means when running with event_routing_config="aggregator", the environment variables like RAY_enable_core_worker_ray_event_to_aggregator are set after Ray has already started, and aggregator mode is never actually enabled. The tests will pass but won't actually test the aggregator code path, defeating the purpose of the parametrization.
| @pytest.mark.parametrize( | ||
| "event_routing_config", ["default", "aggregator"], indirect=True | ||
| ) | ||
| @pytest.mark.usefixtures("event_routing_config") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing aggregator agent wait causes flaky aggregator mode tests
The test_actor_summary and test_object_summary tests have the event_routing_config parametrization for aggregator mode but don't call wait_for_aggregator_agent_if_enabled after ray.init(). In contrast, test_task_summary in the same file correctly calls this wait for all nodes. A TODO comment in other tests (e.g., test_fault_tolerance_chained_task_fail) states this wait is required until task event buffering is implemented internally. Without the wait, these tests may fail or be flaky in aggregator mode if the aggregator agent isn't ready when actors/tasks are created.
Description
run state api and task event unit tests with both the default (task_event -> gcs flow) and aggregator (task_event -> aggregator -> gcs) to smoothen the transition from default to aggregator flow