Add optimized path for intermediate values aggregator#131390
Merged
dnhatn merged 10 commits intoelastic:mainfrom Jul 21, 2025
Merged
Add optimized path for intermediate values aggregator#131390dnhatn merged 10 commits intoelastic:mainfrom
dnhatn merged 10 commits intoelastic:mainfrom
Conversation
d336a28 to
f854f10
Compare
f854f10 to
42b033f
Compare
Collaborator
|
Hi @dnhatn, I've created a changelog YAML for you. |
3f51173 to
747d56e
Compare
Collaborator
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
...l/compute/gen/src/main/java/org/elasticsearch/compute/gen/GroupingAggregatorImplementer.java
Outdated
Show resolved
Hide resolved
...l/compute/gen/src/main/java/org/elasticsearch/compute/gen/GroupingAggregatorImplementer.java
Show resolved
Hide resolved
nik9000
approved these changes
Jul 18, 2025
Member
nik9000
left a comment
There was a problem hiding this comment.
I just scanned it, but I approve of the general approach of letting aggs optimize their intermediate join and think VALUES is the right place to do it. I'm not sure if you did it right, but I think @idegtiarenko is checking this more closely.
ivancea
approved these changes
Jul 21, 2025
...l/compute/gen/src/main/java/org/elasticsearch/compute/gen/GroupingAggregatorImplementer.java
Outdated
Show resolved
Hide resolved
| ordinals = asOrdinals.getOrdinalsBlock(); | ||
| } | ||
| } | ||
| if (dict != null && dict.getPositionCount() < groupIds.getPositionCount()) { |
Contributor
There was a problem hiding this comment.
Should this use OrdinalBytesRefBlock.isDense(), or are the logics not related?
Member
Author
|
Thanks friends! |
szybia
added a commit
to szybia/elasticsearch
that referenced
this pull request
Jul 22, 2025
…king * upstream/main: (100 commits) Term vector API on stateless search nodes (elastic#129902) TEST Fix ThreadPoolMergeSchedulerStressTestIT testMergingFallsBehindAndThenCatchesUp (elastic#131636) Add inference.put_custom rest-api-spec (elastic#131660) ESQL: Fewer serverless docs in tests (elastic#131651) Skip search on indices with INDEX_REFRESH_BLOCK (elastic#129132) Mute org.elasticsearch.indices.cluster.RemoteSearchForceConnectTimeoutIT testTimeoutSetting elastic#131656 [jdk] Resolve EA OpenJDK builds to our JDK archive (elastic#131237) Add optimized path for intermediate values aggregator (elastic#131390) Correctly handling download_database_on_pipeline_creation within a pipeline processor within a default or final pipeline (elastic#131236) Refresh potential lost connections at query start for `_search` (elastic#130463) Add template_id to patterned-text type (elastic#131401) Integrate LIKE/RLIKE LIST with ReplaceStringCasingWithInsensitiveRegexMatch rule (elastic#131531) [ES|QL] Add doc for the COMPLETION command (elastic#131010) ESQL: Add times to topn status (elastic#131555) ESQL: Add asynchronous pre-optimization step for logical plan (elastic#131440) ES|QL: Improve generative tests for FORK [130015] (elastic#131206) Update index mapping update privileges (elastic#130894) ESQL: Added Sample operator NamedWritable to plugin (elastic#131541) update `kibana_system` to grant it access to `.chat-*` system index (elastic#131419) Clarify heap size configuration (elastic#131607) ...
szybia
added a commit
to szybia/elasticsearch
that referenced
this pull request
Jul 22, 2025
…-tracking * upstream/main: (44 commits) Term vector API on stateless search nodes (elastic#129902) TEST Fix ThreadPoolMergeSchedulerStressTestIT testMergingFallsBehindAndThenCatchesUp (elastic#131636) Add inference.put_custom rest-api-spec (elastic#131660) ESQL: Fewer serverless docs in tests (elastic#131651) Skip search on indices with INDEX_REFRESH_BLOCK (elastic#129132) Mute org.elasticsearch.indices.cluster.RemoteSearchForceConnectTimeoutIT testTimeoutSetting elastic#131656 [jdk] Resolve EA OpenJDK builds to our JDK archive (elastic#131237) Add optimized path for intermediate values aggregator (elastic#131390) Correctly handling download_database_on_pipeline_creation within a pipeline processor within a default or final pipeline (elastic#131236) Refresh potential lost connections at query start for `_search` (elastic#130463) Add template_id to patterned-text type (elastic#131401) Integrate LIKE/RLIKE LIST with ReplaceStringCasingWithInsensitiveRegexMatch rule (elastic#131531) [ES|QL] Add doc for the COMPLETION command (elastic#131010) ESQL: Add times to topn status (elastic#131555) ESQL: Add asynchronous pre-optimization step for logical plan (elastic#131440) ES|QL: Improve generative tests for FORK [130015] (elastic#131206) Update index mapping update privileges (elastic#130894) ESQL: Added Sample operator NamedWritable to plugin (elastic#131541) update `kibana_system` to grant it access to `.chat-*` system index (elastic#131419) Clarify heap size configuration (elastic#131607) ...
dnhatn
added a commit
that referenced
this pull request
Jul 28, 2025
There are two bugs introduced in #130510 and #131390 affecting the VALUES aggregator. The random tests do not cover these edge cases: 1. The check should be firstValues.size() <= group instead of firstValues.size() < group when reading values from the firstValues array. We need to inject nulls with repeated values (to simulate ordinals) to trigger this case. 2. We incorrectly added positionOffset when reading the group ID. We need to generate more groups to trigger chunking. Relates #130510 Relates #131390 Closes #131878
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Similar to #127849, this change adds an optimized path for leveraging ordinal blocks of intermediate input pages in the Values aggregator. Below are the micro-benchmark results.
Before:
1K groups: 112 ms -> 34.4ms
1M groups: 113s -> 64s
More to come with #130510
Relates #127849