Skip to content

ESQL: Group by all optimization#139130

Merged
leontyevdv merged 14 commits intoelastic:mainfrom
leontyevdv:feature/esql-group-by-all-dimensions-optimization
Dec 12, 2025
Merged

ESQL: Group by all optimization#139130
leontyevdv merged 14 commits intoelastic:mainfrom
leontyevdv:feature/esql-group-by-all-dimensions-optimization

Conversation

@leontyevdv
Copy link
Contributor

@leontyevdv leontyevdv commented Dec 5, 2025

Normalizes ordinals so that each unique tsid uses the first ordinal encountered for that tsid.

The FirstDocIdGroupingAggregatorFunction collects the first doc id for each group. With time-buckets, the same tsid can appear in multiple groups. When loading the dimension field, this may result in loading multiple documents for the same tsid several times.

There are two options:

  1. Load only one document per tsid and apply tsid ordinals to the dimension values. This requires a separate value source reader for dimension fields, or modifying the current reader to understand ordinals and apply them to the dictionary.
  2. Remap the group ids in the selected vector so that all groups with the same tsid use a single group id. This means we
    may load the same document multiple times for the same tsid, but not different documents. The overhead of loading the
    same document multiple times is small compared to loading different documents for the same tsid.

This implementation uses the second option as it is a more contained change.

Example:

_tsid key:     [t1, t2, t1, t3, t2]
selected:      [0,  1,  2,  3,  4]
first doc ids: [10, 20, 30, 40, 50]

re-mapped selected:            [0, 1, 0, 3, 1]
first doc ids with re-mapped : [10, 20, 10, 40, 20]
Loading docs: [10, 10, 20, 20, 40], which is not much more expensive than [10, 20, 40]

Part of #136253

Normalizes ordinals so that each unique tsid uses the
first ordinal encountered for that tsid.

Part of elastic#136253
@leontyevdv leontyevdv requested a review from dnhatn December 5, 2025 14:51
@leontyevdv leontyevdv self-assigned this Dec 5, 2025
leontyevdv and others added 7 commits December 5, 2025 16:02
Normalizes ordinals so that each unique tsid uses the
first ordinal encountered for that tsid.

Part of elastic#136253
Normalizes ordinals so that each unique tsid uses the
first ordinal encountered for that tsid.

Part of elastic#136253
Add integration test to test bare aggs _over_time

Part of elastic#136253
Add test to test tsid normalization on operator layer

Part of elastic#136253
@leontyevdv leontyevdv added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:StorageEngine :StorageEngine/TSDB You know, for Metrics :StorageEngine/ES|QL Timeseries / metrics / PromQL / logsdb capabilities in ES|QL >enhancement :Analytics/ES|QL AKA ESQL labels Dec 11, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @leontyevdv, I've created a changelog YAML for you.

@leontyevdv leontyevdv requested review from a team and kkrik-es December 11, 2025 09:08
@leontyevdv leontyevdv marked this pull request as ready for review December 11, 2025 09:08
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Add test to test tsid normalization on operator layer

Part of elastic#136253
@leontyevdv leontyevdv requested a review from kkrik-es December 12, 2025 14:44
@leontyevdv leontyevdv merged commit 79f3165 into elastic:main Dec 12, 2025
35 checks passed
parkertimmins pushed a commit to parkertimmins/elasticsearch that referenced this pull request Dec 17, 2025
Normalizes ordinals so that each unique tsid uses the
first ordinal encountered for that tsid.

Part of elastic#136253

---------

Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >enhancement :StorageEngine/ES|QL Timeseries / metrics / PromQL / logsdb capabilities in ES|QL :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:StorageEngine v9.3.0

4 participants