Late materialization of dimension fields in time-series by dnhatn · Pull Request #135961 · elastic/elasticsearch

dnhatn · 2025-10-03T17:47:27Z

This change adds an optimization rule for time-series queries that moves reading dimension fields from before the time-series operator to after, reading each dimension field once per group. This is possible because dimension field values for _tsid are identical across all documents in the same time-series.

For example:

TS .. | STATS sum(rate(r1)), sum(rate(r2)) BY cluster, host, tbucket(1m)

Without this rule:

TS ..
| EXTRACT_FIELDS(r1, r2, cluster, host)
| STATS rate(r1), rate(r2), VALUES(cluster), VALUES(host) BY _tsid, tbucket(1m)
| ...

With this rule:

TS ..
| EXTRACT_FIELDS(r1, r2)
| STATS rate(r1), rate(r2), FIRST_DOC_ID(_doc) BY _tsid, tbucket(1m)
| EXTRACT_FIELDS(cluster, host)
| ...

Ideally, dimension fields should be read once per _tsid in the final result, similar to the fetch phase. Currently, dimension fields are read once per group key in each pipeline; if there are multiple time buckets, dimensions for the same _tsid are read multiple times. This can be avoided by extending ValuesSourceReaderOperator to understand the ordinals of _tsid. I will follow up with this improvement later to keep this PR small.

kkrik-es · 2025-10-04T11:48:18Z

...rc/main/java/org/elasticsearch/compute/aggregation/FirstDocIdGroupingAggregatorFunction.java

+        segments.set(groupId, docVector.segments().getInt(valuePosition));
+        docIds = bigArrays.grow(docIds, groupId + 1);
+        docIds.set(groupId, docVector.docs().getInt(valuePosition));
+        contextRefs.computeIfAbsent(shard, s -> {


Nit: Felix noticed separately that this is too slow, mind replacing with a check and insert?

sure, I pushed e333cdc

kkrik-es · 2025-10-04T11:52:51Z

...rc/main/java/org/elasticsearch/compute/aggregation/FirstDocIdGroupingAggregatorFunction.java

+                try {
+                    blocks[offset] = new DocVector(shardRefs::get, shardVector, segmentVector, docVector, null).asBlock();
+                } catch (Exception e) {
+                    throw e;


What do we get by catching and rethrowing?

sorry, leftover

kkrik-es

So awesome.

I assume this doesn't apply if there are functions applied to dimensions in grouping, e.g.

TS metrics | STATS sum(rate(reqs)) BY substr(host, 3)

elasticsearchmachine · 2025-10-15T14:51:55Z

Pinging @elastic/es-storage-engine (Team:StorageEngine)

elasticsearchmachine · 2025-10-15T14:51:56Z

Hi @dnhatn, I've created a changelog YAML for you.

…-dimensions

dnhatn · 2025-10-17T00:53:13Z

@kkrik-es Thanks for the review!

Late materialization of dimension fields in time-series

38893ef

elasticsearchmachine added the v9.3.0 label Oct 3, 2025

dnhatn requested review from kkrik-es and martijnvg October 3, 2025 18:07

Fix shard ref

643c574

kkrik-es reviewed Oct 4, 2025

View reviewed changes

kkrik-es approved these changes Oct 4, 2025

View reviewed changes

dnhatn added 3 commits October 4, 2025 07:46

check then put

e333cdc

Merge remote-tracking branch 'elastic/main' into extract-dimensions

4960953

Merge remote-tracking branch 'elastic/main' into extract-dimensions

742769a

kkrik-es mentioned this pull request Oct 9, 2025

Efficient retrieval of dimension values #136254

Open

kkrik-es marked this pull request as ready for review October 15, 2025 14:50

elasticsearchmachine added the needs:triage Requires assignment of a team area label label Oct 15, 2025

kkrik-es added :StorageEngine/TSDB You know, for Metrics Team:StorageEngine >enhancement and removed needs:triage Requires assignment of a team area label labels Oct 15, 2025

Update docs/changelog/135961.yaml

7fea128

dnhatn marked this pull request as draft October 15, 2025 15:13

dnhatn added 6 commits October 16, 2025 11:46

Merge remote-tracking branch 'elastic/main' into extract-dimensions

b0cab90

update

23edc54

Merge remote-tracking branch 'dnhatn/extract-dimensions' into extract…

bd8e47f

…-dimensions

fix tests

0c072e6

description

c060b50

builder

d62d0a4

dnhatn marked this pull request as ready for review October 17, 2025 00:52

dnhatn merged commit 3ee3331 into elastic:main Oct 17, 2025
34 checks passed

dnhatn deleted the extract-dimensions branch October 17, 2025 00:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Late materialization of dimension fields in time-series#135961

Late materialization of dimension fields in time-series#135961
dnhatn merged 12 commits intoelastic:mainfrom
dnhatn:extract-dimensions

dnhatn commented Oct 3, 2025 •

edited

Loading

kkrik-es Oct 4, 2025

dnhatn Oct 4, 2025

kkrik-es Oct 4, 2025

dnhatn Oct 4, 2025

kkrik-es left a comment

elasticsearchmachine commented Oct 15, 2025

elasticsearchmachine commented Oct 15, 2025

dnhatn commented Oct 17, 2025

Uh oh!

Labels

3 participants

Conversation

dnhatn commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

kkrik-es Oct 4, 2025

Choose a reason for hiding this comment

dnhatn Oct 4, 2025

Choose a reason for hiding this comment

kkrik-es Oct 4, 2025

Choose a reason for hiding this comment

dnhatn Oct 4, 2025

Choose a reason for hiding this comment

kkrik-es left a comment

Choose a reason for hiding this comment

elasticsearchmachine commented Oct 15, 2025

elasticsearchmachine commented Oct 15, 2025

dnhatn commented Oct 17, 2025

Uh oh!

Labels

3 participants

dnhatn commented Oct 3, 2025 •

edited

Loading