Skip to content

Late materialization of dimension fields in time-series#135961

Merged
dnhatn merged 12 commits intoelastic:mainfrom
dnhatn:extract-dimensions
Oct 17, 2025
Merged

Late materialization of dimension fields in time-series#135961
dnhatn merged 12 commits intoelastic:mainfrom
dnhatn:extract-dimensions

Conversation

@dnhatn
Copy link
Member

@dnhatn dnhatn commented Oct 3, 2025

This change adds an optimization rule for time-series queries that moves reading dimension fields from before the time-series operator to after, reading each dimension field once per group. This is possible because dimension field values for _tsid are identical across all documents in the same time-series.

For example:

TS .. | STATS sum(rate(r1)), sum(rate(r2)) BY cluster, host, tbucket(1m)

Without this rule:

TS ..
| EXTRACT_FIELDS(r1, r2, cluster, host)
| STATS rate(r1), rate(r2), VALUES(cluster), VALUES(host) BY _tsid, tbucket(1m)
| ...

With this rule:

TS ..
| EXTRACT_FIELDS(r1, r2)
| STATS rate(r1), rate(r2), FIRST_DOC_ID(_doc) BY _tsid, tbucket(1m)
| EXTRACT_FIELDS(cluster, host)
| ...

Ideally, dimension fields should be read once per _tsid in the final result, similar to the fetch phase. Currently, dimension fields are read once per group key in each pipeline; if there are multiple time buckets, dimensions for the same _tsid are read multiple times. This can be avoided by extending ValuesSourceReaderOperator to understand the ordinals of _tsid. I will follow up with this improvement later to keep this PR small.

segments.set(groupId, docVector.segments().getInt(valuePosition));
docIds = bigArrays.grow(docIds, groupId + 1);
docIds.set(groupId, docVector.docs().getInt(valuePosition));
contextRefs.computeIfAbsent(shard, s -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Felix noticed separately that this is too slow, mind replacing with a check and insert?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I pushed e333cdc

try {
blocks[offset] = new DocVector(shardRefs::get, shardVector, segmentVector, docVector, null).asBlock();
} catch (Exception e) {
throw e;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we get by catching and rethrowing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, leftover

Copy link
Contributor

@kkrik-es kkrik-es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So awesome.

I assume this doesn't apply if there are functions applied to dimensions in grouping, e.g.

TS metrics | STATS sum(rate(reqs)) BY substr(host, 3)
@kkrik-es kkrik-es marked this pull request as ready for review October 15, 2025 14:50
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Oct 15, 2025
@kkrik-es kkrik-es added :StorageEngine/TSDB You know, for Metrics Team:StorageEngine >enhancement and removed needs:triage Requires assignment of a team area label labels Oct 15, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@elasticsearchmachine
Copy link
Collaborator

Hi @dnhatn, I've created a changelog YAML for you.

@dnhatn dnhatn marked this pull request as draft October 15, 2025 15:13
@dnhatn dnhatn marked this pull request as ready for review October 17, 2025 00:52
@dnhatn
Copy link
Member Author

dnhatn commented Oct 17, 2025

@kkrik-es Thanks for the review!

@dnhatn dnhatn merged commit 3ee3331 into elastic:main Oct 17, 2025
34 checks passed
@dnhatn dnhatn deleted the extract-dimensions branch October 17, 2025 00:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment