Skip to content

Minimize doc values fetches in TSDBSyntheticIdFieldsProducer#139053

Merged
tlrx merged 4 commits intoelastic:mainfrom
tlrx:2025/12/04/minimize-doc-values-in-seeks
Dec 5, 2025
Merged

Minimize doc values fetches in TSDBSyntheticIdFieldsProducer#139053
tlrx merged 4 commits intoelastic:mainfrom
tlrx:2025/12/04/minimize-doc-values-in-seeks

Conversation

@tlrx
Copy link
Member

@tlrx tlrx commented Dec 4, 2025

When seeking for a synthetic _id term in TSDBSyntheticIdFieldsProducer we populate all the information required to build a SyntheticTerm instance, even if the term is never retrieved after seeking.

Instead we can only track the current document ID and eventually the _tsid ordinal and timestamp values if those were fetched for seeking. The rest of the information (_tsid term, routing hash term) can be fetched on demand to build the synthetic _id term.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@elasticsearchmachine
Copy link
Collaborator

Hi @tlrx, I've created a changelog YAML for you.

@tlrx tlrx requested a review from fcofdez December 4, 2025 12:41
Copy link
Contributor

@fcofdez fcofdez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Nice improvement 👍

long docTimestamp;
firstDocID = docValues.skipDocIDForTimestamp(timestamp, firstDocID);
if (firstDocID != DocIdSetIterator.NO_MORE_DOCS) {
int nextDocID = firstDocID;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice improvement for the variable names

@tlrx tlrx merged commit 70845e3 into elastic:main Dec 5, 2025
34 checks passed
@tlrx tlrx deleted the 2025/12/04/minimize-doc-values-in-seeks branch December 5, 2025 11:50
@tlrx
Copy link
Member Author

tlrx commented Dec 5, 2025

Thanks Francisco!

tlrx added a commit to tlrx/elasticsearch that referenced this pull request Dec 8, 2025
…#139053)

When seeking for a synthetic _id term in TSDBSyntheticIdFieldsProducer
we populate all the information required to build a SyntheticTerm instance,
even if the term is never retrieved after seeking.

Instead we can only track the current document ID and eventually the _tsid
ordinal and timestamp values if those were fetched for seeking. The rest of
the information (_tsid term, routing hash term) can be fetched on demand
to build the synthetic _id term.
# Conflicts:
#	server/src/main/java/org/elasticsearch/index/codec/tsdb/TSDBSyntheticIdFieldsProducer.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment