Skip to content
5 changes: 5 additions & 0 deletions docs/changelog/138568.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 138568
summary: Use doc values skipper for `_tsid` in synthetic `_id` postings
area: TSDB
type: enhancement
issues: []
Original file line number Diff line number Diff line change
Expand Up @@ -280,22 +280,44 @@ private BytesRef lookupTsIdOrd(int tsIdOrdinal) throws IOException {
}

/**
* Scan all documents to find the first document that has a _tsid equal or greater than the provided _tsid ordinal, returning its
* document ID. If no document is found, the method returns {@link DocIdSetIterator#NO_MORE_DOCS}.
* Use a doc values skipper to find a starting document ID for the provided _tsid ordinal. The returned document ID might have the
* exact _tsid ordinal provided, or a lower one.
*
* Warning: This method is very slow because it potentially scans all documents in the segment.
* @param tsIdOrd the _tsid ordinal
* @return a docID to start scanning documents from in order to find the first document ID matching the provided _tsid
* @throws IOException if any I/O exception occurs
*/
private int slowScanToFirstDocWithTsIdOrdinalEqualOrGreaterThan(int tsIdOrd) throws IOException {
private int findStartDocIDForTsIdOrd(int tsIdOrd) throws IOException {
var skipper = docValuesProducer.getSkipper(tsIdFieldInfo);
assert skipper != null;
if (skipper.minValue() > tsIdOrd || tsIdOrd > skipper.maxValue()) {
return DocIdSetIterator.NO_MORE_DOCS;
}
skipper.advance(tsIdOrd, Long.MAX_VALUE);
return skipper.minDocID(0);
Copy link
Contributor

@fcofdez fcofdez Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my understanding, if we don't find the tsIdOrd at level 0, this will return NO_MORE_DOCS? I think that I might be missing something here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the ordinal is not found in the first level 0, then it skips to the next levels until it finds a level that includes the ordinal or exhaust the iterator, in which case the Javadoc indicates that minDocs returns NO_MORE_DOCS.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that a DocValuesSkipper is kind of a skip list on top of docs values blocks of data.

If that helps, here is a representation of such skipper levels:

minValue: 0, maxValue: 0, [minDocID: 0, maxDocID: 31], docCount: 32, level: 0/3
minValue: 0, maxValue: 7, [minDocID: 0, maxDocID: 718], docCount: 719,  level: 1/3
minValue: 0, maxValue: 64, [minDocID: 0, maxDocID: 4732], docCount: 4733,  level: 2/3
minValue: 0, maxValue: 0, [minDocID: -1, maxDocID: -1], docCount: 0,  level: 3/3
minValue: 1, maxValue: 1, [minDocID: 32, maxDocID: 178], docCount: 147, level: 0/3
minValue: 0, maxValue: 7, [minDocID: 0, maxDocID: 718], docCount: 719,  level: 1/3
minValue: 0, maxValue: 64, [minDocID: 0, maxDocID: 4732], docCount: 4733,  level: 2/3
minValue: 0, maxValue: 0, [minDocID: -1, maxDocID: -1], docCount: 0,  level: 3/3
minValue: 2, maxValue: 2, [minDocID: 179, maxDocID: 269], docCount: 91, level: 0/3
minValue: 0, maxValue: 7, [minDocID: 0, maxDocID: 718], docCount: 719,  level: 1/3
minValue: 0, maxValue: 64, [minDocID: 0, maxDocID: 4732], docCount: 4733,  level: 2/3
minValue: 0, maxValue: 0, [minDocID: -1, maxDocID: -1], docCount: 0,  level: 3/3
...
minValue: 8, maxValue: 8, [minDocID: 719, maxDocID: 765], docCount: 47, level: 0/3
minValue: 8, maxValue: 15, [minDocID: 719, maxDocID: 1440], docCount: 722,  level: 1/3
minValue: 0, maxValue: 64, [minDocID: 0, maxDocID: 4732], docCount: 4733,  level: 2/3
minValue: 0, maxValue: 0, [minDocID: -1, maxDocID: -1], docCount: 0,  level: 3/3
...

For example, when looking for tsIdOrd == 9 the advance(min, max) method executes:

  • the first level minValue: 0, maxValue: 0, [minDocID: 0, maxDocID: 31] has max value 0 below 9 so we can skip to maxDocID + 1 = 32
  • while there we can check if we can skip even more docs so we look up the next level 1 which is minValue: 0, maxValue: 7, [minDocID: 0, maxDocID: 718] which also has a max value 7 < 9 so we can in fact skip to maxDocID + 1 = 718 + 1 = 719
  • next level 2 has a max value of 64 so we cannot skip more
  • we advance the iterator to 719
  • our new level 0 is now minValue: 8, maxValue: 8, [minDocID: 719, maxDocID: 765], with max value of 8 we can skip all docs until 765 +1
  • while there we check if we can skip more in the next level 1, which is minValue: 8, maxValue: 15, [minDocID: 719, maxDocID: 1440] and has max value of 15, so tsIdOrd == 9 is between docs ids [766, 1440]
  • the while loop ends with minDocs(0) == 766

I hope it helps. It took me some time to understand all of this 🫠

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed explanation, this makes sense 👍

}

/**
* Find the first document that has a _tsid equal or greater than the provided _tsid ordinal, returning its document ID. If no
* document is found, the method returns {@link DocIdSetIterator#NO_MORE_DOCS}.
*
* Warning: This method can be slow because it potentially scans many documents in the segment.
*/
private int findFirstDocWithTsIdOrdinalEqualOrGreaterThan(int tsIdOrd) throws IOException {
final int startDocId = findStartDocIDForTsIdOrd(tsIdOrd);
if (startDocId == DocIdSetIterator.NO_MORE_DOCS) {
return startDocId;
}
// recreate even if doc values are already on the same ordinal, to ensure the method returns the first doc
if (tsIdDocValues == null || (cachedTsIdOrd != -1 && cachedTsIdOrd >= tsIdOrd)) {
if (tsIdDocValues == null || (cachedTsIdOrd != -1 && cachedTsIdOrd >= tsIdOrd) || tsIdDocValues.docID() > startDocId) {
tsIdDocValues = docValuesProducer.getSorted(tsIdFieldInfo);
cachedTsIdOrd = -1;
cachedTsId = null;
}
assert 0 <= tsIdOrd : tsIdOrd;
assert tsIdOrd < tsIdDocValues.getValueCount() : tsIdOrd;

for (int docID = 0; docID != DocIdSetIterator.NO_MORE_DOCS; docID = tsIdDocValues.nextDoc()) {
for (int docID = startDocId; docID != DocIdSetIterator.NO_MORE_DOCS; docID = tsIdDocValues.nextDoc()) {
boolean found = tsIdDocValues.advanceExact(docID);
assert found : "No value found for field [" + tsIdFieldInfo.getName() + " and docID " + docID;
var ord = tsIdDocValues.ordValue();
Expand All @@ -313,22 +335,25 @@ private int slowScanToFirstDocWithTsIdOrdinalEqualOrGreaterThan(int tsIdOrd) thr
}

/**
* Scan all documents to find the first document that has a _tsid equal to the provided _tsid ordinal, returning its
* document ID. If no document is found, the method returns {@link DocIdSetIterator#NO_MORE_DOCS}.
* Find the first document that has a _tsid equal to the provided _tsid ordinal, returning its document ID. If no document is found,
* the method returns {@link DocIdSetIterator#NO_MORE_DOCS}.
*
* Warning: This method is very slow because it potentially scans all documents in the segment.
* Warning: This method can be slow because it potentially scans many documents in the segment.
*/
private int slowScanToFirstDocWithTsIdOrdinalEqualTo(int tsIdOrd) throws IOException {
private int findFirstDocWithTsIdOrdinalEqualTo(int tsIdOrd) throws IOException {
final int startDocId = findStartDocIDForTsIdOrd(tsIdOrd);
assert startDocId != DocIdSetIterator.NO_MORE_DOCS : startDocId;

// recreate even if doc values are already on the same ordinal, to ensure the method returns the first doc
if (tsIdDocValues == null || (cachedTsIdOrd != -1 && cachedTsIdOrd >= tsIdOrd)) {
if (tsIdDocValues == null || (cachedTsIdOrd != -1 && cachedTsIdOrd >= tsIdOrd) || tsIdDocValues.docID() > startDocId) {
tsIdDocValues = docValuesProducer.getSorted(tsIdFieldInfo);
cachedTsIdOrd = -1;
cachedTsId = null;
}
assert 0 <= tsIdOrd : tsIdOrd;
assert tsIdOrd < tsIdDocValues.getValueCount() : tsIdOrd;

for (int docID = 0; docID != DocIdSetIterator.NO_MORE_DOCS; docID = tsIdDocValues.nextDoc()) {
for (int docID = startDocId; docID != DocIdSetIterator.NO_MORE_DOCS; docID = tsIdDocValues.nextDoc()) {
boolean found = tsIdDocValues.advanceExact(docID);
assert found : "No value found for field [" + tsIdFieldInfo.getName() + " and docID " + docID;
var ord = tsIdDocValues.ordValue();
Expand Down Expand Up @@ -441,7 +466,7 @@ public SeekStatus seekCeil(BytesRef id) throws IOException {
tsIdOrd = -tsIdOrd - 1;
// set the terms enum on the first non-matching document
if (tsIdOrd < docValues.getTsIdValueCount()) {
int docID = docValues.slowScanToFirstDocWithTsIdOrdinalEqualOrGreaterThan(tsIdOrd);
int docID = docValues.findFirstDocWithTsIdOrdinalEqualOrGreaterThan(tsIdOrd);
if (docID != DocIdSetIterator.NO_MORE_DOCS) {
current = new SyntheticTerm(
docID,
Expand All @@ -461,8 +486,8 @@ public SeekStatus seekCeil(BytesRef id) throws IOException {
// _tsid found, extract the timestamp
final long timestamp = TsidExtractingIdFieldMapper.extractTimestampFromSyntheticId(id);

// Slow scan to the first document matching the _tsid
final int startDocID = docValues.slowScanToFirstDocWithTsIdOrdinalEqualTo(tsIdOrd);
// Find the first document matching the _tsid
final int startDocID = docValues.findFirstDocWithTsIdOrdinalEqualTo(tsIdOrd);
assert startDocID >= 0 : startDocID;

int docID = startDocID;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@ public static ParsedDocument noopTombstone(SeqNoFieldMapper.SeqNoIndexOptions se
* The returned document consists only _uid, _seqno, _term and _version fields; other metadata fields are excluded.
* @param id the id of the deleted document
*/
// used by tests
public static ParsedDocument deleteTombstone(SeqNoFieldMapper.SeqNoIndexOptions seqNoIndexOptions, String id) {
return deleteTombstone(seqNoIndexOptions, false /* ignored */, false, id, null /* ignored */);
}
Expand All @@ -101,6 +102,9 @@ public static ParsedDocument deleteTombstone(
// Use a synthetic _id field which is not indexed nor stored
document.add(IdFieldMapper.syntheticIdField(id));

// Add doc values fields that are used to synthesize the synthetic _id.
// Note: It is not strictly required for tombstones documents but we decided to add them so that iterating and seeking synthetic
// _id terms over tombstones also work as if a regular _id field was present.
var timeSeriesId = TsidExtractingIdFieldMapper.extractTimeSeriesIdFromSyntheticId(uid);
var timestamp = TsidExtractingIdFieldMapper.extractTimestampFromSyntheticId(uid);
var routingHash = TsidExtractingIdFieldMapper.extractRoutingHashBytesFromSyntheticId(uid);
Expand Down