Use doc values skipper for _tsid in synthetic _id postings#138568
Use doc values skipper for _tsid in synthetic _id postings#138568tlrx merged 9 commits intoelastic:mainfrom
Conversation
Relates ES-13604
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
|
Hi @tlrx, I've created a changelog YAML for you. |
server/src/main/java/org/elasticsearch/index/codec/tsdb/TSDBSyntheticIdFieldsProducer.java
Outdated
Show resolved
Hide resolved
| // _id terms over tombstones also work as if a regular _id field was present. | ||
| document.add(SortedDocValuesField.indexedField(TimeSeriesIdFieldMapper.NAME, extractTimeSeriesIdFromSyntheticId(uid))); | ||
| document.add(SortedNumericDocValuesField.indexedField("@timestamp", extractTimestampFromSyntheticId(uid))); | ||
| document.add(new SortedDocValuesField(TimeSeriesRoutingHashFieldMapper.NAME, extractRoutingHashBytesFromSyntheticId(uid))); |
There was a problem hiding this comment.
Does this need to be gated by an IndexVersion?
There was a problem hiding this comment.
I don't think it needs but it's better to be safe indeed. So I reverted the change which uses the USE_DOC_VALUES_SKIPPER index setting.
| } | ||
| skipper.advance(maxDocID + 1); | ||
| } | ||
| return skipper.minDocID(0); |
There was a problem hiding this comment.
Just for my understanding, if we don't find the tsIdOrd at level 0, this will return NO_MORE_DOCS? I think that I might be missing something here.
There was a problem hiding this comment.
If the ordinal is not found in the first level 0, then it skips to the next levels until it finds a level that includes the ordinal or exhaust the iterator, in which case the Javadoc indicates that minDocs returns NO_MORE_DOCS.
There was a problem hiding this comment.
My understanding is that a DocValuesSkipper is kind of a skip list on top of docs values blocks of data.
If that helps, here is a representation of such skipper levels:
minValue: 0, maxValue: 0, [minDocID: 0, maxDocID: 31], docCount: 32, level: 0/3
minValue: 0, maxValue: 7, [minDocID: 0, maxDocID: 718], docCount: 719, level: 1/3
minValue: 0, maxValue: 64, [minDocID: 0, maxDocID: 4732], docCount: 4733, level: 2/3
minValue: 0, maxValue: 0, [minDocID: -1, maxDocID: -1], docCount: 0, level: 3/3
minValue: 1, maxValue: 1, [minDocID: 32, maxDocID: 178], docCount: 147, level: 0/3
minValue: 0, maxValue: 7, [minDocID: 0, maxDocID: 718], docCount: 719, level: 1/3
minValue: 0, maxValue: 64, [minDocID: 0, maxDocID: 4732], docCount: 4733, level: 2/3
minValue: 0, maxValue: 0, [minDocID: -1, maxDocID: -1], docCount: 0, level: 3/3
minValue: 2, maxValue: 2, [minDocID: 179, maxDocID: 269], docCount: 91, level: 0/3
minValue: 0, maxValue: 7, [minDocID: 0, maxDocID: 718], docCount: 719, level: 1/3
minValue: 0, maxValue: 64, [minDocID: 0, maxDocID: 4732], docCount: 4733, level: 2/3
minValue: 0, maxValue: 0, [minDocID: -1, maxDocID: -1], docCount: 0, level: 3/3
...
minValue: 8, maxValue: 8, [minDocID: 719, maxDocID: 765], docCount: 47, level: 0/3
minValue: 8, maxValue: 15, [minDocID: 719, maxDocID: 1440], docCount: 722, level: 1/3
minValue: 0, maxValue: 64, [minDocID: 0, maxDocID: 4732], docCount: 4733, level: 2/3
minValue: 0, maxValue: 0, [minDocID: -1, maxDocID: -1], docCount: 0, level: 3/3
...
For example, when looking for tsIdOrd == 9 the advance(min, max) method executes:
- the first level
minValue: 0, maxValue: 0, [minDocID: 0, maxDocID: 31]has max value0below9so we can skip tomaxDocID + 1 = 32 - while there we can check if we can skip even more docs so we look up the next level
1which isminValue: 0, maxValue: 7, [minDocID: 0, maxDocID: 718]which also has a max value7 < 9so we can in fact skip tomaxDocID + 1 = 718 + 1 = 719 - next level
2has a max value of64so we cannot skip more - we advance the iterator to 719
- our new level
0is nowminValue: 8, maxValue: 8, [minDocID: 719, maxDocID: 765], with max value of8we can skip all docs until765 +1 - while there we check if we can skip more in the next level
1, which isminValue: 8, maxValue: 15, [minDocID: 719, maxDocID: 1440]and has max value of15, sotsIdOrd == 9is between docs ids[766, 1440] - the while loop ends with
minDocs(0) == 766
I hope it helps. It took me some time to understand all of this 🫠
There was a problem hiding this comment.
Thanks for the detailed explanation, this makes sense 👍
| assert skipper != null; | ||
|
|
||
| if (skipper.minValue() >= tsIdOrd) { | ||
| skipper.advance(0); |
There was a problem hiding this comment.
If the skipper minValue is greater than your requested tsid then that means the tsid isn't present in the segment, so this should probably also return NO_MORE_DOCS? Or maybe trigger an assertion?
|
Thanks Francisco & Alan! |
…ic#138568 Follow-up of elastic#138568. Relates ES-13604
Instead of scanning all documents to find the first document that has a
_tsidless than, or equal to, a given ordinal we can use a doc values skipper to skip as much as possible documents, and only then scan. the remaining docs.When seeking a synthetic _id, we look up the
_tsidordinal, then use DV skipper to find a starting doc ID, then scan each doc to find the first doc ID matching the exact_tsidordinal. Then we finally scan remaining docs to find the one matching the timestamp.I wonder if that also makes sense to use DV skipper for the timestamp too?
Relates ES-13604