Improve downsample performance by buffering docids and do bulk processing.#124477
Improve downsample performance by buffering docids and do bulk processing.#124477martijnvg merged 5 commits intoelastic:mainfrom
Conversation
|
Hi @martijnvg, I've created a changelog YAML for you. |
ff95cf7 to
0f70d99
Compare
|
The result without this change downsampling tsdb index to 1m interval buckets: The result without this change downsampling tsdb index to 1h interval buckets: The result with this change downsampling tsdb index to 1m interval buckets: The result with this change downsampling tsdb index to 1h interval buckets: |
0f70d99 to
04c909d
Compare
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
| } | ||
|
|
||
| void bulkCollection() throws IOException { | ||
| // The leaf bucked collectors newer timestamp go first, other we capture the incorrect last value for counters and labels. |
There was a problem hiding this comment.
The leaf bucket collectors with newer timestamp go first, to correctly capture the last value for counters and labels.
| if (docValuesCount == 1) { | ||
| label.collect(docValues.nextValue()); | ||
| } else { | ||
| var values = new Object[docValuesCount]; |
There was a problem hiding this comment.
How large can this be? Do we need to use bigarrays to track its memory?
There was a problem hiding this comment.
Seems like it's smaller than DOCID_BUFFER_SIZE.. Should be fine.
There was a problem hiding this comment.
It is for multivalued fields. If a document has a json array, then this is the size of the json array (actually the number of all unique values this document has for the field being read). Typically this is small.
Note that this has been this way also before this change. I don't recall issues around this.
| firstTimeStampForBulkCollection = aggCtx.getTimestamp(); | ||
| } | ||
| // buffer.add() always delegates to system.arraycopy() and checks buffer size for resizing purposes: | ||
| buffer.buffer[buffer.elementsCount++] = docId; |
There was a problem hiding this comment.
Nit: let's rename buffer to docIdBuffer, to avoid the buffer.buffer occurrence..
kkrik-es
left a comment
There was a problem hiding this comment.
LGTM, added Nhat too to play it safe.
No description provided.