Speed up sorts on secondary sort fields#137533
Conversation
… with the recorded bottom. When construction a new CompetitiveDISIBuilder, then check whether global min/max points or global min/max doc values skipper are comparative with the bottom. If so, then update competitiveIterator with an empty iterator, because no documents will have a value that is competitive with the current recorded bottom in the current segment. Doing this at CompetitiveDISIBuilder construction is cheap and allows to immediately prune, instead of waiting until doUpdateCompetitiveIterator(...) is invoked.
…g/sort-still-slow
|
Pinging @elastic/es-search (Team:Search) |
|
Hi @romseygeek, I've created a changelog YAML for you. |
…g/sort-still-slow
…g/sort-still-slow
…g/sort-still-slow
|
Buildkite benchmark this with elastic/logs please |
|
Buildkite benchmark this with elastic-logs please |
💚 Build Succeeded
This build ran two elastic-logs benchmarks to evaluate performance impact of this PR. Historycc @romseygeek |
martijnvg
left a comment
There was a problem hiding this comment.
Nice work 👍 . I left a few comments, but otherwise LGTM.
| * range, using DocValueSkippers on the primary sort field to advance rapidly | ||
| * to the next block of values. | ||
| */ | ||
| class SecondarySortIterator extends DocIdSetIterator { |
There was a problem hiding this comment.
Maybe add a unit test for this iterator? Maybe we can do a test duel against DocValuesRangeIterator wrapped in an interator?
| } | ||
| DocValuesSkipper skipper = context.reader().getDocValuesSkipper(field); | ||
| DocValuesSkipper primaryFieldSkipper = context.reader().getDocValuesSkipper(sortFields[0].getField()); | ||
| if (primaryFieldSkipper == null || skipper.docCount() != maxDoc || primaryFieldSkipper.docCount() != maxDoc) { |
There was a problem hiding this comment.
In the case primaryFieldSkipper is null, then the secondary sort field can be treated as primary sort and we can go faster (like in SortedNumericDocValuesRangeQuery#getDocIdSetIteratorOrNullForPrimarySort). But I don't think this happens now? Because super.buildCompetitiveDISIBuilder(context); will not detect this?
The same applies if primary sort field has just one value.
If this is true. Let's address this then in a followup?
| competitiveDISIBuilder = buildCompetitiveDISIBuilder(context); | ||
| } | ||
|
|
||
| protected CompetitiveDISIBuilder buildCompetitiveDISIBuilder(LeafReaderContext context) throws IOException { |
There was a problem hiding this comment.
This is the bit we want to contribute to upstream?
…g/sort-still-slow
This adds a competitive iterator implementation that will take advantage of doc value skippers in the case that: * the index is sorted by a low cardinality field like hostname, and then by a high cardinality field like timestamp * skippers are enabled on both of these fields * the query is sorted by the high cardinality field. To be able to plug this new implementation into the lucene sort architecture, we need to fork NumericComparator and some associated classes. LongValuesComparatorSource now returns the forked version with the new competitive iterator builder.
This adds a competitive iterator implementation that will take advantage of doc
value skippers in the case that:
field like timestamp
To be able to plug this new implementation into the lucene sort architecture, we need
to fork NumericComparator and some associated classes. LongValuesComparatorSource
now returns the forked version with the new competitive iterator builder.