Improve SingleValueMatchQuery performance#135714
Conversation
* If a field is single-valued and dense, then rewrite to match all docs. * Allow SingleValueMatchQuery to be cached if a field is single-valued, given that it will never emit a warning.
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
|
Hi @martijnvg, I've created a changelog YAML for you. |
dnhatn
left a comment
There was a problem hiding this comment.
@martijnvg I have one question, but these changes will be helpful! Thanks Martijn.
| || pointValues.size() != pointValues.getDocCount()) { | ||
| return super.rewrite(indexSearcher); | ||
| NumericDocValues singleton = DocValues.unwrapSingleton(reader.getSortedNumericDocValues(fieldData.getFieldName())); | ||
| if (singleton != null) { |
There was a problem hiding this comment.
I might be missing something, but I think checking if the singleton is not null is sufficient. We don't need to verify that all documents have values to return match_all.
There was a problem hiding this comment.
This was my thinking as well, however then tests started to fail. I think this is because the query is supposed to returns only the docs with exactly one value. Also this is inline with the logic that checks points and terms (detecting that field is dense).
There was a problem hiding this comment.
I think this is because the query is supposed to returns only the docs with exactly one value
Yes, if we have a singleton, we should be able to shortcut to match_all. I think there might be an issue with the test.
There was a problem hiding this comment.
Nhat and I discussed in Slack:
the contract of the SingleValueMatchQuery query is that it only filters out docs that don't have exactly one value. If we were only to check whether doc values is a singleton during query rewrite, then that would break that contract for sparse dense fields (since also doc ids with zero values are included).
The plan is to do the rewrite to match only query if fields are singleton change in a followup change. Given that this changes to contract of SingleValueMatchQuery, but that shouldn't be an issue for es|ql. The contract would then be to exclude doc ids with more than 1 value.
Changes:
docIDRunEnd()method. This method will return maxDoc if field is dense.Note that the first change also allows WHERE clauses for single valued fields to be cached again.
Running the following without this change:
Running the following with the change: