Skip to content

Improve SingleValueMatchQuery performance#135714

Merged
martijnvg merged 6 commits intoelastic:mainfrom
martijnvg:SingleValueMatchQuery_tweaks_2
Oct 3, 2025
Merged

Improve SingleValueMatchQuery performance#135714
martijnvg merged 6 commits intoelastic:mainfrom
martijnvg:SingleValueMatchQuery_tweaks_2

Conversation

@martijnvg
Copy link
Member

@martijnvg martijnvg commented Sep 30, 2025

Changes:

  1. If a field is single-valued and dense, then rewrite to match all docs. This logic already exists for when a field has points or inverted index. The logic can now also work with doc values only using the docIDRunEnd() method. This method will return maxDoc if field is dense.
  2. Allow SingleValueMatchQuery to be cached if a field is single-valued, given that it will never emit a warning.

Note that the first change also allows WHERE clauses for single valued fields to be cached again.

Running the following without this change:

repeat 50000 curl -s -k -XPOST -H "Content-Type: application/json" -d '{"query": "FROM hits | WHERE AdvEngineID == 0 | STATS count = COUNT(*)"}' "http://localhost:9200/_query"  | jq .took

945
778
739
720
892
524
521
517
517
520
517
518
520
517
517
521
517
519

Running the following with the change:

repeat 50000 curl -s -k -XPOST -H "Content-Type: application/json" -d '{"query": "FROM hits | WHERE AdvEngineID == 0 | STATS count = COUNT(*)"}' "http://localhost:9200/_query"  | jq .took

432
249
247
247
377
5
5
5
5
4
5
4
4
4
4
4
4
4
4
4
4
4
4
* If a field is single-valued and dense, then rewrite to match all docs.
* Allow SingleValueMatchQuery to be cached if a field is single-valued, given that it will never emit a warning.
@elasticsearchmachine elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.2.0 labels Sep 30, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Hi @martijnvg, I've created a changelog YAML for you.

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! I don't know the lucene bits well enough these days to approve it, but I could reread if you need it. Otherwise, maybe @iverase can look too?

@martijnvg martijnvg requested a review from iverase October 1, 2025 07:08
Copy link
Contributor

@iverase iverase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martijnvg I have one question, but these changes will be helpful! Thanks Martijn.

|| pointValues.size() != pointValues.getDocCount()) {
return super.rewrite(indexSearcher);
NumericDocValues singleton = DocValues.unwrapSingleton(reader.getSortedNumericDocValues(fieldData.getFieldName()));
if (singleton != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something, but I think checking if the singleton is not null is sufficient. We don't need to verify that all documents have values to return match_all.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was my thinking as well, however then tests started to fail. I think this is because the query is supposed to returns only the docs with exactly one value. Also this is inline with the logic that checks points and terms (detecting that field is dense).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is because the query is supposed to returns only the docs with exactly one value

Yes, if we have a singleton, we should be able to shortcut to match_all. I think there might be an issue with the test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nhat and I discussed in Slack:

the contract of the SingleValueMatchQuery query is that it only filters out docs that don't have exactly one value. If we were only to check whether doc values is a singleton during query rewrite, then that would break that contract for sparse dense fields (since also doc ids with zero values are included).

The plan is to do the rewrite to match only query if fields are singleton change in a followup change. Given that this changes to contract of SingleValueMatchQuery, but that shouldn't be an issue for es|ql. The contract would then be to exclude doc ids with more than 1 value.

@martijnvg martijnvg merged commit 89f37e8 into elastic:main Oct 3, 2025
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/Compute Engine Analytics in ES|QL >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.3.0

5 participants