ESQL: Add documents_found and values_loaded (#125631)#130029
Merged
nik9000 merged 6 commits intoelastic:8.19from Jun 26, 2025
Merged
ESQL: Add documents_found and values_loaded (#125631)#130029nik9000 merged 6 commits intoelastic:8.19from
documents_found and values_loaded (#125631)#130029nik9000 merged 6 commits intoelastic:8.19from
Conversation
This adds `documents_found` and `values_loaded` to the to the ESQL response:
```json
{
"took" : 194,
"is_partial" : false,
"documents_found" : 100000,
"values_loaded" : 200000,
"columns" : [
{ "name" : "a", "type" : "long" },
{ "name" : "b", "type" : "long" }
],
"values" : [[10, 1]]
}
```
These are cheap enough to collect that we can do it for every query and
return it with every response. It's small, but it still gives you a
reasonable sense of how much work Elasticsearch had to go through to
perform the query.
I've also added these two fields to the driver profile and task status:
```json
"drivers" : [
{
"description" : "data",
"cluster_name" : "runTask",
"node_name" : "runTask-0",
"start_millis" : 1742923173077,
"stop_millis" : 1742923173087,
"took_nanos" : 9557014,
"cpu_nanos" : 9091340,
"documents_found" : 5, <---- THESE
"values_loaded" : 15, <---- THESE
"iterations" : 6,
...
```
These are at a high level and should be easy to reason about. We'd like to
extract this into a "show me how difficult this running query is" API one
day. But today, just plumbing it into the debugging output is good.
Any `Operator` can claim to "find documents" or "load values" by overriding
a method on its `Operator.Status` implementation:
```java
/**
* The number of documents found by this operator. Most operators
* don't find documents and will return {@code 0} here.
*/
default long documentsFound() {
return 0;
}
/**
* The number of values loaded by this operator. Most operators
* don't load values and will return {@code 0} here.
*/
default long valuesLoaded() {
return 0;
}
```
In this PR all of the `LuceneOperator`s declare that each `position` they
emit is a "document found" and the `ValuesSourceValuesSourceReaderOperator`
says each value it makes is a "value loaded". That's pretty pretty much
true. The `LuceneCountOperator` and `LuceneMinMaxOperator` sort of pretend
that the count/min/max that they emit is a "document" - but that's good
enough to give you a sense of what's going on. It's *like* document.
Contributor
|
Documentation preview: |
Member
Author
Member
Author
|
Also needs a manual review from me to make sure the backport is truly just what it should be. It wasn't clean at all and I had to make a lot of modifications. I have to double check those are sane. |
nik9000
commented
Jun 25, 2025
Member
Author
nik9000
left a comment
There was a problem hiding this comment.
Seems right to me modulo the three things I found. I'll fix those in a moment.
| columns, | ||
| result.pages(), | ||
| result.completionInfo().documentsFound(), | ||
| result.completionInfo().documentsFound(), |
Member
Author
There was a problem hiding this comment.
This looks wrong. And it's wrong in main too!
| default -> throw new IllegalArgumentException(); | ||
| }; | ||
| } | ||
| ; |
Member
Author
|
@idegtiarenko, could you have a look at this and double check it against the original PR? It looks right to me, but I'd appreciate a second set of eyes. And you are the one who needs this backport in so you get to suffer a little. |
idegtiarenko
approved these changes
Jun 26, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This adds
documents_foundandvalues_loadedto the to the ESQL response:{ "took" : 194, "is_partial" : false, "documents_found" : 100000, "values_loaded" : 200000, "columns" : [ { "name" : "a", "type" : "long" }, { "name" : "b", "type" : "long" } ], "values" : [[10, 1]] }These are cheap enough to collect that we can do it for every query and return it with every response. It's small, but it still gives you a reasonable sense of how much work Elasticsearch had to go through to perform the query.
I've also added these two fields to the driver profile and task status:
These are at a high level and should be easy to reason about. We'd like to extract this into a "show me how difficult this running query is" API one day. But today, just plumbing it into the debugging output is good.
Any
Operatorcan claim to "find documents" or "load values" by overriding a method on itsOperator.Statusimplementation:In this PR all of the
LuceneOperators declare that eachpositionthey emit is a "document found" and theValuesSourceValuesSourceReaderOperatorsays each value it makes is a "value loaded". That's pretty pretty much true. TheLuceneCountOperatorandLuceneMinMaxOperatorsort of pretend that the count/min/max that they emit is a "document" - but that's good enough to give you a sense of what's going on. It's like document.