ESQL: Track memory in evaluators#133392
Merged
nik9000 merged 10 commits intoelastic:mainfrom Aug 26, 2025
Merged
Conversation
If you write very very large ESQL queries you can spend a lot of memory on the expression evaluators themselves. You can certainly do it in real life, but our tests do something like: ``` FROM foo | EVAL a0001 = n + 1 | EVAL a0002 = a0001 + 1 | EVAL a0003 = a0002 + 1 ... | EVAL a5000 = a4999 + 1 | STATS MAX(a5000) ``` Each evaluator costs like 200 bytes a pop. For thousands of evaluators this adds up. So! We have to track it. Nhat had suggested charging a flat 200 bytes a pop. I thought about it and decided that it'd be pretty easy to get the actual size. Most of the evaluators are generated and it's a fairly small generated change to pick that up. So I did. We *do* build the evaluators before we cost them, but that's fine because they are very very small. So long as we account for them, I think it's safe.
Collaborator
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
Collaborator
|
Hi @nik9000, I've created a changelog YAML for you. |
Member
Author
|
If you are reading this for the first time, start here: https://github.com/elastic/elasticsearch/pull/133392/files#diff-411bc9dd7ffd062b664e3e2dc83482512d03a14cf15cb281f0f1897276fbffe5R68 |
dnhatn
approved these changes
Aug 22, 2025
Member
dnhatn
left a comment
There was a problem hiding this comment.
Thank you for fixing this. I see you labeled this as a bug; should we also backport it to 9.1 and 8.19?
Member
Author
Probably. I think I can do that. |
ivancea
approved these changes
Aug 25, 2025
Member
Author
|
Thanks friends! |
Member
Author
|
I'll backport this by hand. |
nik9000
added a commit
to nik9000/elasticsearch
that referenced
this pull request
Aug 26, 2025
If you write very very large ESQL queries you can spend a lot of memory on the expression evaluators themselves. You can certainly do it in real life, but our tests do something like: ``` FROM foo | EVAL a0001 = n + 1 | EVAL a0002 = a0001 + 1 | EVAL a0003 = a0002 + 1 ... | EVAL a5000 = a4999 + 1 | STATS MAX(a5000) ``` Each evaluator costs like 200 bytes a pop. For thousands of evaluators this adds up. So! We have to track it. Nhat had suggested charging a flat 200 bytes a pop. I thought about it and decided that it'd be pretty easy to get the actual size. Most of the evaluators are generated and it's a fairly small generated change to pick that up. So I did. We *do* build the evaluators before we cost them, but that's fine because they are very very small. So long as we account for them, I think it's safe.
nik9000
added a commit
to nik9000/elasticsearch
that referenced
this pull request
Aug 26, 2025
If you write very very large ESQL queries you can spend a lot of memory on the expression evaluators themselves. You can certainly do it in real life, but our tests do something like: ``` FROM foo | EVAL a0001 = n + 1 | EVAL a0002 = a0001 + 1 | EVAL a0003 = a0002 + 1 ... | EVAL a5000 = a4999 + 1 | STATS MAX(a5000) ``` Each evaluator costs like 200 bytes a pop. For thousands of evaluators this adds up. So! We have to track it. Nhat had suggested charging a flat 200 bytes a pop. I thought about it and decided that it'd be pretty easy to get the actual size. Most of the evaluators are generated and it's a fairly small generated change to pick that up. So I did. We *do* build the evaluators before we cost them, but that's fine because they are very very small. So long as we account for them, I think it's safe.
nik9000
added a commit
to nik9000/elasticsearch
that referenced
this pull request
Aug 26, 2025
When we compile the code for `CONTAINS` we generate an evaluator java class and commit that, as is our ancient custom. But because elastic#133016 didn't see elastic#133392, we committed out of date code. That's fine because we regenerate the code on every compile. But it's annoying because every clone is out of date. This updates the generated file. You may be asking "why do you commit the generated code if you just generate it at compile time?" That's a good question! It's a grand tradition, one that we will probably one day leave behind. But let's celebrate it today by committing more code.
dnhatn
pushed a commit
that referenced
this pull request
Aug 27, 2025
When we compile the code for `CONTAINS` we generate an evaluator java class and commit that, as is our ancient custom. But because #133016 didn't see #133392, we committed out of date code. That's fine because we regenerate the code on every compile. But it's annoying because every clone is out of date. This updates the generated file. You may be asking "why do you commit the generated code if you just generate it at compile time?" That's a good question! It's a grand tradition, one that we will probably one day leave behind. But let's celebrate it today by committing more code.
mjmbischoff
added a commit
to mjmbischoff/elasticsearch
that referenced
this pull request
Aug 27, 2025
ESQL: Track memory in evaluators (elastic#133392) got merged to main at the same as Add MV_CONTAINS function elastic#133099 which caused a compile-error and the merge was reverted. This commit addresses the compile-error.
nik9000
added a commit
that referenced
this pull request
Aug 27, 2025
If you write very very large ESQL queries you can spend a lot of memory on the expression evaluators themselves. You can certainly do it in real life, but our tests do something like: ``` FROM foo | EVAL a0001 = n + 1 | EVAL a0002 = a0001 + 1 | EVAL a0003 = a0002 + 1 ... | EVAL a5000 = a4999 + 1 | STATS MAX(a5000) ``` Each evaluator costs like 200 bytes a pop. For thousands of evaluators this adds up. So! We have to track it. Nhat had suggested charging a flat 200 bytes a pop. I thought about it and decided that it'd be pretty easy to get the actual size. Most of the evaluators are generated and it's a fairly small generated change to pick that up. So I did. We *do* build the evaluators before we cost them, but that's fine because they are very very small. So long as we account for them, I think it's safe.
nik9000
added a commit
that referenced
this pull request
Aug 27, 2025
If you write very very large ESQL queries you can spend a lot of memory on the expression evaluators themselves. You can certainly do it in real life, but our tests do something like: ``` FROM foo | EVAL a0001 = n + 1 | EVAL a0002 = a0001 + 1 | EVAL a0003 = a0002 + 1 ... | EVAL a5000 = a4999 + 1 | STATS MAX(a5000) ``` Each evaluator costs like 200 bytes a pop. For thousands of evaluators this adds up. So! We have to track it. Nhat had suggested charging a flat 200 bytes a pop. I thought about it and decided that it'd be pretty easy to get the actual size. Most of the evaluators are generated and it's a fairly small generated change to pick that up. So I did. We *do* build the evaluators before we cost them, but that's fine because they are very very small. So long as we account for them, I think it's safe.
sarog
pushed a commit
to portsbuild/elasticsearch
that referenced
this pull request
Sep 11, 2025
If you write very very large ESQL queries you can spend a lot of memory on the expression evaluators themselves. You can certainly do it in real life, but our tests do something like: ``` FROM foo | EVAL a0001 = n + 1 | EVAL a0002 = a0001 + 1 | EVAL a0003 = a0002 + 1 ... | EVAL a5000 = a4999 + 1 | STATS MAX(a5000) ``` Each evaluator costs like 200 bytes a pop. For thousands of evaluators this adds up. So! We have to track it. Nhat had suggested charging a flat 200 bytes a pop. I thought about it and decided that it'd be pretty easy to get the actual size. Most of the evaluators are generated and it's a fairly small generated change to pick that up. So I did. We *do* build the evaluators before we cost them, but that's fine because they are very very small. So long as we account for them, I think it's safe.
sarog
pushed a commit
to portsbuild/elasticsearch
that referenced
this pull request
Sep 19, 2025
If you write very very large ESQL queries you can spend a lot of memory on the expression evaluators themselves. You can certainly do it in real life, but our tests do something like: ``` FROM foo | EVAL a0001 = n + 1 | EVAL a0002 = a0001 + 1 | EVAL a0003 = a0002 + 1 ... | EVAL a5000 = a4999 + 1 | STATS MAX(a5000) ``` Each evaluator costs like 200 bytes a pop. For thousands of evaluators this adds up. So! We have to track it. Nhat had suggested charging a flat 200 bytes a pop. I thought about it and decided that it'd be pretty easy to get the actual size. Most of the evaluators are generated and it's a fairly small generated change to pick that up. So I did. We *do* build the evaluators before we cost them, but that's fine because they are very very small. So long as we account for them, I think it's safe.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
If you write very very large ESQL queries you can spend a lot of memory on the expression evaluators themselves. You can certainly do it in real life, but our tests do something like:
Each evaluator costs like 200 bytes a pop. For thousands of evaluators this adds up. So! We have to track it. This prevents OOMs in these semi-degenerate cases, instead throwing a CircuitBreakerException.
Nhat had suggested charging a flat 200 bytes a pop. I thought about it and decided that it'd be pretty easy to get the actual size. Most of the evaluators are generated and it's a fairly small generated change to pick that up. So I did.
We do build the evaluators before we cost them, but that's fine because they are very very small. So long as we account for them, I think it's safe.