Skip to content

ESQL: Track memory in evaluators#133392

Merged
nik9000 merged 10 commits intoelastic:mainfrom
nik9000:esql_track_evaluators
Aug 26, 2025
Merged

ESQL: Track memory in evaluators#133392
nik9000 merged 10 commits intoelastic:mainfrom
nik9000:esql_track_evaluators

Conversation

@nik9000
Copy link
Member

@nik9000 nik9000 commented Aug 22, 2025

If you write very very large ESQL queries you can spend a lot of memory on the expression evaluators themselves. You can certainly do it in real life, but our tests do something like:

FROM foo
| EVAL a0001 = n + 1
| EVAL a0002 = a0001 + 1
| EVAL a0003 = a0002 + 1
...
| EVAL a5000 = a4999 + 1
| STATS MAX(a5000)

Each evaluator costs like 200 bytes a pop. For thousands of evaluators this adds up. So! We have to track it. This prevents OOMs in these semi-degenerate cases, instead throwing a CircuitBreakerException.

Nhat had suggested charging a flat 200 bytes a pop. I thought about it and decided that it'd be pretty easy to get the actual size. Most of the evaluators are generated and it's a fairly small generated change to pick that up. So I did.

We do build the evaluators before we cost them, but that's fine because they are very very small. So long as we account for them, I think it's safe.

If you write very very large ESQL queries you can spend a lot of memory
on the expression evaluators themselves. You can certainly do it in real
life, but our tests do something like:
```
FROM foo
| EVAL a0001 = n + 1
| EVAL a0002 = a0001 + 1
| EVAL a0003 = a0002 + 1
...
| EVAL a5000 = a4999 + 1
| STATS MAX(a5000)
```

Each evaluator costs like 200 bytes a pop. For thousands of evaluators
this adds up. So! We have to track it.

Nhat had suggested charging a flat 200 bytes a pop. I thought about it
and decided that it'd be pretty easy to get the actual size. Most of the
evaluators are generated and it's a fairly small generated change to
pick that up. So I did.

We *do* build the evaluators before we cost them, but that's fine
because they are very very small. So long as we account for them, I
think it's safe.
@nik9000 nik9000 requested a review from dnhatn August 22, 2025 13:09
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Aug 22, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Hi @nik9000, I've created a changelog YAML for you.

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for fixing this. I see you labeled this as a bug; should we also backport it to 9.1 and 8.19?

@nik9000
Copy link
Member Author

nik9000 commented Aug 22, 2025

I see you labeled this as a bug; should we also backport it to 9.1 and 8.19?

Probably. I think I can do that.

@nik9000 nik9000 merged commit f32c348 into elastic:main Aug 26, 2025
33 checks passed
@nik9000
Copy link
Member Author

nik9000 commented Aug 26, 2025

Thanks friends!

@nik9000
Copy link
Member Author

nik9000 commented Aug 26, 2025

I'll backport this by hand.

nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Aug 26, 2025
If you write very very large ESQL queries you can spend a lot of memory
on the expression evaluators themselves. You can certainly do it in real
life, but our tests do something like:
```
FROM foo
| EVAL a0001 = n + 1
| EVAL a0002 = a0001 + 1
| EVAL a0003 = a0002 + 1
...
| EVAL a5000 = a4999 + 1
| STATS MAX(a5000)
```

Each evaluator costs like 200 bytes a pop. For thousands of evaluators
this adds up. So! We have to track it.

Nhat had suggested charging a flat 200 bytes a pop. I thought about it
and decided that it'd be pretty easy to get the actual size. Most of the
evaluators are generated and it's a fairly small generated change to
pick that up. So I did.

We *do* build the evaluators before we cost them, but that's fine
because they are very very small. So long as we account for them, I
think it's safe.
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Aug 26, 2025
If you write very very large ESQL queries you can spend a lot of memory
on the expression evaluators themselves. You can certainly do it in real
life, but our tests do something like:
```
FROM foo
| EVAL a0001 = n + 1
| EVAL a0002 = a0001 + 1
| EVAL a0003 = a0002 + 1
...
| EVAL a5000 = a4999 + 1
| STATS MAX(a5000)
```

Each evaluator costs like 200 bytes a pop. For thousands of evaluators
this adds up. So! We have to track it.

Nhat had suggested charging a flat 200 bytes a pop. I thought about it
and decided that it'd be pretty easy to get the actual size. Most of the
evaluators are generated and it's a fairly small generated change to
pick that up. So I did.

We *do* build the evaluators before we cost them, but that's fine
because they are very very small. So long as we account for them, I
think it's safe.
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Aug 26, 2025
When we compile the code for `CONTAINS` we generate an evaluator java
class and commit that, as is our ancient custom. But because elastic#133016
didn't see elastic#133392, we committed out of date code. That's fine because
we regenerate the code on every compile. But it's annoying because every
clone is out of date. This updates the generated file.

You may be asking "why do you commit the generated code if you just
generate it at compile time?" That's a good question! It's a grand
tradition, one that we will probably one day leave behind. But let's
celebrate it today by committing more code.
dnhatn pushed a commit that referenced this pull request Aug 27, 2025
When we compile the code for `CONTAINS` we generate an evaluator java
class and commit that, as is our ancient custom. But because #133016
didn't see #133392, we committed out of date code. That's fine because
we regenerate the code on every compile. But it's annoying because every
clone is out of date. This updates the generated file.

You may be asking "why do you commit the generated code if you just
generate it at compile time?" That's a good question! It's a grand
tradition, one that we will probably one day leave behind. But let's
celebrate it today by committing more code.
mjmbischoff added a commit to mjmbischoff/elasticsearch that referenced this pull request Aug 27, 2025
ESQL: Track memory in evaluators (elastic#133392) got merged to main at the same as Add MV_CONTAINS function elastic#133099 which caused a compile-error and the merge was reverted. This commit addresses the compile-error.
nik9000 added a commit that referenced this pull request Aug 27, 2025
If you write very very large ESQL queries you can spend a lot of memory
on the expression evaluators themselves. You can certainly do it in real
life, but our tests do something like:
```
FROM foo
| EVAL a0001 = n + 1
| EVAL a0002 = a0001 + 1
| EVAL a0003 = a0002 + 1
...
| EVAL a5000 = a4999 + 1
| STATS MAX(a5000)
```

Each evaluator costs like 200 bytes a pop. For thousands of evaluators
this adds up. So! We have to track it.

Nhat had suggested charging a flat 200 bytes a pop. I thought about it
and decided that it'd be pretty easy to get the actual size. Most of the
evaluators are generated and it's a fairly small generated change to
pick that up. So I did.

We *do* build the evaluators before we cost them, but that's fine
because they are very very small. So long as we account for them, I
think it's safe.
nik9000 added a commit that referenced this pull request Aug 27, 2025
If you write very very large ESQL queries you can spend a lot of memory
on the expression evaluators themselves. You can certainly do it in real
life, but our tests do something like:
```
FROM foo
| EVAL a0001 = n + 1
| EVAL a0002 = a0001 + 1
| EVAL a0003 = a0002 + 1
...
| EVAL a5000 = a4999 + 1
| STATS MAX(a5000)
```

Each evaluator costs like 200 bytes a pop. For thousands of evaluators
this adds up. So! We have to track it.

Nhat had suggested charging a flat 200 bytes a pop. I thought about it
and decided that it'd be pretty easy to get the actual size. Most of the
evaluators are generated and it's a fairly small generated change to
pick that up. So I did.

We *do* build the evaluators before we cost them, but that's fine
because they are very very small. So long as we account for them, I
think it's safe.
sarog pushed a commit to portsbuild/elasticsearch that referenced this pull request Sep 11, 2025
If you write very very large ESQL queries you can spend a lot of memory
on the expression evaluators themselves. You can certainly do it in real
life, but our tests do something like:
```
FROM foo
| EVAL a0001 = n + 1
| EVAL a0002 = a0001 + 1
| EVAL a0003 = a0002 + 1
...
| EVAL a5000 = a4999 + 1
| STATS MAX(a5000)
```

Each evaluator costs like 200 bytes a pop. For thousands of evaluators
this adds up. So! We have to track it.

Nhat had suggested charging a flat 200 bytes a pop. I thought about it
and decided that it'd be pretty easy to get the actual size. Most of the
evaluators are generated and it's a fairly small generated change to
pick that up. So I did.

We *do* build the evaluators before we cost them, but that's fine
because they are very very small. So long as we account for them, I
think it's safe.
sarog pushed a commit to portsbuild/elasticsearch that referenced this pull request Sep 19, 2025
If you write very very large ESQL queries you can spend a lot of memory
on the expression evaluators themselves. You can certainly do it in real
life, but our tests do something like:
```
FROM foo
| EVAL a0001 = n + 1
| EVAL a0002 = a0001 + 1
| EVAL a0003 = a0002 + 1
...
| EVAL a5000 = a4999 + 1
| STATS MAX(a5000)
```

Each evaluator costs like 200 bytes a pop. For thousands of evaluators
this adds up. So! We have to track it.

Nhat had suggested charging a flat 200 bytes a pop. I thought about it
and decided that it'd be pretty easy to get the actual size. Most of the
evaluators are generated and it's a fairly small generated change to
pick that up. So I did.

We *do* build the evaluators before we cost them, but that's fine
because they are very very small. So long as we account for them, I
think it's safe.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.19.4 v9.1.0 v9.2.0

4 participants