ESQL: Reserve memory TopN by nik9000 · Pull Request #134235 · elastic/elasticsearch

nik9000 · 2025-09-05T17:13:49Z

Tracks the more memory that's involved in topn.

Lucene TopN

Lucene doesn't track memory usage for TopN and can use a fair bit of it.
Try this query:

FROM big_table
| SORT a, b, c, d, e
| LIMIT 1000000
| STATS MAX(a)

We attempt to return all million documents from lucene. Is we did this
with the compute engine we're track all of the memory usage. With lucene
we have to reserve it.

In the case of the query above the sort keys weight 8 bytes each. 40
bytes total. Plus another 72 for Lucene's FieldDoc. And another 40 at
least for copying to the values to FieldDoc. That totals something
like 152 bytes a piece. That's 145mb. Worth tracking!

Esql Engine TopN

Esql does track memory for topn, but it doesn't track the memory used by the min heap itself. It's just a big array of pointers. But it can get very big!

Lucene doesn't track memory usage for TopN and can use a fair bit of it. Try this query: ``` FROM big_table | SORT a, b, c, d, e | LIMIT 1000000 | STATS MAX(a) ``` We attempt to return all million documents from lucene. Is we did this with the compute engine we're track all of the memory usage. With lucene we have to reserve it. In the case of the query above the sort keys weight 8 bytes each. 40 bytes total. Plus another 72 for Lucene's `FieldDoc`. And another 40 at least for copying to the values to `FieldDoc`. That totals something like 152 bytes a piece. That's 145mb. Worth tracking!

elasticsearchmachine · 2025-09-05T17:14:33Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

elasticsearchmachine · 2025-09-05T17:14:33Z

Hi @nik9000, I've created a changelog YAML for you.

…topn' into esql_reserve_memory_for_lucene_topn

dnhatn

LGTM, thanks Nik!

dnhatn · 2025-09-05T22:16:37Z

.../plugin/esql/compute/src/main/java/org/elasticsearch/compute/operator/topn/TopNOperator.java

        this.encoders = encoders;
        this.sortOrders = sortOrders;
-        this.inputQueue = new Queue(topCount);
+        breaker.addEstimateBytesAndMaybeBreak(Queue.sizeOf(topCount), "esql engine topn");


nit: maybe move this memory acquire to Queue ctor for consistency with Queue#close.

I'll leave a comment about why I'm not doing that! The trouble is that we allocate the memory in the super. I guess I can make the ctor private and call a static. that's good. I'll do that.

Tracks the more memory that's involved in topn. Lucene doesn't track memory usage for TopN and can use a fair bit of it. Try this query: ``` FROM big_table | SORT a, b, c, d, e | LIMIT 1000000 | STATS MAX(a) ``` We attempt to return all million documents from lucene. Is we did this with the compute engine we're track all of the memory usage. With lucene we have to reserve it. In the case of the query above the sort keys weight 8 bytes each. 40 bytes total. Plus another 72 for Lucene's `FieldDoc`. And another 40 at least for copying to the values to `FieldDoc`. That totals something like 152 bytes a piece. That's 145mb. Worth tracking! ## Esql Engine TopN Esql *does* track memory for topn, but it doesn't track the memory used by the min heap itself. It's just a big array of pointers. But it can get very big!

nik9000 · 2025-09-08T19:07:24Z

9.1: #134321
8.19: #134331
8.18: #134335

I committed it in elastic#134235 by accident. We were going to use it as part of that but decided against it.

I committed it in #134235 by accident. We were going to use it as part of that but decided against it.

Tracks the more memory that's involved in topn. Lucene doesn't track memory usage for TopN and can use a fair bit of it. Try this query: ``` FROM big_table | SORT a, b, c, d, e | LIMIT 1000000 | STATS MAX(a) ``` We attempt to return all million documents from lucene. Is we did this with the compute engine we're track all of the memory usage. With lucene we have to reserve it. In the case of the query above the sort keys weight 8 bytes each. 40 bytes total. Plus another 72 for Lucene's `FieldDoc`. And another 40 at least for copying to the values to `FieldDoc`. That totals something like 152 bytes a piece. That's 145mb. Worth tracking! ## Esql Engine TopN Esql *does* track memory for topn, but it doesn't track the memory used by the min heap itself. It's just a big array of pointers. But it can get very big!

I committed it in elastic#134235 by accident. We were going to use it as part of that but decided against it.

* ESQL: Reserve memory TopN (#134235) Tracks the more memory that's involved in topn. Lucene doesn't track memory usage for TopN and can use a fair bit of it. Try this query: ``` FROM big_table | SORT a, b, c, d, e | LIMIT 1000000 | STATS MAX(a) ``` We attempt to return all million documents from lucene. Is we did this with the compute engine we're track all of the memory usage. With lucene we have to reserve it. In the case of the query above the sort keys weight 8 bytes each. 40 bytes total. Plus another 72 for Lucene's `FieldDoc`. And another 40 at least for copying to the values to `FieldDoc`. That totals something like 152 bytes a piece. That's 145mb. Worth tracking! ## Esql Engine TopN Esql *does* track memory for topn, but it doesn't track the memory used by the min heap itself. It's just a big array of pointers. But it can get very big! * fix backport

Tracks the more memory that's involved in topn. Lucene doesn't track memory usage for TopN and can use a fair bit of it. Try this query: ``` FROM big_table | SORT a, b, c, d, e | LIMIT 1000000 | STATS MAX(a) ``` We attempt to return all million documents from lucene. Is we did this with the compute engine we're track all of the memory usage. With lucene we have to reserve it. In the case of the query above the sort keys weight 8 bytes each. 40 bytes total. Plus another 72 for Lucene's `FieldDoc`. And another 40 at least for copying to the values to `FieldDoc`. That totals something like 152 bytes a piece. That's 145mb. Worth tracking! ## Esql Engine TopN Esql *does* track memory for topn, but it doesn't track the memory used by the min heap itself. It's just a big array of pointers. But it can get very big!

nik9000 added 2 commits September 5, 2025 13:12

Merge branch 'main' into esql_reserve_memory_for_lucene_topn

fd03810

nik9000 added >bug :Analytics/ES|QL AKA ESQL v9.2.0 labels Sep 5, 2025

nik9000 requested review from benwtrent and dnhatn September 5, 2025 17:14

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Sep 5, 2025

Update docs/changelog/134235.yaml

bc5de0a

elasticsearchmachine and others added 5 commits September 5, 2025 17:21

[CI] Auto commit changes from spotless

673bdcf

Format

64fac59

Merge branch 'main' into esql_reserve_memory_for_lucene_topn

15a08aa

Merge remote-tracking branch 'nik9000/esql_reserve_memory_for_lucene_…

843fb25

…topn' into esql_reserve_memory_for_lucene_topn

Switch to row size estimates - worse, but easier

1920982

nik9000 added v9.1.4 v8.19.4 labels Sep 5, 2025

More

2d39c12

nik9000 changed the title ~~ESQL: Reserve memory for Lucene's TopN~~ Sep 5, 2025

dnhatn approved these changes Sep 5, 2025

View reviewed changes

nik9000 added the v8.18.7 label Sep 5, 2025

Move, explain, rename

0409da1

nik9000 enabled auto-merge (squash) September 5, 2025 22:52

Merge branch 'main' into esql_reserve_memory_for_lucene_topn

53aada5

nik9000 merged commit e9c145b into elastic:main Sep 8, 2025
33 checks passed

nik9000 added the backport pending label Sep 8, 2025

nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Sep 8, 2025

ESQL: Remove some unused code

92dd38c

I committed it in elastic#134235 by accident. We were going to use it as part of that but decided against it.

nik9000 mentioned this pull request Sep 8, 2025

ESQL: Remove some unused code #134322

Merged

elasticsearchmachine pushed a commit that referenced this pull request Sep 8, 2025

ESQL: Remove some unused code (#134322)

3ae1dea

I committed it in #134235 by accident. We were going to use it as part of that but decided against it.

rjernst pushed a commit to rjernst/elasticsearch that referenced this pull request Sep 9, 2025

ESQL: Remove some unused code (elastic#134322)

6d94889

I committed it in elastic#134235 by accident. We were going to use it as part of that but decided against it.

nik9000 removed the backport pending label Oct 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ESQL: Reserve memory TopN#134235

ESQL: Reserve memory TopN#134235
nik9000 merged 11 commits intoelastic:mainfrom
nik9000:esql_reserve_memory_for_lucene_topn

nik9000 commented Sep 5, 2025 •

edited

Loading

elasticsearchmachine commented Sep 5, 2025

elasticsearchmachine commented Sep 5, 2025

dnhatn left a comment

dnhatn Sep 5, 2025

nik9000 Sep 5, 2025

Uh oh!

nik9000 commented Sep 8, 2025 •

edited

Loading

Labels

3 participants

Conversation

nik9000 commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Lucene TopN

Esql Engine TopN

elasticsearchmachine commented Sep 5, 2025

elasticsearchmachine commented Sep 5, 2025

dnhatn left a comment

Choose a reason for hiding this comment

dnhatn Sep 5, 2025

Choose a reason for hiding this comment

nik9000 Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

nik9000 commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Labels

3 participants

nik9000 commented Sep 5, 2025 •

edited

Loading

nik9000 commented Sep 8, 2025 •

edited

Loading