Skip to content

ESQL: Reserve memory TopN (#134235)#134335

Merged
elasticsearchmachine merged 5 commits intoelastic:8.18from
nik9000:esql_reserve_memory_for_lucene_topn_8_18
Sep 9, 2025
Merged

ESQL: Reserve memory TopN (#134235)#134335
elasticsearchmachine merged 5 commits intoelastic:8.18from
nik9000:esql_reserve_memory_for_lucene_topn_8_18

Conversation

@nik9000
Copy link
Member

@nik9000 nik9000 commented Sep 8, 2025

Tracks the more memory that's involved in topn.

Lucene doesn't track memory usage for TopN and can use a fair bit of it. Try this query:

FROM big_table
| SORT a, b, c, d, e
| LIMIT 1000000
| STATS MAX(a)

We attempt to return all million documents from lucene. Is we did this with the compute engine we're track all of the memory usage. With lucene we have to reserve it.

In the case of the query above the sort keys weight 8 bytes each. 40 bytes total. Plus another 72 for Lucene's FieldDoc. And another 40 at least for copying to the values to FieldDoc. That totals something like 152 bytes a piece. That's 145mb. Worth tracking!

Esql Engine TopN

Esql does track memory for topn, but it doesn't track the memory used by the min heap itself. It's just a big array of pointers. But it can get very big!

  • Have you signed the contributor license agreement?
  • Have you followed the contributor guidelines?
  • If submitting code, have you built your formula locally prior to submission with gradle check?
  • If submitting code, is your pull request against main? Unless there is a good reason otherwise, we prefer pull requests against main and will backport as needed.
  • If submitting code, have you checked that your submission is for an OS and architecture that we support?
  • If you are submitting this code for a class then read our policy for that.
Tracks the more memory that's involved in topn.

Lucene doesn't track memory usage for TopN and can use a fair bit of it.
Try this query:
```
FROM big_table
| SORT a, b, c, d, e
| LIMIT 1000000
| STATS MAX(a)
```

We attempt to return all million documents from lucene. Is we did this
with the compute engine we're track all of the memory usage. With lucene
we have to reserve it.

In the case of the query above the sort keys weight 8 bytes each. 40
bytes total. Plus another 72 for Lucene's `FieldDoc`. And another 40 at
least for copying to the values to `FieldDoc`. That totals something
like 152 bytes a piece. That's 145mb. Worth tracking!

 ## Esql Engine TopN

Esql *does* track memory for topn, but it doesn't track the memory used by the min heap itself. It's just a big array of pointers. But it can get very big!
@nik9000 nik9000 added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Sep 9, 2025
@elasticsearchmachine elasticsearchmachine merged commit 783ebc0 into elastic:8.18 Sep 9, 2025
17 checks passed
@nik9000 nik9000 deleted the esql_reserve_memory_for_lucene_topn_8_18 branch September 9, 2025 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport >bug v8.18.7

2 participants