Skip to content

ESQL: Aggressively free topn#140126

Merged
nik9000 merged 9 commits intoelastic:mainfrom
nik9000:esql_topn_free_faster
Jan 5, 2026
Merged

ESQL: Aggressively free topn#140126
nik9000 merged 9 commits intoelastic:mainfrom
nik9000:esql_topn_free_faster

Conversation

@nik9000
Copy link
Member

@nik9000 nik9000 commented Jan 2, 2026

Without this we free the entire TopNOperator and then pitch the reference to it. It'd be best if these two things were atomically locked together, but so long as everything we free is small-ish we can live with keeping the reference for a bit. We reserve some room to maneuver here. But if the thing we're freeing is quite large this ain't great. We think we have room to allocate but we don't.

With this we pitch the reference to each row as we go so they can be GCed immediately. This keeps the untracked-but-hard-referenced state to a single row at a time.

Without this we free the entire `TopNOperator` and then pitch the
reference to it. It'd be best if these two things were atomically locked
together, but so long as everything we free is small-ish we can live
with keeping the reference for a bit. We reserve some room to maneuver
here. But if the thing we're freeing is quite large this ain't great. We
think we have room to allocate but we don't.

With this we pitch the reference to each row as we go so they can be
GCed immediately. This keeps the untracked-but-hard-referenced state to
a single row at a time.
@nik9000 nik9000 added auto-backport Automatically create backport pull requests when merged :Analytics/ES|QL AKA ESQL v9.3.0 v9.4.0 labels Jan 2, 2026
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jan 2, 2026
Copy link
Contributor

@ivancea ivancea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Comment on lines +694 to +695
* the circuit breaker itself. With this we're still racing, but we're only racing a
* single row at a time. And single rows can only be so large. And we have enough slop
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're only racing a single row at a time

WDYM there? Would it be enough too swap the close() and "= null" statements here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the row is still referenced on the current thread. It's a shame, but I think it's somewhat inherent in having a close clear the breaker. If we swapped it so we had a ramBytesUsed() method on Row then we could do what you are saying. But that doesn't seem worth it at the moment because we'd have to rewrite a bunch of tracking bits.

})));
}
try {
OperatorTestCase.runDriver(drivers);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this fail if the breaker doesn't trip? I suppose an OOM should happen otherwise, but I would fail() just to be sure the test is right, and remains right (E.g. No optimizations breaking)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are tricky. If, say, this runs in a single thread and each Driver finishes before the others wake up, then it won't OOM.

I would try and take more "personal" control of the Drivers and run a couple of single-loop-iterations so I could be sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran it in a loop for many hours and it kept failing. So I'll update it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worst case, it fails in CI, and we revert it.
A similar test that expected a CB 100% was muted for aggs, one I added for reduce phase, and I'm considering having it lenient like this one was initially

@nik9000 nik9000 added the >bug label Jan 2, 2026
@elasticsearchmachine
Copy link
Collaborator

Hi @nik9000, I've created a changelog YAML for you.

@nik9000 nik9000 enabled auto-merge (squash) January 2, 2026 23:07
Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks Nik!

*/
() -> {
for (int i = 0; i < getHeapArray().length; i++) {
Row row = ((Row) getHeapArray()[i]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: redundant parentheses.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment on lines 71 to 72
int approxSizePerFilledRow = RamUsageEstimator.NUM_BYTES_OBJECT_REF * 5 + repeats * Integer.BYTES;
int topCount = (int) ((double) limit.getBytes() / approxSizePerFilledRow * usedByEachThread) - 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract 5 to a constant (assuming it's the "same" 5 as the one in parameters).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not. I'll push more comments.

blockFactory,
breaker,
topCount,
IntStream.range(0, repeats).mapToObj(i -> ElementType.INT).toList(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use Collections.nCopies instead (here and below).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's shorter!

import static org.hamcrest.Matchers.equalTo;
import static org.hamcrest.Matchers.greaterThan;

/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I get the setup of this test, I'm not sure what is it actually trying to test... what are we trying to verify here? That we properly break the circuits?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That we don't oom. I'll push a bigger comment explaining. But without the change that I made to the close code the test does OOM.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% of the time though. Maybe, 10%. Which is why I've got it running 20 times.

}

public void testFromHeapDump1() {
TopNOperator.Row row = new TopNOperator.Row(new NoopCircuitBreaker(CircuitBreaker.REQUEST), sortOrders(5), 56, 24);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The relationship between 5, 56, and 24 and 304L isn't obvious to me (same thing for the numbers in the method below). Are they just magic numbers, do they come from the same heap dump or the 304L, or is it something else?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They come from the heap dump. They are actual measured results. I'll leave a comment.

}

public void testFromHeapDump2() {
TopNOperator.Row row = new TopNOperator.Row(new NoopCircuitBreaker(CircuitBreaker.REQUEST), sortOrders(1), 1160, 1153096);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/1153096/1_153_096

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@nik9000 nik9000 requested a review from GalLalouche January 5, 2026 14:10
@nik9000
Copy link
Member Author

nik9000 commented Jan 5, 2026

@GalLalouche, I pushed a patch for your comments. Could you have another look?

@nik9000 nik9000 merged commit d6762fb into elastic:main Jan 5, 2026
35 checks passed
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Jan 5, 2026
Without this we free the entire `TopNOperator` and then pitch the
reference to it. It'd be best if these two things were atomically locked
together, but so long as everything we free is small-ish we can live
with keeping the reference for a bit. We reserve some room to maneuver
here. But if the thing we're freeing is quite large this ain't great. We
think we have room to allocate but we don't.

With this we pitch the reference to each row as we go so they can be
GCed immediately. This keeps the untracked-but-hard-referenced state to
a single row at a time.
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
9.3
elasticsearchmachine pushed a commit that referenced this pull request Jan 5, 2026
Without this we free the entire `TopNOperator` and then pitch the
reference to it. It'd be best if these two things were atomically locked
together, but so long as everything we free is small-ish we can live
with keeping the reference for a bit. We reserve some room to maneuver
here. But if the thing we're freeing is quite large this ain't great. We
think we have room to allocate but we don't.

With this we pitch the reference to each row as we go so they can be
GCed immediately. This keeps the untracked-but-hard-referenced state to
a single row at a time.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.3.0 v9.4.0

5 participants