Avoid stack overflow in IndicesClusterStateService applyClusterState by albertzaharovits · Pull Request #132536 · elastic/elasticsearch

albertzaharovits · 2025-08-07T13:11:40Z

Every cluster state applied in the IndicesClusterStateService has the potential to chain a new RefCountingListener to a chain of such listeners. If the chain is too long, the unlucky thread that decreases the ref count to 0 for the head of the listeners chain, ends up calling each listener in turn, and, assuming all ref counts are hence decreased to 0, traversing the whole chain on its thread stack, possibly resulting in a Stackoverflow exception.

This fix chains max 8 RefCountingListener, the 11th one is forked on a generic thread when it gets to execution.

elasticsearchmachine · 2025-08-07T13:12:06Z

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

elasticsearchmachine · 2025-08-07T13:12:06Z

Hi @albertzaharovits, I've created a changelog YAML for you.

albertzaharovits · 2025-08-07T13:12:54Z

Honestly, I think I prefer that every chained listener be executed on a generic thread, for code simplicity's sake.

DaveCTurner

I'd rather we didn't extend the chain in the (overwhelmingly common) case where the cluster state update doesn't close any more shards.

Also can you cover this in a test?

DaveCTurner · 2025-08-07T14:36:04Z

server/src/main/java/org/elasticsearch/indices/cluster/IndicesClusterStateService.java

        lastClusterStateShardsClosedListener = new SubscribableListener<>();
        currentClusterStateShardsClosedListeners = new RefCountingListener(lastClusterStateShardsClosedListener);
        try {
-            previousShardsClosedListener.addListener(currentClusterStateShardsClosedListeners.acquire());


Hmm are you sure we should move all this listener stuff below doApplyClusterState()?

I can't think of any impact to execution.

But I've put it back at the original place.

albertzaharovits · 2025-08-08T09:58:57Z

I'd rather we didn't extend the chain in the (overwhelmingly common) case where the cluster state update doesn't close any more shards.

Pushed 3a00599

albertzaharovits · 2025-08-11T14:42:13Z

@DaveCTurner can you take another look please?

I've changed the code to avoid linking listeners when the applied cluster state doesn't close any shards.
I've also added a test that asserts that all the runnables before the oldest shard close listener that's not complete are run, while the others are not.

DaveCTurner

LGTM

fcofdez · 2025-08-26T20:23:19Z

@elasticmachine update branch

fcofdez · 2025-08-26T22:11:52Z

@elasticmachine test this

fcofdez · 2025-08-27T05:43:11Z

@elasticmachine update branch

…lastic#132536) Every cluster state applied in the IndicesClusterStateService has the potential to chain a new RefCountingListener to a chain of such listeners. If the chain is too long, the unlucky thread that decreases the ref count to 0 for the head of the listeners chain, ends up calling each listener in turn, and, assuming all ref counts are hence decreased to 0, traversing the whole chain on its thread stack, possibly resulting in a Stackoverflow exception. This fix chains max 8 RefCountingListener, the 11th one is forked on a generic thread when it gets to execution.

…rState (#139499) * Avoid stack overflow in IndicesClusterStateService applyClusterState (#132536) Every cluster state applied in the IndicesClusterStateService has the potential to chain a new RefCountingListener to a chain of such listeners. If the chain is too long, the unlucky thread that decreases the ref count to 0 for the head of the listeners chain, ends up calling each listener in turn, and, assuming all ref counts are hence decreased to 0, traversing the whole chain on its thread stack, possibly resulting in a Stackoverflow exception. This fix chains max 8 RefCountingListener, the 11th one is forked on a generic thread when it gets to execution. * MockTransportService.createNewService

…State (#139498) * Avoid stack overflow in IndicesClusterStateService applyClusterState (#132536) Every cluster state applied in the IndicesClusterStateService has the potential to chain a new RefCountingListener to a chain of such listeners. If the chain is too long, the unlucky thread that decreases the ref count to 0 for the head of the listeners chain, ends up calling each listener in turn, and, assuming all ref counts are hence decreased to 0, traversing the whole chain on its thread stack, possibly resulting in a Stackoverflow exception. This fix chains max 8 RefCountingListener, the 11th one is forked on a generic thread when it gets to execution. * MockTransportService.createNewService

sometimes fork the thread

054f5ee

albertzaharovits requested a review from DaveCTurner August 7, 2025 13:11

albertzaharovits self-assigned this Aug 7, 2025

albertzaharovits added >bug :Distributed/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v9.2.0 v8.19.2 v9.1.2 labels Aug 7, 2025

elasticsearchmachine added the Team:Distributed Coordination (obsolete) Meta label for Distributed Coordination team. Obsolete. Please do not use. label Aug 7, 2025

Update docs/changelog/132536.yaml

73e3304

DaveCTurner reviewed Aug 7, 2025

View reviewed changes

albertzaharovits added 2 commits August 8, 2025 10:24

Merge branch 'main' into fix-3855

fa49097

Avoid chaining when no shard has been closed

3a00599

albertzaharovits added 8 commits August 8, 2025 12:59

Merge branch 'main' into fix-3855

e18cbd5

ooops

e07e36d

Test skeleton

c6f1d0b

Test WIP

447e140

WIP no threadpool

40bfdf7

Merge branch 'main' into fix-3855

027380e

test done

13eecf9

nit

b6d6742

albertzaharovits force-pushed the fix-3855 branch from f4c5977 to b6d6742 Compare August 11, 2025 14:35

albertzaharovits requested a review from DaveCTurner August 11, 2025 14:36

[CI] Auto commit changes from spotless

fa44494

elasticsearchmachine removed the v8.19.2 label Aug 11, 2025

elasticsearchmachine added v8.19.3 v9.1.3 and removed v9.1.2 labels Aug 11, 2025

Merge branch 'main' into fix-3855

2d5d9ca

DaveCTurner approved these changes Aug 18, 2025

View reviewed changes

elasticsearchmachine added v9.1.4 v8.19.4 and removed v9.1.3 v8.19.3 labels Aug 21, 2025

Merge branch 'main' into fix-3855

5f8bd19

Merge branch 'main' into fix-3855

8ae0256

fcofdez merged commit eb75ba3 into elastic:main Aug 27, 2025
33 checks passed

DaveCTurner removed v9.1.4 v8.19.4 labels Dec 12, 2025

This was referenced Dec 14, 2025

[9.1] Avoid stack overflow in IndicesClusterStateService applyClusterState #139498

Merged

[8.19] Avoid stack overflow in IndicesClusterStateService applyClusterState #139499

Merged

DaveCTurner added v9.1.10 v8.19.10 labels Dec 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid stack overflow in IndicesClusterStateService applyClusterState#132536

Avoid stack overflow in IndicesClusterStateService applyClusterState#132536
fcofdez merged 16 commits intoelastic:mainfrom
albertzaharovits:fix-3855

albertzaharovits commented Aug 7, 2025 •

edited

Loading

elasticsearchmachine commented Aug 7, 2025

elasticsearchmachine commented Aug 7, 2025

albertzaharovits commented Aug 7, 2025

DaveCTurner left a comment

DaveCTurner Aug 7, 2025

albertzaharovits Aug 8, 2025

albertzaharovits Aug 8, 2025

albertzaharovits commented Aug 8, 2025

albertzaharovits commented Aug 11, 2025

DaveCTurner left a comment

fcofdez commented Aug 26, 2025

fcofdez commented Aug 26, 2025

fcofdez commented Aug 27, 2025

Uh oh!

Labels

5 participants

Conversation

albertzaharovits commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

elasticsearchmachine commented Aug 7, 2025

elasticsearchmachine commented Aug 7, 2025

albertzaharovits commented Aug 7, 2025

DaveCTurner left a comment

Choose a reason for hiding this comment

DaveCTurner Aug 7, 2025

Choose a reason for hiding this comment

albertzaharovits Aug 8, 2025

Choose a reason for hiding this comment

albertzaharovits Aug 8, 2025

Choose a reason for hiding this comment

albertzaharovits commented Aug 8, 2025

albertzaharovits commented Aug 11, 2025

DaveCTurner left a comment

Choose a reason for hiding this comment

fcofdez commented Aug 26, 2025

fcofdez commented Aug 26, 2025

fcofdez commented Aug 27, 2025

Uh oh!

Labels

5 participants

albertzaharovits commented Aug 7, 2025 •

edited

Loading