Skip to content

Implement canRemain for the IndexBalanceAllocationDecider#141494

Merged
nicktindall merged 12 commits intoelastic:mainfrom
nicktindall:implement_canRemain_index_balance_decider
Jan 31, 2026
Merged

Implement canRemain for the IndexBalanceAllocationDecider#141494
nicktindall merged 12 commits intoelastic:mainfrom
nicktindall:implement_canRemain_index_balance_decider

Conversation

@nicktindall
Copy link
Contributor

@nicktindall nicktindall commented Jan 29, 2026

When the size of the cluster increases, the ideal spread of shards may change, meaning it might improve index balance if we move some shards to the new nodes.

This change adds canRemain support to the IndexBalanceAllocationDecider, so when a reroute is performed in response to a new node joining the cluster, the allocator will make any necessary moves to maintain index balance.

There is an integration test in a linked serverless PR

Relates: ES-13566

@nicktindall nicktindall added >enhancement :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) labels Jan 29, 2026
@elasticsearchmachine elasticsearchmachine added v9.4.0 Team:Distributed Meta label for distributed team. labels Jan 29, 2026
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticsearchmachine elasticsearchmachine added the serverless-linked Added by automation, don't add manually label Jan 29, 2026
bestDecision = Type.NOT_PREFERRED;
if (allocationDecision.type().compareToBetweenDecisions(bestDecision) > 0) {
bestDecision = Type.NOT_PREFERRED;
}
Copy link
Contributor Author

@nicktindall nicktindall Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to break when we run decideMove with debug turned on. It will find a YES and then it'll find a NOT_PREFERRED and over-write the bestDecision=YES with NOT_PREFERRED.

This trips an assertion later that checks that canAllocate=NOT_PREFERRED and canRemain=NOT_PREFERRED comes with no targetNode

Happy to break this out as a separate bugfix PR? it doesn't seem to be biting us at the moment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah let's break it into a separate bug-fix PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #141565

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure the merge conflict is resolved here.

Copy link
Member

@ywangd ywangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some minor comments on mostly the existing code.

bestDecision = Type.NOT_PREFERRED;
if (allocationDecision.type().compareToBetweenDecisions(bestDecision) > 0) {
bestDecision = Type.NOT_PREFERRED;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah let's break it into a separate bug-fix PR.

Copy link
Member

@ywangd ywangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

bestDecision = Type.NOT_PREFERRED;
if (allocationDecision.type().compareToBetweenDecisions(bestDecision) > 0) {
bestDecision = Type.NOT_PREFERRED;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure the merge conflict is resolved here.

# Conflicts:
#	server/src/main/java/org/elasticsearch/cluster/routing/allocation/allocator/BalancedShardsAllocator.java
…e_decider' into implement_canRemain_index_balance_decider
@nicktindall nicktindall enabled auto-merge (squash) January 30, 2026 23:35
@nicktindall nicktindall merged commit a8e0467 into elastic:main Jan 31, 2026
35 checks passed
@nicktindall nicktindall deleted the implement_canRemain_index_balance_decider branch February 4, 2026 02:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >enhancement serverless-linked Added by automation, don't add manually Team:Distributed Meta label for distributed team. v9.4.0

3 participants