Skip to content

Don't overwrite decision with NOT_PREFERRED unless its an improvement#141565

Merged
nicktindall merged 10 commits intoelastic:mainfrom
nicktindall:fix_bug_in_balanced_shards_allocator
Jan 30, 2026
Merged

Don't overwrite decision with NOT_PREFERRED unless its an improvement#141565
nicktindall merged 10 commits intoelastic:mainfrom
nicktindall:fix_bug_in_balanced_shards_allocator

Conversation

@nicktindall
Copy link
Contributor

When the not-preferred logging is turned on we re-run decideMove with debug mode on. If we encounter a NOT_PREFERRED after we've seen a YES in debug mode, we were over-writing the YES decision with NOT_PREFERRED. This change checks first that NOT_PREFERRED is a better decision than the current one before overwriting it.

@nicktindall nicktindall added >bug :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) labels Jan 30, 2026
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticsearchmachine elasticsearchmachine added the Team:Distributed Meta label for distributed team. label Jan 30, 2026
@elasticsearchmachine
Copy link
Collaborator

Hi @nicktindall, I've created a changelog YAML for you.

@nicktindall nicktindall changed the title Fix bug in BalancedShardsAllocator Jan 30, 2026
// Whether or not a relocation target node can be found, it's important to explain the canAllocate response as
// NOT_PREFERRED, as opposed to NO.
bestDecision = Type.NOT_PREFERRED;
if (allocationDecision.type().compareToBetweenNodes(bestDecision) > 0) {
Copy link
Contributor

@mhl-b mhl-b Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the whole parent if block become redundant.

if (allocationDecision.type() == Type.NOT_PREFERRED && remainDecision.type() == Type.NOT_PREFERRED) {

The NOT_PREFERRED else-if branch below should handle this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this make it right do you think?

d399aee

We still don't want to set targetNode if remainDecision == NOT_PREFERRED

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's easier to read and make sense to me: "dont move not-preferred to another not-preferred". Sounds straightforward.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated comment and logic. I think the way you stated it is clearer.

@nicktindall nicktindall requested a review from mhl-b January 30, 2026 04:55
Copy link
Contributor

@mhl-b mhl-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@ywangd ywangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +1090 to +1093
// We will accept a NOT_PREFERRED allocation if canRemain = NO, but if canRemain = NOT_PREFERRED
// we will wait for a YES/THROTTLE. Either way we update bestDecision so we can distinguish betweem
// a NO and a NOT_PREFERRED in allocate-explain
if (remainDecision.type() != Type.NOT_PREFERRED) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add an assertion like the follows in the beginning of the method

assert remainDecision.type() == Decision.Type.NO || remainDecision.type() == Decision.Type.NOT_PREFERRED : remainDecision;

?
It always took me a second to understand why we are not considering more types at this point.

@ywangd
Copy link
Member

ywangd commented Jan 30, 2026

I think we should backport to 9.3.1

@nicktindall nicktindall added auto-backport Automatically create backport pull requests when merged v9.3.1 labels Jan 30, 2026
@nicktindall nicktindall enabled auto-merge (squash) January 30, 2026 22:40
@nicktindall nicktindall merged commit fb0a4d5 into elastic:main Jan 30, 2026
35 checks passed
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
9.3
nicktindall added a commit to nicktindall/elasticsearch that referenced this pull request Jan 30, 2026
@nicktindall nicktindall deleted the fix_bug_in_balanced_shards_allocator branch January 30, 2026 23:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged >bug :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) Team:Distributed Meta label for distributed team. v9.3.1 v9.4.0

4 participants