Do not recommend increasing max_shards_per_node#120458
Do not recommend increasing max_shards_per_node#120458DaveCTurner merged 3 commits intoelastic:mainfrom
max_shards_per_node#120458Conversation
Today if the `shards_capacity` health indicator detects a problem then it recommends increasing the limit, which goes against the advice in the manual about not increasing these limits and also makes it rather pointless having a limit in the first place. This commit improves the recommendation to suggest either adding nodes or else reducing the shard count.
|
Pinging @elastic/es-data-management (Team:Data Management) |
|
Hi @DaveCTurner, I've created a changelog YAML for you. |
| static final Diagnosis SHARDS_MAX_CAPACITY_REACHED_DATA_NODES = SHARD_MAX_CAPACITY_REACHED_FN.apply( | ||
| "increase_max_shards_per_node", | ||
| "decrease_shards_per_non_frozen_node", | ||
| ShardLimitValidator.SETTING_CLUSTER_MAX_SHARDS_PER_NODE, | ||
| "data" | ||
| "non-frozen" | ||
| ); | ||
| static final Diagnosis SHARDS_MAX_CAPACITY_REACHED_FROZEN_NODES = SHARD_MAX_CAPACITY_REACHED_FN.apply( | ||
| "increase_max_shards_per_node_frozen", | ||
| "decrease_shards_per_frozen_node", | ||
| ShardLimitValidator.SETTING_CLUSTER_MAX_SHARDS_PER_NODE_FROZEN, |
There was a problem hiding this comment.
The bad "increase the limit" advice was baked into the actual diagnosis IDs - fixed here, and see also https://github.com/elastic/telemetry/pull/4362 for the corresponding change to the telemetry cluster
|
Hey @DaveCTurner , you are bringing up a very good point here. I do have a concern though. If I am not mistaken the current limit is quite low, so it is probable that it would make sense to first increase the limit before expanding the cluster or reducing the shards. So, I am thinking of 2 options to make this more useful to users:
Does this make sense? |
|
The default of 1000 shards per node is still rather relaxed IMO, at least for high-segment-count or high-field-count indices, and we do want users to stick to it for now. We do get support cases involving egregiously high shard-per-node counts sometimes, and we need to be able to point at the guidance in the manual when telling users to scale up their clusters. It rather weakens that argument when the health API told them specifically to keep on relaxing the limit each time they got close. A better limit would be nice ofc, maybe one based on #111123, but that won't be a quick process and I don't think we can in good conscience block this change on that work. |
gmarouli
left a comment
There was a problem hiding this comment.
LGTM! Thanks for raising this and addressing this @DaveCTurner
|
Thanks @gmarouli |
💔 Backport failed
You can use sqren/backport to manually backport by running |
Today if the `shards_capacity` health indicator detects a problem then it recommends increasing the limit, which goes against the advice in the manual about not increasing these limits and also makes it rather pointless having a limit in the first place. This commit improves the recommendation to suggest either adding nodes or else reducing the shard count.
|
Backported to 8.x in de5be24 |
Today if the
shards_capacityhealth indicator detects a problem thenit recommends increasing the limit, which goes against the advice in the
manual about not increasing these limits and also makes it rather
pointless having a limit in the first place.
This commit improves the recommendation to suggest either adding nodes
or else reducing the shard count.