Track shardStarted events for simulation in DesiredBalanceComputer#133630
Track shardStarted events for simulation in DesiredBalanceComputer#133630elasticsearchmachine merged 24 commits intoelastic:mainfrom
Conversation
|
Hi @ywangd, I've created a changelog YAML for you. |
|
I had some back-and-forth with the way to track shardStarted events. At the end, I decided to do it with mostly |
...main/java/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceComputer.java
Outdated
Show resolved
Hide resolved
...main/java/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceComputer.java
Outdated
Show resolved
Hide resolved
nicktindall
left a comment
There was a problem hiding this comment.
I don't mind the approach but I think it adds some complexity and state to an already quite complex/stateful bit of code
...main/java/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceComputer.java
Outdated
Show resolved
Hide resolved
I think it will have to add some complexity. But if we track the real shard started events, the complexity might be a bit less in |
henningandersen
left a comment
There was a problem hiding this comment.
Left a few initial comments, did not get into the weeds of the started simulations yet
| return currentClusterInfo; | ||
| } | ||
|
|
||
| private void updateAndGetCurrentClusterInfo() { |
There was a problem hiding this comment.
The method name here hints that it should return the cluster info? That would seem nice to do, but I'd also be fine to just call it updateClusterInfo
There was a problem hiding this comment.
Yeah it was an oversight and was meant to return the value. Fixed in 001af0d
...main/java/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceComputer.java
Outdated
Show resolved
Hide resolved
...main/java/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceComputer.java
Outdated
Show resolved
Hide resolved
|
I got some new idea after talking to Henning. I'll rework this PR. Please hold on your reviews. Thanks! 😅 |
|
As discussed previously, I pushed 85d0089 to implement simulation for started shards by diffing between the current |
...main/java/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceComputer.java
Show resolved
Hide resolved
|
This PR is now ready for review. Thanks! 🙏 |
henningandersen
left a comment
There was a problem hiding this comment.
LGTM.
Only skimmed through the tests, looks like adequate coverage but may be good to have a second review as well.
| return currentClusterInfo; | ||
| } | ||
|
|
||
| private ClusterInfo updateAndGetCurrentClusterInfo() { |
There was a problem hiding this comment.
As discussed in chat, the point of separating get and update is to remove the risk of someone calling get when a refresh is half-finished and receiving a ClusterInfo that was a mix of data from the current and previous refresh.
I wonder in light of this change whether we should make
leastAvailableSpaceUsages / mostAvailableSpaceUsages / nodeThreadPoolUsageStatsPerNode etc. all fields on the AsyncRefresh, so that state is kept private until the refresh was completed? Feels a bit safer than leaving it as fields on the enclosing class where it can be accessed whenever.
If not that we should at least put some doc here to indicate that using any of those fields directly isn't safe and the reason for the update/get split.
There was a problem hiding this comment.
It would mean rather than updateAndGet... living at the top level and accessing fields, it might need to change to being updateClusterInfo and accepting one that the AsyncRefresh had created from its private state.
There was a problem hiding this comment.
I'd have a separate PR for the refactoring. For now I added comments in 95ba02d
| // We use dataPath to find out whether a shard is allocated on a node. | ||
| // TODO: DataPath is sent with disk usages but thread pool usage is sent separately so that local shard allocation | ||
| // may change between the two calls. | ||
| if (clusterInfo.getDataPath(shardRouting) == null) { |
There was a problem hiding this comment.
NIt: I wonder if we could put this logic in the ClusterInfo and have the computer be more agnostic about how this is implemented.
i.e. if we implemented boolean ClusterInfo#shardHasMoved(ShardRouting) and then put this logic and an explanation of it in there. It might reduce the cognitive load for the reader of this method, and allow us to adapt the implementation of #shardHasMoved as the contents of ClusterInfo evolves.
| final var firstNodeUpdatedStats = updatedClusterInfo.getNodeUsageStatsForThreadPools() | ||
| .get(firstNode.getId()) | ||
| .threadPoolUsageStatsMap() | ||
| .get("write"); |
There was a problem hiding this comment.
Nit: can use ThreadPool.Names.WRITE here? (and above/below?)
nicktindall
left a comment
There was a problem hiding this comment.
LGTM, just some proposals around some restructuring, feel free to ignore.
…lastic#133630) If a shard starts on the target node before the next ClusterInfo polling, today we don't include it for the simulation. With this PR, we track shards that can potentially start within one ClusterInfo polling cycle so that they are always included in simulation. The tracking is reset when a new ClusterInfo arrives. Resolves: ES-12723
…lastic#133630) If a shard starts on the target node before the next ClusterInfo polling, today we don't include it for the simulation. With this PR, we track shards that can potentially start within one ClusterInfo polling cycle so that they are always included in simulation. The tracking is reset when a new ClusterInfo arrives. Resolves: ES-12723
Only the overall ClusterInfo is needed at the top level. This PR moves the individual intermediate stats fields onto AsyncRefresh to avoid potential misuses. Relates: elastic#133630 (comment)
Only the overall ClusterInfo is needed at the top level. This PR moves the individual intermediate stats fields onto AsyncRefresh to avoid potential misuses. Relates: #133630 (comment)
…puter (elastic#133630)" This reverts commit f248596.
…puter (elastic#133630)" This reverts commit f248596.
* Revert "Move individual stats fields to AsyncRefresh (#135052)" This reverts commit 2b0153b. * Revert "Track shardStarted events for simulation in DesiredBalanceComputer (#133630)" This reverts commit f248596. * [CI] Update transport version definitions * Revert "[CI] Update transport version definitions" This reverts commit 90f38b0. * Don't reset upper bounds (#135226) Transport version upper bounds are currently set to their values from upstream main whenever no new definition is detected. However, this is like a partial merge of upstream, and produces broken state files. This commit temporarily comments out resetting until a more robust solution is built. * Revert "Don't reset upper bounds (#135226)" This reverts commit ddbac68. --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co> Co-authored-by: Ryan Ernst <ryan@iernst.net>
This reverts commit e76d333.
…mputer" (#135597) This PR reapplies both #133630 and #135052 with a performance bug fix. The original PR #133630 had a severe impact on throughput for index creation. It was reverted with #135369. The flamegraph suggests the system spent a lot time to compute shard assignments on ClusterInfo instantiation time. But that is unnecessary since they do not change within a single polling interval. This PR fixes it by reuse the last value and avoid recomputation. Copying the original commit message here If a shard starts on the target node before the next ClusterInfo polling, today we don't include it for the simulation. With this PR, we track shards that can potentially start within one ClusterInfo polling cycle so that they are always included in simulation. The tracking is reset when a new ClusterInfo arrives. Resolves: ES-12723
If a shard starts on the target node before the next ClusterInfo polling, today we don't include it for the simulation. With this PR, we track shards that can potentially start within one ClusterInfo polling cycle so that they are always included in simulation. The tracking is reset when a new ClusterInfo arrives.
Resolves: ES-12723