Skip to content

[8.18] [ML] Fix double-counting of inference memory in the assignment rebalancer (#133919)#134053

Merged
elasticsearchmachine merged 1 commit intoelastic:8.18from
valeriy42:backport/8.18/pr-133919
Sep 3, 2025
Merged

[8.18] [ML] Fix double-counting of inference memory in the assignment rebalancer (#133919)#134053
elasticsearchmachine merged 1 commit intoelastic:8.18from
valeriy42:backport/8.18/pr-133919

Conversation

@valeriy42
Copy link
Contributor

Backports the following commits to 8.18:

…ncer (elastic#133919)

The static method TrainedModelAssignmentRebalancer.getNodeFreeMemoryExcludingPerNodeOverheadAndNativeInference was used to subtract load.getAssignedNativeInferenceMemory() from load.getFreeMemoryExcludingPerNodeOverhead(). However, in NodeLoad.getFreeMemoryExcludingPerNodeOverhead(), native inference memory was already subtracted as part of the getAssignedJobMemoryExcludingPerNodeOverhead() calculation.

This led to double-counting of the native inference memory. Avoiding this double-counting allows us to remove the private method getNodeFreeMemoryExcludingPerNodeOverheadAndNativeInference() entirely.
@valeriy42 valeriy42 added :ml Machine learning >bug auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport Team:ML Meta label for the ML team labels Sep 3, 2025
@elasticsearchmachine elasticsearchmachine merged commit faed991 into elastic:8.18 Sep 3, 2025
16 checks passed
@valeriy42 valeriy42 deleted the backport/8.18/pr-133919 branch September 3, 2025 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport >bug :ml Machine learning Team:ML Meta label for the ML team v8.18.7

2 participants