Skip to content

[ML] Low Priority models should not be updateable #111227

@davidkyle

Description

@davidkyle

Elasticsearch Version

8.14

Installed Plugins

No response

Java Version

bundled

OS Version

any

Problem Description

ML trained models can be deployed in a low priority mode which should be limited to 1 allocation and 1 thread per allocation. When checking there is sufficient CPU resource to deploy a model the assignment planner allows multiple low priority deployments to share a single CPU allowing low priority deployments be over allocated.

The problem is that the number of allocations can be updated to a much higher number and the because the assignment planner treats low priority deployments differently it will not consider those extra allocations when calculating the available resource. Multiple low priority deployments can be created then updated to use far more CPU than is available in the cluster. In cloud low priority deployments do not trigger a scale event so the cluster will not grow in size and can become extremely over allocated.

Screenshot 2024-07-24 at 10 11 26

Steps to Reproduce

  1. Start a deployment in low priority mode
  2. Update the number of allocations to a high number
Screenshot 2024-07-24 at 10 12 51
  1. Optionally start more low priority deployments and increase the number of allocations. In the screen shot below the total required CPU cores for all deployments far exceeds what is available on the system.
Screenshot 2024-07-24 at 10 23 33

Logs (if relevant)

No response

Metadata

Metadata

Assignees

Labels

:mlMachine learning>bugTeam:MLMeta label for the ML team

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions