Add Minimal Service Settings to the Model Registry#120560
Add Minimal Service Settings to the Model Registry#120560jimczi merged 8 commits intoelastic:mainfrom
Conversation
This commit introduces minimal service settings in the model registry, accessible without querying the inference index. These settings are now available for the default models exposed by the inference service. The ability to access settings without an inference index query is needed for the semantic text field, as it would benefit from eager validation of configuration during field creation. This is not feasible currently because retrieving service settings relies on an asynchronous call to the inference index. ### Follow-Up Plans: 1. Extend this capability to include minimal service settings for all newly added models, making them accessible via the cluster state. 2. Update the semantic text field to eagerly retrieve service settings directly from the model registry.
|
Pinging @elastic/ml-core (Team:ML) |
davidkyle
left a comment
There was a problem hiding this comment.
LGTM
Left some minor comments mainly about using sometimes using minimalModelSettings instead of minimalServiceSettings
...n/inference/src/test/java/org/elasticsearch/xpack/inference/registry/ModelRegistryTests.java
Outdated
Show resolved
Hide resolved
...earch/xpack/inference/services/elasticsearch/MultilingualE5SmallInternalServiceSettings.java
Outdated
Show resolved
Hide resolved
| * If the {@code inferenceEntityId} is not found, the method behaves as follows: | ||
| * <ul> | ||
| * <li>Returns {@code null} if the id might exist but its configuration is not available locally.</li> | ||
| * <li>Throws a {@link ResourceNotFoundException} if it is certain that the id does not exist in the cluster.</li> | ||
| * </ul> |
There was a problem hiding this comment.
This comment appears to be out of date as the function does not throw
There was a problem hiding this comment.
It's just adding a note here to allow for throwing an exception if we ever determine that the inference ID cannot exist.
This isn't possible with the current implementation, but it ensures that future API consumers (e.g., semantic text field validation) handle this case upfront.
There was a problem hiding this comment.
We could throw from this method today for internal inference IDs (i.e. those that start with .) that we can't find in defaultConfigIds
Mikep86
left a comment
There was a problem hiding this comment.
LGTM, except I would like to see test coverage for an invalid task type added before we merge
| * If the {@code inferenceEntityId} is not found, the method behaves as follows: | ||
| * <ul> | ||
| * <li>Returns {@code null} if the id might exist but its configuration is not available locally.</li> | ||
| * <li>Throws a {@link ResourceNotFoundException} if it is certain that the id does not exist in the cluster.</li> | ||
| * </ul> |
There was a problem hiding this comment.
We could throw from this method today for internal inference IDs (i.e. those that start with .) that we can't find in defaultConfigIds
| new MinimalServiceSettings(TaskType.TEXT_EMBEDDING, 10, SimilarityMeasure.COSINE, null); | ||
| }); | ||
| assertThat(ex.getMessage(), containsString("required [element_type] field is missing")); | ||
| } |
There was a problem hiding this comment.
Can we add a test for an invalid task_type? That got removed in the refactoring.
...plugin/inference/src/main/java/org/elasticsearch/xpack/inference/registry/ModelRegistry.java
Show resolved
Hide resolved
kderusso
left a comment
There was a problem hiding this comment.
I like this approach!
A couple of high level comments:
- Would a name like
InferenceModelSettingsor something be more descriptive? It's not quite clear to me from the name thatMinimalServiceSettingsapplies to models. - As we add service settings in the future we may want to think about a class hierarchy here, but for now since text embedding models are the only models with settings attached to them, I think this is fine.
Thanks for adding this!
This commit introduces minimal service settings in the model registry, accessible without querying the inference index. These settings are now available for the default models exposed by the inference service. The ability to access settings without an inference index query is needed for the semantic text field, as it would benefit from eager validation of configuration during field creation. This is not feasible currently because retrieving service settings relies on an asynchronous call to the inference index. ### Follow-Up Plans: 1. Extend this capability to include minimal service settings for all newly added models, making them accessible via the cluster state. 2. Update the semantic text field to eagerly retrieve service settings directly from the model registry.
This commit introduces minimal service settings in the model registry, accessible without querying the inference index. These settings are now available for the default models exposed by the inference service. The ability to access settings without an inference index query is needed for the semantic text field, as it would benefit from eager validation of configuration during field creation. This is not feasible currently because retrieving service settings relies on an asynchronous call to the inference index. ### Follow-Up Plans: 1. Extend this capability to include minimal service settings for all newly added models, making them accessible via the cluster state. 2. Update the semantic text field to eagerly retrieve service settings directly from the model registry. Co-authored-by: Jonathan Buttner <56361221+jonathan-buttner@users.noreply.github.com>
This commit integrates `MinimalServiceSettings` (introduced in elastic#120560) into the cluster state for all registered models in the `ModelRegistry`. These settings allow consumers to access configuration details without requiring asynchronous calls to retrieve full model configurations. To ensure consistency, the cluster state metadata must remain synchronized with the models in the inference index. If a mismatch is detected during startup, the master node performs an upgrade to load all model settings from the index.
This commit integrates `MinimalServiceSettings` (introduced in elastic#120560) into the cluster state for all registered models in the `ModelRegistry`. These settings allow consumers to access configuration details without requiring asynchronous calls to retrieve full model configurations. To ensure consistency, the cluster state metadata must remain synchronized with the models in the inference index. If a mismatch is detected during startup, the master node performs an upgrade to load all model settings from the index.
This commit integrates `MinimalServiceSettings` (introduced in #120560) into the cluster state for all registered models in the `ModelRegistry`. These settings allow consumers to access configuration details without requiring asynchronous calls to retrieve full model configurations. To ensure consistency, the cluster state metadata must remain synchronized with the models in the inference index. If a mismatch is detected during startup, the master node performs an upgrade to load all model settings from the index.
* Add ModelRegistryMetadata to Cluster State (#121106) This commit integrates `MinimalServiceSettings` (introduced in #120560) into the cluster state for all registered models in the `ModelRegistry`. These settings allow consumers to access configuration details without requiring asynchronous calls to retrieve full model configurations. To ensure consistency, the cluster state metadata must remain synchronized with the models in the inference index. If a mismatch is detected during startup, the master node performs an upgrade to load all model settings from the index. * fix test compil * fix serialisation * Exclude Default Inference Endpoints from Cluster State Storage (#125242) When retrieving a default inference endpoint for the first time, the system automatically creates the endpoint. However, unlike the `put inference model` action, the `get` action does not redirect the request to the master node. Since #121106, we rely on the assumption that every model creation (`put model`) must run on the master node, as it modifies the cluster state. However, this assumption led to a bug where the get action tries to store default inference endpoints from a different node. This change resolves the issue by preventing default inference endpoints from being added to the cluster state. These endpoints are not strictly needed there, as they are already reported by inference services upon startup. **Note:** This bug did not prevent the default endpoints from being used, but it caused repeated attempts to store them in the index, resulting in logging errors on every usage.
This commit integrates `MinimalServiceSettings` (introduced in elastic#120560) into the cluster state for all registered models in the `ModelRegistry`. These settings allow consumers to access configuration details without requiring asynchronous calls to retrieve full model configurations. To ensure consistency, the cluster state metadata must remain synchronized with the models in the inference index. If a mismatch is detected during startup, the master node performs an upgrade to load all model settings from the index.
This commit introduces minimal service settings in the model registry, accessible without querying the inference index. These settings are now available for the default models exposed by the inference service.
The ability to access settings without an inference index query is needed for the semantic text field, as it would benefit from eager validation of configuration during field creation. This is not feasible currently because retrieving service settings relies on an asynchronous call to the inference index.
Follow-Up Plans: