[ML] Disable EIS rate limiting within the inference API#133845
[ML] Disable EIS rate limiting within the inference API#133845jonathan-buttner merged 17 commits intoelastic:mainfrom
Conversation
|
Pinging @elastic/ml-core (Team:ML) |
DonalEvans
left a comment
There was a problem hiding this comment.
Nothing major, just some clean-up suggestions. I do like the idea of returning something to the user to indicate that rate limiting is disabled rather than just not returning any rate limit settings at all.
|
|
||
| private static final RateLimitSettings DEFAULT_RATE_LIMIT_SETTINGS = new RateLimitSettings(720L); | ||
|
|
||
| public static ElasticInferenceServiceCompletionServiceSettings fromMap(Map<String, Object> map, ConfigurationParseContext context) { |
There was a problem hiding this comment.
With these changes, the context argument is no longer used, so it can be removed. This also applies to the other *ServiceSettings classes.
There was a problem hiding this comment.
Yeah great point. I'm going to remove them from the *Settings classes but leave them in the models just in case we need the context for a settings class in the future. That way we don't have to plumb it through again.
| ); | ||
| var serviceSettings = ElasticInferenceServiceSparseEmbeddingsServiceSettings.fromMap(map, ConfigurationParseContext.REQUEST); | ||
|
|
||
| assertThat(map, is(Map.of())); |
There was a problem hiding this comment.
Nitpick, but the anEmptyMap() matcher is probably a better choice here. There are a few other places in this PR that make a similar assertion which could also be changed.
| assertThat(map, is(Map.of())); | ||
| assertThat(serviceSettings, is(new ElasticInferenceServiceSparseEmbeddingsServiceSettings(modelId, null))); | ||
| assertThat(serviceSettings.rateLimitSettings(), sameInstance(RateLimitSettings.DISABLED_INSTANCE)); | ||
| assertThat(serviceSettings.rateLimitSettings().isEnabled(), is(false)); |
There was a problem hiding this comment.
Nitpick, but this assertion is redundant, since we already confirmed that the rate limit settings returned are RateLimitSettings.DISABLED_INSTANCE. If we want to verify that RateLimitSettings.DISABLED_INSTANCE.isEnabled() returns false, that test would probably be better placed in RateLimitSettingsTests, since it's testing the implementation of the RateLimitSettings class.
There are a few other tests which make similar assertions, namely the ones added in *ServiceSettingsTests classes.
There was a problem hiding this comment.
Yeah good point, and I believe I already have a test for it in RateLimitSettingsTests.
|
|
||
| assertThat(xContentResult, is(Strings.format(""" | ||
| {"model_id":"%s","rate_limit":{"requests_per_minute":1000}}""", modelId))); | ||
| assertThat(xContentResult, is(XContentHelper.stripWhitespace(Strings.format(""" |
There was a problem hiding this comment.
Is the stripWhitespace() needed here? The JSON string doesn't contain any whitespace.
There was a problem hiding this comment.
I'll format it so it's easier to read and leverages the helper.
| assertFalse(serviceSettings.rateLimitSettings().isEnabled()); | ||
| } | ||
|
|
||
| public void testFromMap_RemovesRateLimitingField() { |
There was a problem hiding this comment.
This test is missing an assertion that the field is removed from the map.
|
|
||
| var failureListener = getModelListenerForException( | ||
| ElasticsearchStatusException.class, | ||
| "Configuration contains settings [{rate_limit={requests_per_minute=100}}] unknown to the [elastic] service" |
There was a problem hiding this comment.
Is there any way to have this message be more specific? It's not really accurate to say that rate_limit or requests_per_minute are unknown, they're just disabled in specific cases.
There was a problem hiding this comment.
Good idea 👍
BASE=ae7bfd61c966081bb68dda052f4fb7bf81abbb2c HEAD=dfd9154dcdf6d90333625e1542220dce5faf4572 Branch=main
BASE=ae7bfd61c966081bb68dda052f4fb7bf81abbb2c HEAD=dfd9154dcdf6d90333625e1542220dce5faf4572 Branch=main
BASE=ae7bfd61c966081bb68dda052f4fb7bf81abbb2c HEAD=dfd9154dcdf6d90333625e1542220dce5faf4572 Branch=main
BASE=ae7bfd61c966081bb68dda052f4fb7bf81abbb2c HEAD=dfd9154dcdf6d90333625e1542220dce5faf4572 Branch=main
BASE=ae7bfd61c966081bb68dda052f4fb7bf81abbb2c HEAD=dfd9154dcdf6d90333625e1542220dce5faf4572 Branch=main
BASE=ae7bfd61c966081bb68dda052f4fb7bf81abbb2c HEAD=dfd9154dcdf6d90333625e1542220dce5faf4572 Branch=main
BASE=ae7bfd61c966081bb68dda052f4fb7bf81abbb2c HEAD=dfd9154dcdf6d90333625e1542220dce5faf4572 Branch=main
This PR disables the rate limiting settings within the inference API for EIS for all task types.
This PR does not change the default inference endpoints that were persisted to the
.inferenceindex. Instead, the changes make all default and new EIS inference endpoints have rate limiting disabled.The changes no longer parse the
rate_limitfield. If a PUT request includes therate_limitfield we'll throw a validation exception indicating that it shouldn't be included.The field will be ignored if coming from the
.inferenceindex.Another option would be to return and store:
If we think that'd be more clear to the user.
I don't think many users are creating EIS endpoints because it isn't documented so maybe this isn't an issue.
Example
The result will no longer include the
rate_limitfield since rate limiting is no longer supported.Response