Skip to content

[Inference API] add service and task type aware rate limiting#125880

Closed
brendan-jugan-elastic wants to merge 10 commits intoelastic:mainfrom
brendan-jugan-elastic:dev/rate-limiting-follow-ups
Closed

[Inference API] add service and task type aware rate limiting#125880
brendan-jugan-elastic wants to merge 10 commits intoelastic:mainfrom
brendan-jugan-elastic:dev/rate-limiting-follow-ups

Conversation

@brendan-jugan-elastic
Copy link
Contributor

The goal of this PR is to address the rate-limiting follow-up TODOs introduced by this PR and tracked by this issue in order to support service and task type aware rate-limiting.

TreeMap<TaskType, MaxNodesPerGroupingStrategy> alibabaCloudSearchConfigs = new TreeMap<>();
var alibabaCloudSearchService = serviceRegistry.getService(AlibabaCloudSearchService.NAME);
if (alibabaCloudSearchService.isPresent()) {
var alibabaCloudSearchTaskTypes = alibabaCloudSearchService.get().supportedTaskTypes();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think eventually we'll want something like this but at the moment we don't support cross node streaming support so we'll definitely need to exclude the chat_completion task type.

alibabaCloudSearchConfigs.put(taskType, defaultStrategy);
}
}
serviceNodeLocalRateLimitConfigs.put(AlibabaCloudSearchService.NAME, alibabaCloudSearchConfigs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt this will ever happen but If the individual service is not present (isPresent() == false) do we still want to add the configs to the tree map?

public static DeepSeekRequestManager.RateLimitGrouping of(DeepSeekChatCompletionModel model) {
Objects.requireNonNull(model);

return new DeepSeekRequestManager.RateLimitGrouping(model.apiKey().hashCode());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it was intentional to limit the rate limit to the max allowed:

So I think we should revert the changes around the api key here.

cc: @prwhelan

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct - there is effectively no rate limiting for DeepSeek

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

4 participants