-
Notifications
You must be signed in to change notification settings - Fork 7.1k
[Serve][LLM] Added new /tokenize /detokenize endpoints #59787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces new /tokenize and /detokenize endpoints to the LLM serving layer, which is a great addition for exposing more of the underlying model's capabilities. The changes are well-structured and consistently applied across the different layers of the serving stack, from the API models to the vLLM engine implementation.
I have a couple of suggestions to improve the robustness of the new protocol methods and to maintain type safety. Specifically, I recommend raising a NotImplementedError in the base engine protocol for engines that don't support these new methods, and updating type hints in LLMServer to include the new request types.
Overall, this is a solid contribution that enhances the functionality of Ray Serve for LLMs.
| Returns: | ||
| None when the generator is done. | ||
| """ | ||
| yield # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current default implementation is an empty async generator. This means if an engine doesn't implement this method, it will silently do nothing and return an empty stream to the user, which can be confusing. It would be better to raise a NotImplementedError to make it explicit that the functionality is not supported by the engine. The yield statement is still needed to make this a generator function and satisfy the type hint.
| yield # type: ignore | |
| raise NotImplementedError("tokenize is not implemented for this engine") | |
| yield # type: ignore |
| Returns: | ||
| None when the generator is done. | ||
| """ | ||
| yield # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to tokenize, the default implementation for detokenize should raise a NotImplementedError to provide a clear error message when an engine does not support this functionality, instead of silently returning an empty stream.
| yield # type: ignore | |
| raise NotImplementedError("detokenize is not implemented for this engine") | |
| yield # type: ignore |
| An AsyncGenerator over the TokenizeResponse object. | ||
| """ | ||
| # NOTE: Tokenize does not need batching. | ||
| return await self._run_request( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type hint for the request parameter in _run_request does not include TokenizeRequest or DetokenizeRequest. This will cause a type error when running static analysis. Please update the signature of _run_request to include these new request types. The _maybe_add_request_id_to_request method, which is called within _run_request, also needs its type hint for request updated to include these new types, as they both support request_id.
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
kouroshHakha
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes looks good. please add tests and it’s good to go
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Description
Adding new /tokenizer and /detokenize endpoints
Usage
Testing