Skip to content

Conversation

@ahao-anyscale
Copy link
Contributor

@ahao-anyscale ahao-anyscale commented Dec 31, 2025

Description

Adding new /tokenizer and /detokenize endpoints

Usage

curl -X POST http://localhost:8000/tokenize \
  -H "Content-Type: application/json" \
  -d '{"model": "your-model-id", "prompt": "Hello, world!"}'

Testing

  • Unit tests included
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces new /tokenize and /detokenize endpoints to the LLM serving layer, which is a great addition for exposing more of the underlying model's capabilities. The changes are well-structured and consistently applied across the different layers of the serving stack, from the API models to the vLLM engine implementation.

I have a couple of suggestions to improve the robustness of the new protocol methods and to maintain type safety. Specifically, I recommend raising a NotImplementedError in the base engine protocol for engines that don't support these new methods, and updating type hints in LLMServer to include the new request types.

Overall, this is a solid contribution that enhances the functionality of Ray Serve for LLMs.

Returns:
None when the generator is done.
"""
yield # type: ignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current default implementation is an empty async generator. This means if an engine doesn't implement this method, it will silently do nothing and return an empty stream to the user, which can be confusing. It would be better to raise a NotImplementedError to make it explicit that the functionality is not supported by the engine. The yield statement is still needed to make this a generator function and satisfy the type hint.

Suggested change
yield # type: ignore
raise NotImplementedError("tokenize is not implemented for this engine")
yield # type: ignore
Returns:
None when the generator is done.
"""
yield # type: ignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to tokenize, the default implementation for detokenize should raise a NotImplementedError to provide a clear error message when an engine does not support this functionality, instead of silently returning an empty stream.

Suggested change
yield # type: ignore
raise NotImplementedError("detokenize is not implemented for this engine")
yield # type: ignore
An AsyncGenerator over the TokenizeResponse object.
"""
# NOTE: Tokenize does not need batching.
return await self._run_request(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The type hint for the request parameter in _run_request does not include TokenizeRequest or DetokenizeRequest. This will cause a type error when running static analysis. Please update the signature of _run_request to include these new request types. The _maybe_add_request_id_to_request method, which is called within _run_request, also needs its type hint for request updated to include these new types, as they both support request_id.

Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Copy link
Contributor

@kouroshHakha kouroshHakha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes looks good. please add tests and it’s good to go

Signed-off-by: ahao-anyscale <ahao@anyscale.com>
@ahao-anyscale ahao-anyscale added the go add ONLY when ready to merge, run all tests label Dec 31, 2025
@ahao-anyscale ahao-anyscale marked this pull request as ready for review December 31, 2025 19:18
@ahao-anyscale ahao-anyscale requested a review from a team as a code owner December 31, 2025 19:18
@ray-gardener ray-gardener bot added serve Ray Serve Related Issue llm labels Jan 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests llm serve Ray Serve Related Issue

2 participants