[Serve][LLM] Added new /tokenize /detokenize endpoints #59787

ahao-anyscale · 2025-12-31T18:08:48Z

Description

Adding new /tokenizer and /detokenize endpoints

Usage

curl -X POST http://localhost:8000/tokenize \
  -H "Content-Type: application/json" \
  -d '{"model": "your-model-id", "prompt": "Hello, world!"}'

Testing

Unit tests included

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

gemini-code-assist

Code Review

This pull request introduces new /tokenize and /detokenize endpoints to the LLM serving layer, which is a great addition for exposing more of the underlying model's capabilities. The changes are well-structured and consistently applied across the different layers of the serving stack, from the API models to the vLLM engine implementation.

I have a couple of suggestions to improve the robustness of the new protocol methods and to maintain type safety. Specifically, I recommend raising a NotImplementedError in the base engine protocol for engines that don't support these new methods, and updating type hints in LLMServer to include the new request types.

Overall, this is a solid contribution that enhances the functionality of Ray Serve for LLMs.

gemini-code-assist · 2025-12-31T18:16:04Z

python/ray/llm/_internal/serve/core/engine/protocol.py

+        Returns:
+            None when the generator is done.
+        """
+        yield  # type: ignore


The current default implementation is an empty async generator. This means if an engine doesn't implement this method, it will silently do nothing and return an empty stream to the user, which can be confusing. It would be better to raise a NotImplementedError to make it explicit that the functionality is not supported by the engine. The yield statement is still needed to make this a generator function and satisfy the type hint.

Suggested change

yield # type: ignore

raise NotImplementedError("tokenize is not implemented for this engine")

yield # type: ignore

gemini-code-assist · 2025-12-31T18:16:04Z

python/ray/llm/_internal/serve/core/engine/protocol.py

+        Returns:
+            None when the generator is done.
+        """
+        yield  # type: ignore


Similar to tokenize, the default implementation for detokenize should raise a NotImplementedError to provide a clear error message when an engine does not support this functionality, instead of silently returning an empty stream.

Suggested change

yield # type: ignore

raise NotImplementedError("detokenize is not implemented for this engine")

yield # type: ignore

gemini-code-assist · 2025-12-31T18:16:04Z

python/ray/llm/_internal/serve/core/server/llm_server.py

+            An AsyncGenerator over the TokenizeResponse object.
+        """
+        # NOTE: Tokenize does not need batching.
+        return await self._run_request(


The type hint for the request parameter in _run_request does not include TokenizeRequest or DetokenizeRequest. This will cause a type error when running static analysis. Please update the signature of _run_request to include these new request types. The _maybe_add_request_id_to_request method, which is called within _run_request, also needs its type hint for request updated to include these new types, as they both support request_id.

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

kouroshHakha

yes looks good. please add tests and it’s good to go

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

added tokenize endpoints

4297501

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

gemini-code-assist bot reviewed Dec 31, 2025

View reviewed changes

ingress changes

7322c0f

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

kouroshHakha reviewed Dec 31, 2025

View reviewed changes

added unit tests

6d2fe1a

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

ahao-anyscale added the go add ONLY when ready to merge, run all tests label Dec 31, 2025

ahao-anyscale marked this pull request as ready for review December 31, 2025 19:18

ahao-anyscale requested a review from a team as a code owner December 31, 2025 19:18

ray-gardener bot added serve Ray Serve Related Issue llm labels Jan 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Serve][LLM] Added new /tokenize /detokenize endpoints #59787

[Serve][LLM] Added new /tokenize /detokenize endpoints #59787

Uh oh!

ahao-anyscale commented Dec 31, 2025 •

edited

Loading

gemini-code-assist bot left a comment

gemini-code-assist bot Dec 31, 2025

gemini-code-assist bot Dec 31, 2025

gemini-code-assist bot Dec 31, 2025

kouroshHakha left a comment

Labels

2 participants

	yield # type: ignore
	raise NotImplementedError("tokenize is not implemented for this engine")
	yield # type: ignore

	yield # type: ignore
	raise NotImplementedError("detokenize is not implemented for this engine")
	yield # type: ignore

[Serve][LLM] Added new /tokenize /detokenize endpoints #59787

Are you sure you want to change the base?

[Serve][LLM] Added new /tokenize /detokenize endpoints #59787

Uh oh!

Conversation

ahao-anyscale commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Usage

Testing

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

gemini-code-assist bot Dec 31, 2025

Choose a reason for hiding this comment

gemini-code-assist bot Dec 31, 2025

Choose a reason for hiding this comment

gemini-code-assist bot Dec 31, 2025

Choose a reason for hiding this comment

kouroshHakha left a comment

Choose a reason for hiding this comment

Labels

2 participants

ahao-anyscale commented Dec 31, 2025 •

edited

Loading