Skip to content

Add cached tokens to Unified API response#136412

Merged
jaybcee merged 16 commits intomainfrom
add-cached-tokens-unified-api
Oct 22, 2025
Merged

Add cached tokens to Unified API response#136412
jaybcee merged 16 commits intomainfrom
add-cached-tokens-unified-api

Conversation

@jaybcee
Copy link
Member

@jaybcee jaybcee commented Oct 10, 2025

This PR adds support for cached tokens in the Unified Chat Completion API response, allowing users to track prompt caching from EIS and OpenAI services.

Testing

  • OpenAiUnifiedStreamingProcessorTests.java: Added comprehensive tests for both scenarios (with and without cached tokens)
  • Tests verify correct parsing and backward compatibility

Response Format

When cached tokens are present, the response includes:

{
  "completion_tokens": 150,
  "prompt_tokens": 55,
  "total_tokens": 205,
  "prompt_tokens_details": {
    "cached_tokens": 20
  }
}

The prompt_tokens_details object is optional and only appears when cached token information is available from the EIS service, following the OpenAI specification.

I did not implement this for non open-ai providers. We don't really need this field at the moment, it just helps with O11y tools like Phoenix.

elasticsearchmachine and others added 2 commits October 10, 2025 15:26
- Updated OpenAiUnifiedStreamingProcessor to parse optional cached_tokens from prompt_tokens_details
- Added ConstructingObjectParser for prompt_tokens_details nested object
- Added tests for both cached tokens present and absent scenarios
- Maintains backward compatibility with responses without cached tokens
@jaybcee jaybcee requested a review from Copilot October 10, 2025 16:47
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for cached tokens in the Unified Chat Completion API response to align with OpenAI's specification. The implementation adds a cachedTokens field to the Usage record and includes optional prompt_tokens_details serialization when cached token information is available.

  • Added cachedTokens field to the Usage record with proper serialization support
  • Updated JSON parsing to handle prompt_tokens_details.cached_tokens field
  • Added comprehensive test coverage for both scenarios with and without cached tokens

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
StreamingUnifiedChatCompletionResults.java Added cachedTokens field to Usage record and updated serialization logic
OpenAiUnifiedStreamingProcessor.java Updated parser to handle prompt_tokens_details with cached_tokens field
StreamingUnifiedChatCompletionResultsTests.java Added test coverage for cached tokens serialization scenarios
OpenAiUnifiedStreamingProcessorTests.java Added test coverage for usage parsing with and without cached tokens

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@jaybcee jaybcee marked this pull request as ready for review October 10, 2025 16:53
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Oct 10, 2025
@jaybcee jaybcee marked this pull request as draft October 10, 2025 16:54
jaybcee and others added 3 commits October 10, 2025 13:39
- Updated OpenAiServiceTests.testUnifiedCompletionInfer to expect cached_tokens:0 in response
- Updated HuggingFaceServiceTests.testUnifiedCompletionInfer to expect cached_tokens:0 in response
- Including cached_tokens:0 provides meaningful information (caching available but not used)
- Distinguishes from null (no caching information available)
@jonathan-buttner jonathan-buttner added :ml Machine learning Team:ML Meta label for the ML team >enhancement and removed needs:triage Requires assignment of a team area label labels Oct 10, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @jaybcee, I've created a changelog YAML for you.

Copy link
Contributor

@jonathan-buttner jonathan-buttner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, left a few comments about the transport version check.

jaybcee and others added 3 commits October 21, 2025 17:19
Wrap cachedTokens read/write operations in transport version checks to ensure
backward compatibility with older nodes that haven't been upgraded yet.
@jaybcee jaybcee marked this pull request as ready for review October 22, 2025 12:51
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@jaybcee jaybcee merged commit 9c422bb into main Oct 22, 2025
35 checks passed
@jaybcee jaybcee deleted the add-cached-tokens-unified-api branch October 22, 2025 18:29
fzowl pushed a commit to voyage-ai/elasticsearch that referenced this pull request Nov 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement :ml Machine learning Team:ML Meta label for the ML team v9.3.0

4 participants