Add cached tokens to Unified API response by jaybcee · Pull Request #136412 · elastic/elasticsearch

jaybcee · 2025-10-10T15:19:10Z

This PR adds support for cached tokens in the Unified Chat Completion API response, allowing users to track prompt caching from EIS and OpenAI services.

Testing

OpenAiUnifiedStreamingProcessorTests.java: Added comprehensive tests for both scenarios (with and without cached tokens)
Tests verify correct parsing and backward compatibility

Response Format

When cached tokens are present, the response includes:

{
  "completion_tokens": 150,
  "prompt_tokens": 55,
  "total_tokens": 205,
  "prompt_tokens_details": {
    "cached_tokens": 20
  }
}

The prompt_tokens_details object is optional and only appears when cached token information is available from the EIS service, following the OpenAI specification.

I did not implement this for non open-ai providers. We don't really need this field at the moment, it just helps with O11y tools like Phoenix.

- Updated OpenAiUnifiedStreamingProcessor to parse optional cached_tokens from prompt_tokens_details - Added ConstructingObjectParser for prompt_tokens_details nested object - Added tests for both cached tokens present and absent scenarios - Maintains backward compatibility with responses without cached tokens

Copilot

Pull Request Overview

This PR adds support for cached tokens in the Unified Chat Completion API response to align with OpenAI's specification. The implementation adds a cachedTokens field to the Usage record and includes optional prompt_tokens_details serialization when cached token information is available.

Added cachedTokens field to the Usage record with proper serialization support
Updated JSON parsing to handle prompt_tokens_details.cached_tokens field
Added comprehensive test coverage for both scenarios with and without cached tokens

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File	Description
StreamingUnifiedChatCompletionResults.java	Added cachedTokens field to Usage record and updated serialization logic
OpenAiUnifiedStreamingProcessor.java	Updated parser to handle prompt_tokens_details with cached_tokens field
StreamingUnifiedChatCompletionResultsTests.java	Added test coverage for cached tokens serialization scenarios
OpenAiUnifiedStreamingProcessorTests.java	Added test coverage for usage parsing with and without cached tokens

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

- Updated OpenAiServiceTests.testUnifiedCompletionInfer to expect cached_tokens:0 in response - Updated HuggingFaceServiceTests.testUnifiedCompletionInfer to expect cached_tokens:0 in response - Including cached_tokens:0 provides meaningful information (caching available but not used) - Distinguishes from null (no caching information available)

elasticsearchmachine · 2025-10-10T19:40:52Z

Hi @jaybcee, I've created a changelog YAML for you.

jonathan-buttner

Looking good, left a few comments about the transport version check.

...va/org/elasticsearch/xpack/core/inference/results/StreamingUnifiedChatCompletionResults.java

Wrap cachedTokens read/write operations in transport version checks to ensure backward compatibility with older nodes that haven't been upgraded yet.

elasticsearchmachine · 2025-10-22T12:52:03Z

Pinging @elastic/ml-core (Team:ML)

Add cached tokens to Unified API response

812d637

elasticsearchmachine added the v9.3.0 label Oct 10, 2025

elasticsearchmachine and others added 2 commits October 10, 2025 15:26

[CI] Auto commit changes from spotless

e00ab11

jaybcee requested a review from Copilot October 10, 2025 16:47

Copilot AI reviewed Oct 10, 2025

View reviewed changes

[CI] Auto commit changes from spotless

f29f063

jaybcee marked this pull request as ready for review October 10, 2025 16:53

jaybcee requested a review from jonathan-buttner October 10, 2025 16:53

elasticsearchmachine added the needs:triage Requires assignment of a team area label label Oct 10, 2025

jaybcee marked this pull request as draft October 10, 2025 16:54

jaybcee and others added 3 commits October 10, 2025 13:39

Merge branch 'main' into add-cached-tokens-unified-api

fdf8db1

Merge branch 'main' into add-cached-tokens-unified-api

9d72cc9

jonathan-buttner added :ml Machine learning Team:ML Meta label for the ML team >enhancement and removed needs:triage Requires assignment of a team area label labels Oct 10, 2025

Update docs/changelog/136412.yaml

4a2bbed

jonathan-buttner reviewed Oct 10, 2025

View reviewed changes

jaybcee and others added 3 commits October 21, 2025 17:19

Merge branch 'main' into add-cached-tokens-unified-api

8601cc4

Address comments: Add transport version checks for cached tokens

cdcf8a0

Wrap cachedTokens read/write operations in transport version checks to ensure backward compatibility with older nodes that haven't been upgraded yet.

[CI] Update transport version definitions

8bdb492

jaybcee marked this pull request as ready for review October 22, 2025 12:51

jaybcee and others added 4 commits October 22, 2025 08:58

Remove CSV for Transports such that CI creates a new one

abc4035

Restore 9.3.csv transport version definitions

2e25bb4

Update 9.3.csv to match main branch

36e7f98

Merge branch 'main' into add-cached-tokens-unified-api

59ad044

[CI] Update transport version definitions

0d5d1c9

jonathan-buttner approved these changes Oct 22, 2025

View reviewed changes

jaybcee merged commit 9c422bb into main Oct 22, 2025
35 checks passed

jaybcee deleted the add-cached-tokens-unified-api branch October 22, 2025 18:29

fzowl pushed a commit to voyage-ai/elasticsearch that referenced this pull request Nov 3, 2025

Add cached tokens to Unified API response (elastic#136412)

db8b9be

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cached tokens to Unified API response#136412

Add cached tokens to Unified API response#136412
jaybcee merged 16 commits intomainfrom
add-cached-tokens-unified-api

jaybcee commented Oct 10, 2025 •

edited by davidkyle

Loading

Copilot AI left a comment

elasticsearchmachine commented Oct 10, 2025

jonathan-buttner left a comment

Uh oh!

Uh oh!

Uh oh!

elasticsearchmachine commented Oct 22, 2025

Uh oh!

Labels

4 participants

Conversation

jaybcee commented Oct 10, 2025 • edited by davidkyle Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing

Response Format

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

elasticsearchmachine commented Oct 10, 2025

jonathan-buttner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

elasticsearchmachine commented Oct 22, 2025

Uh oh!

Labels

4 participants

jaybcee commented Oct 10, 2025 •

edited by davidkyle

Loading