[ML] Append all data to Chat Completion buffer by prwhelan · Pull Request #127658 · elastic/elasticsearch

prwhelan · 2025-05-02T18:58:54Z

Moved the Chat Completion buffer into the
StreamingUnifiedChatCompletionResults so that all Chat Completion responses can benefit from it. Chat Completions is meant to adhere to OpenAI as much as possible, and OpenAI only sends one response chunk at a time. All implementations of Chat Completions will now buffer.

This fixes a bug where more than two chunks in a single item would be dropped, instead they are all added to the buffer.
This fixes a bug where onComplete would omit trailing items in the buffer.

Moved the Chat Completion buffer into the StreamingUnifiedChatCompletionResults so that all Chat Completion responses can benefit from it. Chat Completions is meant to adhere to OpenAI as much as possible, and OpenAI only sends one response chunk at a time. All implementations of Chat Completions will now buffer. This fixes a bug where more than two chunks in a single item would be dropped, instead they are all added to the buffer. This fixes a bug where onComplete would omit trailing items in the buffer.

elasticsearchmachine · 2025-05-02T18:59:18Z

Hi @prwhelan, I've created a changelog YAML for you.

elasticsearchmachine · 2025-05-02T21:22:27Z

Pinging @elastic/ml-core (Team:ML)

jonathan-buttner

Looks good, left some questions.

jonathan-buttner · 2025-05-05T14:14:38Z

...va/org/elasticsearch/xpack/core/inference/results/StreamingUnifiedChatCompletionResults.java

+                                    subscription.request(n);
+                                }
+                            } else {
+                                downstream.onNext(new Results(DequeUtils.of(buffer.poll())));


Is there only 1 thread accessing the buffer? Or is there a chance that we could check for isEmpty() and then some other thread picks up the item before this thread?

There is only 1 thread calling request, but we can be safe and change to bufer.poll() and check if it's null

jonathan-buttner · 2025-05-05T14:23:34Z

...g/elasticsearch/xpack/core/inference/results/StreamingUnifiedChatCompletionResultsTests.java

+            @Override
+            public void onComplete() {}
+        });
+        assertThat(counter.get(), equalTo(2));


Just to make sure I understand, does this test that we only get a 1 result even if we have multiple in a single item?

Yes because we'll only call onNext once per chunk, so if we send a chunk of 2 elements then counter will equal 1. Let me change to mockito spies so that's easier to read (I think)

elasticsearchmachine · 2025-05-05T21:09:45Z

💔 Backport failed

Status	Branch	Result
❌	8.19	Commit could not be cherrypicked due to conflicts
❌	8.18	Commit could not be cherrypicked due to conflicts
❌	9.0	Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 127658

Moved the Chat Completion buffer into the StreamingUnifiedChatCompletionResults so that all Chat Completion responses can benefit from it. Chat Completions is meant to adhere to OpenAI as much as possible, and OpenAI only sends one response chunk at a time. All implementations of Chat Completions will now buffer. This fixes a bug where more than two chunks in a single item would be dropped, instead they are all added to the buffer. This fixes a bug where onComplete would omit trailing items in the buffer.

…tic#128134) Moved the Chat Completion buffer into the StreamingUnifiedChatCompletionResults so that all Chat Completion responses can benefit from it. Chat Completions is meant to adhere to OpenAI as much as possible, and OpenAI only sends one response chunk at a time. All implementations of Chat Completions will now buffer. This fixes a bug where more than two chunks in a single item would be dropped, instead they are all added to the buffer. This fixes a bug where onComplete would omit trailing items in the buffer.

…128164) Moved the Chat Completion buffer into the StreamingUnifiedChatCompletionResults so that all Chat Completion responses can benefit from it. Chat Completions is meant to adhere to OpenAI as much as possible, and OpenAI only sends one response chunk at a time. All implementations of Chat Completions will now buffer. This fixes a bug where more than two chunks in a single item would be dropped, instead they are all added to the buffer. This fixes a bug where onComplete would omit trailing items in the buffer.

prwhelan added >bug :ml Machine learning Team:ML Meta label for the ML team auto-backport Automatically create backport pull requests when merged v8.19.0 v9.1.0 v8.18.2 v9.0.2 labels May 2, 2025

prwhelan and others added 3 commits May 2, 2025 14:59

Update docs/changelog/127658.yaml

9458112

[CI] Auto commit changes from spotless

d0ada04

Revert id change

9fe2c43

prwhelan marked this pull request as ready for review May 2, 2025 21:22

jonathan-buttner reviewed May 5, 2025

View reviewed changes

prwhelan added 2 commits May 5, 2025 10:44

Change to buffer.poll, use mockito

43bcce5

Merge branch 'main' into refactor/chat-completion-buffer

71c5ff9

jonathan-buttner approved these changes May 5, 2025

View reviewed changes

Merge branch 'main' into refactor/chat-completion-buffer

4c5696f

prwhelan enabled auto-merge (squash) May 5, 2025 19:04

Merge branch 'main' into refactor/chat-completion-buffer

c80d382

prwhelan merged commit b108e39 into elastic:main May 5, 2025
16 of 17 checks passed

elasticsearchmachine added the backport pending label May 5, 2025

prwhelan mentioned this pull request May 19, 2025

[ML] Append all data to Chat Completion buffer (#127658) #128134

Merged

prwhelan mentioned this pull request May 19, 2025

[ML] Append all data to Chat Completion buffer (#127658) #128136

Merged

prwhelan mentioned this pull request May 19, 2025

[ML] Append all data to Chat Completion buffer (#127658) (#128134) #128164

Merged

valeriy42 removed the backport pending label Feb 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Append all data to Chat Completion buffer#127658

[ML] Append all data to Chat Completion buffer#127658
prwhelan merged 8 commits intoelastic:mainfrom
prwhelan:refactor/chat-completion-buffer

prwhelan commented May 2, 2025

elasticsearchmachine commented May 2, 2025

elasticsearchmachine commented May 2, 2025

jonathan-buttner left a comment

jonathan-buttner May 5, 2025

prwhelan May 5, 2025

jonathan-buttner May 5, 2025

prwhelan May 5, 2025

Uh oh!

elasticsearchmachine commented May 5, 2025

Labels

4 participants

Conversation

prwhelan commented May 2, 2025

elasticsearchmachine commented May 2, 2025

elasticsearchmachine commented May 2, 2025

jonathan-buttner left a comment

Choose a reason for hiding this comment

jonathan-buttner May 5, 2025

Choose a reason for hiding this comment

prwhelan May 5, 2025

Choose a reason for hiding this comment

jonathan-buttner May 5, 2025

Choose a reason for hiding this comment

prwhelan May 5, 2025

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented May 5, 2025

💔 Backport failed

Labels

4 participants