[Inference API] Add unified api for chat completions#117589
[Inference API] Add unified api for chat completions#117589jonathan-buttner merged 75 commits intomainfrom
Conversation
…/elasticsearch into ml-inference-unified-api-elastic
|
Pinging @elastic/ml-core (Team:ML) |
|
Hi @maxhniebergall, I've created a changelog YAML for you. |
…ence-unified-api-elastic
|
Tests to add:
TODO
|
…/elasticsearch into ml-inference-unified-api-elastic
server/src/main/java/org/elasticsearch/inference/UnifiedCompletionRequest.java
Outdated
Show resolved
Hide resolved
…/elasticsearch into ml-inference-unified-api-elastic
…/elasticsearch into ml-inference-unified-api-elastic
|
|
||
| builder.field(MODEL_FIELD, model.getServiceSettings().modelId()); | ||
| if (unifiedRequest.maxCompletionTokens() != null) { | ||
| builder.field(MAX_COMPLETION_TOKENS_FIELD, unifiedRequest.maxCompletionTokens()); |
There was a problem hiding this comment.
I just realized that the OpenAiChatCompletionServiceSettings has a similar field that isn't used (even previous to this PR). I wonder if we should sync those fields up like we do the modelId 🤔 . I think we've typically used the max tokens to do the truncation for text embedding so I don't think we really used it for completions.
@maxhniebergall what do you think?
There was a problem hiding this comment.
hmm, yea it sounds like we should use the value from the service settings if its available
| public void toXContent(XContentBuilder builder, ToXContent.Params params) throws IOException { | ||
| builder.value(content); | ||
| } |
There was a problem hiding this comment.
Is function required? The class does not declare that is implements toXContent
There was a problem hiding this comment.
Oops, yeah we can remove that. Thanks.
...ore/src/main/java/org/elasticsearch/xpack/core/inference/action/UnifiedCompletionAction.java
Show resolved
Hide resolved
| return e; | ||
| } | ||
|
|
||
| if (taskType != TaskType.COMPLETION) { |
There was a problem hiding this comment.
| if (taskType != TaskType.COMPLETION) { | |
| if (taskType.isAnyOrSame(TaskType.COMPLETION)) { |
For the case where tasktype is not set in the URL and defaulted to ANY
| private Deque<StreamingUnifiedChatCompletionResults.ChatCompletionChunk> singleItem( | ||
| StreamingUnifiedChatCompletionResults.ChatCompletionChunk result | ||
| ) { | ||
| var deque = new ArrayDeque<StreamingUnifiedChatCompletionResults.ChatCompletionChunk>(2); |
There was a problem hiding this comment.
I'm not sure, it was in the code Pat sent me for this. I also thought it was odd prwhelan@4c573ba
| for (UnifiedCompletionRequest.Message message : unifiedRequest.messages()) { | ||
| builder.startObject(); | ||
| { | ||
| switch (message.content()) { |
| private final Deque<StreamingUnifiedChatCompletionResults.ChatCompletionChunk> buffer = new LinkedBlockingDeque<>(); | ||
|
|
||
| @Override | ||
| protected void onRequest(long n) { |
There was a problem hiding this comment.
It took me a while to grok that onRequest is not part of the Flow interface. Maybe call this upstreamRequest not to confuse it with the various Flow on* methods
There was a problem hiding this comment.
Yep sounds good. I still don't quite understand what all that stuff is doing haha. I'll have Pat give an overview when he gets back maybe.
…ence-unified-api-elastic
…/elasticsearch into ml-inference-unified-api-elastic
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
* Adding some shell classes * modeling the request objects * Writeable changes to schema * Working parsing tests * Creating a new action * Add outbound request writing (WIP) * Improvements to request serialization * Adding separate transport classes * separate out unified request and combine inputs * Reworking unified inputs * Adding unsupported operation calls * Fixing parsing logic * get the build working * Update docs/changelog/117589.yaml * Fixing injection issue * Allowing model to be overridden but not working yet * Fixing issues * Switch field name for tool * Add suport for toolCalls and refusal in streaming completion * Working tool call response * Separate unified and legacy code paths * Updated the parser, but there are some class cast exceptions to fix * Refactoring tests and request entities * Parse response from OpenAI * Removing unused request classes * precommit * Adding tests for UnifiedCompletionAction Request * Refactoring stop to be a list of strings * Testing for OpenAI response parsing * Refactoring transport action tests to test unified validation code * Fixing various tests * Fixing license header * Reformat streaming results * Finalize response format * remove debug logs * remove changes for debugging * Task type and base inference action tests * Adding openai service tests * Adding model tests * tests for StreamingUnifiedChatCompletionResultsTests toXContentChunked * Fixing change log and removing commented out code * Switch usage to accept null * Adding test for TestStreamingCompletionServiceExtension * Avoid serializing empty lists + request entity tests * Register named writeables from UnifiedCompletionRequest * Removing commented code * Clean up and add more of an explination * remove duplicate test * remove old todos * Refactoring some duplication * Adding javadoc * Addressing feedback --------- Co-authored-by: Jonathan Buttner <jonathan.buttner@elastic.co> Co-authored-by: Jonathan Buttner <56361221+jonathan-buttner@users.noreply.github.com> (cherry picked from commit 467fdb8) # Conflicts: # x-pack/plugin/inference/qa/inference-service-tests/src/javaRestTest/java/org/elasticsearch/xpack/inference/InferenceCrudIT.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/action/TransportInferenceAction.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/common/DelegatingProcessor.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/action/TransportInferenceActionTests.java
* Adding some shell classes * modeling the request objects * Writeable changes to schema * Working parsing tests * Creating a new action * Add outbound request writing (WIP) * Improvements to request serialization * Adding separate transport classes * separate out unified request and combine inputs * Reworking unified inputs * Adding unsupported operation calls * Fixing parsing logic * get the build working * Update docs/changelog/117589.yaml * Fixing injection issue * Allowing model to be overridden but not working yet * Fixing issues * Switch field name for tool * Add suport for toolCalls and refusal in streaming completion * Working tool call response * Separate unified and legacy code paths * Updated the parser, but there are some class cast exceptions to fix * Refactoring tests and request entities * Parse response from OpenAI * Removing unused request classes * precommit * Adding tests for UnifiedCompletionAction Request * Refactoring stop to be a list of strings * Testing for OpenAI response parsing * Refactoring transport action tests to test unified validation code * Fixing various tests * Fixing license header * Reformat streaming results * Finalize response format * remove debug logs * remove changes for debugging * Task type and base inference action tests * Adding openai service tests * Adding model tests * tests for StreamingUnifiedChatCompletionResultsTests toXContentChunked * Fixing change log and removing commented out code * Switch usage to accept null * Adding test for TestStreamingCompletionServiceExtension * Avoid serializing empty lists + request entity tests * Register named writeables from UnifiedCompletionRequest * Removing commented code * Clean up and add more of an explination * remove duplicate test * remove old todos * Refactoring some duplication * Adding javadoc * Addressing feedback --------- Co-authored-by: Jonathan Buttner <jonathan.buttner@elastic.co> Co-authored-by: Jonathan Buttner <56361221+jonathan-buttner@users.noreply.github.com> (cherry picked from commit 467fdb8) # Conflicts: # x-pack/plugin/inference/qa/inference-service-tests/src/javaRestTest/java/org/elasticsearch/xpack/inference/InferenceCrudIT.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/action/TransportInferenceAction.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/common/DelegatingProcessor.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/action/TransportInferenceActionTests.java
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
* Adding some shell classes * modeling the request objects * Writeable changes to schema * Working parsing tests * Creating a new action * Add outbound request writing (WIP) * Improvements to request serialization * Adding separate transport classes * separate out unified request and combine inputs * Reworking unified inputs * Adding unsupported operation calls * Fixing parsing logic * get the build working * Update docs/changelog/117589.yaml * Fixing injection issue * Allowing model to be overridden but not working yet * Fixing issues * Switch field name for tool * Add suport for toolCalls and refusal in streaming completion * Working tool call response * Separate unified and legacy code paths * Updated the parser, but there are some class cast exceptions to fix * Refactoring tests and request entities * Parse response from OpenAI * Removing unused request classes * precommit * Adding tests for UnifiedCompletionAction Request * Refactoring stop to be a list of strings * Testing for OpenAI response parsing * Refactoring transport action tests to test unified validation code * Fixing various tests * Fixing license header * Reformat streaming results * Finalize response format * remove debug logs * remove changes for debugging * Task type and base inference action tests * Adding openai service tests * Adding model tests * tests for StreamingUnifiedChatCompletionResultsTests toXContentChunked * Fixing change log and removing commented out code * Switch usage to accept null * Adding test for TestStreamingCompletionServiceExtension * Avoid serializing empty lists + request entity tests * Register named writeables from UnifiedCompletionRequest * Removing commented code * Clean up and add more of an explination * remove duplicate test * remove old todos * Refactoring some duplication * Adding javadoc * Addressing feedback --------- Co-authored-by: Jonathan Buttner <jonathan.buttner@elastic.co> Co-authored-by: Jonathan Buttner <56361221+jonathan-buttner@users.noreply.github.com> (cherry picked from commit 467fdb8) # Conflicts: # x-pack/plugin/inference/qa/inference-service-tests/src/javaRestTest/java/org/elasticsearch/xpack/inference/InferenceCrudIT.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/action/TransportInferenceAction.java
…118772) * [Inference API] Add unified api for chat completions (#117589) * Adding some shell classes * modeling the request objects * Writeable changes to schema * Working parsing tests * Creating a new action * Add outbound request writing (WIP) * Improvements to request serialization * Adding separate transport classes * separate out unified request and combine inputs * Reworking unified inputs * Adding unsupported operation calls * Fixing parsing logic * get the build working * Update docs/changelog/117589.yaml * Fixing injection issue * Allowing model to be overridden but not working yet * Fixing issues * Switch field name for tool * Add suport for toolCalls and refusal in streaming completion * Working tool call response * Separate unified and legacy code paths * Updated the parser, but there are some class cast exceptions to fix * Refactoring tests and request entities * Parse response from OpenAI * Removing unused request classes * precommit * Adding tests for UnifiedCompletionAction Request * Refactoring stop to be a list of strings * Testing for OpenAI response parsing * Refactoring transport action tests to test unified validation code * Fixing various tests * Fixing license header * Reformat streaming results * Finalize response format * remove debug logs * remove changes for debugging * Task type and base inference action tests * Adding openai service tests * Adding model tests * tests for StreamingUnifiedChatCompletionResultsTests toXContentChunked * Fixing change log and removing commented out code * Switch usage to accept null * Adding test for TestStreamingCompletionServiceExtension * Avoid serializing empty lists + request entity tests * Register named writeables from UnifiedCompletionRequest * Removing commented code * Clean up and add more of an explination * remove duplicate test * remove old todos * Refactoring some duplication * Adding javadoc * Addressing feedback --------- Co-authored-by: Jonathan Buttner <jonathan.buttner@elastic.co> Co-authored-by: Jonathan Buttner <56361221+jonathan-buttner@users.noreply.github.com> (cherry picked from commit 467fdb8) # Conflicts: # x-pack/plugin/inference/qa/inference-service-tests/src/javaRestTest/java/org/elasticsearch/xpack/inference/InferenceCrudIT.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/action/TransportInferenceAction.java * fix merge conflicts * formatting * Remove tests - retain feature flag * [CI] Auto commit changes from spotless --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Unified API communicating with OpenAI
Testing
Running ES
The
_unifiedroute is behind a feature flag, so to enable it run es like this:Creating endpoint and sending requestions
Creating a completion endpoint
Completion request
Response format
A sequence of partial responses: