[Inference API] Add unified api for chat completions by maxhniebergall · Pull Request #117589 · elastic/elasticsearch

maxhniebergall · 2024-11-26T20:03:06Z

Unified API communicating with OpenAI

Testing

Running ES

The _unified route is behind a feature flag, so to enable it run es like this:

./gradlew :run -Drun.license_type=trial -Des.inference_unified_feature_flag_enabled=true

Creating endpoint and sending requestions

Creating a completion endpoint

PUT http://localhost:9200/_inference/completion/test
{
    "service": "openai",
    "service_settings": {
        "api_key": "<api key>",
        "model_id": "gpt-4o"
    }
}

Completion request

POST http://localhost:9200/_inference/completion/test/_unified
{
    "model": "gpt-4o",
    "messages": [
        {
            "role": "user",
            "content": "What is the weather like in Boston today?"
        }
    ],
    "stop": "none",
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "get_current_weather",
                "description": "Get the current weather in a given location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA"
                        },
                        "unit": {
                            "type": "string",
                            "enum": [
                                "celsius",
                                "fahrenheit"
                            ]
                        }
                    },
                    "required": [
                        "location"
                    ]
                }
            }
        }
    ],
    "tool_choice": "auto"
}

Response format

A sequence of partial responses:

{
    "id": "chatcmpl-AamGq3bqfB5bhmDaBLYi5Z5kTUz5R",
    "choices": [
        {
            "delta": {
                "content": " ol"
            },
            "index": 0
        }
    ],
    "model": "gpt-3.5-turbo-0125",
    "object": "chat.completion.chunk"
}

{
    "id": "chatcmpl-AamGq3bqfB5bhmDaBLYi5Z5kTUz5R",
    "choices": [
        {
            "delta": {
                "content": "iv"
            },
            "index": 0
        }
    ],
    "model": "gpt-3.5-turbo-0125",
    "object": "chat.completion.chunk"
}

{
    "id": "chatcmpl-AamGq3bqfB5bhmDaBLYi5Z5kTUz5R",
    "choices": [
        {
            "delta": {
                "content": "ine"
            },
            "index": 0
        }
    ],
    "model": "gpt-3.5-turbo-0125",
    "object": "chat.completion.chunk"
}

{
    "id": "chatcmpl-AamGq3bqfB5bhmDaBLYi5Z5kTUz5R",
    "choices": [
        {
            "delta": {
                "content": "."
            },
            "index": 0
        }
    ],
    "model": "gpt-3.5-turbo-0125",
    "object": "chat.completion.chunk"
}

{
    "id": "chatcmpl-AamGq3bqfB5bhmDaBLYi5Z5kTUz5R",
    "choices": [
        {
            "delta": {},
            "finish_reason": "stop",
            "index": 0
        }
    ],
    "model": "gpt-3.5-turbo-0125",
    "object": "chat.completion.chunk"
}

[DONE
]

…/elasticsearch into ml-inference-unified-api-elastic

elasticsearchmachine · 2024-11-26T20:03:30Z

Pinging @elastic/ml-core (Team:ML)

elasticsearchmachine · 2024-11-26T20:03:32Z

Hi @maxhniebergall, I've created a changelog YAML for you.

…ence-unified-api-elastic

maxhniebergall · 2024-11-26T20:16:36Z

Tests to add:

UnifiedCompletionAction (@jonathan-buttner) ✅
TestStreamingCompletionServiceExtension (@jonathan-buttner) ✅
Rolling update tests (We don't think we need these?)
TransportInferenceActionTests (@jonathan-buttner) ✅
InferenceInputs (@jonathan-buttner) ✅
- ~~we should double check the castTo method and add it to the other subclasses~~ (Let's do this later because the PR is large enough)
UnifiedChatInput (@jonathan-buttner) ✅
- for the conversions
OpenAiUnifiedCompletionRequestEntity (@maxhniebergall)✅
BaseInferenceAction (@jonathan-buttner) ✅
RestUnifiedCompletionInferenceAction (@jonathan-buttner) ✅
OpenAiService (@jonathan-buttner) ✅
OpenAiChatCompletionModel (@jonathan-buttner) ✅
OpenAiUnifiedStreamingProcessor & StreamingUnifiedChatCompletionResults (@maxhniebergall )✅

TODO

Create a list of all new named writables and add them to the registry ✅
Address outstanding TODOs ✅

…/elasticsearch into ml-inference-unified-api-elastic

server/src/main/java/org/elasticsearch/inference/UnifiedCompletionRequest.java

…/elasticsearch into ml-inference-unified-api-elastic

maxhniebergall

LGTM

…/elasticsearch into ml-inference-unified-api-elastic

…ence-unified-api-elastic

jonathan-buttner · 2024-12-06T13:59:11Z

...search/xpack/inference/external/request/openai/OpenAiUnifiedChatCompletionRequestEntity.java

+
+        builder.field(MODEL_FIELD, model.getServiceSettings().modelId());
+        if (unifiedRequest.maxCompletionTokens() != null) {
+            builder.field(MAX_COMPLETION_TOKENS_FIELD, unifiedRequest.maxCompletionTokens());


I just realized that the OpenAiChatCompletionServiceSettings has a similar field that isn't used (even previous to this PR). I wonder if we should sync those fields up like we do the modelId 🤔 . I think we've typically used the max tokens to do the truncation for text embedding so I don't think we really used it for completions.

@maxhniebergall what do you think?

hmm, yea it sounds like we should use the value from the service settings if its available

davidkyle

LGTM

davidkyle · 2024-12-06T13:36:14Z

server/src/main/java/org/elasticsearch/inference/UnifiedCompletionRequest.java

+        public void toXContent(XContentBuilder builder, ToXContent.Params params) throws IOException {
+            builder.value(content);
+        }


Is function required? The class does not declare that is implements toXContent

Oops, yeah we can remove that. Thanks.

...ore/src/main/java/org/elasticsearch/xpack/core/inference/action/UnifiedCompletionAction.java

davidkyle · 2024-12-06T14:31:38Z

...ore/src/main/java/org/elasticsearch/xpack/core/inference/action/UnifiedCompletionAction.java

+                return e;
+            }
+
+            if (taskType != TaskType.COMPLETION) {


Suggested change

if (taskType != TaskType.COMPLETION) {

if (taskType.isAnyOrSame(TaskType.COMPLETION)) {

For the case where tasktype is not set in the URL and defaulted to ANY

Ah good catch!

davidkyle · 2024-12-06T15:55:11Z

.../java/org/elasticsearch/xpack/inference/external/openai/OpenAiUnifiedStreamingProcessor.java

+    private Deque<StreamingUnifiedChatCompletionResults.ChatCompletionChunk> singleItem(
+        StreamingUnifiedChatCompletionResults.ChatCompletionChunk result
+    ) {
+        var deque = new ArrayDeque<StreamingUnifiedChatCompletionResults.ChatCompletionChunk>(2);


Why size 2 and not 1?

I'm not sure, it was in the code Pat sent me for this. I also thought it was odd prwhelan@4c573ba

davidkyle · 2024-12-06T15:56:08Z

...search/xpack/inference/external/request/openai/OpenAiUnifiedChatCompletionRequestEntity.java

+            for (UnifiedCompletionRequest.Message message : unifiedRequest.messages()) {
+                builder.startObject();
+                {
+                    switch (message.content()) {


davidkyle · 2024-12-06T16:13:52Z

.../java/org/elasticsearch/xpack/inference/external/openai/OpenAiUnifiedStreamingProcessor.java

+    private final Deque<StreamingUnifiedChatCompletionResults.ChatCompletionChunk> buffer = new LinkedBlockingDeque<>();
+
+    @Override
+    protected void onRequest(long n) {


It took me a while to grok that onRequest is not part of the Flow interface. Maybe call this upstreamRequest not to confuse it with the various Flow on* methods

Yep sounds good. I still don't quite understand what all that stuff is doing haha. I'll have Pat give an overview when he gets back maybe.

…ence-unified-api-elastic

…/elasticsearch into ml-inference-unified-api-elastic

maxhniebergall · 2024-12-11T20:48:16Z

💚 All backports created successfully

Status	Branch	Result
✅	8.x

Questions ?

Please refer to the Backport tool documentation

* Adding some shell classes * modeling the request objects * Writeable changes to schema * Working parsing tests * Creating a new action * Add outbound request writing (WIP) * Improvements to request serialization * Adding separate transport classes * separate out unified request and combine inputs * Reworking unified inputs * Adding unsupported operation calls * Fixing parsing logic * get the build working * Update docs/changelog/117589.yaml * Fixing injection issue * Allowing model to be overridden but not working yet * Fixing issues * Switch field name for tool * Add suport for toolCalls and refusal in streaming completion * Working tool call response * Separate unified and legacy code paths * Updated the parser, but there are some class cast exceptions to fix * Refactoring tests and request entities * Parse response from OpenAI * Removing unused request classes * precommit * Adding tests for UnifiedCompletionAction Request * Refactoring stop to be a list of strings * Testing for OpenAI response parsing * Refactoring transport action tests to test unified validation code * Fixing various tests * Fixing license header * Reformat streaming results * Finalize response format * remove debug logs * remove changes for debugging * Task type and base inference action tests * Adding openai service tests * Adding model tests * tests for StreamingUnifiedChatCompletionResultsTests toXContentChunked * Fixing change log and removing commented out code * Switch usage to accept null * Adding test for TestStreamingCompletionServiceExtension * Avoid serializing empty lists + request entity tests * Register named writeables from UnifiedCompletionRequest * Removing commented code * Clean up and add more of an explination * remove duplicate test * remove old todos * Refactoring some duplication * Adding javadoc * Addressing feedback --------- Co-authored-by: Jonathan Buttner <jonathan.buttner@elastic.co> Co-authored-by: Jonathan Buttner <56361221+jonathan-buttner@users.noreply.github.com> (cherry picked from commit 467fdb8) # Conflicts: # x-pack/plugin/inference/qa/inference-service-tests/src/javaRestTest/java/org/elasticsearch/xpack/inference/InferenceCrudIT.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/action/TransportInferenceAction.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/common/DelegatingProcessor.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/action/TransportInferenceActionTests.java

maxhniebergall · 2024-12-16T14:22:27Z

💚 All backports created successfully

Status	Branch	Result
✅	8.x

Questions ?

Please refer to the Backport tool documentation

* Adding some shell classes * modeling the request objects * Writeable changes to schema * Working parsing tests * Creating a new action * Add outbound request writing (WIP) * Improvements to request serialization * Adding separate transport classes * separate out unified request and combine inputs * Reworking unified inputs * Adding unsupported operation calls * Fixing parsing logic * get the build working * Update docs/changelog/117589.yaml * Fixing injection issue * Allowing model to be overridden but not working yet * Fixing issues * Switch field name for tool * Add suport for toolCalls and refusal in streaming completion * Working tool call response * Separate unified and legacy code paths * Updated the parser, but there are some class cast exceptions to fix * Refactoring tests and request entities * Parse response from OpenAI * Removing unused request classes * precommit * Adding tests for UnifiedCompletionAction Request * Refactoring stop to be a list of strings * Testing for OpenAI response parsing * Refactoring transport action tests to test unified validation code * Fixing various tests * Fixing license header * Reformat streaming results * Finalize response format * remove debug logs * remove changes for debugging * Task type and base inference action tests * Adding openai service tests * Adding model tests * tests for StreamingUnifiedChatCompletionResultsTests toXContentChunked * Fixing change log and removing commented out code * Switch usage to accept null * Adding test for TestStreamingCompletionServiceExtension * Avoid serializing empty lists + request entity tests * Register named writeables from UnifiedCompletionRequest * Removing commented code * Clean up and add more of an explination * remove duplicate test * remove old todos * Refactoring some duplication * Adding javadoc * Addressing feedback --------- Co-authored-by: Jonathan Buttner <jonathan.buttner@elastic.co> Co-authored-by: Jonathan Buttner <56361221+jonathan-buttner@users.noreply.github.com> (cherry picked from commit 467fdb8) # Conflicts: # x-pack/plugin/inference/qa/inference-service-tests/src/javaRestTest/java/org/elasticsearch/xpack/inference/InferenceCrudIT.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/action/TransportInferenceAction.java

…118772) * [Inference API] Add unified api for chat completions (#117589) * Adding some shell classes * modeling the request objects * Writeable changes to schema * Working parsing tests * Creating a new action * Add outbound request writing (WIP) * Improvements to request serialization * Adding separate transport classes * separate out unified request and combine inputs * Reworking unified inputs * Adding unsupported operation calls * Fixing parsing logic * get the build working * Update docs/changelog/117589.yaml * Fixing injection issue * Allowing model to be overridden but not working yet * Fixing issues * Switch field name for tool * Add suport for toolCalls and refusal in streaming completion * Working tool call response * Separate unified and legacy code paths * Updated the parser, but there are some class cast exceptions to fix * Refactoring tests and request entities * Parse response from OpenAI * Removing unused request classes * precommit * Adding tests for UnifiedCompletionAction Request * Refactoring stop to be a list of strings * Testing for OpenAI response parsing * Refactoring transport action tests to test unified validation code * Fixing various tests * Fixing license header * Reformat streaming results * Finalize response format * remove debug logs * remove changes for debugging * Task type and base inference action tests * Adding openai service tests * Adding model tests * tests for StreamingUnifiedChatCompletionResultsTests toXContentChunked * Fixing change log and removing commented out code * Switch usage to accept null * Adding test for TestStreamingCompletionServiceExtension * Avoid serializing empty lists + request entity tests * Register named writeables from UnifiedCompletionRequest * Removing commented code * Clean up and add more of an explination * remove duplicate test * remove old todos * Refactoring some duplication * Adding javadoc * Addressing feedback --------- Co-authored-by: Jonathan Buttner <jonathan.buttner@elastic.co> Co-authored-by: Jonathan Buttner <56361221+jonathan-buttner@users.noreply.github.com> (cherry picked from commit 467fdb8) # Conflicts: # x-pack/plugin/inference/qa/inference-service-tests/src/javaRestTest/java/org/elasticsearch/xpack/inference/InferenceCrudIT.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/action/TransportInferenceAction.java * fix merge conflicts * formatting * Remove tests - retain feature flag * [CI] Auto commit changes from spotless --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>

jonathan-buttner and others added 14 commits November 20, 2024 17:24

Adding some shell classes

bf507e7

modeling the request objects

705aa42

Writeable changes to schema

bd5df97

Working parsing tests

bd59543

Creating a new action

1e30c6d

Add outbound request writing (WIP)

2846942

Improvements to request serialization

9cb401c

Adding separate transport classes

1e0eb20

separate out unified request and combine inputs

d6cc223

Merge branch 'ml-inference-unified-api-elastic' of github.com:elastic…

7986c81

…/elasticsearch into ml-inference-unified-api-elastic

Reworking unified inputs

bf817d0

Adding unsupported operation calls

81a05b7

Fixing parsing logic

cb440e1

get the build working

86d477e

maxhniebergall added >enhancement :ml Machine learning v9.0.0 v8.18.0 labels Nov 26, 2024

elasticsearchmachine added the Team:ML Meta label for the ML team label Nov 26, 2024

Update docs/changelog/117589.yaml

359d305

Merge branch 'main' of github.com:elastic/elasticsearch into ml-infer…

4070231

…ence-unified-api-elastic

jonathan-buttner added 3 commits November 26, 2024 15:17

Merge branch 'ml-inference-unified-api-elastic' of github.com:elastic…

ce57bea

…/elasticsearch into ml-inference-unified-api-elastic

Fixing injection issue

834676d

Allowing model to be overridden but not working yet

5909a7d

maxhniebergall commented Nov 27, 2024

View reviewed changes

server/src/main/java/org/elasticsearch/inference/UnifiedCompletionRequest.java Outdated Show resolved Hide resolved

jonathan-buttner added 2 commits November 27, 2024 10:02

Fixing issues

315be2c

Switch field name for tool

657561e

Merge branch 'ml-inference-unified-api-elastic' of github.com:elastic…

e2ed5cc

…/elasticsearch into ml-inference-unified-api-elastic

maxhniebergall commented Dec 5, 2024

View reviewed changes

jonathan-buttner added 3 commits December 5, 2024 15:22

Refactoring some duplication

8f22f56

Adding javadoc

a9b44b5

Merge branch 'ml-inference-unified-api-elastic' of github.com:elastic…

fc173ff

…/elasticsearch into ml-inference-unified-api-elastic

jonathan-buttner requested a review from davidkyle December 5, 2024 20:37

jonathan-buttner and others added 2 commits December 5, 2024 15:40

Merge branch 'main' of github.com:elastic/elasticsearch into ml-infer…

4c2573e

…ence-unified-api-elastic

Merge branch 'main' into ml-inference-unified-api-elastic

e1decca

jonathan-buttner reviewed Dec 6, 2024

View reviewed changes

davidkyle approved these changes Dec 6, 2024

View reviewed changes

jonathan-buttner added 3 commits December 6, 2024 14:09

Addressing feedback

3c4428f

Merge branch 'main' of github.com:elastic/elasticsearch into ml-infer…

b16008f

…ence-unified-api-elastic

Merge branch 'ml-inference-unified-api-elastic' of github.com:elastic…

481aa90

…/elasticsearch into ml-inference-unified-api-elastic

jonathan-buttner enabled auto-merge (squash) December 6, 2024 19:16

Removing unused import

7fc36ce

jonathan-buttner merged commit 467fdb8 into main Dec 6, 2024

jonathan-buttner deleted the ml-inference-unified-api-elastic branch December 6, 2024 20:52

maxhniebergall mentioned this pull request Dec 11, 2024

[8.x] [Inference API] Add unified api for chat completions (#117589) #118506

Closed

maxhniebergall mentioned this pull request Dec 16, 2024

[8.x] [Inference API] Add unified api for chat completions (#117589) #118772

Merged

jonathan-buttner mentioned this pull request Dec 16, 2024

Add POST _unified for the inference API elastic/elasticsearch-specification#3313

Merged

YulNaumenko mentioned this pull request Jan 17, 2025

[Epic] Enabling inference AI Connector as a default experience for all Kibana GenAI functionality elastic/kibana#207140

Closed

7 tasks

pquentin mentioned this pull request Jan 20, 2025

Add rest-api-spec for unified inference API #120447

Merged

jonathan-buttner mentioned this pull request Feb 5, 2025

[REQUEST]: Inference API removing references to the _unified URL suffix elastic/docs-content#339

Closed

alvarezmelissa87 mentioned this pull request Apr 2, 2025

[ML][AI Connector] Add support for unified completion spec elastic/kibana#216942

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Inference API] Add unified api for chat completions#117589

[Inference API] Add unified api for chat completions#117589
jonathan-buttner merged 75 commits intomainfrom
ml-inference-unified-api-elastic

maxhniebergall commented Nov 26, 2024 •

edited by jonathan-buttner

Loading

elasticsearchmachine commented Nov 26, 2024

elasticsearchmachine commented Nov 26, 2024

maxhniebergall commented Nov 26, 2024 •

edited by jonathan-buttner

Loading

Uh oh!

maxhniebergall left a comment

jonathan-buttner Dec 6, 2024

maxhniebergall Dec 6, 2024

davidkyle left a comment

davidkyle Dec 6, 2024

jonathan-buttner Dec 6, 2024

Uh oh!

davidkyle Dec 6, 2024

jonathan-buttner Dec 6, 2024

davidkyle Dec 6, 2024

maxhniebergall Dec 6, 2024

davidkyle Dec 6, 2024

davidkyle Dec 6, 2024

jonathan-buttner Dec 6, 2024

maxhniebergall commented Dec 11, 2024

maxhniebergall commented Dec 16, 2024

Labels

4 participants

	if (taskType != TaskType.COMPLETION) {
	if (taskType.isAnyOrSame(TaskType.COMPLETION)) {

Conversation

maxhniebergall commented Nov 26, 2024 • edited by jonathan-buttner Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing

Running ES

Creating endpoint and sending requestions

Response format

elasticsearchmachine commented Nov 26, 2024

elasticsearchmachine commented Nov 26, 2024

maxhniebergall commented Nov 26, 2024 • edited by jonathan-buttner Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maxhniebergall left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidkyle left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maxhniebergall commented Dec 11, 2024

💚 All backports created successfully

Questions ?

maxhniebergall commented Dec 16, 2024

💚 All backports created successfully

Questions ?

Labels

4 participants

maxhniebergall commented Nov 26, 2024 •

edited by jonathan-buttner

Loading

maxhniebergall commented Nov 26, 2024 •

edited by jonathan-buttner

Loading