Add Google Model Garden's Anthropic support to Inference Plugin#134080
Conversation
…ntegration # Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/googlevertexai/action/GoogleVertexAiActionCreator.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/googlevertexai/request/completion/GoogleVertexAiUnifiedChatCompletionRequest.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/googlevertexai/action/GoogleVertexAiUnifiedChatCompletionActionTests.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/googlevertexai/completion/GoogleVertexAiChatCompletionModelTests.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/googlevertexai/request/completion/GoogleVertexAiUnifiedChatCompletionRequestTests.java
…ntegration # Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java
…al parameters based on transport version
…ntegration # Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java
… to support new content block types and improve parsing logic
… parser and add unit tests for response validation
…ate response parsing and error handling
…ity to validate serialization of user fields
…n model and update related tests
…ntegration # Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java
|
Hello @jonathan-buttner @dan-rubinstein |
| public static final TransportVersion ESQL_DOCUMENTS_FOUND_AND_VALUES_LOADED_8_19 = def(8_841_0_61); | ||
| public static final TransportVersion ESQL_PROFILE_INCLUDE_PLAN_8_19 = def(8_841_0_62); | ||
| public static final TransportVersion INITIAL_ELASTICSEARCH_8_19_4 = def(8_841_0_68); | ||
| public static final TransportVersion ML_INFERENCE_GOOGLE_MODEL_GARDEN_ADDED_8_19 = def(8_841_0_69); |
There was a problem hiding this comment.
Let me know if this needs to be removed. I haven't seen backports in a while. But Google Vertex AI is there for quite some time, so probably we'd require one.
There was a problem hiding this comment.
Let's remove this, we won't be backporting the changes.
There was a problem hiding this comment.
Removed.
…ntegration # Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java
…ntegration # Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/googlevertexai/completion/GoogleVertexAiChatCompletionServiceSettings.java
|
@jonathan-buttner your comments are addressed. Could you please take a look at the PR once more? |
…ntegration # Conflicts: # server/src/main/resources/transport/upper_bounds/9.2.csv
jonathan-buttner
left a comment
There was a problem hiding this comment.
Looking good, couple more changes
| } | ||
|
|
||
| public GoogleVertexAiChatCompletionTaskSettings(StreamInput in) throws IOException { | ||
| thinkingConfig = new ThinkingConfig(in); | ||
| TransportVersion version = in.getTransportVersion(); | ||
| if (GoogleVertexAiUtils.supportsModelGarden(version)) { | ||
| maxTokens = Objects.requireNonNullElse(in.readOptionalInt(), DEFAULT_MAX_TOKENS); | ||
| maxTokens = in.readOptionalInt(); |
There was a problem hiding this comment.
Can we use readOptionalVInt?
There was a problem hiding this comment.
Good thinking. Done.
| @@ -124,7 +124,9 @@ public TransportVersion getMinimalSupportedVersion() { | |||
| @Override | |||
| public void writeTo(StreamOutput out) throws IOException { | |||
| thinkingConfig.writeTo(out); | |||
| out.writeOptionalInt(maxTokens); | |||
| if (GoogleVertexAiUtils.supportsModelGarden(out.getTransportVersion())) { | |||
| out.writeOptionalInt(maxTokens); | |||
There was a problem hiding this comment.
Let's use writeOptionalVInt
There was a problem hiding this comment.
Done.
| delta = new StreamingUnifiedChatCompletionResults.ChatCompletionChunk.Choice.Delta( | ||
| null, | ||
| null, | ||
| null, | ||
| List.of( | ||
| new StreamingUnifiedChatCompletionResults.ChatCompletionChunk.Choice.Delta.ToolCall( | ||
| 0, | ||
| id, | ||
| new StreamingUnifiedChatCompletionResults.ChatCompletionChunk.Choice.Delta.ToolCall.Function( | ||
| input != null ? input.toString() : null, | ||
| name | ||
| ), | ||
| null | ||
| ) | ||
| ) | ||
| ); |
There was a problem hiding this comment.
For readability, this might be better as:
var function = new StreamingUnifiedChatCompletionResults.ChatCompletionChunk.Choice.Delta.ToolCall.Function(
input != null ? input.toString() : null,
name
);
var toolCall = new StreamingUnifiedChatCompletionResults.ChatCompletionChunk.Choice.Delta.ToolCall(0, id, function, null);
delta = new StreamingUnifiedChatCompletionResults.ChatCompletionChunk.Choice.Delta(null, null, null, List.of(toolCall));
Similar changes can be made in the parseContentBlockDelta() method.
There was a problem hiding this comment.
Thanks. Done!
…ntegration # Conflicts: # server/src/main/resources/transport/upper_bounds/9.2.csv
…CompletionStreamingProcessor readability
…ntegration # Conflicts: # server/src/main/resources/transport/upper_bounds/9.2.csv
|
Your comments are addressed. Could you please review the fixes? |
jonathan-buttner
left a comment
There was a problem hiding this comment.
Thanks for the changes!
Create Completion EndpointNo Provider No URLs: Google Provider With URLs: Google Provider No URLs: No URLs: Both URLs: Only Non-Streaming URL: Only Streaming URL: No Task Parameters: Not Found: Perform Non-Streaming CompletionNon-Streaming Both URLs Non-Streaming Only Non-Streaming URL Non-Streaming Only Streaming URL Non-Streaming Without Task Settings Perform Streaming CompletionStreaming Both URLs Streaming Only Non-Streaming URL Streaming Only Streaming URL Streaming Without Task Settings |
Create Chat Completion EndpointNo Provider No URLs: Google Provider With URLs: Google Provider No URLs: No URLs: Both URLs: Only Non-Streaming URL: Only Streaming URL: No Task Parameters: Not Found: Testing of Performing Streaming Chat Completion is done and it is confirmed to be successful. |
Perform Chat CompletionBoth URLs Both URLs With Max Tokens in RQ Only Non-Streaming URL Only Non-Streaming URL With Max Tokens in RQ Only Streaming URL Only Streaming URL With Max Tokens in RQ Both URLs No task settings on creation Both URLs No task settings on creation With Max Tokens in RQ |
|
Regression Tests for Google Vertex AI. Create Completion endpointSuccess No model_id Perform Non-Streaming CompletionPerform Streaming CompletionCreate Chat Completion endpointPerform Chat Completion |
|
@jonathan-buttner |
Update of the existing Google Vertex AI inference provider integration allowing performing completion (both streaming and non-streaming) and chat_completion (only streaming) of Anthropic provider models withing Google Model Garden.
Changes were tested locally against next anthropic models:
Create Completion Endpoint
Success:
With max_tokens in task settings:
Unknown Provider:
No Provider + No Google Vertex AI parameters:
No URL + No Streaming URL + No Google Vertex AI parameters:
URL + No Streaming URL (URL is default for both streaming/non-streaming):
No URL + Streaming URL (Streaming URL is default for both streaming/non-streaming):
Not Found:
Perform Completion
Success Non Streaming:
Success Streaming:
Success Non Streaming with task_settings max_tokens:
Success Streaming with task_settings max_tokens:
Create Chat Completion Endpoint
Success:
Success with task_settings max_tokens:
Unknown Provider:
No url/streaming_url:
Not found:
No streaming_url (url is default for both streaming/non-streaming):
No url (steraming_url is default for both streaming/non-streaming):
Perform Chat Completion
Basic:
Complex