Adding base64-encoded kNN query vectors#140796
Merged
ah89 merged 68 commits intoelastic:mainfrom Feb 17, 2026
Merged
Conversation
Allow kNN query vectors as base64 strings as well as JSON arrays, easing use for clients that generate binary vectors. Update parsing, validation, and serialization; add errors for invalid base64 and dimension mismatches. Extend tests and REST API coverage for parity between both inputs. Relates to elastic#138190
Collaborator
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
Skips the test on versions prior to 9.4.0 to prevent failures, since query_vector_base64 is only supported from 9.4.0 onwards. Relates to elastic#138190
Introduce a dedicated feature flag and transport version to gate base64-encoded kNN query vector support. This ensures backward compatibility in mixed-version clusters and makes feature behavior explicit in serialization and tests. Relates to elastic#138190
benwtrent
reviewed
Jan 16, 2026
Member
benwtrent
left a comment
There was a problem hiding this comment.
this is an ok start, but way too much complexity for the user.
...bution/tools/server-cli/src/main/java/org/elasticsearch/server/cli/ServerProcessBuilder.java
Outdated
Show resolved
Hide resolved
benchmarks/src/main/java/org/elasticsearch/benchmark/xcontent/KnnQueryVectorParseBenchmark.java
Outdated
Show resolved
Hide resolved
...rc/yamlRestTest/resources/rest-api-spec/test/search.vectors/172_knn_query_base64_vectors.yml
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/search/vectors/KnnSearchBuilder.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/search/vectors/KnnSearchBuilder.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/search/vectors/KnnVectorQueryBuilder.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/search/vectors/KnnVectorQueryBuilder.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/search/vectors/KnnVectorQueryBuilder.java
Outdated
Show resolved
Hide resolved
...er/src/test/java/org/elasticsearch/search/vectors/AbstractKnnVectorQueryBuilderTestCase.java
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/search/vectors/KnnSearchBuilderTests.java
Show resolved
Hide resolved
Update test cases and related configuration to use the correct base64 representation for float vectors in kNN searches. Ensure consistency between documented vector encoding and its usage in tests and transport definitions to prevent encoding mismatches and improve reliability. Relates to elastic#138190
...rc/yamlRestTest/resources/rest-api-spec/test/search.vectors/172_knn_query_base64_vectors.yml
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/search/SearchFeatures.java
Outdated
Show resolved
Hide resolved
server/src/internalClusterTest/java/org/elasticsearch/search/KnnSearchIT.java
Outdated
Show resolved
Hide resolved
...er/src/test/java/org/elasticsearch/search/vectors/AbstractKnnVectorQueryBuilderTestCase.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/search/vectors/KnnSearchBuilder.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/search/vectors/KnnSearchRequestParser.java
Show resolved
Hide resolved
Remove the separate query_vector_base64 field and centralize base64/hex parsing and validation in VectorData. - Parse VALUE_STRING as hex (if all-hex) or base64 in VectorData .parseXContent. - Add String-only VectorData constructor and fromBase64 factory method. - Decode base64 as BIG_ENDIAN floats/bytes and validate dimensions/bounds. - Gate transport serialization with QUERY_VECTOR_BASE64 transport version. - Add SearchCapabilities entry and register referable/upper-bounds defs. - Rewire KnnSearchBuilder/RequestParser/QueryBuilder to use query_vector only. - Throw on unsupported transport versions instead of silently nulling values. - Update unit, internal-cluster and YAML REST tests to match new behavior. - Fix rolling-upgrade/BWC failures by transport-gating and test adjustments. Ensure hex strings still parse to byte vectors and preserve backwards compatibility via transport version checks. Relates to elastic#138190
Increments the version value to reflect updated or new KNN query vectors, ensuring downstream components use the latest configuration.
benwtrent
reviewed
Feb 3, 2026
server/src/main/java/org/elasticsearch/search/vectors/VectorData.java
Outdated
Show resolved
Hide resolved
…ta.java Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
Refactors equality logic for clarity and correctness by restructuring instance checks. Enhances error messages for invalid encoded vector. Simplifies base64 length validation and removes redundant helper methods for maintainability.
server/src/main/java/org/elasticsearch/index/mapper/vectors/DenseVectorFieldMapper.java
Show resolved
Hide resolved
benwtrent
reviewed
Feb 4, 2026
...est/java/org/elasticsearch/search/diversification/DiversifyRetrieverBuilderParsingTests.java
Outdated
Show resolved
Hide resolved
...est/java/org/elasticsearch/search/diversification/DiversifyRetrieverBuilderParsingTests.java
Outdated
Show resolved
Hide resolved
...er/src/test/java/org/elasticsearch/search/vectors/AbstractKnnVectorQueryBuilderTestCase.java
Outdated
Show resolved
Hide resolved
...er/src/test/java/org/elasticsearch/search/vectors/AbstractKnnVectorQueryBuilderTestCase.java
Outdated
Show resolved
Hide resolved
benwtrent
reviewed
Feb 4, 2026
...er/src/test/java/org/elasticsearch/search/vectors/AbstractKnnVectorQueryBuilderTestCase.java
Outdated
Show resolved
Hide resolved
Refactors vector equality to use a canonical string representation, improving consistency for byte and string vectors. Removes special handling and test logic for base64-encoded vectors across tests, streamlining test code. Adds explicit error for dimension mismatches in query vector decoding to improve error reporting.
benwtrent
reviewed
Feb 6, 2026
Switches serialization of byte vectors from hex strings to arrays for type stability and clarity. Refines equality and hash code implementations to directly compare vector contents rather than canonical strings. Updates tests and related data to support consistent round-trip handling of vectors. Relates to elastic#138190
Increases the vector parameter to enhance query results or accommodate new requirements for similarity search.
benwtrent
approved these changes
Feb 13, 2026
Member
benwtrent
left a comment
There was a problem hiding this comment.
I think we are there. @mayya-sharipova what do you think?
mayya-sharipova
approved these changes
Feb 17, 2026
Contributor
mayya-sharipova
left a comment
There was a problem hiding this comment.
Thanks for iterating and persisting on this PR, @ah89!
This LGTM!
Contributor
Author
|
@benwtrent @mayya-sharipova Thanks for taking the time to review the PR and for the constructive feedback—really appreciate it. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Allow kNN query vectors as base64 strings as well as JSON arrays, easing
use for clients that generate binary vectors. Update parsing,
validation, and serialization; add errors for invalid base64 and
dimension mismatches. Extend tests and REST API coverage for parity
between both inputs.
Relates to #138190