Skip to content

Adding base64-encoded kNN query vectors#140796

Merged
ah89 merged 68 commits intoelastic:mainfrom
ah89:feature/knn-base64-query-vector
Feb 17, 2026
Merged

Adding base64-encoded kNN query vectors#140796
ah89 merged 68 commits intoelastic:mainfrom
ah89:feature/knn-base64-query-vector

Conversation

@ah89
Copy link
Contributor

@ah89 ah89 commented Jan 16, 2026

Allow kNN query vectors as base64 strings as well as JSON arrays, easing
use for clients that generate binary vectors. Update parsing,
validation, and serialization; add errors for invalid base64 and
dimension mismatches. Extend tests and REST API coverage for parity
between both inputs.

Relates to #138190

Allow kNN query vectors as base64 strings as well as JSON arrays, easing
 use for clients that generate binary vectors. Update parsing,
 validation, and serialization; add errors for invalid base64 and
 dimension mismatches. Extend tests and REST API coverage for parity
 between both inputs.

Relates to elastic#138190
@ah89 ah89 added the >feature label Jan 16, 2026
@ah89 ah89 requested a review from a team as a code owner January 16, 2026 02:27
@ah89 ah89 added :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch labels Jan 16, 2026
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

ah89 and others added 5 commits January 15, 2026 20:13
Skips the test on versions prior to 9.4.0 to prevent failures,
since query_vector_base64 is only supported from 9.4.0 onwards.

Relates to elastic#138190
Introduce a dedicated feature flag and transport version to gate
base64-encoded kNN query vector support. This ensures backward
compatibility in mixed-version clusters and makes feature behavior
explicit in serialization and tests.

Relates to elastic#138190
Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an ok start, but way too much complexity for the user.

ah89 and others added 2 commits January 16, 2026 09:42
Update test cases and related configuration to use the correct base64
representation for float vectors in kNN searches. Ensure consistency
between documented vector encoding and its usage in tests and transport
definitions to prevent encoding mismatches and improve reliability.

Relates to elastic#138190
ah89 and others added 4 commits January 20, 2026 06:52
Remove the separate query_vector_base64 field and centralize base64/hex
parsing and validation in VectorData.

- Parse VALUE_STRING as hex (if all-hex) or base64 in VectorData
.parseXContent.
- Add String-only VectorData constructor and fromBase64 factory method.
- Decode base64 as BIG_ENDIAN floats/bytes and validate
dimensions/bounds.
- Gate transport serialization with QUERY_VECTOR_BASE64 transport
version.
- Add SearchCapabilities entry and register referable/upper-bounds defs.
- Rewire KnnSearchBuilder/RequestParser/QueryBuilder to use
query_vector only.
- Throw on unsupported transport versions instead of silently nulling
values.
- Update unit, internal-cluster and YAML REST tests to match new
behavior.
- Fix rolling-upgrade/BWC failures by transport-gating and test
adjustments.

Ensure hex strings still parse to byte vectors and preserve backwards
compatibility via transport version checks.

Relates to elastic#138190
Increments the version value to reflect updated or new KNN query
vectors, ensuring downstream components use the latest configuration.
ah89 and others added 3 commits February 3, 2026 11:33
…ta.java

Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
Refactors equality logic for clarity and correctness by restructuring instance checks.
Enhances error messages for invalid encoded vector.
Simplifies base64 length validation and removes redundant helper methods for maintainability.
@ah89 ah89 requested a review from benwtrent February 4, 2026 07:04
ah89 and others added 3 commits February 4, 2026 15:04
Refactors vector equality to use a canonical string representation,
improving consistency for byte and string vectors. Removes special
handling and test logic for base64-encoded vectors across tests,
streamlining test code. Adds explicit error for dimension mismatches
in query vector decoding to improve error reporting.
ah89 and others added 5 commits February 12, 2026 10:20
Switches serialization of byte vectors from hex strings to arrays for type stability and clarity.
Refines equality and hash code implementations to directly compare vector contents rather than canonical strings.
Updates tests and related data to support consistent round-trip handling of vectors.

Relates to elastic#138190
Increases the vector parameter to enhance query results
or accommodate new requirements for similarity search.
Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are there. @mayya-sharipova what do you think?

Copy link
Contributor

@mayya-sharipova mayya-sharipova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating and persisting on this PR, @ah89!
This LGTM!

@ah89
Copy link
Contributor Author

ah89 commented Feb 17, 2026

@benwtrent @mayya-sharipova Thanks for taking the time to review the PR and for the constructive feedback—really appreciate it.

@ah89 ah89 enabled auto-merge (squash) February 17, 2026 15:48
@ah89 ah89 merged commit 3011ff6 into elastic:main Feb 17, 2026
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>feature :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.4.0

5 participants