ESQL - Add byte element support for dense_vector data type#131863
Conversation
…vector-byte-element-support # Conflicts: # x-pack/plugin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/DenseVectorFieldTypeIT.java
| private final FloatVectorValues floatVectorValues; | ||
| private final KnnVectorValues.DocIndexIterator iterator; | ||
| private final int dimensions; | ||
| private abstract static class DenseVectorValuesBlockReader<T extends KnnVectorValues> extends BlockDocValuesReader { |
There was a problem hiding this comment.
Added abstract classes to deal with common code between float and byte vector reading
| * @param vectorBR - dense vector encoded in BytesRef | ||
| * @param vector - array of bytes where the decoded vector should be stored | ||
| */ | ||
| public static void decodeDenseVector(IndexVersion indexVersion, BytesRef vectorBR, byte[] vector) { |
There was a problem hiding this comment.
Needed a specific method for decoding dense vector of byte values - this is an adaptation of the existing float[] method (expand up to see it)
There was a problem hiding this comment.
A copy of the float vector tests, using the specific byte field
|
|
||
| FROM dense_vector | ||
| | KEEP id, vector | ||
| | KEEP id, float_vector |
There was a problem hiding this comment.
Renamed fields to avoid confusion to float_vector, byte_vector
| // Indexed field types | ||
| for (String indexType : DENSE_VECTOR_INDEX_TYPES) { | ||
| params.add(new Object[] { indexType, true, false }); | ||
| for (String indexType : ALL_DENSE_VECTOR_INDEX_TYPES) { |
There was a problem hiding this comment.
Check all types for float and bytes. Quantized types only make sense for floats.
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
ioanatia
left a comment
There was a problem hiding this comment.
the approach makes sense to me, but there are some failures in KnnFunctionIT that we should look into
| /** | ||
| * Byte elements dense vector field type support. | ||
| */ | ||
| DENSE_VECTOR_FIELD_TYPE_BYTE_ELEMENTS(EsqlCorePlugin.DENSE_VECTOR_FEATURE_FLAG); |
There was a problem hiding this comment.
this make sense to me for now - when we get to take these features out of snapshot, maybe we can create single capability to encompass all the features, from dense_vector support to the knn function and brute force functions and we can remove the individual ones we have created for each step.
x-pack/plugin/esql/qa/testFixtures/src/main/resources/data/dense_vector.csv
Show resolved
Hide resolved
tteofili
left a comment
There was a problem hiding this comment.
LGTM, great job Carlos!
…r-support-normalization' into non-issue/esql-dense-vector-support-normalization # Conflicts: # x-pack/plugin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/DenseVectorFieldTypeIT.java
… non-issue/esql-dense-vector-byte-element-support # Conflicts: # server/src/main/java/org/elasticsearch/index/mapper/BlockDocValuesReader.java # x-pack/plugin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/DenseVectorFieldTypeIT.java # x-pack/plugin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/plugin/KnnFunctionIT.java
…r-byte-element-support' into non-issue/esql-dense-vector-byte-element-support # Conflicts: # x-pack/plugin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/DenseVectorFieldTypeIT.java
…r-byte-element-support' into non-issue/esql-dense-vector-byte-element-support
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
ChrisHegarty
left a comment
There was a problem hiding this comment.
This seems like a good first step. LGTM
|
Merging as CI failures are unrelated |
* upstream/main: (278 commits) ESQL - dense vector support cosine normalization (elastic#132721) [ML] Add support for dimensions in google vertex ai request (elastic#132689) ESQL - Add byte element support for dense_vector data type (elastic#131863) ESQL: Fix async operator warnings not always sent when blocking (elastic#132744) Method not needed anymore (elastic#132912) [Test] Excercise shutdown more reliably in snapshot stress IT (elastic#132909) Update Gradle shadow plugin to 9.0.1 (elastic#132637) Mute org.elasticsearch.test.rest.yaml.CcsCommonYamlTestSuiteIT test {p0=search/410_named_queries/named_queries_with_score} elastic#132906 Update docker.elastic.co/wolfi/chainguard-base-fips:latest Docker digest to fa6cb69 (elastic#132735) Remove unnecessary calls to fold() (elastic#131870) Use consistent terminology for transport version resources/references (elastic#132882) Mute org.elasticsearch.test.rest.yaml.CcsCommonYamlTestSuiteIT test {p0=search.vectors/40_knn_search_cosine/kNN search only regular query} elastic#132890 Finalize release notes for v9.1.2 release (elastic#132745) Finalize release notes for v9.0.5 release (elastic#132718) Move inner records out of TransportVersionUtils (elastic#132872) Add support for Lookup Join on Multiple Fields (elastic#131559) Bootstrap PR-based benchmarks (elastic#132717) Refactor MetadataIndexTemplateService to use template maps instead of project metadata (elastic#132662) [Gradle] Update nebula ospackage plugin to 12.1.0 (elastic#132640) Mute org.elasticsearch.xpack.esql.CsvTests test {csv-spec:ip.CdirMatchEqualsInsOrs} elastic#132860 ...
Adds code and tests for supporting byte dense_vectors.
Internally, byte vectors get widened to floats. This is inefficient, but avoids having to create specific infrastructure for variable type blocks. This can be improved in the future.