Add support for knn vector queries on semantic_text fields#119011
Add support for knn vector queries on semantic_text fields#119011kderusso merged 22 commits intoelastic:mainfrom
Conversation
|
Hi @kderusso, I've created a changelog YAML for you. |
| */ | ||
|
|
||
| package org.elasticsearch.xpack.ml.vectors; | ||
| package org.elasticsearch.xpack.core.ml.vectors; |
There was a problem hiding this comment.
This had to move from the ml plugin in order to access it in the inference plugin. While I could have moved it to inference, core seemed like a better fit for this standalone class. I did keep the tests in the ml plugin however, because they rely on the ML test runner.
Mikep86
left a comment
There was a problem hiding this comment.
Partial review. Great start to this! As far as edge cases, I think we need some test cases for the following:
- Missing both query vector and query vector builder
- Querying multiple indices that use different inference IDs, particularly when each index has more than
kpotential matches
...e/src/main/java/org/elasticsearch/xpack/core/ml/vectors/TextEmbeddingQueryVectorBuilder.java
Show resolved
Hide resolved
...java/org/elasticsearch/xpack/inference/queries/SemanticKnnVectorQueryRewriteInterceptor.java
Outdated
Show resolved
Hide resolved
...java/org/elasticsearch/xpack/inference/queries/SemanticKnnVectorQueryRewriteInterceptor.java
Show resolved
Hide resolved
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
|
Pinging @elastic/search-eng (Team:SearchOrg) |
|
Pinging @elastic/search-relevance (Team:Search - Relevance) |
carlosdelest
left a comment
There was a problem hiding this comment.
LGTM 💯
Some questions about pulling up some common methods - can be done in follow up work
...e/src/main/java/org/elasticsearch/xpack/core/ml/vectors/TextEmbeddingQueryVectorBuilder.java
Show resolved
Hide resolved
...e/src/main/java/org/elasticsearch/xpack/core/ml/vectors/TextEmbeddingQueryVectorBuilder.java
Outdated
Show resolved
Hide resolved
...java/org/elasticsearch/xpack/inference/queries/SemanticKnnVectorQueryRewriteInterceptor.java
Show resolved
Hide resolved
| import java.io.IOException; | ||
| import java.util.Map; | ||
|
|
||
| public class SemanticKnnVectorQueryRewriteInterceptorTests extends ESTestCase { |
There was a problem hiding this comment.
We may find a common structure for these tests and create a base class for them in follow up work
There was a problem hiding this comment.
Yes, I would like to have this organized better in a followup. Unfortunately, since the interceptor is a QueryBuilder we can't take advantage of built in abstract query builder tests. I was trying to focus more on test functionality first, so we don't back ourselves into a corner. Right now I'm thinking this would be a better followup optimization if the test cases are complete though. WDYT?
There was a problem hiding this comment.
Yes, I'm totally fine with doing that as follow up work 👍
...n/inference/src/yamlRestTest/resources/rest-api-spec/test/inference/47_semantic_text_knn.yml
Outdated
Show resolved
Hide resolved
...n/inference/src/yamlRestTest/resources/rest-api-spec/test/inference/47_semantic_text_knn.yml
Show resolved
Hide resolved
Mikep86
left a comment
There was a problem hiding this comment.
Partial review of everything except for the tests. Good work on this, I have some suggestions about how we can make the intercept logic simpler and more efficient.
...e/src/main/java/org/elasticsearch/xpack/core/ml/vectors/TextEmbeddingQueryVectorBuilder.java
Show resolved
Hide resolved
| createSubQueryForIndices( | ||
| inferenceIdsIndices.get(inferenceId), | ||
| buildNestedQueryFromKnnVectorQuery(queryBuilder, inferenceId) | ||
| ) |
There was a problem hiding this comment.
I think we could update buildNestedQueryFromKnnVectorQuery to take the index list and build the nested knn query that filters on the indices directly in it. This would both simplify the code and be more efficient, as it looks like right now you're effectively using the query builder returned from buildNestedQueryFromKnnVectorQuery as a wrapper for data that you use to build the actual nested query in createSubQueryForIndices.
Another benefit of this approach is that we remove the index post-filtering (via the boolean query), which is no longer necessary since we are pre-filtering on index in the knn query.
There was a problem hiding this comment.
Good call on adding these as a prefilter, however the post filtering can't be removed without breakinng functionality for dense vector combined fields. I think your suggestions helped to clean it up a bit though.
...java/org/elasticsearch/xpack/inference/queries/SemanticKnnVectorQueryRewriteInterceptor.java
Outdated
Show resolved
Hide resolved
...java/org/elasticsearch/xpack/inference/queries/SemanticKnnVectorQueryRewriteInterceptor.java
Show resolved
Hide resolved
...java/org/elasticsearch/xpack/inference/queries/SemanticKnnVectorQueryRewriteInterceptor.java
Outdated
Show resolved
Hide resolved
...java/org/elasticsearch/xpack/inference/queries/SemanticKnnVectorQueryRewriteInterceptor.java
Show resolved
Hide resolved
...java/org/elasticsearch/xpack/inference/queries/SemanticKnnVectorQueryRewriteInterceptor.java
Outdated
Show resolved
Hide resolved
...java/org/elasticsearch/xpack/inference/queries/SemanticKnnVectorQueryRewriteInterceptor.java
Outdated
Show resolved
Hide resolved
...c/test/java/org/elasticsearch/index/query/SemanticKnnVectorQueryRewriteInterceptorTests.java
Outdated
Show resolved
Hide resolved
...c/test/java/org/elasticsearch/index/query/SemanticKnnVectorQueryRewriteInterceptorTests.java
Outdated
Show resolved
Hide resolved
...n/inference/src/yamlRestTest/resources/rest-api-spec/test/inference/47_semantic_text_knn.yml
Outdated
Show resolved
Hide resolved
Mikep86
left a comment
There was a problem hiding this comment.
Thanks for making the changes! There's one more follow-up, I think we can avoid index post-filtering when not using nested queries. Please correct me if I'm wrong though.
...java/org/elasticsearch/xpack/inference/queries/SemanticKnnVectorQueryRewriteInterceptor.java
Outdated
Show resolved
Hide resolved
...java/org/elasticsearch/xpack/inference/queries/SemanticKnnVectorQueryRewriteInterceptor.java
Outdated
Show resolved
Hide resolved
Mikep86
left a comment
There was a problem hiding this comment.
LGTM, thanks for the iterations!
…19011) * First cut at KNN interceptor * Move TextEmbeddingQueryVectorBuilder to xpack-core * Infer model ID * Fix test compilation errors * Update docs/changelog/119011.yaml * Update changelog * Update test * Cleanup * PR feedback * Add yaml test * Cleanup tests * Update test * Minor PR feedback * refactor pre filter indices for knn * PR feedback
…119648) * First cut at KNN interceptor * Move TextEmbeddingQueryVectorBuilder to xpack-core * Infer model ID * Fix test compilation errors * Update docs/changelog/119011.yaml * Update changelog * Update test * Cleanup * PR feedback * Add yaml test * Cleanup tests * Update test * Minor PR feedback * refactor pre filter indices for knn * PR feedback
Supports
knnqueries in the query DSL againstsemantic_textfields. Note that top levelknnsearch using theknnsection is not supported.Example script: