Conversation
|
Hi @Mikep86, I've created a changelog YAML for you. |
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
|
Pinging @elastic/search-eng (Team:SearchOrg) |
|
Pinging @elastic/search-relevance (Team:Search - Relevance) |
server/src/main/java/org/elasticsearch/search/retriever/CompoundRetrieverBuilder.java
Outdated
Show resolved
Hide resolved
...lugin/rank-rrf/src/main/java/org/elasticsearch/xpack/rank/linear/LinearRetrieverBuilder.java
Show resolved
Hide resolved
...lugin/rank-rrf/src/main/java/org/elasticsearch/xpack/rank/linear/LinearRetrieverBuilder.java
Show resolved
Hide resolved
| RetrieverBuilder rewritten = this; | ||
|
|
||
| ResolvedIndices resolvedIndices = ctx.getResolvedIndices(); | ||
| if (resolvedIndices != null && query != null) { |
There was a problem hiding this comment.
We can still add a global prefilter on the top-level retriever, right? Should we also account for it when generating the expansions or is this something not to be supported?
There was a problem hiding this comment.
That should work as is since rewriting with pre-filters happens after this rewrite logic. I can add a test to confirm.
There was a problem hiding this comment.
A couple of things to follow up on here:
- There are a whole class of bugs in the retriever framework right now caused by incomplete copies during rewrite. We should adopt a copy constructor approach (similar to how queries handle copy generation) to address all of these issues. This is outside the scope of this PR though, so I did the quick fix for now.
- I would have loved to add a unit test that has more access to the retriever structure to verify that the filters are being propagated as pre-filters, but that is difficult to do right now with the current rewrite logic. We rewrite all the way to a
RankDocsRetrieverBuilder, with no stopping cue for when all the pre-search rewrite activities (such as pre-filter propagation) are complete. It would help a lot with testability if we could refactor the rewrite process to add such a stopping cue. In the meantime, I added a YAML test to cover this, but that doesn't give us visibility into whether the filters are applied as actual pre-filters.
...rrf/src/main/java/org/elasticsearch/xpack/rank/simplified/SimplifiedInnerRetrieverUtils.java
Show resolved
Hide resolved
|
Just finished a first pass, seems really nice :) Will we also include doc changes & additional examples in |
...lugin/rank-rrf/src/main/java/org/elasticsearch/xpack/rank/linear/LinearRetrieverBuilder.java
Outdated
Show resolved
Hide resolved
...-rrf/src/yamlRestTest/resources/rest-api-spec/test/linear/20_linear_retriever_simplified.yml
Outdated
Show resolved
Hide resolved
...-rrf/src/yamlRestTest/resources/rest-api-spec/test/linear/20_linear_retriever_simplified.yml
Outdated
Show resolved
Hide resolved
...-rrf/src/yamlRestTest/resources/rest-api-spec/test/linear/20_linear_retriever_simplified.yml
Show resolved
Hide resolved
Yes, all that will come in a follow-up docs-focused PR :) |
jimczi
left a comment
There was a problem hiding this comment.
I like the approach, @Mikep86, it’s less invasive than I expected! 😉
I do have some concerns about the growing number of parameters in the linear retriever that aren't compatible with each other, but that's an inherent trade-off with the decision to add this functionality there, so I’m comfortable with it.
...rrf/src/main/java/org/elasticsearch/xpack/rank/simplified/SimplifiedInnerRetrieverUtils.java
Outdated
Show resolved
Hide resolved
| - match: { hits.hits.1._id: "2" } | ||
| - lte: { hits.hits.1._score: 2.0 } | ||
| - match: { hits.hits.2._id: "1" } | ||
| - lte: { hits.hits.2._score: 2.0 } |
There was a problem hiding this comment.
Can we also add a test with a filter? We need to make sure that the filter is propagated correctly.
There was a problem hiding this comment.
Yep, working on a unit test to verify pre-filter propagation now
There was a problem hiding this comment.
...rf/src/test/java/org/elasticsearch/xpack/rank/linear/LinearRetrieverBuilderParsingTests.java
Show resolved
Hide resolved
| @@ -0,0 +1,5 @@ | |||
| pr: 129200 | |||
| summary: Simplified Linear Retriever | |||
There was a problem hiding this comment.
Does this need to be updated? Probably something like Add simplified syntax and hybrid support to linear retriever.
There was a problem hiding this comment.
I think this description is still succinct and accurate, good as is
💔 Backport failed
You can use sqren/backport to manually backport by running |
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
(cherry picked from commit fc77640) # Conflicts: # x-pack/plugin/inference/build.gradle # x-pack/plugin/rank-rrf/src/main/java/org/elasticsearch/xpack/rank/RankRRFFeatures.java # x-pack/plugin/rank-rrf/src/main/java/org/elasticsearch/xpack/rank/rrf/RRFRankBuilder.java
(cherry picked from commit fc77640) # Conflicts: # x-pack/plugin/inference/build.gradle # x-pack/plugin/rank-rrf/src/main/java/org/elasticsearch/xpack/rank/RankRRFFeatures.java # x-pack/plugin/rank-rrf/src/main/java/org/elasticsearch/xpack/rank/rrf/RRFRankBuilder.java
Adds a simplified syntax for the
linearretriever:fieldsis optional. If it is not provided, we query the fields defined by theindex.query.default_fieldindex setting (which is*by default).This syntax automatically handles querying a mix of lexical fields (i.e. fields that support lexical search via
match) andsemantic_textfields. The fields are divided into lexical and semantic groups to create a 50/50 weight distribution between the two in the final score. This is achieved by creating a retriever tree that looks like:The end result is a score that ranges between 0-2, with up to 1 coming from the lexical matches and up to 1 coming from the semantic matches.
Common logic for generating the retriever tree is in
SimplifiedInnerRetrieverUtils, which will also be used by the simplifiedrrfretriever (see #128633 for a preview of that).