Support querying multiple indices with the simplified RRF retriever#134822
Support querying multiple indices with the simplified RRF retriever#134822ioanatia merged 6 commits intoelastic:mainfrom
Conversation
|
Hi @ioanatia, I've created a changelog YAML for you. |
|
Pinging @elastic/search-relevance (Team:Search - Relevance) |
kderusso
left a comment
There was a problem hiding this comment.
Looks great, nice work! 👏
One question on tests.
|
|
||
| - match: { hits.total.value: 6 } | ||
| - length: { hits.hits: 6 } | ||
| - match: { hits.hits.0._id: "4" } |
There was a problem hiding this comment.
Have we tested this with different shard counts, to make sure it won't be flakey in serverless where the default shard count is different or with different randomized shard configurations?
There was a problem hiding this comment.
I have not - but this is just following the same pattern for linear, for which I have not seen any failures.
we are also not matching on the score values, which always has the potential for flakiness.
Mikep86
left a comment
There was a problem hiding this comment.
LGTM! I left one non-blocking comment about a small test change.
| // Non-default rank window size | ||
| retriever = new RRFRetrieverBuilder( | ||
| null, | ||
| List.of("field_1", "field_2", "semantic_field_1", "semantic_field_2"), | ||
| "foo2", | ||
| DEFAULT_RANK_WINDOW_SIZE * 2, | ||
| RRFRetrieverBuilder.DEFAULT_RANK_CONSTANT, | ||
| new float[0] | ||
| ); | ||
| assertMultiIndexMultiFieldsParamsRewrite( | ||
| retriever, | ||
| queryRewriteContext, | ||
| Map.of( | ||
| Map.of("field_1", 1.0f, "field_2", 1.0f), | ||
| List.of(indexName), | ||
| Map.of("field_1", 1.0f, "field_2", 1.0f, "semantic_field_1", 1.0f), | ||
| List.of(anotherIndexName) | ||
| ), | ||
| Map.of( | ||
| new Tuple<>("semantic_field_1", List.of(indexName)), | ||
| 1.0f, | ||
| new Tuple<>("semantic_field_2", List.of(indexName)), | ||
| 1.0f, | ||
| new Tuple<>("semantic_field_2", List.of(anotherIndexName)), | ||
| 1.0f | ||
| ), | ||
| "foo2", | ||
| null | ||
| ); |
There was a problem hiding this comment.
Could we use this to test non-default rank constant as well?
There was a problem hiding this comment.
promise to do it in a follow up - I don't want to keep @mridula-s109 waiting too much on this PR to continue the work on #132680 - this is the second PR that will introduce conflicts for her PR.
* upstream/main: (43 commits) Unmute testAckedIndexing to see if it still fails on main (elastic#134682) Silence time zone ID deprecation warning for JDK 25 due to log4j2 bug. (elastic#134719) Adding a getUnmodifiableSourceAndMetadata() method to IngestDocument (elastic#134816) Mark the create-index-from-source action as publicly available on Serverless (elastic#134953) ESQL: Rename command from INLINESTATS to INLINE STATS (elastic#134827) Document multi index query support for simplified retrievers (elastic#134980) [ML] Fix YAMl test to use correct query parameter type (elastic#134999) [Transform] Wait for PIT to close (elastic#134955) Add XPath to XmlUtils (elastic#134923) Fixing conditional processor mutability bugs (elastic#134936) [Transform] Lower loglevel of 3 transform-related error messages from ERROR to WARN (elastic#134985) Unmute pattern text tests. (elastic#134981) Integrate weights into simplified RRF retriever syntax (elastic#132680) Mute org.elasticsearch.xpack.esql.qa.mixed.MixedClusterEsqlSpecIT test {csv-spec:stats.CountDistinctWithConditions} elastic#134993 Update periodic java-ea build to test java 26 pre-release (elastic#134983) Mute org.elasticsearch.xpack.esql.ccq.MultiClusterSpecIT test {csv-spec:stats.CountDistinctWithConditions} elastic#134984 Fix and unmute testIndexSettingProviderPrivateSetting (elastic#134861) Add missing common cat params (elastic#134870) Support querying multiple indices with the simplified RRF retriever (elastic#134822) Allow including semantic field embeddings in _source (elastic#134717) ...
follows the same approach from #133720 where we add support for querying multiple indices for the simplified linear retriever.
#133720 already does most of it, here we just need to remove the validation check that disallowed querying multiple indices and add tests.
the tests are very similar to #133720, I just needed to adjust them slightly since specifying boosts for fields is disallowed for the simplified RRF retriever.