Support querying multiple indices with the simplified linear retriever by ioanatia · Pull Request #133720 · elastic/elasticsearch

ioanatia · 2025-08-28T14:11:21Z

This only handles the linear retriever, but the same approach can be used for the RRF retriever.
I just haven't removed the restriction for RRF yet, we can do it in a follow up.
It should be as simple as removing the check we have for RRF and adding tests.

And once we do this change for both retrievers, we can adjust the docs too.

When we rewrite the linear retriever, we form two sub-retrievers:

linear
   standard retriever with lexical query, normalizer:<specified_normalizer>
   linear, normalizer:<specified_normalizer>
      standard retriever normalizer:<specified_normalizer>
          match on semantic_text field 1,
      standard retriever normalizer:<specified_normalizer>
          match on semantic_text field 2
      ...

The lexical query uses multi match at the moment.
For the semantic part, we group by (inferenceId, field_name) and issue a match query for each group.
For each group, if the semantic_text field does not exist in all queried indices, we add an additional filter on the index names.

Some examples:

Let's assume we have index A and index B and both have the same semantic text fields using the same inference IDs: semantic_field_1 and semantic_field_2.
And that we issue a query to query the lexical_field_*, semantic_field_1, semantic_field_2.

The linear retriever will be rewritten to:

linear
  // Lexical group
  standard retriever using multi-match on `lexical_field_*` normalizer:minmax
    multi
  // Semantic group
  linear normalizer:minmax
    standard { match on semantic_field_1}, weight:1 normalizer:minmax
    standard { match on semantic_field_2}, weight:1 normalizer:minmax

Let's now assume that indexB has an extra semantic_field_3 and that we query:
["lexical_field_*^3", "semantic_field_1^10", "semantic_field_2^20", "semantic_field_3^30"].

The linear retriever will be rewritten to:

linear
  // Lexical group
  standard retriever `lexical_field_*` normalizer:minmax
   bool:
      should:
         multi_match ["lexical_field_*^3", "semantic_field_3^30"] filter on indexA 
         multi_match ["lexical_field_*^3"] filter on indexB
  // Semantic group
  linear normalizer:minmax
    standard { match on semantic_field_1}, weight:1 normalizer:minmax
    standard { match on semantic_field_2}, weight:1 normalizer:minmax
    standard { match on semantic_field_3, filter on indexB}, weight:1 normalizer:minmax

If only semantic_text field is queried, we don't need to have 2 levels of normalization, so we rewrite to:

linear
  standard retriever for lexical query, normalizer:minmax
  standard retriever, normalizer:minmax
     match on semantic_text fieldd

elasticsearchmachine · 2025-08-28T14:11:47Z

Hi @ioanatia, I've created a changelog YAML for you.

ioanatia · 2025-08-28T14:32:58Z

...ugin/rank-rrf/src/main/java/org/elasticsearch/xpack/rank/MultiFieldsInnerRetrieverUtils.java

        }
-        return innerRetrievers;
+        // there are no lexical fields that need to be queried, no need to create a retriever
+        if (lexicalQueryBuilders.isEmpty()) {


we could be in this situation when:

No query fields are provided and we need to use the ones from index.query.default_field index setting and not all indices have the same value for this setting, resulting in different list of fields that need to be queried per index. This case is uncommon.

One of the fields from the list that is provided to the linear retriever is a semantic_text field that is not present in all indices. This will be more common. It could be that a field with the same name does not even exist in all the other indices, or that it was mapped to another mapping type. We don't really know since we don't have access to the mappings.

From what I tested, the current approach where compose a boolean query and we filter by index names, seems to mitigate both cases.
If we find it too complicated, I can change this to raise an exception for both of these cases, so that we always use a single multi match query.

One of the fields from the list that is provided to the linear retriever is a semantic_text field that is not present in all indices. This will be more common. It could be that a field with the same name does not even exist in all the other indices, or that it was mapped to another mapping type. We don't really know since we don't have access to the mappings.

Agreed this will be more common, enough so to the point that IMO we need to handle it. I think it's also worth pointing out that we can be in this case when differentNonInferenceFields == true and lexicalQueryBuilders is not empty.

Mikep86

Nice work! I reviewed just the core logic for now, not the tests. I have some suggestions about how we may be able to simplify and handle more edge cases at the same time.

...ugin/rank-rrf/src/main/java/org/elasticsearch/xpack/rank/MultiFieldsInnerRetrieverUtils.java

Mikep86 · 2025-09-02T20:03:29Z

...ugin/rank-rrf/src/main/java/org/elasticsearch/xpack/rank/MultiFieldsInnerRetrieverUtils.java

        }
-        return innerRetrievers;
+        // there are no lexical fields that need to be queried, no need to create a retriever
+        if (lexicalQueryBuilders.isEmpty()) {


One of the fields from the list that is provided to the linear retriever is a semantic_text field that is not present in all indices. This will be more common. It could be that a field with the same name does not even exist in all the other indices, or that it was mapped to another mapping type. We don't really know since we don't have access to the mappings.

Agreed this will be more common, enough so to the point that IMO we need to handle it. I think it's also worth pointing out that we can be in this case when differentNonInferenceFields == true and lexicalQueryBuilders is not empty.

...ugin/rank-rrf/src/main/java/org/elasticsearch/xpack/rank/MultiFieldsInnerRetrieverUtils.java

elasticsearchmachine · 2025-09-09T10:08:55Z

Pinging @elastic/search-relevance (Team:Search - Relevance)

ioanatia · 2025-09-10T12:34:10Z

.../rank-rrf/src/test/java/org/elasticsearch/xpack/rank/linear/LinearRetrieverBuilderTests.java

+                assertFalse("the lexical retriever is only asserted once", assertedLexical);
+                assertFalse(expectedNonInferenceFields.isEmpty());
+
+                QueryBuilder topDocsQueryBuilder = standardRetrieverBuilder.topDocsQuery();


the reason why I went through these hoops for the lexical query is because we are comparing boolean queries.
and AFAICS I can't guarantee the order of the should clauses which are held in a List<QueryBuilder>.

when we compare two boolean queries, we check that we have the same list of should clauses:

elasticsearch/server/src/main/java/org/elasticsearch/index/query/BoolQueryBuilder.java

Lines 333 to 340 in 5a0636a

protected boolean doEquals(BoolQueryBuilder other) {

return Objects.equals(adjustPureNegative, other.adjustPureNegative)

&& Objects.equals(minimumShouldMatch, other.minimumShouldMatch)

&& Objects.equals(mustClauses, other.mustClauses)

&& Objects.equals(shouldClauses, other.shouldClauses)

&& Objects.equals(mustNotClauses, other.mustNotClauses)

&& Objects.equals(filterClauses, other.filterClauses);

}

so here I couldn't just use a simple assertEquals check as we had before

Mikep86

Looks good, awesome work 🚀 ! I left some minor comments about tests, they should be very easy to address.

Mikep86 · 2025-09-10T13:53:53Z

.../rank-rrf/src/test/java/org/elasticsearch/xpack/rank/linear/LinearRetrieverBuilderTests.java

+                queryBuilder = new BoolQueryBuilder().must(queryBuilder).filter(new TermsQueryBuilder("_index", indices));
+            }
+            return new InnerRetriever(new StandardRetrieverBuilder(queryBuilder), weight, expectedNormalizer);
+        }).collect(Collectors.toSet());


Can we check that rankWindowSize is propagated to the rewritten retriever as expected?

Mikep86 · 2025-09-10T15:12:39Z

.../rank-rrf/src/test/java/org/elasticsearch/xpack/rank/linear/LinearRetrieverBuilderTests.java

+            MinMaxScoreNormalizer.INSTANCE,
+            DEFAULT_RANK_WINDOW_SIZE,
+            new float[0],
+            new ScoreNormalizer[0]


Nit: Use indexName instead of a magic string here? Repeat where "test-index" is referenced below.

Mikep86 · 2025-09-10T15:13:07Z

.../rank-rrf/src/test/java/org/elasticsearch/xpack/rank/linear/LinearRetrieverBuilderTests.java

+            new float[0],
+            new ScoreNormalizer[0]
+        );
+        assertMultiIndexMultiFieldsParamsRewrite(


Nit: Use anotherIndexName instead of a magic string here? Repeat where "test-another-index" is referenced below.

Multi index support for simplified retrievers

f6b26ad

ioanatia added >enhancement :SearchOrg/Relevance Label for the Search (solution/org) Relevance team Team:Search - Relevance The Search organization Search Relevance team v9.2.0 labels Aug 28, 2025

Update docs/changelog/133720.yaml

c97afae

ioanatia commented Aug 28, 2025

View reviewed changes

comment

6f7bcef

Mikep86 reviewed Sep 2, 2025

View reviewed changes

ioanatia and others added 4 commits September 4, 2025 12:42

Merge branch 'main' into multi_index_support_simplified

f3fa897

Add more tests and refactor the generation of the lexical group

22c4618

Merge branch 'main' into multi_index_support_simplified

5184425

Add comments

7e4a231

ioanatia marked this pull request as ready for review September 9, 2025 10:08

ioanatia requested a review from Mikep86 September 9, 2025 10:08

ioanatia commented Sep 10, 2025

View reviewed changes

Mikep86 approved these changes Sep 10, 2025

View reviewed changes

ioanatia and others added 3 commits September 11, 2025 09:42

Address feedback

207fc24

Merge branch 'main' into multi_index_support_simplified

1e961ce

Glad I merged main - fix syntax error

da3a3a8

ioanatia merged commit 54152ca into elastic:main Sep 11, 2025
34 checks passed

ioanatia deleted the multi_index_support_simplified branch September 11, 2025 10:09

This was referenced Sep 16, 2025

Support querying multiple indices with the simplified RRF retriever #134822

Merged

Document multi index query support for simplified retrievers #134980

Merged

phananh1010 mentioned this pull request Oct 23, 2025

Mirror upstream elastic/elasticsearch#134980 for AI review (snapshot of HEAD tree) phananh1010/elasticsearch#203

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support querying multiple indices with the simplified linear retriever#133720

Support querying multiple indices with the simplified linear retriever#133720
ioanatia merged 10 commits intoelastic:mainfrom
ioanatia:multi_index_support_simplified

ioanatia commented Aug 28, 2025 •

edited

Loading

elasticsearchmachine commented Aug 28, 2025

ioanatia Aug 28, 2025 •

edited

Loading

Mikep86 Sep 2, 2025

Mikep86 left a comment

Uh oh!

Uh oh!

Mikep86 Sep 2, 2025

Uh oh!

elasticsearchmachine commented Sep 9, 2025

ioanatia Sep 10, 2025

Mikep86 left a comment

Mikep86 Sep 10, 2025

Mikep86 Sep 10, 2025

Mikep86 Sep 10, 2025

Uh oh!

Labels

3 participants

	protected boolean doEquals(BoolQueryBuilder other) {
	return Objects.equals(adjustPureNegative, other.adjustPureNegative)
	&& Objects.equals(minimumShouldMatch, other.minimumShouldMatch)
	&& Objects.equals(mustClauses, other.mustClauses)
	&& Objects.equals(shouldClauses, other.shouldClauses)
	&& Objects.equals(mustNotClauses, other.mustNotClauses)
	&& Objects.equals(filterClauses, other.filterClauses);
	}

Conversation

ioanatia commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Some examples:

elasticsearchmachine commented Aug 28, 2025

ioanatia Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Mikep86 Sep 2, 2025

Choose a reason for hiding this comment

Mikep86 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Mikep86 Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented Sep 9, 2025

ioanatia Sep 10, 2025

Choose a reason for hiding this comment

Mikep86 left a comment

Choose a reason for hiding this comment

Mikep86 Sep 10, 2025

Choose a reason for hiding this comment

Mikep86 Sep 10, 2025

Choose a reason for hiding this comment

Mikep86 Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

Labels

3 participants

ioanatia commented Aug 28, 2025 •

edited

Loading

ioanatia Aug 28, 2025 •

edited

Loading