Skip to content

[ES|QL] Implementing rerank on multi values.#140672

Merged
ioanatia merged 17 commits intoelastic:mainfrom
afoucret:esql-mv-rerank
Jan 28, 2026
Merged

[ES|QL] Implementing rerank on multi values.#140672
ioanatia merged 17 commits intoelastic:mainfrom
afoucret:esql-mv-rerank

Conversation

@afoucret
Copy link
Contributor

@afoucret afoucret commented Jan 14, 2026

Summary

This PR changes how the ES|QL RERANK command handles input fields for reranking.
Multi-value fields are now processed natively, with each value sent individually to the reranking model.

Closes: #136865

Functional Changes

Multi-value field handling

When a rerank field contains multiple values (e.g., a multi-valued author field), each value is now sent separately to the inference service for scoring and the max score is returned.
Previously, vlaues fields were combined into a single YAML document before being sent to the model.

Example:

FROM books
| WHERE title:"Leo Tolstoy"
| RERANK "Leo Tolstoy" ON author WITH { "inference_id" : "my_reranker" }}

If a document has author: ["John Hockenberry", "Leo Tolstoy", "Pat Conroy"], each author value is now scored independently rather than being formatted as a YAML list.

Rerank fields restricted to string types only

The RERANK command now only accepts string fields. Numeric and boolean fields are no longer supported as rerank inputs.

Can use an expression without a name as a RERANK field

In previous version it was required to specify the name of the computed field when using and expression as a rerank field:

| RERANK "my query" ON truncated_description = SUBSTRING(description, 0, 100)

Now that we do not use an intermediary YAML, it is not necessary anymore and you can use the expression directly:

| RERANK "my query" ON SUBSTRING(description, 0, 100)
@elasticsearchmachine
Copy link
Collaborator

Hi @afoucret, I've created a changelog YAML for you.

Comment on lines -67 to -86
reranker using a non text fields
required_capability: rerank
required_capability: match_operator_colon

FROM books METADATA _score
| WHERE title:"war and peace" AND author:"Tolstoy"
| RERANK "war and peace" ON ratings WITH { "inference_id" : "test_reranker" }
| EVAL _score=ROUND(_score, 2), ratings = ROUND(ratings, 2)
| SORT _score DESC, book_no ASC
| KEEP book_no, title, ratings, _score
;

book_no:keyword | title:text | ratings:double | _score:double
2776 | The Devil and Other Stories (Oxford World's Classics) | 5.0 | 0.33
4536 | War and Peace (Signet Classics) | 4.75 | 0.25
5327 | War and Peace | 3.84 | 0.06
9032 | War and Peace: A Novel (6 Volumes) | 3.81 | 0.06
;


Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ℹ️ Now impossible to rerank a number.


rerankCommand
: RERANK (targetField=qualifiedName ASSIGN)? queryText=constant ON rerankFields commandNamedParameters
: RERANK (targetField=qualifiedName ASSIGN)? queryText=constant ON rerankFields=fields commandNamedParameters
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaning the grammar of the useless rerankFields that was required only to implement that we do not have nameless expression (not required anymore)

@afoucret afoucret marked this pull request as ready for review January 15, 2026 10:40
@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Jan 15, 2026
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@github-actions
Copy link
Contributor

ℹ️ Important: Docs version tagging

👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version.

We use applies_to tags to mark version-specific features and changes.

Expand for a quick overview

When to use applies_to tags:

✅ At the page level to indicate which products/deployments the content applies to (mandatory)
✅ When features change state (e.g. preview, ga) in a specific version
✅ When availability differs across deployments and environments

What NOT to do:

❌ Don't remove or replace information that applies to an older version
❌ Don't add new information that applies to a specific version without an applies_to tag
❌ Don't forget that applies_to tags can be used at the page, section, and inline level

🤔 Need help?

@ioanatia ioanatia self-assigned this Jan 26, 2026
Copy link
Contributor

@tteofili tteofili left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines -79 to +82
this.totalPositions = textBlock.getPositionCount();
this.totalPositions = inputBlocks[0].getPositionCount();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a null check here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should "never" happen - so I added an assertion.

Copy link
Contributor

@mridula-s109 mridula-s109 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I do have one question would it be nice if the docs explicitly cover the breaking changes mentioned in the PR description:

  1. String fields only: Add clear statement that numeric/boolean fields are no longer supported
  2. Multi-value max score: Document that multi-value fields use max score (not YAML aggregation)

Suggested location: docs/reference/query-languages/esql/_snippets/commands/layout/rerank.md

Copy link
Contributor

@pmpailis pmpailis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

// Filter out empty and whitespace-only strings
if (Strings.hasText(inputText)) {
inputs.add(inputText);
if (textBlock.isNull(position)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly to the ctor, is it possible for textBlock to be null here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should never happen - so I added an assertion instead

* Tests that when one block is null at a position but another block has a value,
* only the non-null block's values are included in the input.
*/
public void testMultipleInputBlocksWithPartialNulls() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice :)

);

// Verify factory is created correctly
assertNotNull(factory);
Copy link
Contributor

@pmpailis pmpailis Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we could update this to test actual input/output for 2 blocks? Something similar to RerankRequestIteratorTests#testMultipleInputBlocksWithPartialNulls

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good suggestion - I updated the test

@ioanatia
Copy link
Contributor

@mridula-s109 I will update the docs separately when we also mark RERANK as GA - it's a good idea.

@ioanatia ioanatia merged commit 8974ffc into elastic:main Jan 28, 2026
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement :Search Relevance/ES|QL Search functionality in ES|QL Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.4.0

6 participants