[ES|QL] Implementing rerank on multi values.#140672
Conversation
…h instead of the string length.
|
Hi @afoucret, I've created a changelog YAML for you. |
| reranker using a non text fields | ||
| required_capability: rerank | ||
| required_capability: match_operator_colon | ||
|
|
||
| FROM books METADATA _score | ||
| | WHERE title:"war and peace" AND author:"Tolstoy" | ||
| | RERANK "war and peace" ON ratings WITH { "inference_id" : "test_reranker" } | ||
| | EVAL _score=ROUND(_score, 2), ratings = ROUND(ratings, 2) | ||
| | SORT _score DESC, book_no ASC | ||
| | KEEP book_no, title, ratings, _score | ||
| ; | ||
|
|
||
| book_no:keyword | title:text | ratings:double | _score:double | ||
| 2776 | The Devil and Other Stories (Oxford World's Classics) | 5.0 | 0.33 | ||
| 4536 | War and Peace (Signet Classics) | 4.75 | 0.25 | ||
| 5327 | War and Peace | 3.84 | 0.06 | ||
| 9032 | War and Peace: A Novel (6 Volumes) | 3.81 | 0.06 | ||
| ; | ||
|
|
||
|
|
There was a problem hiding this comment.
ℹ️ Now impossible to rerank a number.
|
|
||
| rerankCommand | ||
| : RERANK (targetField=qualifiedName ASSIGN)? queryText=constant ON rerankFields commandNamedParameters | ||
| : RERANK (targetField=qualifiedName ASSIGN)? queryText=constant ON rerankFields=fields commandNamedParameters |
There was a problem hiding this comment.
Cleaning the grammar of the useless rerankFields that was required only to implement that we do not have nameless expression (not required anymore)
.../plugin/esql/src/main/java/org/elasticsearch/xpack/esql/inference/rerank/RerankOperator.java
Show resolved
Hide resolved
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
ℹ️ Important: Docs version tagging👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version. We use applies_to tags to mark version-specific features and changes. Expand for a quick overviewWhen to use applies_to tags:✅ At the page level to indicate which products/deployments the content applies to (mandatory) What NOT to do:❌ Don't remove or replace information that applies to an older version 🤔 Need help?
|
| this.totalPositions = textBlock.getPositionCount(); | ||
| this.totalPositions = inputBlocks[0].getPositionCount(); |
There was a problem hiding this comment.
I think this should "never" happen - so I added an assertion.
mridula-s109
left a comment
There was a problem hiding this comment.
LGTM.
I do have one question would it be nice if the docs explicitly cover the breaking changes mentioned in the PR description:
- String fields only: Add clear statement that numeric/boolean fields are no longer supported
- Multi-value max score: Document that multi-value fields use max score (not YAML aggregation)
Suggested location: docs/reference/query-languages/esql/_snippets/commands/layout/rerank.md
| // Filter out empty and whitespace-only strings | ||
| if (Strings.hasText(inputText)) { | ||
| inputs.add(inputText); | ||
| if (textBlock.isNull(position)) { |
There was a problem hiding this comment.
Similarly to the ctor, is it possible for textBlock to be null here?
There was a problem hiding this comment.
I think this should never happen - so I added an assertion instead
| * Tests that when one block is null at a position but another block has a value, | ||
| * only the non-null block's values are included in the input. | ||
| */ | ||
| public void testMultipleInputBlocksWithPartialNulls() throws Exception { |
| ); | ||
|
|
||
| // Verify factory is created correctly | ||
| assertNotNull(factory); |
There was a problem hiding this comment.
maybe we could update this to test actual input/output for 2 blocks? Something similar to RerankRequestIteratorTests#testMultipleInputBlocksWithPartialNulls
There was a problem hiding this comment.
good suggestion - I updated the test
|
@mridula-s109 I will update the docs separately when we also mark RERANK as GA - it's a good idea. |
Summary
This PR changes how the ES|QL
RERANKcommand handles input fields for reranking.Multi-value fields are now processed natively, with each value sent individually to the reranking model.
Closes: #136865
Functional Changes
Multi-value field handling
When a rerank field contains multiple values (e.g., a multi-valued
authorfield), each value is now sent separately to the inference service for scoring and the max score is returned.Previously, vlaues fields were combined into a single YAML document before being sent to the model.
Example:
If a document has author: ["John Hockenberry", "Leo Tolstoy", "Pat Conroy"], each author value is now scored independently rather than being formatted as a YAML list.
Rerank fields restricted to string types only
The RERANK command now only accepts string fields. Numeric and boolean fields are no longer supported as rerank inputs.
Can use an expression without a name as a RERANK field
In previous version it was required to specify the name of the computed field when using and expression as a rerank field:
Now that we do not use an intermediary YAML, it is not necessary anymore and you can use the expression directly: