Skip to content

Implement comprehensive top N parameter handling for text similarity reranker#142039

Merged
Mikep86 merged 29 commits intoelastic:mainfrom
Mikep86:text-similarity-reranker_aioob-error
Feb 11, 2026
Merged

Implement comprehensive top N parameter handling for text similarity reranker#142039
Mikep86 merged 29 commits intoelastic:mainfrom
Mikep86:text-similarity-reranker_aioob-error

Conversation

@Mikep86
Copy link
Contributor

@Mikep86 Mikep86 commented Feb 6, 2026

Currently, the text similarity reranker has, at best, spotty support for rerank services that can have a top_n (or similar) parameter set. This parameter configures the rerank service to return, at most, N reranked documents. When N is less than rank_window_size, it can lead to ArrayIndexOutOfBoundsExceptions in the current implementation. This is why we check that N is greater than or equal to rank_window_size, however the problem is that that the current check misses a bunch of rerank services that support top_n.

This PR addresses the problem by implementing comprehensive support for the top_n parameter:

  • Adds the TopNProvider interface, a unified way to report the value of the top_n parameter.
  • Implements unified and more robust logic for extracting scores from ranked docs. This logic also implements a friendlier failure mode when the number of ranked docs returned by the rerank service does not match what is expected, which can happen when a top_n parameter value is applied, but not properly reported.

TextSimilarityRankFeaturePhaseRankCoordinatorContext and its tests have also been simplified. With the unified score extraction logic, we no longer need to resolve chunk scoring configuration while computing scores, so that logic (and its related tests) have been removed.

@Mikep86 Mikep86 added >bug :Search Relevance/Ranking Scoring, rescoring, rank evaluation. v9.4.0 labels Feb 6, 2026
@elasticsearchmachine
Copy link
Collaborator

Hi @Mikep86, I've created a changelog YAML for you.

return new RankedDocsResults(results.stream().sorted(Comparator.reverseOrder()).toList());

// RankedDoc's compareTo implementation already sorts by score descending, so we don't need to reverse the sort order
var sortedResultsStream = results.stream().sorted();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it turns out that our test reranker service has been sorting docs in reverse relevance this whole time 🫠 . We didn't catch it as an issue because ES sorts the results again (correctly) here. This all happens to work, as long as reranker reranks every doc sent to it. However, if the reranker truncates results (like if a top_n parameter is applied 😉 ), then the incorrect sort order becomes an issue.

Comment on lines +61 to +62
// This method relies on callers filtering out feature docs with null feature data
assert Arrays.stream(featureDocs).noneMatch(featureDoc -> featureDoc.featureData == null);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We used to filter out feature docs with null featureData before generating the inference request, however there are two issues with that:

  • In all code paths, the caller already does this, so it's a redundant operation.
  • We don't consistently use the filtered feature doc array in computeScores. So, if a caller were to actually provide a featureDocs array with one or more docs with null featureData, we would still end up throwing an NPE somewhere in computeScores.

This assertion seems like a good balance between being explicit that callers must provide docs with non-null featureData and avoiding redundant operations.

@Mikep86 Mikep86 added auto-backport Automatically create backport pull requests when merged branch:9.2 branch:9.3 labels Feb 9, 2026
@Mikep86
Copy link
Contributor Author

Mikep86 commented Feb 9, 2026

@elasticmachine update branch

Copy link
Member

@kderusso kderusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some concerns about backporting and immediately performing a followup but will defer to your team on that. Otherwise the changes look good to me, thanks for doing the cleanup!

chunkScorerConfig != null
? new ChunkScorerConfig(chunkScorerConfig.size, inferenceText, chunkScorerConfig.chunkingSettings())
: null
failuresAllowed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice cleanup here!

configuredTopN = topNProvider.getTopN();
}
}
if (configuredTopN != null && configuredTopN < rankWindowSize) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this suggestion. Talked about this a little with @Mikep86 offline, but I do have some concerns about backporting this change, and then immediately removing it for the better fix proposed here. That strategy will result in merge pain going forward for all future backports that touch this code.

I wonder if there's a way we could backport the change in a way that wouldn't cause a 5xx error in serverless but also not require the merge hell. I realize that manually enumerating top n supporting services isn't ideal but for the backport branches, now that we know the fix could we literally catch the OOB exception and return a friendlier message, then just go with this proposed fix going forward?

List<Integer> rankedDocToFeatureDoc = new ArrayList<>();
for (int i = 0; i < featureDocs.length; i++) {
RankFeatureDoc featureDoc = featureDocs[i];
for (int j = 0; j < featureDoc.featureData.size(); j++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to check anyway for safety?

Copy link
Member

@kderusso kderusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some concerns about backporting and immediately performing a followup but will defer to your team on that. Otherwise the changes look good to me, thanks for doing the cleanup!

Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Mikep86 Mikep86 merged commit 628c78d into elastic:main Feb 11, 2026
35 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
9.3 Commit could not be cherrypicked due to conflicts
9.2 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 142039

Mikep86 added a commit to Mikep86/elasticsearch that referenced this pull request Feb 11, 2026
…reranker (elastic#142039)

Currently, the text similarity reranker has, at best, spotty support for rerank services that can have a top_n (or similar) parameter set. This parameter configures the rerank service to return, at most, N reranked documents. When N is less than rank_window_size, it can lead to ArrayIndexOutOfBoundsExceptions in the current implementation. This is why we check that N is greater than or equal to rank_window_size, however the problem is that that the current check misses a bunch of rerank services that support top_n.

This PR addresses the problem by implementing comprehensive support for the top_n parameter:

- Adds the TopNProvider interface, a unified way to report the value of the top_n parameter.
- Implements unified and more robust logic for extracting scores from ranked docs. This logic also implements a friendlier failure mode when the number of ranked docs returned by the rerank service does not match what is expected, which can happen when a top_n parameter value is applied, but not properly reported.

TextSimilarityRankFeaturePhaseRankCoordinatorContext and its tests have also been simplified. With the unified score extraction logic, we no longer need to resolve chunk scoring configuration while computing scores, so that logic (and its related tests) have been removed.

(cherry picked from commit 628c78d)

# Conflicts:
#	x-pack/plugin/inference/qa/test-service-plugin/src/main/java/org/elasticsearch/xpack/inference/mock/TestRerankingServiceExtension.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/InferenceFeatures.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/rank/textsimilarity/TextSimilarityRankFeaturePhaseRankCoordinatorContext.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/rank/textsimilarity/TextSimilarityRankFeaturePhaseRankCoordinatorContextTests.java
#	x-pack/plugin/inference/src/yamlRestTest/resources/rest-api-spec/test/inference/70_text_similarity_rank_retriever.yml
@Mikep86
Copy link
Contributor Author

Mikep86 commented Feb 11, 2026

💚 All backports created successfully

Status Branch Result
9.3
9.2

Questions ?

Please refer to the Backport tool documentation

Mikep86 added a commit to Mikep86/elasticsearch that referenced this pull request Feb 11, 2026
…reranker (elastic#142039)

Currently, the text similarity reranker has, at best, spotty support for rerank services that can have a top_n (or similar) parameter set. This parameter configures the rerank service to return, at most, N reranked documents. When N is less than rank_window_size, it can lead to ArrayIndexOutOfBoundsExceptions in the current implementation. This is why we check that N is greater than or equal to rank_window_size, however the problem is that that the current check misses a bunch of rerank services that support top_n.

This PR addresses the problem by implementing comprehensive support for the top_n parameter:

- Adds the TopNProvider interface, a unified way to report the value of the top_n parameter.
- Implements unified and more robust logic for extracting scores from ranked docs. This logic also implements a friendlier failure mode when the number of ranked docs returned by the rerank service does not match what is expected, which can happen when a top_n parameter value is applied, but not properly reported.

TextSimilarityRankFeaturePhaseRankCoordinatorContext and its tests have also been simplified. With the unified score extraction logic, we no longer need to resolve chunk scoring configuration while computing scores, so that logic (and its related tests) have been removed.

(cherry picked from commit 628c78d)

# Conflicts:
#	x-pack/plugin/inference/qa/test-service-plugin/src/main/java/org/elasticsearch/xpack/inference/mock/TestRerankingServiceExtension.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/InferenceFeatures.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/rank/textsimilarity/TextSimilarityRankFeaturePhaseRankCoordinatorContext.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/openshiftai/rerank/OpenShiftAiRerankTaskSettings.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/rank/textsimilarity/TextSimilarityRankFeaturePhaseRankCoordinatorContextTests.java
#	x-pack/plugin/inference/src/yamlRestTest/resources/rest-api-spec/test/inference/70_text_similarity_rank_retriever.yml
elasticsearchmachine pushed a commit that referenced this pull request Feb 11, 2026
…arity reranker (#142039) (#142314)

* Implement comprehensive top N parameter handling for text similarity reranker (#142039)

Currently, the text similarity reranker has, at best, spotty support for rerank services that can have a top_n (or similar) parameter set. This parameter configures the rerank service to return, at most, N reranked documents. When N is less than rank_window_size, it can lead to ArrayIndexOutOfBoundsExceptions in the current implementation. This is why we check that N is greater than or equal to rank_window_size, however the problem is that that the current check misses a bunch of rerank services that support top_n.

This PR addresses the problem by implementing comprehensive support for the top_n parameter:

- Adds the TopNProvider interface, a unified way to report the value of the top_n parameter.
- Implements unified and more robust logic for extracting scores from ranked docs. This logic also implements a friendlier failure mode when the number of ranked docs returned by the rerank service does not match what is expected, which can happen when a top_n parameter value is applied, but not properly reported.

TextSimilarityRankFeaturePhaseRankCoordinatorContext and its tests have also been simplified. With the unified score extraction logic, we no longer need to resolve chunk scoring configuration while computing scores, so that logic (and its related tests) have been removed.

(cherry picked from commit 628c78d)

# Conflicts:
#	x-pack/plugin/inference/qa/test-service-plugin/src/main/java/org/elasticsearch/xpack/inference/mock/TestRerankingServiceExtension.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/InferenceFeatures.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/rank/textsimilarity/TextSimilarityRankFeaturePhaseRankCoordinatorContext.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/rank/textsimilarity/TextSimilarityRankFeaturePhaseRankCoordinatorContextTests.java
#	x-pack/plugin/inference/src/yamlRestTest/resources/rest-api-spec/test/inference/70_text_similarity_rank_retriever.yml

* [CI] Auto commit changes from spotless

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
elasticsearchmachine pushed a commit that referenced this pull request Feb 11, 2026
…arity reranker (#142039) (#142317)

* Implement comprehensive top N parameter handling for text similarity reranker (#142039)

Currently, the text similarity reranker has, at best, spotty support for rerank services that can have a top_n (or similar) parameter set. This parameter configures the rerank service to return, at most, N reranked documents. When N is less than rank_window_size, it can lead to ArrayIndexOutOfBoundsExceptions in the current implementation. This is why we check that N is greater than or equal to rank_window_size, however the problem is that that the current check misses a bunch of rerank services that support top_n.

This PR addresses the problem by implementing comprehensive support for the top_n parameter:

- Adds the TopNProvider interface, a unified way to report the value of the top_n parameter.
- Implements unified and more robust logic for extracting scores from ranked docs. This logic also implements a friendlier failure mode when the number of ranked docs returned by the rerank service does not match what is expected, which can happen when a top_n parameter value is applied, but not properly reported.

TextSimilarityRankFeaturePhaseRankCoordinatorContext and its tests have also been simplified. With the unified score extraction logic, we no longer need to resolve chunk scoring configuration while computing scores, so that logic (and its related tests) have been removed.

(cherry picked from commit 628c78d)

# Conflicts:
#	x-pack/plugin/inference/qa/test-service-plugin/src/main/java/org/elasticsearch/xpack/inference/mock/TestRerankingServiceExtension.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/InferenceFeatures.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/rank/textsimilarity/TextSimilarityRankFeaturePhaseRankCoordinatorContext.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/openshiftai/rerank/OpenShiftAiRerankTaskSettings.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/rank/textsimilarity/TextSimilarityRankFeaturePhaseRankCoordinatorContextTests.java
#	x-pack/plugin/inference/src/yamlRestTest/resources/rest-api-spec/test/inference/70_text_similarity_rank_retriever.yml

* [CI] Auto commit changes from spotless

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
costin pushed a commit to costin/elasticsearch that referenced this pull request Feb 16, 2026
…reranker (elastic#142039)

Currently, the text similarity reranker has, at best, spotty support for rerank services that can have a top_n (or similar) parameter set. This parameter configures the rerank service to return, at most, N reranked documents. When N is less than rank_window_size, it can lead to ArrayIndexOutOfBoundsExceptions in the current implementation. This is why we check that N is greater than or equal to rank_window_size, however the problem is that that the current check misses a bunch of rerank services that support top_n.

This PR addresses the problem by implementing comprehensive support for the top_n parameter:

- Adds the TopNProvider interface, a unified way to report the value of the top_n parameter.
- Implements unified and more robust logic for extracting scores from ranked docs. This logic also implements a friendlier failure mode when the number of ranked docs returned by the rerank service does not match what is expected, which can happen when a top_n parameter value is applied, but not properly reported.

TextSimilarityRankFeaturePhaseRankCoordinatorContext and its tests have also been simplified. With the unified score extraction logic, we no longer need to resolve chunk scoring configuration while computing scores, so that logic (and its related tests) have been removed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged >bug :Search Relevance/Ranking Scoring, rescoring, rank evaluation. Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.2.6 v9.3.1 v9.4.0

8 participants