Skip to content

Add a generic rescorer retriever based on the search request's rescore functionality #118585

Merged
jimczi merged 61 commits intoelastic:mainfrom
jimczi:rescorer_retriever
Dec 18, 2024
Merged

Add a generic rescorer retriever based on the search request's rescore functionality #118585
jimczi merged 61 commits intoelastic:mainfrom
jimczi:rescorer_retriever

Conversation

@jimczi
Copy link
Contributor

@jimczi jimczi commented Dec 12, 2024

This pull request introduces a new retriever called rescorer, which leverages the rescore functionality of the search request.
The rescorer retriever re-scores only the top documents retrieved by its child retriever, offering fine-tuned scoring capabilities.

All rescorers supported in the rescore section of a search request are available in this retriever, and the same format is used to define the rescore configuration.

Example:
  - do:
      search:
        index: test
        body:
          retriever:
            rescorer:
              rescore:
                window_size: 10
                query:
                  rescore_query:
                    rank_feature:
                      field: "features.second_stage"
                      linear: { }
                  query_weight: 0
              retriever:
                standard:
                  query:
                    rank_feature:
                      field: "features.first_stage"
                      linear: { }
          size: 2

Key Changes

  1. Rescore Phase Adaptation:
    The original rescore phase was modified to support tie-breaking on the _shard_doc field. This ensures consistent sorting across all rounds of rescoring.
  2. CompoundRetrieverBuilder Integration:
    The implementation uses the CompoundRetrieverBuilder, ensuring the rescorer retriever can seamlessly integrate into any position within the retriever tree.

Commit Structure

  • Commit 1: Adapts the rescore phase to handle _shard_doc as a tiebreaker.
  • Commit 2: Implements the rescorer retriever.

To facilitate review, I split the changes into two commits. If preferred, I can open separate pull requests for each commit to simplify the review process. However, I opted to include all changes in this PR to provide a complete overview.

Closes #118327

jimczi and others added 28 commits November 21, 2024 20:46
This commit introduces support for using the `_shard_doc` field as a sort tiebreaker during query rescoring.
This change is a prerequisite to add support for rescorers in retriever workflows.
This change adds a new `rescorer` retriever that re-scores only the top documents returned by its child retriever.
@jimczi jimczi added >feature :Search Relevance/Ranking Scoring, rescoring, rank evaluation. labels Dec 12, 2024
@jimczi jimczi requested a review from a team as a code owner December 18, 2024 13:39
@jimczi jimczi merged commit 6f26106 into elastic:main Dec 18, 2024
@jimczi jimczi deleted the rescorer_retriever branch December 18, 2024 19:47
@benwtrent
Copy link
Member

Thank you for tackling this @jimczi ! I didn't fully review, but it looks nice!

jimczi added a commit to jimczi/elasticsearch that referenced this pull request Dec 18, 2024
…ore functionality (elastic#118585)

This pull request introduces a new retriever called `rescorer`, which leverages the `rescore` functionality of the search request.  
The `rescorer` retriever re-scores only the top documents retrieved by its child retriever, offering fine-tuned scoring capabilities.  

All rescorers supported in the `rescore` section of a search request are available in this retriever, and the same format is used to define the rescore configuration.  

<details>
<summary>Example:</summary>

```yaml
  - do:
      search:
        index: test
        body:
          retriever:
            rescorer:
              rescore:
                window_size: 10
                query:
                  rescore_query:
                    rank_feature:
                      field: "features.second_stage"
                      linear: { }
                  query_weight: 0
              retriever:
                standard:
                  query:
                    rank_feature:
                      field: "features.first_stage"
                      linear: { }
          size: 2
```

</details>

Closes elastic#118327

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
elasticsearchmachine pushed a commit that referenced this pull request Dec 19, 2024
…s rescore functionality (#119023)

* Add a generic `rescorer` retriever based on the search request's rescore functionality   (#118585)

This pull request introduces a new retriever called `rescorer`, which leverages the `rescore` functionality of the search request.  
The `rescorer` retriever re-scores only the top documents retrieved by its child retriever, offering fine-tuned scoring capabilities.  

All rescorers supported in the `rescore` section of a search request are available in this retriever, and the same format is used to define the rescore configuration.  

<details>
<summary>Example:</summary>

```yaml
  - do:
      search:
        index: test
        body:
          retriever:
            rescorer:
              rescore:
                window_size: 10
                query:
                  rescore_query:
                    rank_feature:
                      field: "features.second_stage"
                      linear: { }
                  query_weight: 0
              retriever:
                standard:
                  query:
                    rank_feature:
                      field: "features.first_stage"
                      linear: { }
          size: 2
```

</details>

Closes #118327

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>

* replace java21 only method

* fix compil

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport pending >feature :Search Relevance/Ranking Scoring, rescoring, rank evaluation. Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v8.18.0 v9.0.0

6 participants