Support semantic reranking using contextual snippets instead of entire field text#129369
Support semantic reranking using contextual snippets instead of entire field text#129369kderusso merged 68 commits intoelastic:mainfrom
Conversation
|
💚 CLA has been signed |
|
cla/check |
...k/inference/rank/textsimilarity/TextSimilarityRerankingRankFeaturePhaseRankShardContext.java
Show resolved
Hide resolved
|
|
||
| import static org.elasticsearch.search.rank.feature.RerankSnippetConfig.DEFAULT_NUM_SNIPPETS; | ||
|
|
||
| public class TextSimilarityRerankingRankFeaturePhaseRankShardContext extends RerankingRankFeaturePhaseRankShardContext { |
There was a problem hiding this comment.
Do we really need the separation here between TextSimilarityRerankingRankFeaturePhaseRankShardContext and RerankingRankFeaturePhaseRankShardContext? I don't see RerankingRankFeaturePhaseRankShardContext being used elsewhere so maybe we can just have TextSimilarityRerankingRankFeaturePhaseRankShardContext and implement the full logic there?
There was a problem hiding this comment.
I agree that it's a little extra, but it clearly conveys the fact that snippets only pertain to text similarity reranking and not other rerankers we may add in the future. If you feel strongly about it I will refactor back, but I don't think this is a bad thing per se.
There was a problem hiding this comment.
Can we just rename RerankingRankFeaturePhaseRankShardContext into TextSimilarityRerankingRankFeaturePhaseRankShardContext? I don't see what RerankingRankFeaturePhaseRankShardContext provides as an abstract class.
There was a problem hiding this comment.
Can we just rename RerankingRankFeaturePhaseRankShardContext into TextSimilarityRerankingRankFeaturePhaseRankShardContext? I don't see what RerankingRankFeaturePhaseRankShardContext provides as an abstract class.
@jimczi this class is referenced elsewhere including the random reranker. I personally think it's cleaner to have the separation, but if you feel strongly about it I will put snippet configuration in the parent class and remove the abstraction. I am commenting one more time instead of changing it, because I actually prefer this separation to putting things specific to text similarity that are used by the parent reranker. However, I won't die on that hill if you absolutely hate it.
server/src/main/java/org/elasticsearch/search/retriever/CompoundRetrieverBuilder.java
Outdated
Show resolved
Hide resolved
| */ | ||
| public class SnippetRankInput implements Writeable { | ||
|
|
||
| private final RerankSnippetConfig snippets; |
There was a problem hiding this comment.
Is the separation between SnippetRankInput and RerankSnippetConfig necessary? Why not injecting numSnippets and snippetQueryBuilder directly here?
There was a problem hiding this comment.
It's a little convoluted because we don't allow size of snippets in the API, I'll see if I can clean it up
...in/java/org/elasticsearch/xpack/inference/rank/textsimilarity/TextSimilarityRankBuilder.java
Outdated
Show resolved
Hide resolved
| private final SnippetRankInput snippetRankInput; | ||
|
|
||
| // Rough approximation of token size vs. characters in highlight fragments. | ||
| // TODO highlighter should be able to set fragment size by token not length |
There was a problem hiding this comment.
@jimczi Here is the TODO I referenced in my earlier response 🙂
jimczi
left a comment
There was a problem hiding this comment.
Thanks for adding the feature flag. I left some minor comments to address but the feature looks good to me.
|
|
||
| import static org.elasticsearch.search.rank.feature.RerankSnippetConfig.DEFAULT_NUM_SNIPPETS; | ||
|
|
||
| public class TextSimilarityRerankingRankFeaturePhaseRankShardContext extends RerankingRankFeaturePhaseRankShardContext { |
There was a problem hiding this comment.
Can we just rename RerankingRankFeaturePhaseRankShardContext into TextSimilarityRerankingRankFeaturePhaseRankShardContext? I don't see what RerankingRankFeaturePhaseRankShardContext provides as an abstract class.
...k/inference/rank/textsimilarity/TextSimilarityRerankingRankFeaturePhaseRankShardContext.java
Outdated
Show resolved
Hide resolved
...in/java/org/elasticsearch/xpack/inference/rank/textsimilarity/TextSimilarityRankBuilder.java
Outdated
Show resolved
Hide resolved
...in/java/org/elasticsearch/xpack/inference/rank/textsimilarity/TextSimilarityRankBuilder.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/search/rank/feature/RerankSnippetInput.java
Outdated
Show resolved
Hide resolved
…inference/rank/textsimilarity/TextSimilarityRankBuilder.java Co-authored-by: Jim Ferenczi <jim.ferenczi@elastic.co>
…inference/rank/textsimilarity/TextSimilarityRerankingRankFeaturePhaseRankShardContext.java Co-authored-by: Jim Ferenczi <jim.ferenczi@elastic.co>
Followup to the POC described in #128255
Adds the ability to rerank based on a smaller number of snippets.
Example, with a default of one snippet:
Example, specifying snippets:
Not specifying snippets will continue to send the entire field contents into the reranker model.