Provide better impacts for fields indexed with IndexOptions.DOCS#14511
Provide better impacts for fields indexed with IndexOptions.DOCS#14511gf2121 merged 13 commits intoapache:mainfrom
Conversation
lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java
Outdated
Show resolved
Hide resolved
jpountz
left a comment
There was a problem hiding this comment.
Sorry for only seeing now, I left some suggestions as to how to update your change.
lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java
Outdated
Show resolved
Hide resolved
jpountz
left a comment
There was a problem hiding this comment.
LGTM. Can you add a CHANGES entry under 10.3 and undo the new line in SlowImpactsEnum?
|
Addressed comments. |
|
This sounds safe enough for 10.2.1 for me. Can you move the CHANGES entry to 10.2.1 then? cc @ChrisHegarty |
|
I hope you don't mind, I updated this PR title and description to better reflect the change. |
Not at all. Thanks for taking the time to explain the different pieces of this code. It was really fun debugging this and would definitely love to visit this part of the code again. |
|
@ChrisHegarty @jpountz Moved the change log to 10.2.1 |
Eh! I think you moved it to 10.2.0, rather than 10.2.1. |
Signed-off-by: expani <anijainc@amazon.com>
Signed-off-by: expani <anijainc@amazon.com>
Signed-off-by: expani <anijainc@amazon.com>
Signed-off-by: expani <anijainc@amazon.com>
Signed-off-by: expani <anijainc@amazon.com>
Signed-off-by: expani <anijainc@amazon.com>
Signed-off-by: expani <anijainc@amazon.com>
Signed-off-by: expani <anijainc@amazon.com>
|
Oops hadn't rebased with main. Fixed it now. |
What am I missing? This is not applicable to 10.2.1, since the only changed file is Lucene103PostingsReader.java which is not present in 10.2 ! Did the rebase mess something up ? |
Signed-off-by: expani <anijainc@amazon.com>
Signed-off-by: expani <anijainc@amazon.com>
I had updated 103PostingsReader as the initial plan was not to backport. Updated 101PostingsReader which is used in 10.2.1 Should I also raise against some other branch as well ? |
Signed-off-by: expani <anijainc@amazon.com>
Yes, we have not backport |
|
I made the same change in |
…che#14511) Co-Authored-by: expani <anijainc@amazon.com>
…che#14511) Co-Authored-by: expani <anijainc@amazon.com>
…che#14511) Co-Authored-by: expani <anijainc@amazon.com>
…che#14511) Co-Authored-by: expani <anijainc@amazon.com>
|
@expani could you resolve the conflicts so that i can merge? |
) Postings always return impacts with freq=Integer.MAX_VALUE and norm=1 when frequencies are not indexed (IndexOptions.DOCS). This significantly overestimates the score upper bound of term queries, since the similarity scorer is effectively called with freq=1 all the time in this case (and either norm=1 if norms are not indexed, or the number of terms in the field otherwise). This updates postings to always return impacts with freq=1 and norm=1 when frequencies are not indexed, which helps compute better score upper bounds, and in-turn makes dynamic pruning perform better. Closes #14445 Co-Authored-by: expani <anijainc@amazon.com>
let the IndexMode decide whether to default lucene postings instead of checking for standard index mode. The `Lucene101PostingsFormat` is now used for a while behind a feature flag. Regressions were found by were fixed via apache/lucene#14511. The `Lucene101PostingsFormat` is now a better trade off when the index mode is standard.
Remove use_default_lucene_postings_format feature flag and let the IndexMode decide whether to default lucene postings instead of checking for standard index mode. The `Lucene101PostingsFormat` is now used for a while behind a feature flag. Regressions were found by were fixed via apache/lucene#14511. The `Lucene101PostingsFormat` is now a better trade off when the index mode is standard.
Remove use_default_lucene_postings_format feature flag and let the IndexMode decide whether to default lucene postings instead of checking for standard index mode. The `Lucene101PostingsFormat` is now used for a while behind a feature flag. Regressions were found by were fixed via apache/lucene#14511. The `Lucene101PostingsFormat` is now a better trade off when the index mode is standard.
Remove use_default_lucene_postings_format feature flag and let the IndexMode decide whether to default lucene postings instead of checking for standard index mode. The `Lucene101PostingsFormat` is now used for a while behind a feature flag. Regressions were found by were fixed via apache/lucene#14511. The `Lucene101PostingsFormat` is now a better trade off when the index mode is standard.
Remove use_default_lucene_postings_format feature flag and let the IndexMode decide whether to default lucene postings instead of checking for standard index mode. The `Lucene101PostingsFormat` is now used for a while behind a feature flag. Regressions were found by were fixed via apache/lucene#14511. The `Lucene101PostingsFormat` is now a better trade off when the index mode is standard.
Remove use_default_lucene_postings_format feature flag and let the IndexMode decide whether to default lucene postings instead of checking for standard index mode. The `Lucene101PostingsFormat` is now used for a while behind a feature flag. Regressions were found by were fixed via apache/lucene#14511. The `Lucene101PostingsFormat` is now a better trade off when the index mode is standard.
Postings always return impacts with freq=Integer.MAX_VALUE and norm=1 when frequencies are not indexed (IndexOptions.DOCS). This significantly overestimates the score upper bound of term queries, since the similarity scorer is effectively called with freq=1 all the time in this case (and either norm=1 if norms are not indexed, or the number of terms in the field otherwise).
This updates postings to always return impacts with freq=1 and norm=1 when frequencies are not indexed, which helps compute better score upper bounds, and in-turn makes dynamic pruning perform better.
Closes #14445