Provide better impacts for fields indexed with IndexOptions.DOCS by expani · Pull Request #14511 · apache/lucene

expani · 2025-04-16T17:54:16Z

Postings always return impacts with freq=Integer.MAX_VALUE and norm=1 when frequencies are not indexed (IndexOptions.DOCS). This significantly overestimates the score upper bound of term queries, since the similarity scorer is effectively called with freq=1 all the time in this case (and either norm=1 if norms are not indexed, or the number of terms in the field otherwise).

This updates postings to always return impacts with freq=1 and norm=1 when frequencies are not indexed, which helps compute better score upper bounds, and in-turn makes dynamic pruning perform better.

Closes #14445

lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java

jpountz

Sorry for only seeing now, I left some suggestions as to how to update your change.

lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java

jpountz

LGTM. Can you add a CHANGES entry under 10.3 and undo the new line in SlowImpactsEnum?

expani · 2025-04-25T11:32:52Z

Addressed comments.
I want to backport these to 9.12.x and 10.2.x as well. Will open separate PRs for the same.

jpountz · 2025-04-25T13:19:48Z

This sounds safe enough for 10.2.1 for me. Can you move the CHANGES entry to 10.2.1 then? cc @ChrisHegarty

jpountz · 2025-04-25T13:25:36Z

I hope you don't mind, I updated this PR title and description to better reflect the change.

expani · 2025-04-25T14:12:53Z

I hope you don't mind, I updated this PR title and description to better reflect the change.

Not at all. Thanks for taking the time to explain the different pieces of this code.

It was really fun debugging this and would definitely love to visit this part of the code again.

expani · 2025-04-25T14:13:50Z

@ChrisHegarty @jpountz Moved the change log to 10.2.1

ChrisHegarty · 2025-04-25T14:27:34Z

@ChrisHegarty @jpountz Moved the change log to 10.2.1

Eh! I think you moved it to 10.2.0, rather than 10.2.1.

Signed-off-by: expani <anijainc@amazon.com>

expani · 2025-04-25T14:39:15Z

Oops hadn't rebased with main. Fixed it now.

ChrisHegarty · 2025-04-25T14:43:22Z

This sounds safe enough for 10.2.1 for me. Can you move the CHANGES entry to 10.2.1 then? cc @ChrisHegarty

What am I missing? This is not applicable to 10.2.1, since the only changed file is Lucene103PostingsReader.java which is not present in 10.2 ! Did the rebase mess something up ?

Signed-off-by: expani <anijainc@amazon.com>

expani · 2025-04-25T14:49:07Z

What am I missing? This is not applicable to 10.2.1, since the only changed file is Lucene103PostingsReader.java which is not present in 10.2 ! Did the rebase mess something up ?

I had updated 103PostingsReader as the initial plan was not to backport.

Updated 101PostingsReader which is used in 10.2.1

Should I also raise against some other branch as well ?

Signed-off-by: expani <anijainc@amazon.com>

gf2121 · 2025-04-25T14:57:41Z

Lucene103PostingsReader.java which is not present in 10.2

Yes, we have not backport Lucene103PostingReader, see #14333 (comment). I think we will need to make the same change to Lucene101PostingReader if we want to include this in 10.2.1.

expani · 2025-04-25T15:04:08Z

I made the same change in Lucene101PostingsReader
Should I be raising the PR against some other branch as well ?

…che#14511) Co-Authored-by: expani <anijainc@amazon.com>

gf2121 · 2025-04-25T15:24:13Z

To be clear, i raised #14557 and #14558 for backporting. I plan to merge this now if no one objects.

gf2121 · 2025-04-25T15:25:49Z

@expani could you resolve the conflicts so that i can merge?

) Postings always return impacts with freq=Integer.MAX_VALUE and norm=1 when frequencies are not indexed (IndexOptions.DOCS). This significantly overestimates the score upper bound of term queries, since the similarity scorer is effectively called with freq=1 all the time in this case (and either norm=1 if norms are not indexed, or the number of terms in the field otherwise). This updates postings to always return impacts with freq=1 and norm=1 when frequencies are not indexed, which helps compute better score upper bounds, and in-turn makes dynamic pruning perform better. Closes #14445 Co-Authored-by: expani <anijainc@amazon.com>

let the IndexMode decide whether to default lucene postings instead of checking for standard index mode. The `Lucene101PostingsFormat` is now used for a while behind a feature flag. Regressions were found by were fixed via apache/lucene#14511. The `Lucene101PostingsFormat` is now a better trade off when the index mode is standard.

Remove use_default_lucene_postings_format feature flag and let the IndexMode decide whether to default lucene postings instead of checking for standard index mode. The `Lucene101PostingsFormat` is now used for a while behind a feature flag. Regressions were found by were fixed via apache/lucene#14511. The `Lucene101PostingsFormat` is now a better trade off when the index mode is standard.

github-project-automation bot added this to OpenSearch Lucene & Core Performance Tracking Apr 16, 2025

github-project-automation bot moved this to Open in OpenSearch Lucene & Core Performance Tracking Apr 16, 2025

github-actions bot added the module:core/codecs label Apr 16, 2025

msfroh reviewed Apr 16, 2025

View reviewed changes

lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java Outdated Show resolved Hide resolved

lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java Outdated Show resolved Hide resolved

github-actions bot added the module:core/index label Apr 21, 2025

jpountz reviewed Apr 24, 2025

View reviewed changes

expani marked this pull request as ready for review April 25, 2025 08:24

jpountz approved these changes Apr 25, 2025

View reviewed changes

github-actions bot removed the module:core/index label Apr 25, 2025

jpountz changed the title ~~Ensuring skip list is read for fields indexed with only DOCS~~ Apr 25, 2025

ChrisHegarty added this to the 10.2.1 milestone Apr 25, 2025

expani added 8 commits April 25, 2025 07:33

Ensuring skip list is read for fields indexed with only DOCS

aaaabf1

Signed-off-by: expani <anijainc@amazon.com>

refactor

4e0b60e

Signed-off-by: expani <anijainc@amazon.com>

Fixed failing checkIndex tests

51f6ae7

Signed-off-by: expani <anijainc@amazon.com>

Addressed comments

8c482aa

Signed-off-by: expani <anijainc@amazon.com>

Add changes entry

f048684

Signed-off-by: expani <anijainc@amazon.com>

Fixed formatting

ac2650e

Signed-off-by: expani <anijainc@amazon.com>

Moved change log to 10.2.1

cc387ec

Signed-off-by: expani <anijainc@amazon.com>

Rebased and moved to 10.2.1

5e26e62

Signed-off-by: expani <anijainc@amazon.com>

expani force-pushed the perf_term_14445 branch from bb2a980 to 5e26e62 Compare April 25, 2025 14:36

Rebased and moved to 10.2.1

26bc66b

Signed-off-by: expani <anijainc@amazon.com>

Same bug fix for 101PostingsReader

8af1bad

Signed-off-by: expani <anijainc@amazon.com>

Same bug fix for 101PostingsReader

f7b4ce1

Signed-off-by: expani <anijainc@amazon.com>

Formatting

acfbdb1

Signed-off-by: expani <anijainc@amazon.com>

gf2121 added a commit to gf2121/lucene that referenced this pull request Apr 25, 2025

Provide better impacts for fields indexed with IndexOptions.DOCS (apa…

3b06e53

…che#14511) Co-Authored-by: expani <anijainc@amazon.com>

gf2121 added a commit to gf2121/lucene that referenced this pull request Apr 25, 2025

Provide better impacts for fields indexed with IndexOptions.DOCS (apa…

6e922b9

…che#14511) Co-Authored-by: expani <anijainc@amazon.com>

gf2121 added a commit to gf2121/lucene that referenced this pull request Apr 25, 2025

Provide better impacts for fields indexed with IndexOptions.DOCS (apa…

3a13282

…che#14511) Co-Authored-by: expani <anijainc@amazon.com>

gf2121 mentioned this pull request Apr 25, 2025

[Backport] Provide better impacts for fields indexed with IndexOptions.DOCS #14557

Merged

gf2121 added a commit to gf2121/lucene that referenced this pull request Apr 25, 2025

Provide better impacts for fields indexed with IndexOptions.DOCS (apa…

b4dfba0

…che#14511) Co-Authored-by: expani <anijainc@amazon.com>

gf2121 mentioned this pull request Apr 25, 2025

[Backport] Provide better impacts for fields indexed with IndexOptions.DOCS #14558

Merged

Merge branch 'main' into perf_term_14445

7610d9c

gf2121 merged commit d05d6fe into apache:main Apr 25, 2025
2 checks passed

github-project-automation bot moved this from Open to Merged in OpenSearch Lucene & Core Performance Tracking Apr 25, 2025

ChrisHegarty mentioned this pull request Apr 27, 2025

Change in behaviour of totalHitsThreshold with #14511 #14561

Closed

martijnvg mentioned this pull request May 27, 2025

Remove use_default_lucene_postings_format feature flag elastic/elasticsearch#128509

Merged

msfroh mentioned this pull request Oct 1, 2025

[10.3] Fix returned Impacts when frequencies are not indexed #15263

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide better impacts for fields indexed with IndexOptions.DOCS#14511

Provide better impacts for fields indexed with IndexOptions.DOCS#14511
gf2121 merged 13 commits intoapache:mainfrom
expani:perf_term_14445

expani commented Apr 16, 2025 •

edited by jpountz

Loading

Uh oh!

Uh oh!

jpountz left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jpountz left a comment

expani commented Apr 25, 2025

jpountz commented Apr 25, 2025

jpountz commented Apr 25, 2025

expani commented Apr 25, 2025

expani commented Apr 25, 2025

ChrisHegarty commented Apr 25, 2025

expani commented Apr 25, 2025

ChrisHegarty commented Apr 25, 2025 •

edited

Loading

expani commented Apr 25, 2025

gf2121 commented Apr 25, 2025

expani commented Apr 25, 2025 •

edited

Loading

gf2121 commented Apr 25, 2025

gf2121 commented Apr 25, 2025

Uh oh!

Labels

5 participants

Conversation

expani commented Apr 16, 2025 • edited by jpountz Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

jpountz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jpountz left a comment

Choose a reason for hiding this comment

expani commented Apr 25, 2025

jpountz commented Apr 25, 2025

jpountz commented Apr 25, 2025

expani commented Apr 25, 2025

expani commented Apr 25, 2025

ChrisHegarty commented Apr 25, 2025

expani commented Apr 25, 2025

ChrisHegarty commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

expani commented Apr 25, 2025

gf2121 commented Apr 25, 2025

expani commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

gf2121 commented Apr 25, 2025

gf2121 commented Apr 25, 2025

Uh oh!

Labels

5 participants

expani commented Apr 16, 2025 •

edited by jpountz

Loading

ChrisHegarty commented Apr 25, 2025 •

edited

Loading

expani commented Apr 25, 2025 •

edited

Loading