Skip to content

Add DirectIO bulk rescoring#135380

Merged
elasticsearchmachine merged 13 commits intoelastic:mainfrom
benwtrent:direct-io-bulk-scoring
Oct 13, 2025
Merged

Add DirectIO bulk rescoring#135380
elasticsearchmachine merged 13 commits intoelastic:mainfrom
benwtrent:direct-io-bulk-scoring

Conversation

@benwtrent
Copy link
Member

This adds directIO bulk rescoring where vectors are prefetched in batches and then bulk scored with the random vector scorer.

Without #134803 this doesn't do much.

I haven't really optimized the batch sizes, I am sure we can pick something better given the knowledge of IO capabilities of the underlying system.

@elasticsearchmachine
Copy link
Collaborator

Hi @benwtrent, I've created a changelog YAML for you.

@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Sep 24, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

}
}

static class Lucene99FlatBulkScoringVectorsReader extends FlatVectorsReader {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So much ceremony required...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thecoop yes, some of it should go away with Lucene 10.4 the nice thing is that the top level format name remains unchanged, so its a easy removal once the new lucene is released.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a PR for the corresponding changes on lucene_snapshot?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thecoop no, not yet.

@benwtrent benwtrent requested a review from a team as a code owner October 10, 2025 18:53
@benwtrent benwtrent requested a review from thecoop October 10, 2025 18:53
@benwtrent benwtrent added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Oct 13, 2025
@elasticsearchmachine elasticsearchmachine merged commit b98f8fa into elastic:main Oct 13, 2025
34 checks passed
@benwtrent benwtrent deleted the direct-io-bulk-scoring branch October 13, 2025 21:53
<inspection_tool class="jol" enabled="false" level="WARNING" enabled_by_default="false" />
</profile>
</component>
</component> No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert this

Kubik42 pushed a commit to Kubik42/elasticsearch that referenced this pull request Oct 16, 2025
This adds directIO bulk rescoring where vectors are prefetched in
batches and then bulk scored with the random vector scorer.

Without elastic#134803 this
doesn't do much. 

I haven't really optimized the batch sizes, I am sure we can pick
something better given the knowledge of IO capabilities of the
underlying system.
Kubik42 pushed a commit to Kubik42/elasticsearch that referenced this pull request Oct 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >enhancement :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.3.0

4 participants