Skip to content

Use the new merge executor for intra-merge parallelism#137853

Merged
elasticsearchmachine merged 36 commits intoelastic:mainfrom
benwtrent:exp/multi-threaded-merging
Dec 16, 2025
Merged

Use the new merge executor for intra-merge parallelism#137853
elasticsearchmachine merged 36 commits intoelastic:mainfrom
benwtrent:exp/multi-threaded-merging

Conversation

@benwtrent
Copy link
Member

@benwtrent benwtrent commented Nov 10, 2025

Now that we have a nice shiny executor service to serve background merges, let's steal some threads for intra-merge parallelism...

For merge isolated actions, this is an almost linear speed improvement by throwing more threads at the HNSW graph merge. Of course, this uses more resources.

Benchmarks show that since now we restrict the threadpool to be the same as the merging pool, there is no impact on resource utilization when indexing & querying on the same nodes.

However, if there aren't many merges occurring, and the user is force-merging, this sucker is a huge improvement. Maybe we should start giving more threads to merges... ;)

@elasticsearchmachine
Copy link
Collaborator

Hi @benwtrent, I've created a changelog YAML for you.

@benwtrent
Copy link
Member Author

Buildkite benchmark this with so-vector please

@elasticmachine
Copy link
Collaborator

elasticmachine commented Nov 11, 2025

💚 Build Succeeded

This build ran two so-vector benchmarks to evaluate performance impact of this PR.

History

@benwtrent
Copy link
Member Author

Benchmarks show nice improvement:

image

Need to benchmark with parallel search

@benwtrent
Copy link
Member Author

OK, benchmarked again, just on OpenAI dataset. There seems to be pretty much no impact on things we are trying to measure (parallel search & indexing). I think this is because we are simply using the same threads as the merging thread pool, which allows things to be throttled and partitioned just like a regular merge. So, if just a single merge was happening, it would use all the merge threads.

candidate:
https://gist.github.com/benwtrent/5ba71f2017257779475b58c34336d9b3#file-multi_candidate-txt

baseline:
https://gist.githubusercontent.com/benwtrent/0c0c1e16bb82acab558c98e1863421e1/raw/7ce4f80fa204c5ef9625d86e031adb4eaa527cd8/baseline.txt

@benwtrent benwtrent marked this pull request as ready for review December 8, 2025 17:55
@benwtrent benwtrent requested a review from a team as a code owner December 8, 2025 17:55
@benwtrent benwtrent requested review from iverase and thecoop December 8, 2025 17:56
@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Dec 8, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

Copy link
Contributor

@iverase iverase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR gives a good overview oh how we hook index settings to vector formats. It looks good to me, I am just wondering if we need some test to check that the setting is propagated correctly to the format.


public static final Setting<Boolean> INTRA_MERGE_PARALLELISM_ENABLED_SETTING = Setting.boolSetting(
"index.merge.intra_merge_parallelism_enabled",
false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any thoughts on making true the default?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iverase we could. Let me see if distrib indexing has any opinion here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iverase I am thinking we first release this setting as "preview" and default its value if the build is "snapshot" or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds like a good plan to me.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 9, 2025

🔍 Preview links for changed docs

@github-actions
Copy link
Contributor

github-actions bot commented Dec 9, 2025

ℹ️ Important: Docs version tagging

👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version.

We use applies_to tags to mark version-specific features and changes.

Expand for a quick overview

When to use applies_to tags:

✅ At the page level to indicate which products/deployments the content applies to (mandatory)
✅ When features change state (e.g. preview, ga) in a specific version
✅ When availability differs across deployments and environments

What NOT to do:

❌ Don't remove or replace information that applies to an older version
❌ Don't add new information that applies to a specific version without an applies_to tag
❌ Don't forget that applies_to tags can be used at the page, section, and inline level

🤔 Need help?

@benwtrent benwtrent requested a review from iverase December 9, 2025 17:22
Copy link
Contributor

@iverase iverase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@benwtrent benwtrent added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Dec 12, 2025
@elasticsearchmachine elasticsearchmachine added the serverless-linked Added by automation, don't add manually label Dec 15, 2025
@elasticsearchmachine elasticsearchmachine merged commit 75770b2 into elastic:main Dec 16, 2025
35 checks passed
@benwtrent benwtrent deleted the exp/multi-threaded-merging branch December 16, 2025 20:21
breskeby pushed a commit to breskeby/elasticsearch that referenced this pull request Feb 11, 2026
breskeby pushed a commit to breskeby/elasticsearch that referenced this pull request Feb 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement :Search Relevance/Vectors Vector search serverless-linked Added by automation, don't add manually Team:Distributed Indexing (obsolete) Meta label for Distributed Indexing team. Obsolete. Please do not use. Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.3.0

4 participants