Use the new merge executor for intra-merge parallelism#137853
Use the new merge executor for intra-merge parallelism#137853elasticsearchmachine merged 36 commits intoelastic:mainfrom
Conversation
|
Hi @benwtrent, I've created a changelog YAML for you. |
|
Buildkite benchmark this with so-vector please |
💚 Build Succeeded
This build ran two so-vector benchmarks to evaluate performance impact of this PR. History |
|
OK, benchmarked again, just on OpenAI dataset. There seems to be pretty much no impact on things we are trying to measure (parallel search & indexing). I think this is because we are simply using the same threads as the merging thread pool, which allows things to be throttled and partitioned just like a regular merge. So, if just a single merge was happening, it would use all the merge threads. candidate: |
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
|
|
||
| public static final Setting<Boolean> INTRA_MERGE_PARALLELISM_ENABLED_SETTING = Setting.boolSetting( | ||
| "index.merge.intra_merge_parallelism_enabled", | ||
| false, |
There was a problem hiding this comment.
Any thoughts on making true the default?
There was a problem hiding this comment.
@iverase we could. Let me see if distrib indexing has any opinion here.
There was a problem hiding this comment.
@iverase I am thinking we first release this setting as "preview" and default its value if the build is "snapshot" or not.
There was a problem hiding this comment.
sounds like a good plan to me.
🔍 Preview links for changed docs |
ℹ️ Important: Docs version tagging👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version. We use applies_to tags to mark version-specific features and changes. Expand for a quick overviewWhen to use applies_to tags:✅ At the page level to indicate which products/deployments the content applies to (mandatory) What NOT to do:❌ Don't remove or replace information that applies to an older version 🤔 Need help?
|

Now that we have a nice shiny executor service to serve background merges, let's steal some threads for intra-merge parallelism...
For merge isolated actions, this is an almost linear speed improvement by throwing more threads at the HNSW graph merge. Of course, this uses more resources.
Benchmarks show that since now we restrict the threadpool to be the same as the merging pool, there is no impact on resource utilization when indexing & querying on the same nodes.
However, if there aren't many merges occurring, and the user is force-merging, this sucker is a huge improvement. Maybe we should start giving more threads to merges... ;)