Semantic Text Chunking Indexing Pressure#125517
Merged
Mikep86 merged 54 commits intoelastic:mainfrom Apr 14, 2025
Merged
Conversation
…emitted to XContent
Mikep86
commented
Mar 24, 2025
x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/InferenceException.java
Show resolved
Hide resolved
Contributor
Author
|
@elasticmachine update branch |
davidkyle
approved these changes
Apr 10, 2025
Collaborator
|
Hi @Mikep86, I've created a changelog YAML for you. |
Collaborator
|
Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing) |
Collaborator
💔 Backport failed
You can use sqren/backport to manually backport by running |
Contributor
Author
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
Mikep86
added a commit
to Mikep86/elasticsearch
that referenced
this pull request
Apr 28, 2025
We have observed many OOMs due to the memory required to inject chunked inference results for semantic_text fields. This PR uses coordinating indexing pressure to account for this memory usage. When indexing pressure memory usage exceeds the threshold set by indexing_pressure.memory.limit, chunked inference result injection will be suspended to prevent OOMs. (cherry picked from commit 85713f7) # Conflicts: # server/src/main/java/org/elasticsearch/node/NodeConstruction.java # server/src/main/java/org/elasticsearch/node/PluginServiceInstances.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/InferencePlugin.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/action/filter/ShardBulkInferenceActionFilterTests.java
elasticsearchmachine
pushed a commit
that referenced
this pull request
Apr 28, 2025
* Semantic Text Chunking Indexing Pressure (#125517) We have observed many OOMs due to the memory required to inject chunked inference results for semantic_text fields. This PR uses coordinating indexing pressure to account for this memory usage. When indexing pressure memory usage exceeds the threshold set by indexing_pressure.memory.limit, chunked inference result injection will be suspended to prevent OOMs. (cherry picked from commit 85713f7) # Conflicts: # server/src/main/java/org/elasticsearch/node/NodeConstruction.java # server/src/main/java/org/elasticsearch/node/PluginServiceInstances.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/InferencePlugin.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/action/filter/ShardBulkInferenceActionFilterTests.java * [CI] Auto commit changes from spotless --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
We have observed many OOMs due to the memory required to inject chunked inference results for
semantic_textfields. This PR uses coordinating indexing pressure to account for this memory usage. When indexing pressure memory usage exceeds the threshold set byindexing_pressure.memory.limit, chunked inference result injection will be suspended to prevent OOMs.