Optimize memory usage in ShardBulkInferenceActionFilter#124313

Merged

jimczi merged 15 commits intoelastic:mainfrom

jimczi:shard_bulk_inference_filter_memory

Mar 14, 2025

Contributor

jimczi commented Mar 7, 2025

This refactor improves memory efficiency by processing inference requests in batches, capped by a max input length.

Changes include:

A new dynamic operator setting to control the maximum batch size in bytes.
Dropping input data from inference responses when the legacy semantic text format isn’t used, saving memory.
Clearing inference results dynamically after each bulk item to free up memory sooner.

This is a step toward enabling circuit breakers to better handle memory usage when dealing with large inputs.


          Optimize memory usage in ShardBulkInferenceActionFilter

8b47105

This refactor improves memory efficiency by processing inference requests in batches, capped by a max input length.

Changes include:
- A new dynamic operator setting to control the maximum batch size in bytes.
- Dropping input data from inference responses when the legacy semantic text format isn’t used, saving memory.
- Clearing inference results dynamically after each bulk item to free up memory sooner.

This is a step toward enabling circuit breakers to better handle memory usage when dealing with large inputs.

jimczi added >enhancement :SearchOrg/Relevance :SearchOrg/Inference v8.19.0 v9.1.0 labels

jimczi requested review from Mikep86 and jan-elastic

March 7, 2025 12:34

Collaborator

elasticsearchmachine commented Mar 7, 2025

Pinging @elastic/search-inference-team (Team:Search - Inference)

Collaborator

elasticsearchmachine commented Mar 7, 2025

Pinging @elastic/search-eng (Team:SearchOrg)

Collaborator

elasticsearchmachine commented Mar 7, 2025

Pinging @elastic/search-relevance (Team:Search - Relevance)

Collaborator

elasticsearchmachine commented Mar 7, 2025

Hi @jimczi, I've created a changelog YAML for you.

jimczi mentioned this pull request

Prevent duplicate source parsing in ShardBulkInferenceActionFilter. #124186

Closed

jimczi added the :Search Relevance/Search label

Collaborator

elasticsearchmachine commented Mar 7, 2025

Pinging @elastic/es-search-relevance (Team:Search Relevance)

kderusso approved these changes

View reviewed changes

Member

kderusso left a comment

Overall changes LGTM

...ain/java/org/elasticsearch/xpack/inference/action/filter/ShardBulkInferenceActionFilter.java Show resolved Hide resolved

jan-elastic approved these changes

View reviewed changes

Contributor

jan-elastic left a comment

LGTM. Few small code comments, and I think there's still an inefficiency (but that's also prior to this PR).

...ain/java/org/elasticsearch/xpack/inference/action/filter/ShardBulkInferenceActionFilter.java Outdated Show resolved Hide resolved

...ain/java/org/elasticsearch/xpack/inference/action/filter/ShardBulkInferenceActionFilter.java Outdated Show resolved Hide resolved

...ain/java/org/elasticsearch/xpack/inference/action/filter/ShardBulkInferenceActionFilter.java Outdated Show resolved Hide resolved

...ain/java/org/elasticsearch/xpack/inference/action/filter/ShardBulkInferenceActionFilter.java

-                                  // ignore delete request
-                                  continue;
+                          if (useLegacyFormat) {
+                              var newDocMap = indexRequest.sourceAsMap();

Contributor

jan-elastic Mar 7, 2025

I'll close my PR; it conflicts badly with this.

I'll check whether this resolves the inefficiency I spotted and let you know.

...ain/java/org/elasticsearch/xpack/inference/action/filter/ShardBulkInferenceActionFilter.java Outdated Show resolved Hide resolved

...ain/java/org/elasticsearch/xpack/inference/action/filter/ShardBulkInferenceActionFilter.java Show resolved Hide resolved

Mikep86 approved these changes

View reviewed changes

Contributor

Mikep86 left a comment

Looks good overall, I left a few non-blocking comments

...ain/java/org/elasticsearch/xpack/inference/action/filter/ShardBulkInferenceActionFilter.java Show resolved Hide resolved

...ain/java/org/elasticsearch/xpack/inference/action/filter/ShardBulkInferenceActionFilter.java Outdated Show resolved Hide resolved

...ain/java/org/elasticsearch/xpack/inference/action/filter/ShardBulkInferenceActionFilter.java Outdated Show resolved Hide resolved

jimczi added 4 commits

March 7, 2025 16:29


          handle batch size as a real maximum

6cb165b


          Merge remote-tracking branch 'upstream/main' into shard_bulk_inferenc…

76e2e1a

…e_filter_memory


          Revert "handle batch size as a real maximum"

381a85a

This reverts commit 6cb165b.


          Address review comments

06a96e9

jimczi force-pushed the shard_bulk_inference_filter_memory branch from 7cb386f to 06a96e9 Compare

March 7, 2025 16:55

jimczi and others added 7 commits

March 7, 2025 16:57


          fix max batch_size value

df4c2b9


          address review comment (2)

6b015bd


          remove unused limit

c08051d


          Merge remote-tracking branch 'upstream/main' into shard_bulk_inferenc…

a7aa1d8

…e_filter_memory


          [CI] Auto commit changes from spotless

b902c6a


          Merge branch 'main' into shard_bulk_inference_filter_memory


          fix test compil

038cbaa


          Merge branch 'main' into shard_bulk_inference_filter_memory

e4bd138

jimczi added the auto-backport label


          Update docs/changelog/124313.yaml

883cfeb

Collaborator

elasticsearchmachine commented Mar 14, 2025

Hi @jimczi, I've created a changelog YAML for you.


          Fix changelog

8225c91

jimczi merged commit 361b51d into elastic:main

17 checks passed

jimczi deleted the shard_bulk_inference_filter_memory branch

March 14, 2025 09:51

jimczi mentioned this pull request

[8.x] Optimize memory usage in ShardBulkInferenceActionFilter (#124313) #124863

Merged

Collaborator

elasticsearchmachine commented Mar 14, 2025

💚 Backport successful

Status	Branch	Result
✅	8.x

elasticsearchmachine pushed a commit that referenced this pull request


          Optimize memory usage in ShardBulkInferenceActionFilter (#124313) (#1…

17e2721

…24863)

This refactor improves memory efficiency by processing inference requests in batches, capped by a max input length.

Changes include:
- A new dynamic operator setting to control the maximum batch size in bytes.
- Dropping input data from inference responses when the legacy semantic text format isn’t used, saving memory.
- Clearing inference results dynamically after each bulk item to free up memory sooner.

This is a step toward enabling circuit breakers to better handle memory usage when dealing with large inputs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport >enhancement :Search Relevance/Search :SearchOrg/Inference :SearchOrg/Relevance v8.19.0 v9.1.0