[ML] Implementing latency improvements for EIS integration by jonathan-buttner · Pull Request #133861 · elastic/elasticsearch

jonathan-buttner · 2025-08-29T21:03:33Z

This PR implements some of the improvements from here: #133263

Notably:

Response specific threadpool inference_response_thread_pool
Allow reuse of persistent connections for connections that use mTLS via clientBuilder.disableConnectionState();
Ensuring that the response input stream is closed via EntityUtils.consumeQuietly(response.getEntity());
Increasing max_total_connections to 500 and max_route_connections to 200
Decreasing xpack.inference.http.retry.initial_delay to 20ms from 1 second
Looping in the RequestExecutorService when work was accomplished instead of scheduling a new thread for 0 ms

elasticsearchmachine · 2025-08-29T21:04:14Z

Hi @jonathan-buttner, I've created a changelog YAML for you.

…ve-latency

…ticsearch into ml-improve-latency

…ve-latency

elasticsearchmachine · 2025-09-04T15:12:15Z

Pinging @elastic/ml-core (Team:ML)

...c/main/java/org/elasticsearch/xpack/core/inference/action/GetInferenceDiagnosticsAction.java

...lasticsearch/xpack/core/inference/action/GetInferenceDiagnosticsActionNodeResponseTests.java

...ugin/inference/src/main/java/org/elasticsearch/xpack/inference/external/http/HttpClient.java

davidkyle · 2025-09-08T12:50:31Z

...ference/src/main/java/org/elasticsearch/xpack/inference/external/http/HttpClientManager.java

    public static final Setting<Integer> MAX_ROUTE_CONNECTIONS = Setting.intSetting(
        "xpack.inference.http.max_route_connections",
-        20, // default
+        200, // default


10x the default value is quite a step. Can we explore changing this with overrides in the environments where EIS is available

After doing some more research, allowing more connections will results in more memory and file descriptors being used.

Can we explore changing this with overrides in the environments where EIS is available

I suspect that this would mean we'd need to put in a lot of manual overrides. Maybe we leave these defaults as is for now and add metrics to get a better idea of what typical usage looks like.

If the cluster is located in the same region and provider as EIS I typically saw ~20 connections being used after connections already existed in the pool. So when the first spike of traffic occurs it'll likely be limited by the 20 limit here and then hopefully go down after that.

davidkyle · 2025-09-08T12:55:02Z

...rence/src/main/java/org/elasticsearch/xpack/inference/external/http/retry/RetrySettings.java

    static final Setting<TimeValue> RETRY_INITIAL_DELAY_SETTING = Setting.timeSetting(
        "xpack.inference.http.retry.initial_delay",
-        TimeValue.timeValueSeconds(1),
+        TimeValue.timeValueMillis(20),


1 second is too slow and a bad default value but I don't know what a good default is. 20ms is a very short delay, perhaps 100ms?

My understanding is that the latency was due to the connection pool configuration and retries weren't really happening. It would be good to limit the scope of the changes in this PR if possible

Yeah I can switch to 100.

davidkyle · 2025-09-08T12:55:43Z

...main/java/org/elasticsearch/xpack/inference/external/http/sender/RequestExecutorService.java

+                    timeToWait = TimeValue.min(endpoint.executeEnqueuedTask(), timeToWait);
+                }
+                // if we execute a task the timeToWait will be 0 so we'll immediately look for more work
+            } while (timeToWait.compareTo(TimeValue.ZERO) <= 0);


nice, that was a lot easier then we thought it would be

…ticsearch into ml-improve-latency

…ve-latency

davidkyle

LGTM

…33861) * Adding latency improvements * Update docs/changelog/133861.yaml * [CI] Auto commit changes from spotless * Renaming test executor getter and adding response executor * [CI] Auto commit changes from spotless * Address feedback --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>

Adding latency improvements

9b01b15

jonathan-buttner added >bug :ml Machine learning Team:ML Meta label for the ML team v9.2.0 labels Aug 29, 2025

Update docs/changelog/133861.yaml

fdc9a24

elasticsearchmachine and others added 3 commits August 29, 2025 21:11

[CI] Auto commit changes from spotless

a5583e0

Merge branch 'main' of github.com:elastic/elasticsearch into ml-impro…

6913aba

…ve-latency

Merge branch 'ml-improve-latency' of github.com:jonathan-buttner/elas…

fc75d85

…ticsearch into ml-improve-latency

jonathan-buttner mentioned this pull request Sep 2, 2025

[ML] Refactor inference API test threadpool creation #134007

Merged

jonathan-buttner and others added 4 commits September 2, 2025 14:17

Renaming test executor getter and adding response executor

56b4d36

Merge branch 'main' of github.com:elastic/elasticsearch into ml-impro…

7ec75ae

…ve-latency

[CI] Auto commit changes from spotless

6e172db

Merge branch 'main' into ml-improve-latency

21988b6

jonathan-buttner marked this pull request as ready for review September 4, 2025 15:11

DonalEvans reviewed Sep 5, 2025

View reviewed changes

davidkyle reviewed Sep 8, 2025

View reviewed changes

jonathan-buttner added 3 commits September 8, 2025 10:27

Address feedback

15d8cae

Merge branch 'ml-improve-latency' of github.com:jonathan-buttner/elas…

be7b9a6

…ticsearch into ml-improve-latency

Merge branch 'main' of github.com:elastic/elasticsearch into ml-impro…

873aa4d

…ve-latency

DonalEvans approved these changes Sep 8, 2025

View reviewed changes

davidkyle approved these changes Sep 9, 2025

View reviewed changes

jonathan-buttner merged commit 65dcdf2 into elastic:main Sep 9, 2025
33 checks passed

jonathan-buttner mentioned this pull request Sep 22, 2025

[TESTING][ML] Investigating inference API throughput improvements #133263

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Implementing latency improvements for EIS integration#133861

[ML] Implementing latency improvements for EIS integration#133861
jonathan-buttner merged 12 commits intoelastic:mainfrom
jonathan-buttner:ml-improve-latency

jonathan-buttner commented Aug 29, 2025 •

edited

Loading

elasticsearchmachine commented Aug 29, 2025

elasticsearchmachine commented Sep 4, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

davidkyle Sep 8, 2025

jonathan-buttner Sep 8, 2025

davidkyle Sep 8, 2025

jonathan-buttner Sep 8, 2025

davidkyle Sep 8, 2025

davidkyle left a comment

Uh oh!

Labels

4 participants

Conversation

jonathan-buttner commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

elasticsearchmachine commented Aug 29, 2025

elasticsearchmachine commented Sep 4, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

davidkyle Sep 8, 2025

Choose a reason for hiding this comment

jonathan-buttner Sep 8, 2025

Choose a reason for hiding this comment

davidkyle Sep 8, 2025

Choose a reason for hiding this comment

jonathan-buttner Sep 8, 2025

Choose a reason for hiding this comment

davidkyle Sep 8, 2025

Choose a reason for hiding this comment

davidkyle left a comment

Choose a reason for hiding this comment

Uh oh!

Labels

4 participants

jonathan-buttner commented Aug 29, 2025 •

edited

Loading