Skip to content

Add node scoped vectors.indexing.use_gpu setting#138738

Merged
mayya-sharipova merged 8 commits intoelastic:mainfrom
mayya-sharipova:gpu_node_setting
Dec 2, 2025
Merged

Add node scoped vectors.indexing.use_gpu setting#138738
mayya-sharipova merged 8 commits intoelastic:mainfrom
mayya-sharipova:gpu_node_setting

Conversation

@mayya-sharipova
Copy link
Contributor

Replace per-index GPU setting with a static node-scoped
vectors.indexing.use_gpu setting supporting true/false/auto modes
(default: auto).

  • Node will fail to start if GPU vector indexing is required (true), but unavailable.
  • GPU will be used for indexing on supported types if setting is true or auto and GPU is available.
Replace per-index GPU setting with a static node-scoped vectors.indexing.use_gpu
setting supporting true/false/auto modes (default: auto). Node will fail to start
if GPU vector indexing is required but unavailable.
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine
Copy link
Collaborator

Hi @mayya-sharipova, I've created a changelog YAML for you.

@mayya-sharipova mayya-sharipova added the test-gpu Run tests using a GPU label Nov 27, 2025
Copy link
Contributor

@ChrisHegarty ChrisHegarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mayya-sharipova mayya-sharipova removed the test-gpu Run tests using a GPU label Dec 1, 2025
@mayya-sharipova mayya-sharipova added auto-backport Automatically create backport pull requests when merged auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) labels Dec 1, 2025
Copy link
Contributor

@ldematte ldematte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks! Left some questions, mainly about testing.

}
}

public void testSearchWithoutGPU() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want/need to replace this test? (search on the CPU works when GPU disabled?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point.
this test is good to have but may be not in GPUIndexIT, as the cluster in this test is started with GpuMode.TRUE.name.

I will follow up with another PR to add this test in a separate module.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In alternative, it maybe can be a different YAML test (see below). OK to add it as a follow-up.

Settings settings = context.settings();
GpuMode gpuMode = VECTORS_INDEXING_USE_GPU_NODE_SETTING.get(settings);

if (gpuMode == GpuMode.TRUE && GPUSupport.isSupported() == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to check license here too? (isGpuIndexingFeatureAllowed()?)

Copy link
Contributor Author

@mayya-sharipova mayya-sharipova Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this, but decided against it for these reasons:

  • Looks like Boostrap check is designed for infrastructure check, not business logic. I checked all other its implementers and all they only do infrastructure-focused checks (JVM heap, memory, file size etc)
  • License are mutable and updatable through API. Thus we should not fail to start a node without license, because a license can be later updated through an API to enable the GPu feature.

WDYT, @ldematte ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit torn; your points are very good, but I wonder if it's a better user experience if we tell the user right away if they want to use the GPU and don't have a valid license. True, the license can be updated afterwards, but this would be a first check that everything (or most things) are in the right place.

Since I'm not 100% sure what's best, we can leave it this way.

Copy link
Contributor Author

@mayya-sharipova mayya-sharipova Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will raise this for our sync discussion.

.module("gpu")
.setting("xpack.license.self_generated.type", "trial")
.setting("xpack.security.enabled", "false")
.setting("vectors.indexing.use_gpu", "true")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want/need tests for false too?

Copy link
Contributor Author

@mayya-sharipova mayya-sharipova Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am interested to know why you think we need to test for false?
For false we default to CPU and we already have a lot of yml tests outside of GPU plugin for testing how hsnw and int8_hnws work on CPU.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about the test above, "(search on the CPU works when GPU disabled)". Instead of a IT test, it could be a YAML test. (Either one or the other, no need to have both).
It's OK to add either in a follow-up though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add the follow up.

@mayya-sharipova mayya-sharipova removed the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Dec 1, 2025
Copy link
Contributor

@ldematte ldematte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@mayya-sharipova mayya-sharipova merged commit 38de567 into elastic:main Dec 2, 2025
34 checks passed
@mayya-sharipova mayya-sharipova deleted the gpu_node_setting branch December 2, 2025 11:48
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
9.2
mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this pull request Dec 2, 2025
Replace per-index GPU setting with a static node-scoped vectors.indexing.use_gpu
setting supporting true/false/auto modes (default: auto). Node will fail to start
if GPU vector indexing is required but unavailable.
elasticsearchmachine pushed a commit that referenced this pull request Dec 2, 2025
Replace per-index GPU setting with a static node-scoped vectors.indexing.use_gpu
setting supporting true/false/auto modes (default: auto). Node will fail to start
if GPU vector indexing is required but unavailable.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged >enhancement :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.2.3 v9.3.0

4 participants