Skip to content

Reload the keystore for Elasticsearch 9.3 without node restarts #8922

@pebrc

Description

@pebrc

The changes in elastic/elasticsearch#138052 allow us to revisit the keystore implementation.

The POST /_nodes/reload_secure_settings API now returns:

{
  "cluster_name": "my-cluster",
  "nodes": {
    "node-id-1": {
      "name": "node-1",
      "secure_setting_names": ["s3.client.default.access_key", "..."],
      "keystore_path": "/usr/share/elasticsearch/config/elasticsearch.keystore",
      "keystore_digest": "a3f2e9...",  ← SHA-256 hash for verification
      "keystore_last_modified_time": "2025-11-20T10:30:00Z"
    }
  }
}

The Problem

Currently, updating keystore secrets triggers a rolling restart of all Elasticsearch pods:

  1. Secret hash is included in pod annotations
  2. When secret changes → hash changes → pod restart triggered
  3. This can cause temporary unavailability and is slow for large clusters

The reason for this implementation was that it was unknowable for the operator what the current state inside the pod was. If had had a way to continously update the keystore we we would have needed a way to know when was the right time to call the reload API so that all nodes had a consistent view. The 9.3 API change makes this now possible.

The Idea

For Elasticsearch 9.3+:

  1. Move away from the init container based keystore creation
  2. Use a k8s job to create a keystore file that is then shared via a secret mount with all Elasticsearch nodes
  3. Use the enhanced reload_secure_settings API that returns keystore digests
  4. Retry/requeue the reload call until all nodes report the expected digest (convergence)

Open questions

  1. how to extract the keystore file from the k8s job (exec, write permissions for a secret?)
  2. tracking: we probably need to track the last seen digest in an annotation or status to avoid having to reload the keystore continuously

Alternatives

  1. reimplment the keystore builder in Go in the operator (not recommended by the ES team the format changes over time)
  2. implement a sidecar to reload the keystore (extra resources for each node, doing nothing most of the time, needs to be there even if no secure settings are needed to avoid a restart)

Metadata

Metadata

Assignees

Labels

>enhancementEnhancement of existing functionality

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions