Skip to content

[8.19] [Failure store] Introduce dedicated failure store lifecycle configuration (#127314)#127577

Merged
elasticsearchmachine merged 4 commits intoelastic:8.19from
gmarouli:backport/8.19/pr-127314
May 3, 2025
Merged

[8.19] [Failure store] Introduce dedicated failure store lifecycle configuration (#127314)#127577
elasticsearchmachine merged 4 commits intoelastic:8.19from
gmarouli:backport/8.19/pr-127314

Conversation

@gmarouli
Copy link
Contributor

Backport

This will backport the following commits from main to 8.19:

Questions ?

Please refer to the Backport tool documentation

…tion (elastic#127314)

The failure store is a set of data stream indices that are used to store certain type of ingestion failures. Until this moment they were sharing the configuration of the backing indices. We understand that the two data sets have different lifecycle needs.

We believe that typically the failures will need to be retained much less than the data. Considering this we believe the lifecycle needs of the failures also more limited and they fit better the simplicity of the data stream lifecycle feature.

This allows the user to only set the desired retention and we will perform the rollover and other maintenance tasks without the user having to think about them. Furthermore, having only one lifecycle management feature allows us to ensure that these data is managed by default.

This PR introduces the following:

Configuration

We extend the failure store configuration to allow lifecycle configuration too, this configuration reflects the user's configuration only as shown below:

PUT _data_stream/*/options
{
  "failure_store": {
     "lifecycle": {
       "data_retention": "5d"
     }
  }
}

GET _data_stream/*/options

{
  "data_streams": [
    {
      "name": "my-ds",
      "options": {
        "failure_store": {
          "lifecycle": {
            "data_retention": "5d"
          }
        }
      }
    }
  ]
}
To retrieve the effective configuration you need to use the GET data streams API, see elastic#126668

Functionality

The data stream lifecycle (DLM) will manage the failure indices regardless if the failure store is enabled or not. This will ensure that if the failure store gets disabled we will not have stagnant data.
The data stream options APIs reflect only the user's configuration.
The GET data stream API should be used to check the current state of the effective failure store configuration.
Telemetry
We extend the data stream failure store telemetry to also include the lifecycle telemetry.

{
  "data_streams": {
     "available": true,
     "enabled": true,
     "data_streams": 10,
     "indices_count": 50,
     "failure_store": {
       "explicitly_enabled_count": 1,
       "effectively_enabled_count": 15,
       "failure_indices_count": 30
       "lifecycle": {
         "explicitly_enabled_count": 5,
         "effectively_enabled_count": 20,
         "data_retention": {
           "configured_data_streams": 5,
           "minimum_millis": X,
           "maximum_millis": Y,
           "average_millis": Z,
          },
          "effective_retention": {
            "retained_data_streams": 20,
            "minimum_millis": X,
            "maximum_millis": Y,
            "average_millis": Z
          },
         "global_retention": {
           "max": {
             "defined": false
           },
           "default": {
             "defined": true,  <------ this is the default value applicable for the failure store
             "millis": X
           }
        }
      }
   }
}
Implementation details

We ensure that partially reset failure store will create valid failure store configuration.
We ensure that when a node communicates with a note with a previous version it will ensure it will not send an invalid failure store configuration enabled: null.

(cherry picked from commit 03d7781)

# Conflicts:
#	modules/data-streams/src/main/java/org/elasticsearch/datastreams/DataStreamsPlugin.java
#	modules/data-streams/src/main/java/org/elasticsearch/datastreams/lifecycle/DataStreamLifecycleService.java
#	modules/data-streams/src/test/java/org/elasticsearch/datastreams/lifecycle/DataStreamLifecycleServiceTests.java
#	server/src/main/java/org/elasticsearch/TransportVersions.java
#	server/src/main/java/org/elasticsearch/cluster/metadata/MetadataDataStreamsService.java
#	server/src/test/java/org/elasticsearch/cluster/metadata/DataStreamTests.java
@gmarouli gmarouli added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label May 3, 2025
@elasticsearchmachine elasticsearchmachine merged commit a2e1925 into elastic:8.19 May 3, 2025
15 checks passed
@gmarouli gmarouli deleted the backport/8.19/pr-127314 branch May 3, 2025 11:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport v8.19.0

2 participants