ILM can get Stuck in an Infinite Snapshot Create and Delete Loop if Snapshotting Partially Fails

The ILM create snapshot step can become stuck in an endless loop of snapshot create and delete if the snapshot creates.
This seems to be a result of org.elasticsearch.xpack.ilm.IndexLifecycleRunner#maybeRunAsyncAction always executing the next async step if an async step completes and the org.elasticsearch.xpack.core.ilm.CreateSnapshotStep triggering a delete of any partial snapshot it creates, which then on completion triggers the create step again. This loop is not even broken by stopping ILM completely because there's no ILM execution state check in the loop.

I think this loop should be broken at least if ILM has been stopped. Maybe in addition to that it would make sense to have a maximum retry count here?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ILM can get Stuck in an Infinite Snapshot Create and Delete Loop if Snapshotting Partially Fails #85097

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ILM can get Stuck in an Infinite Snapshot Create and Delete Loop if Snapshotting Partially Fails #85097

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions