Skip to content

ILM can get Stuck in an Infinite Snapshot Create and Delete Loop if Snapshotting Partially Fails #85097

@original-brownbear

Description

@original-brownbear

The ILM create snapshot step can become stuck in an endless loop of snapshot create and delete if the snapshot creates.
This seems to be a result of org.elasticsearch.xpack.ilm.IndexLifecycleRunner#maybeRunAsyncAction always executing the next async step if an async step completes and the org.elasticsearch.xpack.core.ilm.CreateSnapshotStep triggering a delete of any partial snapshot it creates, which then on completion triggers the create step again. This loop is not even broken by stopping ILM completely because there's no ILM execution state check in the loop.

I think this loop should be broken at least if ILM has been stopped. Maybe in addition to that it would make sense to have a maximum retry count here?

Metadata

Metadata

Assignees

No one assigned

    Labels

    :Data Management/ILM+SLMDO NOT USE. Use ":StorageEngine/ILM" or ":Distributed Coordination/SLM" instead.>bugTeam:Data Management (obsolete)DO NOT USE. This team no longer exists.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions