Straight forward bug:
These 3 maps:
private final SortedMap<String, LifecyclePolicyMetadata> lifecyclePolicyMap;
// keeps track of what the first step in a policy is, the key is policy name
private final Map<String, Step> firstStepMap;
// keeps track of a mapping from policy/step-name to respective Step, the key is policy name
private final Map<String, Map<Step.StepKey, Step>> stepMap;
are all plain hash or tree maps and get updated on every cluster state update on the cluster state applier thread. but are accessed from at least the CS threads and the ILM scheduler threads concurrently.
We need to do something to synchronize them correctly or else at least the period execution will break in subtle ways here and there.