[Transform] Retry when failing to start indexer#132048
Merged
prwhelan merged 10 commits intoelastic:mainfrom Aug 12, 2025
Merged
[Transform] Retry when failing to start indexer#132048prwhelan merged 10 commits intoelastic:mainfrom
prwhelan merged 10 commits intoelastic:mainfrom
Conversation
Bug: when the indexer state fails to save during a cluster state update, the Transform is stuck in STOPPING and cannot be restarted unless the user force stops to delete the task. Fix: the task will continuously retry starting the indexer until the cluster state update can succeed. Notes: - users can cancel the retry by force stopping the transform - the retry is displayed in the UI as "degraded" with a message as to why the transform is restarting - the transform now displays as STARTING rather than STOPPING until it successfully starts - the retry is audited so it displays in the Messages tab of the UI - the retry timer is randomly selected between 45s and 90s, this should help during rolling restarts for clusters that have a large amount of transforms Fix elastic#128221
Collaborator
|
Hi @prwhelan, I've created a changelog YAML for you. |
Collaborator
|
Pinging @elastic/ml-core (Team:ML) |
jonathan-buttner
approved these changes
Aug 11, 2025
| params.getId(), | ||
| Strings.format( | ||
| "Failed while starting Transform. Automatically retrying every [%s] seconds. " | ||
| + "To cancel retries, force stop this transform. Failure: [%s]", |
Contributor
There was a problem hiding this comment.
What do you think about including the force stop command in the message here? Or is it expected that the user would do that via a UI button (I'm imagining it being done from the dev console)?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bug: when the indexer state fails to save during a cluster state update, the Transform is stuck in STOPPING and cannot be restarted unless the user force stops to delete the task.
Fix: the task will continuously retry starting the indexer until the cluster state update can succeed.
Notes:
Fix #128221