Skip to content

[Backport release-1.34] Remove retry cap when waiting for Helm CRDs#7259

Merged
twz123 merged 1 commit intorelease-1.34from
backport-7252-to-release-1.34
Mar 12, 2026
Merged

[Backport release-1.34] Remove retry cap when waiting for Helm CRDs#7259
twz123 merged 1 commit intorelease-1.34from
backport-7252-to-release-1.34

Conversation

@k0s-bot
Copy link
Copy Markdown
Contributor

@k0s-bot k0s-bot commented Mar 11, 2026

Automated backport to release-1.34, triggered by a label in #7252.
See .

@k0s-bot k0s-bot requested review from a team as code owners March 11, 2026 12:54
@k0s-bot k0s-bot requested review from kke and ncopa March 11, 2026 12:54
@twz123 twz123 changed the title [Backport release-1.34] [Backport release-1.35] Remove retry cap when waiting for Helm CRDs Mar 11, 2026
@twz123 twz123 added bug Something isn't working area/controlplane backport/release-1.33 PR that needs to be backported/cherrypicked to the release-1.33 branch labels Mar 11, 2026
The extensions controller component relies on the availability of the
Helm CRD. The component's startup is delayed until the CRD becomes
available. The availability check is repeated using retry-go, which
defaults to ten attempts with an exponential backoff. Overall, this will
usually bail out in under a minute, resulting in a k0s restart.

This is problematic if it just takes longer for the CRD to become
available. During initial cluster bootstrapping, the API server
may receive a high volume of traffic, which takes time to process,
especially for less powerful controllers. In these cases, k0s exits
too early.

Another example is a borked leader lease: client-go's leader election
waits for at least the lease duration before trying to acquire an
abandoned lease, even if the last renewal time was way in the past. In
such situations, bailing out too early effectively causes k0s to enter a
crash loop.

Replace the capped retry/backoff with a context-cancellable endless
loop that checks the CRD every two seconds, occasionally issues a log
statement.

See: 05f7867 ("Remove timeout from dynamic cluster config initializer")
Signed-off-by: Tom Wieczorek <twieczorek@mirantis.com>
(cherry picked from commit 0cf4813)
(cherry picked from commit 53079d3)
@twz123 twz123 force-pushed the backport-7252-to-release-1.34 branch from f219a74 to d404cbb Compare March 11, 2026 13:11
@twz123 twz123 enabled auto-merge March 11, 2026 17:26
@twz123 twz123 merged commit 68d22c6 into release-1.34 Mar 12, 2026
198 of 202 checks passed
@twz123 twz123 deleted the backport-7252-to-release-1.34 branch March 12, 2026 07:49
@k0s-bot
Copy link
Copy Markdown
Contributor Author

k0s-bot commented Mar 12, 2026

Successfully created backport PR for release-1.33:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/controlplane backport/release-1.33 PR that needs to be backported/cherrypicked to the release-1.33 branch bug Something isn't working

3 participants