[Fleet] use long expiration for upgrade agents#243443
[Fleet] use long expiration for upgrade agents#243443juliaElastic merged 5 commits intoelastic:mainfrom
Conversation
|
Pinging @elastic/fleet (Team:Fleet) |
| MINIMUM_EXECUTION_DURATION_SECONDS, | ||
| upgradeDurationSeconds | ||
| ); | ||
| // expiration will not be taken into account with Fleet Server version >=8.7, it is kept for BWC |
There was a problem hiding this comment.
expiration will not be taken into account with Fleet Server
It is really the case?
There was a problem hiding this comment.
I think it was never changed, probably that comment is no longer true.
I think we should just use 1 month expiration everywhere to avoid early expiration for scheduled upgrade actions too.
Related issue: #170879
Originally fleet-server used expiration to calculate the schedule upgrades, but this was changed here: elastic/fleet-server#2243
💛 Build succeeded, but was flaky
Failed CI StepsMetrics [docs]
History
|
|
Starting backport for target branches: 9.1, 9.2 |
## Summary Relates elastic/ingest-dev#6421 (comment) Auto upgrade actions currently use a short expiration (20m), which is too short if some of the agents can't checkin for some time to receive the action. If this window is missed, the updating state becomes stuck and not retried ever. Instead using a 1 month expiration like we do for manual upgrades. Also made the change to use 1 month expiration for scheduled upgrades. Expiration is no longer used to calculate scheduled upgrades offset on [fleet-server](elastic/fleet-server#2243) side for a long time, so there is no need to use a different expiration. ### Checklist Check the PR satisfies following conditions. Reviewers should verify this PR satisfies this list as well. - [ ] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md) - [ ] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [ ] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [ ] If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the [docker list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker) - [ ] This was checked for breaking HTTP API changes, and any breaking changes have been approved by the breaking-change committee. The `release_note:breaking` label should be applied in these situations. - [ ] [Flaky Test Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was used on any tests changed - [ ] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) - [ ] Review the [backport guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing) and apply applicable `backport:*` labels. ### Identify risks Does this PR introduce any risks? For example, consider risks like hard to test bugs, performance regression, potential of data loss. Describe the risk, its severity, and mitigation for each identified risk. Invite stakeholders and evaluate how to proceed before merging. - [ ] [See some risk examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) - [ ] ... --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> (cherry picked from commit 022344d)
💔 Some backports could not be created
Note: Successful backport PRs will be merged automatically after passing CI. Manual backportTo create the backport manually run: Questions ?Please refer to the Backport tool documentation |
Relates elastic/ingest-dev#6421 (comment) Auto upgrade actions currently use a short expiration (20m), which is too short if some of the agents can't checkin for some time to receive the action. If this window is missed, the updating state becomes stuck and not retried ever. Instead using a 1 month expiration like we do for manual upgrades. Also made the change to use 1 month expiration for scheduled upgrades. Expiration is no longer used to calculate scheduled upgrades offset on [fleet-server](elastic/fleet-server#2243) side for a long time, so there is no need to use a different expiration. Check the PR satisfies following conditions. Reviewers should verify this PR satisfies this list as well. - [ ] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md) - [ ] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [ ] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [ ] If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the [docker list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker) - [ ] This was checked for breaking HTTP API changes, and any breaking changes have been approved by the breaking-change committee. The `release_note:breaking` label should be applied in these situations. - [ ] [Flaky Test Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was used on any tests changed - [ ] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) - [ ] Review the [backport guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing) and apply applicable `backport:*` labels. Does this PR introduce any risks? For example, consider risks like hard to test bugs, performance regression, potential of data loss. Describe the risk, its severity, and mitigation for each identified risk. Invite stakeholders and evaluate how to proceed before merging. - [ ] [See some risk examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) - [ ] ... --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
# Backport This will backport the following commits from `main` to `9.2`: - [[Fleet] use long expiration for upgrade agents (#243443)](#243443) <!--- Backport version: 9.6.6 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sorenlouv/backport) <!--BACKPORT [{"author":{"name":"Julia Bardi","email":"90178898+juliaElastic@users.noreply.github.com"},"sourceCommit":{"committedDate":"2025-11-20T07:38:36Z","message":"[Fleet] use long expiration for upgrade agents (#243443)\n\n## Summary\n\nRelates\nhttps://github.com/elastic/ingest-dev/issues/6421#issuecomment-3551318678\n\nAuto upgrade actions currently use a short expiration (20m), which is\ntoo short if some of the agents can't checkin for some time to receive\nthe action. If this window is missed, the updating state becomes stuck\nand not retried ever.\n\nInstead using a 1 month expiration like we do for manual upgrades.\n\nAlso made the change to use 1 month expiration for scheduled upgrades.\nExpiration is no longer used to calculate scheduled upgrades offset on\n[fleet-server](elastic/fleet-server#2243) side\nfor a long time, so there is no need to use a different expiration.\n\n\n### Checklist\n\nCheck the PR satisfies following conditions. \n\nReviewers should verify this PR satisfies this list as well.\n\n- [ ] Any text added follows [EUI's writing\nguidelines](https://elastic.github.io/eui/#/guidelines/writing), uses\nsentence case text and includes [i18n\nsupport](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)\n- [ ]\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\nwas added for features that require explanation or tutorials\n- [ ] [Unit or functional\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\nwere updated or added to match the most common scenarios\n- [ ] If a plugin configuration key changed, check if it needs to be\nallowlisted in the cloud and added to the [docker\nlist](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)\n- [ ] This was checked for breaking HTTP API changes, and any breaking\nchanges have been approved by the breaking-change committee. The\n`release_note:breaking` label should be applied in these situations.\n- [ ] [Flaky Test\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was\nused on any tests changed\n- [ ] The PR description includes the appropriate Release Notes section,\nand the correct `release_note:*` label is applied per the\n[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)\n- [ ] Review the [backport\nguidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing)\nand apply applicable `backport:*` labels.\n\n### Identify risks\n\nDoes this PR introduce any risks? For example, consider risks like hard\nto test bugs, performance regression, potential of data loss.\n\nDescribe the risk, its severity, and mitigation for each identified\nrisk. Invite stakeholders and evaluate how to proceed before merging.\n\n- [ ] [See some risk\nexamples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx)\n- [ ] ...\n\n---------\n\nCo-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>","sha":"022344dd2066c5d2a02a506cfd531d12c5ebedae","branchLabelMapping":{"^v9.3.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:fix","Team:Fleet","ci:build-cloud-image","backport:version","v9.3.0","v9.2.2","v9.1.8"],"title":"[Fleet] use long expiration for upgrade agents","number":243443,"url":"https://github.com/elastic/kibana/pull/243443","mergeCommit":{"message":"[Fleet] use long expiration for upgrade agents (#243443)\n\n## Summary\n\nRelates\nhttps://github.com/elastic/ingest-dev/issues/6421#issuecomment-3551318678\n\nAuto upgrade actions currently use a short expiration (20m), which is\ntoo short if some of the agents can't checkin for some time to receive\nthe action. If this window is missed, the updating state becomes stuck\nand not retried ever.\n\nInstead using a 1 month expiration like we do for manual upgrades.\n\nAlso made the change to use 1 month expiration for scheduled upgrades.\nExpiration is no longer used to calculate scheduled upgrades offset on\n[fleet-server](elastic/fleet-server#2243) side\nfor a long time, so there is no need to use a different expiration.\n\n\n### Checklist\n\nCheck the PR satisfies following conditions. \n\nReviewers should verify this PR satisfies this list as well.\n\n- [ ] Any text added follows [EUI's writing\nguidelines](https://elastic.github.io/eui/#/guidelines/writing), uses\nsentence case text and includes [i18n\nsupport](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)\n- [ ]\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\nwas added for features that require explanation or tutorials\n- [ ] [Unit or functional\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\nwere updated or added to match the most common scenarios\n- [ ] If a plugin configuration key changed, check if it needs to be\nallowlisted in the cloud and added to the [docker\nlist](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)\n- [ ] This was checked for breaking HTTP API changes, and any breaking\nchanges have been approved by the breaking-change committee. The\n`release_note:breaking` label should be applied in these situations.\n- [ ] [Flaky Test\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was\nused on any tests changed\n- [ ] The PR description includes the appropriate Release Notes section,\nand the correct `release_note:*` label is applied per the\n[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)\n- [ ] Review the [backport\nguidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing)\nand apply applicable `backport:*` labels.\n\n### Identify risks\n\nDoes this PR introduce any risks? For example, consider risks like hard\nto test bugs, performance regression, potential of data loss.\n\nDescribe the risk, its severity, and mitigation for each identified\nrisk. Invite stakeholders and evaluate how to proceed before merging.\n\n- [ ] [See some risk\nexamples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx)\n- [ ] ...\n\n---------\n\nCo-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>","sha":"022344dd2066c5d2a02a506cfd531d12c5ebedae"}},"sourceBranch":"main","suggestedTargetBranches":["9.2","9.1"],"targetPullRequestStates":[{"branch":"main","label":"v9.3.0","branchLabelMappingKey":"^v9.3.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/243443","number":243443,"mergeCommit":{"message":"[Fleet] use long expiration for upgrade agents (#243443)\n\n## Summary\n\nRelates\nhttps://github.com/elastic/ingest-dev/issues/6421#issuecomment-3551318678\n\nAuto upgrade actions currently use a short expiration (20m), which is\ntoo short if some of the agents can't checkin for some time to receive\nthe action. If this window is missed, the updating state becomes stuck\nand not retried ever.\n\nInstead using a 1 month expiration like we do for manual upgrades.\n\nAlso made the change to use 1 month expiration for scheduled upgrades.\nExpiration is no longer used to calculate scheduled upgrades offset on\n[fleet-server](elastic/fleet-server#2243) side\nfor a long time, so there is no need to use a different expiration.\n\n\n### Checklist\n\nCheck the PR satisfies following conditions. \n\nReviewers should verify this PR satisfies this list as well.\n\n- [ ] Any text added follows [EUI's writing\nguidelines](https://elastic.github.io/eui/#/guidelines/writing), uses\nsentence case text and includes [i18n\nsupport](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)\n- [ ]\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\nwas added for features that require explanation or tutorials\n- [ ] [Unit or functional\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\nwere updated or added to match the most common scenarios\n- [ ] If a plugin configuration key changed, check if it needs to be\nallowlisted in the cloud and added to the [docker\nlist](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)\n- [ ] This was checked for breaking HTTP API changes, and any breaking\nchanges have been approved by the breaking-change committee. The\n`release_note:breaking` label should be applied in these situations.\n- [ ] [Flaky Test\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was\nused on any tests changed\n- [ ] The PR description includes the appropriate Release Notes section,\nand the correct `release_note:*` label is applied per the\n[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)\n- [ ] Review the [backport\nguidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing)\nand apply applicable `backport:*` labels.\n\n### Identify risks\n\nDoes this PR introduce any risks? For example, consider risks like hard\nto test bugs, performance regression, potential of data loss.\n\nDescribe the risk, its severity, and mitigation for each identified\nrisk. Invite stakeholders and evaluate how to proceed before merging.\n\n- [ ] [See some risk\nexamples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx)\n- [ ] ...\n\n---------\n\nCo-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>","sha":"022344dd2066c5d2a02a506cfd531d12c5ebedae"}},{"branch":"9.2","label":"v9.2.2","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"9.1","label":"v9.1.8","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"}]}] BACKPORT--> Co-authored-by: Julia Bardi <90178898+juliaElastic@users.noreply.github.com>
## Summary Relates elastic/ingest-dev#6421 (comment) Auto upgrade actions currently use a short expiration (20m), which is too short if some of the agents can't checkin for some time to receive the action. If this window is missed, the updating state becomes stuck and not retried ever. Instead using a 1 month expiration like we do for manual upgrades. Also made the change to use 1 month expiration for scheduled upgrades. Expiration is no longer used to calculate scheduled upgrades offset on [fleet-server](elastic/fleet-server#2243) side for a long time, so there is no need to use a different expiration. ### Checklist Check the PR satisfies following conditions. Reviewers should verify this PR satisfies this list as well. - [ ] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md) - [ ] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [ ] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [ ] If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the [docker list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker) - [ ] This was checked for breaking HTTP API changes, and any breaking changes have been approved by the breaking-change committee. The `release_note:breaking` label should be applied in these situations. - [ ] [Flaky Test Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was used on any tests changed - [ ] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) - [ ] Review the [backport guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing) and apply applicable `backport:*` labels. ### Identify risks Does this PR introduce any risks? For example, consider risks like hard to test bugs, performance regression, potential of data loss. Describe the risk, its severity, and mitigation for each identified risk. Invite stakeholders and evaluate how to proceed before merging. - [ ] [See some risk examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) - [ ] ... --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
## Summary Relates elastic/ingest-dev#6421 (comment) Auto upgrade actions currently use a short expiration (20m), which is too short if some of the agents can't checkin for some time to receive the action. If this window is missed, the updating state becomes stuck and not retried ever. Instead using a 1 month expiration like we do for manual upgrades. Also made the change to use 1 month expiration for scheduled upgrades. Expiration is no longer used to calculate scheduled upgrades offset on [fleet-server](elastic/fleet-server#2243) side for a long time, so there is no need to use a different expiration. ### Checklist Check the PR satisfies following conditions. Reviewers should verify this PR satisfies this list as well. - [ ] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md) - [ ] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [ ] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [ ] If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the [docker list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker) - [ ] This was checked for breaking HTTP API changes, and any breaking changes have been approved by the breaking-change committee. The `release_note:breaking` label should be applied in these situations. - [ ] [Flaky Test Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was used on any tests changed - [ ] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) - [ ] Review the [backport guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing) and apply applicable `backport:*` labels. ### Identify risks Does this PR introduce any risks? For example, consider risks like hard to test bugs, performance regression, potential of data loss. Describe the risk, its severity, and mitigation for each identified risk. Invite stakeholders and evaluate how to proceed before merging. - [ ] [See some risk examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) - [ ] ... --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Summary
Relates https://github.com/elastic/ingest-dev/issues/6421#issuecomment-3551318678
Auto upgrade actions currently use a short expiration (20m), which is too short if some of the agents can't checkin for some time to receive the action. If this window is missed, the updating state becomes stuck and not retried ever.
Instead using a 1 month expiration like we do for manual upgrades.
Also made the change to use 1 month expiration for scheduled upgrades. Expiration is no longer used to calculate scheduled upgrades offset on fleet-server side for a long time, so there is no need to use a different expiration.
Checklist
Check the PR satisfies following conditions.
Reviewers should verify this PR satisfies this list as well.
release_note:breakinglabel should be applied in these situations.release_note:*label is applied per the guidelinesbackport:*labels.Identify risks
Does this PR introduce any risks? For example, consider risks like hard to test bugs, performance regression, potential of data loss.
Describe the risk, its severity, and mitigation for each identified risk. Invite stakeholders and evaluate how to proceed before merging.