[Fleet] fix auto upgrade bug when upgrading agents in other policies interfered with calculation#258387
Merged
juliaElastic merged 5 commits intoelastic:mainfrom Mar 23, 2026
Merged
Conversation
Contributor
|
Pinging @elastic/fleet (Team:Fleet) |
Contributor
⏳ Build in-progress, with failures
Failed CI StepsTest Failures
History
|
Contributor
|
Starting backport for target branches: 8.19, 9.2, 9.3 |
kibanamachine
pushed a commit
to kibanamachine/kibana
that referenced
this pull request
Mar 23, 2026
…interfered with calculation (elastic#258387) ### The "target percentage already reached" false positive — bug in the counting query In `getVersionAndCounts` ([automatic_agent_upgrade_task.ts:284](https://github.com/elastic/kibana/blob/d0b55d1acafd637019f831c5d4a9e9d7bb0eab53/x-pack/platform/plugins/shared/fleet/server/tasks/automatic_agent_upgrade_task.ts#L284)), the count of agents already on or upgrading to the target version uses this query: `((policy_id:${agentPolicy.id} AND agent.version:9.2.6) OR (upgrade_details.target_version:9.2.6 AND NOT upgrade_details.state:UPG_FAILED)) AND activeAgentsKuery` The second OR clause — `upgrade_details.target_version:9.2.6 AND NOT upgrade_details.state:UPG_FAILED` — has no `policy_id` filter. It counts any active agent across the entire fleet that is currently upgrading to 9.2.6, regardless of which policy they belong to. With only 1 active agent in this policy (`totalActiveAgents = 1`), if even one agent from any other policy has `upgrade_details.target_version = 9.2.6` and `state ≠ UPG_FAILED` (e.g. `UPG_SCHEDULED`, `UPG_WATCHING`, `UPG_REPLACING`), then: `numberOfAgentsForUpgrade = Math.round(1 * 100 / 100) - 1 = 0` → "target percentage 100 already reached", returns early without touching the 8.16.3 agent. ### Checklist Check the PR satisfies following conditions. Reviewers should verify this PR satisfies this list as well. - [ ] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md) - [ ] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [ ] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [ ] If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the [docker list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker) - [ ] This was checked for breaking HTTP API changes, and any breaking changes have been approved by the breaking-change committee. The `release_note:breaking` label should be applied in these situations. - [ ] [Flaky Test Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was used on any tests changed - [ ] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) - [ ] Review the [backport guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing) and apply applicable `backport:*` labels. ### Identify risks Does this PR introduce any risks? For example, consider risks like hard to test bugs, performance regression, potential of data loss. Describe the risk, its severity, and mitigation for each identified risk. Invite stakeholders and evaluate how to proceed before merging. - [ ] [See some risk examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) - [ ] ... (cherry picked from commit f3db700)
kibanamachine
pushed a commit
to kibanamachine/kibana
that referenced
this pull request
Mar 23, 2026
…interfered with calculation (elastic#258387) ### The "target percentage already reached" false positive — bug in the counting query In `getVersionAndCounts` ([automatic_agent_upgrade_task.ts:284](https://github.com/elastic/kibana/blob/d0b55d1acafd637019f831c5d4a9e9d7bb0eab53/x-pack/platform/plugins/shared/fleet/server/tasks/automatic_agent_upgrade_task.ts#L284)), the count of agents already on or upgrading to the target version uses this query: `((policy_id:${agentPolicy.id} AND agent.version:9.2.6) OR (upgrade_details.target_version:9.2.6 AND NOT upgrade_details.state:UPG_FAILED)) AND activeAgentsKuery` The second OR clause — `upgrade_details.target_version:9.2.6 AND NOT upgrade_details.state:UPG_FAILED` — has no `policy_id` filter. It counts any active agent across the entire fleet that is currently upgrading to 9.2.6, regardless of which policy they belong to. With only 1 active agent in this policy (`totalActiveAgents = 1`), if even one agent from any other policy has `upgrade_details.target_version = 9.2.6` and `state ≠ UPG_FAILED` (e.g. `UPG_SCHEDULED`, `UPG_WATCHING`, `UPG_REPLACING`), then: `numberOfAgentsForUpgrade = Math.round(1 * 100 / 100) - 1 = 0` → "target percentage 100 already reached", returns early without touching the 8.16.3 agent. ### Checklist Check the PR satisfies following conditions. Reviewers should verify this PR satisfies this list as well. - [ ] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md) - [ ] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [ ] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [ ] If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the [docker list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker) - [ ] This was checked for breaking HTTP API changes, and any breaking changes have been approved by the breaking-change committee. The `release_note:breaking` label should be applied in these situations. - [ ] [Flaky Test Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was used on any tests changed - [ ] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) - [ ] Review the [backport guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing) and apply applicable `backport:*` labels. ### Identify risks Does this PR introduce any risks? For example, consider risks like hard to test bugs, performance regression, potential of data loss. Describe the risk, its severity, and mitigation for each identified risk. Invite stakeholders and evaluate how to proceed before merging. - [ ] [See some risk examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) - [ ] ... (cherry picked from commit f3db700)
Contributor
💔 Some backports could not be created
Note: Successful backport PRs will be merged automatically after passing CI. Manual backportTo create the backport manually run: Questions ?Please refer to the Backport tool documentation |
Contributor
|
Starting backport for target branches: 9.2, 9.3 |
kibanamachine
pushed a commit
to kibanamachine/kibana
that referenced
this pull request
Mar 23, 2026
…interfered with calculation (elastic#258387) ### The "target percentage already reached" false positive — bug in the counting query In `getVersionAndCounts` ([automatic_agent_upgrade_task.ts:284](https://github.com/elastic/kibana/blob/d0b55d1acafd637019f831c5d4a9e9d7bb0eab53/x-pack/platform/plugins/shared/fleet/server/tasks/automatic_agent_upgrade_task.ts#L284)), the count of agents already on or upgrading to the target version uses this query: `((policy_id:${agentPolicy.id} AND agent.version:9.2.6) OR (upgrade_details.target_version:9.2.6 AND NOT upgrade_details.state:UPG_FAILED)) AND activeAgentsKuery` The second OR clause — `upgrade_details.target_version:9.2.6 AND NOT upgrade_details.state:UPG_FAILED` — has no `policy_id` filter. It counts any active agent across the entire fleet that is currently upgrading to 9.2.6, regardless of which policy they belong to. With only 1 active agent in this policy (`totalActiveAgents = 1`), if even one agent from any other policy has `upgrade_details.target_version = 9.2.6` and `state ≠ UPG_FAILED` (e.g. `UPG_SCHEDULED`, `UPG_WATCHING`, `UPG_REPLACING`), then: `numberOfAgentsForUpgrade = Math.round(1 * 100 / 100) - 1 = 0` → "target percentage 100 already reached", returns early without touching the 8.16.3 agent. ### Checklist Check the PR satisfies following conditions. Reviewers should verify this PR satisfies this list as well. - [ ] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md) - [ ] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [ ] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [ ] If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the [docker list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker) - [ ] This was checked for breaking HTTP API changes, and any breaking changes have been approved by the breaking-change committee. The `release_note:breaking` label should be applied in these situations. - [ ] [Flaky Test Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was used on any tests changed - [ ] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) - [ ] Review the [backport guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing) and apply applicable `backport:*` labels. ### Identify risks Does this PR introduce any risks? For example, consider risks like hard to test bugs, performance regression, potential of data loss. Describe the risk, its severity, and mitigation for each identified risk. Invite stakeholders and evaluate how to proceed before merging. - [ ] [See some risk examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) - [ ] ... (cherry picked from commit f3db700)
kibanamachine
pushed a commit
to kibanamachine/kibana
that referenced
this pull request
Mar 23, 2026
…interfered with calculation (elastic#258387) ### The "target percentage already reached" false positive — bug in the counting query In `getVersionAndCounts` ([automatic_agent_upgrade_task.ts:284](https://github.com/elastic/kibana/blob/d0b55d1acafd637019f831c5d4a9e9d7bb0eab53/x-pack/platform/plugins/shared/fleet/server/tasks/automatic_agent_upgrade_task.ts#L284)), the count of agents already on or upgrading to the target version uses this query: `((policy_id:${agentPolicy.id} AND agent.version:9.2.6) OR (upgrade_details.target_version:9.2.6 AND NOT upgrade_details.state:UPG_FAILED)) AND activeAgentsKuery` The second OR clause — `upgrade_details.target_version:9.2.6 AND NOT upgrade_details.state:UPG_FAILED` — has no `policy_id` filter. It counts any active agent across the entire fleet that is currently upgrading to 9.2.6, regardless of which policy they belong to. With only 1 active agent in this policy (`totalActiveAgents = 1`), if even one agent from any other policy has `upgrade_details.target_version = 9.2.6` and `state ≠ UPG_FAILED` (e.g. `UPG_SCHEDULED`, `UPG_WATCHING`, `UPG_REPLACING`), then: `numberOfAgentsForUpgrade = Math.round(1 * 100 / 100) - 1 = 0` → "target percentage 100 already reached", returns early without touching the 8.16.3 agent. ### Checklist Check the PR satisfies following conditions. Reviewers should verify this PR satisfies this list as well. - [ ] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md) - [ ] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [ ] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [ ] If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the [docker list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker) - [ ] This was checked for breaking HTTP API changes, and any breaking changes have been approved by the breaking-change committee. The `release_note:breaking` label should be applied in these situations. - [ ] [Flaky Test Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was used on any tests changed - [ ] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) - [ ] Review the [backport guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing) and apply applicable `backport:*` labels. ### Identify risks Does this PR introduce any risks? For example, consider risks like hard to test bugs, performance regression, potential of data loss. Describe the risk, its severity, and mitigation for each identified risk. Invite stakeholders and evaluate how to proceed before merging. - [ ] [See some risk examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) - [ ] ... (cherry picked from commit f3db700)
Contributor
💚 All backports created successfully
Note: Successful backport PRs will be merged automatically after passing CI. Questions ?Please refer to the Backport tool documentation |
10 tasks
kibanamachine
added a commit
that referenced
this pull request
Mar 24, 2026
…icies interfered with calculation (#258387) (#259033) # Backport This will backport the following commits from `main` to `9.3`: - [[Fleet] fix auto upgrade bug when upgrading agents in other policies interfered with calculation (#258387)](#258387) <!--- Backport version: 9.6.6 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sorenlouv/backport) <!--BACKPORT [{"author":{"name":"Julia Bardi","email":"90178898+juliaElastic@users.noreply.github.com"},"sourceCommit":{"committedDate":"2026-03-23T09:50:40Z","message":"[Fleet] fix auto upgrade bug when upgrading agents in other policies interfered with calculation (#258387)\n\n### The \"target percentage already reached\" false positive — bug in the\ncounting query\nIn `getVersionAndCounts`\n([automatic_agent_upgrade_task.ts:284](https://github.com/elastic/kibana/blob/d0b55d1acafd637019f831c5d4a9e9d7bb0eab53/x-pack/platform/plugins/shared/fleet/server/tasks/automatic_agent_upgrade_task.ts#L284)),\nthe count of agents already on or upgrading to the target version uses\nthis query:\n\n`((policy_id:${agentPolicy.id} AND agent.version:9.2.6)\nOR (upgrade_details.target_version:9.2.6 AND NOT\nupgrade_details.state:UPG_FAILED))\nAND activeAgentsKuery`\nThe second OR clause — `upgrade_details.target_version:9.2.6 AND NOT\nupgrade_details.state:UPG_FAILED` — has no `policy_id` filter. It counts\nany active agent across the entire fleet that is currently upgrading to\n9.2.6, regardless of which policy they belong to.\n\nWith only 1 active agent in this policy (`totalActiveAgents = 1`), if\neven one agent from any other policy has `upgrade_details.target_version\n= 9.2.6` and `state ≠ UPG_FAILED` (e.g. `UPG_SCHEDULED`, `UPG_WATCHING`,\n`UPG_REPLACING`), then:\n\n`numberOfAgentsForUpgrade = Math.round(1 * 100 / 100) - 1 = 0`\n→ \"target percentage 100 already reached\", returns early without\ntouching the 8.16.3 agent.\n\n\n### Checklist\n\nCheck the PR satisfies following conditions. \n\nReviewers should verify this PR satisfies this list as well.\n\n- [ ] Any text added follows [EUI's writing\nguidelines](https://elastic.github.io/eui/#/guidelines/writing), uses\nsentence case text and includes [i18n\nsupport](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)\n- [ ]\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\nwas added for features that require explanation or tutorials\n- [ ] [Unit or functional\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\nwere updated or added to match the most common scenarios\n- [ ] If a plugin configuration key changed, check if it needs to be\nallowlisted in the cloud and added to the [docker\nlist](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)\n- [ ] This was checked for breaking HTTP API changes, and any breaking\nchanges have been approved by the breaking-change committee. The\n`release_note:breaking` label should be applied in these situations.\n- [ ] [Flaky Test\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was\nused on any tests changed\n- [ ] The PR description includes the appropriate Release Notes section,\nand the correct `release_note:*` label is applied per the\n[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)\n- [ ] Review the [backport\nguidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing)\nand apply applicable `backport:*` labels.\n\n### Identify risks\n\nDoes this PR introduce any risks? For example, consider risks like hard\nto test bugs, performance regression, potential of data loss.\n\nDescribe the risk, its severity, and mitigation for each identified\nrisk. Invite stakeholders and evaluate how to proceed before merging.\n\n- [ ] [See some risk\nexamples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx)\n- [ ] ...","sha":"f3db70085cc8fe0332c85e7683b3e3d5d66ccb6c","branchLabelMapping":{"^v9.4.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:fix","Team:Fleet","backport:all-open","v9.4.0"],"title":"[Fleet] fix auto upgrade bug when upgrading agents in other policies interfered with calculation","number":258387,"url":"https://github.com/elastic/kibana/pull/258387","mergeCommit":{"message":"[Fleet] fix auto upgrade bug when upgrading agents in other policies interfered with calculation (#258387)\n\n### The \"target percentage already reached\" false positive — bug in the\ncounting query\nIn `getVersionAndCounts`\n([automatic_agent_upgrade_task.ts:284](https://github.com/elastic/kibana/blob/d0b55d1acafd637019f831c5d4a9e9d7bb0eab53/x-pack/platform/plugins/shared/fleet/server/tasks/automatic_agent_upgrade_task.ts#L284)),\nthe count of agents already on or upgrading to the target version uses\nthis query:\n\n`((policy_id:${agentPolicy.id} AND agent.version:9.2.6)\nOR (upgrade_details.target_version:9.2.6 AND NOT\nupgrade_details.state:UPG_FAILED))\nAND activeAgentsKuery`\nThe second OR clause — `upgrade_details.target_version:9.2.6 AND NOT\nupgrade_details.state:UPG_FAILED` — has no `policy_id` filter. It counts\nany active agent across the entire fleet that is currently upgrading to\n9.2.6, regardless of which policy they belong to.\n\nWith only 1 active agent in this policy (`totalActiveAgents = 1`), if\neven one agent from any other policy has `upgrade_details.target_version\n= 9.2.6` and `state ≠ UPG_FAILED` (e.g. `UPG_SCHEDULED`, `UPG_WATCHING`,\n`UPG_REPLACING`), then:\n\n`numberOfAgentsForUpgrade = Math.round(1 * 100 / 100) - 1 = 0`\n→ \"target percentage 100 already reached\", returns early without\ntouching the 8.16.3 agent.\n\n\n### Checklist\n\nCheck the PR satisfies following conditions. \n\nReviewers should verify this PR satisfies this list as well.\n\n- [ ] Any text added follows [EUI's writing\nguidelines](https://elastic.github.io/eui/#/guidelines/writing), uses\nsentence case text and includes [i18n\nsupport](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)\n- [ ]\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\nwas added for features that require explanation or tutorials\n- [ ] [Unit or functional\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\nwere updated or added to match the most common scenarios\n- [ ] If a plugin configuration key changed, check if it needs to be\nallowlisted in the cloud and added to the [docker\nlist](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)\n- [ ] This was checked for breaking HTTP API changes, and any breaking\nchanges have been approved by the breaking-change committee. The\n`release_note:breaking` label should be applied in these situations.\n- [ ] [Flaky Test\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was\nused on any tests changed\n- [ ] The PR description includes the appropriate Release Notes section,\nand the correct `release_note:*` label is applied per the\n[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)\n- [ ] Review the [backport\nguidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing)\nand apply applicable `backport:*` labels.\n\n### Identify risks\n\nDoes this PR introduce any risks? For example, consider risks like hard\nto test bugs, performance regression, potential of data loss.\n\nDescribe the risk, its severity, and mitigation for each identified\nrisk. Invite stakeholders and evaluate how to proceed before merging.\n\n- [ ] [See some risk\nexamples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx)\n- [ ] ...","sha":"f3db70085cc8fe0332c85e7683b3e3d5d66ccb6c"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v9.4.0","branchLabelMappingKey":"^v9.4.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/258387","number":258387,"mergeCommit":{"message":"[Fleet] fix auto upgrade bug when upgrading agents in other policies interfered with calculation (#258387)\n\n### The \"target percentage already reached\" false positive — bug in the\ncounting query\nIn `getVersionAndCounts`\n([automatic_agent_upgrade_task.ts:284](https://github.com/elastic/kibana/blob/d0b55d1acafd637019f831c5d4a9e9d7bb0eab53/x-pack/platform/plugins/shared/fleet/server/tasks/automatic_agent_upgrade_task.ts#L284)),\nthe count of agents already on or upgrading to the target version uses\nthis query:\n\n`((policy_id:${agentPolicy.id} AND agent.version:9.2.6)\nOR (upgrade_details.target_version:9.2.6 AND NOT\nupgrade_details.state:UPG_FAILED))\nAND activeAgentsKuery`\nThe second OR clause — `upgrade_details.target_version:9.2.6 AND NOT\nupgrade_details.state:UPG_FAILED` — has no `policy_id` filter. It counts\nany active agent across the entire fleet that is currently upgrading to\n9.2.6, regardless of which policy they belong to.\n\nWith only 1 active agent in this policy (`totalActiveAgents = 1`), if\neven one agent from any other policy has `upgrade_details.target_version\n= 9.2.6` and `state ≠ UPG_FAILED` (e.g. `UPG_SCHEDULED`, `UPG_WATCHING`,\n`UPG_REPLACING`), then:\n\n`numberOfAgentsForUpgrade = Math.round(1 * 100 / 100) - 1 = 0`\n→ \"target percentage 100 already reached\", returns early without\ntouching the 8.16.3 agent.\n\n\n### Checklist\n\nCheck the PR satisfies following conditions. \n\nReviewers should verify this PR satisfies this list as well.\n\n- [ ] Any text added follows [EUI's writing\nguidelines](https://elastic.github.io/eui/#/guidelines/writing), uses\nsentence case text and includes [i18n\nsupport](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)\n- [ ]\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\nwas added for features that require explanation or tutorials\n- [ ] [Unit or functional\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\nwere updated or added to match the most common scenarios\n- [ ] If a plugin configuration key changed, check if it needs to be\nallowlisted in the cloud and added to the [docker\nlist](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)\n- [ ] This was checked for breaking HTTP API changes, and any breaking\nchanges have been approved by the breaking-change committee. The\n`release_note:breaking` label should be applied in these situations.\n- [ ] [Flaky Test\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was\nused on any tests changed\n- [ ] The PR description includes the appropriate Release Notes section,\nand the correct `release_note:*` label is applied per the\n[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)\n- [ ] Review the [backport\nguidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing)\nand apply applicable `backport:*` labels.\n\n### Identify risks\n\nDoes this PR introduce any risks? For example, consider risks like hard\nto test bugs, performance regression, potential of data loss.\n\nDescribe the risk, its severity, and mitigation for each identified\nrisk. Invite stakeholders and evaluate how to proceed before merging.\n\n- [ ] [See some risk\nexamples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx)\n- [ ] ...","sha":"f3db70085cc8fe0332c85e7683b3e3d5d66ccb6c"}}]}] BACKPORT--> --------- Co-authored-by: Julia Bardi <90178898+juliaElastic@users.noreply.github.com> Co-authored-by: Julia Bardi <julia.bardi@elastic.co>
Contributor
|
Looks like this PR has backport PRs but they still haven't been merged. Please merge them ASAP to keep the branches relatively in sync. |
kibanamachine
added a commit
that referenced
this pull request
Mar 24, 2026
…icies interfered with calculation (#258387) (#259032) # Backport This will backport the following commits from `main` to `9.2`: - [[Fleet] fix auto upgrade bug when upgrading agents in other policies interfered with calculation (#258387)](#258387) <!--- Backport version: 9.6.6 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sorenlouv/backport) <!--BACKPORT [{"author":{"name":"Julia Bardi","email":"90178898+juliaElastic@users.noreply.github.com"},"sourceCommit":{"committedDate":"2026-03-23T09:50:40Z","message":"[Fleet] fix auto upgrade bug when upgrading agents in other policies interfered with calculation (#258387)\n\n### The \"target percentage already reached\" false positive — bug in the\ncounting query\nIn `getVersionAndCounts`\n([automatic_agent_upgrade_task.ts:284](https://github.com/elastic/kibana/blob/d0b55d1acafd637019f831c5d4a9e9d7bb0eab53/x-pack/platform/plugins/shared/fleet/server/tasks/automatic_agent_upgrade_task.ts#L284)),\nthe count of agents already on or upgrading to the target version uses\nthis query:\n\n`((policy_id:${agentPolicy.id} AND agent.version:9.2.6)\nOR (upgrade_details.target_version:9.2.6 AND NOT\nupgrade_details.state:UPG_FAILED))\nAND activeAgentsKuery`\nThe second OR clause — `upgrade_details.target_version:9.2.6 AND NOT\nupgrade_details.state:UPG_FAILED` — has no `policy_id` filter. It counts\nany active agent across the entire fleet that is currently upgrading to\n9.2.6, regardless of which policy they belong to.\n\nWith only 1 active agent in this policy (`totalActiveAgents = 1`), if\neven one agent from any other policy has `upgrade_details.target_version\n= 9.2.6` and `state ≠ UPG_FAILED` (e.g. `UPG_SCHEDULED`, `UPG_WATCHING`,\n`UPG_REPLACING`), then:\n\n`numberOfAgentsForUpgrade = Math.round(1 * 100 / 100) - 1 = 0`\n→ \"target percentage 100 already reached\", returns early without\ntouching the 8.16.3 agent.\n\n\n### Checklist\n\nCheck the PR satisfies following conditions. \n\nReviewers should verify this PR satisfies this list as well.\n\n- [ ] Any text added follows [EUI's writing\nguidelines](https://elastic.github.io/eui/#/guidelines/writing), uses\nsentence case text and includes [i18n\nsupport](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)\n- [ ]\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\nwas added for features that require explanation or tutorials\n- [ ] [Unit or functional\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\nwere updated or added to match the most common scenarios\n- [ ] If a plugin configuration key changed, check if it needs to be\nallowlisted in the cloud and added to the [docker\nlist](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)\n- [ ] This was checked for breaking HTTP API changes, and any breaking\nchanges have been approved by the breaking-change committee. The\n`release_note:breaking` label should be applied in these situations.\n- [ ] [Flaky Test\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was\nused on any tests changed\n- [ ] The PR description includes the appropriate Release Notes section,\nand the correct `release_note:*` label is applied per the\n[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)\n- [ ] Review the [backport\nguidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing)\nand apply applicable `backport:*` labels.\n\n### Identify risks\n\nDoes this PR introduce any risks? For example, consider risks like hard\nto test bugs, performance regression, potential of data loss.\n\nDescribe the risk, its severity, and mitigation for each identified\nrisk. Invite stakeholders and evaluate how to proceed before merging.\n\n- [ ] [See some risk\nexamples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx)\n- [ ] ...","sha":"f3db70085cc8fe0332c85e7683b3e3d5d66ccb6c","branchLabelMapping":{"^v9.4.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:fix","Team:Fleet","backport:all-open","v9.4.0"],"title":"[Fleet] fix auto upgrade bug when upgrading agents in other policies interfered with calculation","number":258387,"url":"https://github.com/elastic/kibana/pull/258387","mergeCommit":{"message":"[Fleet] fix auto upgrade bug when upgrading agents in other policies interfered with calculation (#258387)\n\n### The \"target percentage already reached\" false positive — bug in the\ncounting query\nIn `getVersionAndCounts`\n([automatic_agent_upgrade_task.ts:284](https://github.com/elastic/kibana/blob/d0b55d1acafd637019f831c5d4a9e9d7bb0eab53/x-pack/platform/plugins/shared/fleet/server/tasks/automatic_agent_upgrade_task.ts#L284)),\nthe count of agents already on or upgrading to the target version uses\nthis query:\n\n`((policy_id:${agentPolicy.id} AND agent.version:9.2.6)\nOR (upgrade_details.target_version:9.2.6 AND NOT\nupgrade_details.state:UPG_FAILED))\nAND activeAgentsKuery`\nThe second OR clause — `upgrade_details.target_version:9.2.6 AND NOT\nupgrade_details.state:UPG_FAILED` — has no `policy_id` filter. It counts\nany active agent across the entire fleet that is currently upgrading to\n9.2.6, regardless of which policy they belong to.\n\nWith only 1 active agent in this policy (`totalActiveAgents = 1`), if\neven one agent from any other policy has `upgrade_details.target_version\n= 9.2.6` and `state ≠ UPG_FAILED` (e.g. `UPG_SCHEDULED`, `UPG_WATCHING`,\n`UPG_REPLACING`), then:\n\n`numberOfAgentsForUpgrade = Math.round(1 * 100 / 100) - 1 = 0`\n→ \"target percentage 100 already reached\", returns early without\ntouching the 8.16.3 agent.\n\n\n### Checklist\n\nCheck the PR satisfies following conditions. \n\nReviewers should verify this PR satisfies this list as well.\n\n- [ ] Any text added follows [EUI's writing\nguidelines](https://elastic.github.io/eui/#/guidelines/writing), uses\nsentence case text and includes [i18n\nsupport](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)\n- [ ]\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\nwas added for features that require explanation or tutorials\n- [ ] [Unit or functional\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\nwere updated or added to match the most common scenarios\n- [ ] If a plugin configuration key changed, check if it needs to be\nallowlisted in the cloud and added to the [docker\nlist](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)\n- [ ] This was checked for breaking HTTP API changes, and any breaking\nchanges have been approved by the breaking-change committee. The\n`release_note:breaking` label should be applied in these situations.\n- [ ] [Flaky Test\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was\nused on any tests changed\n- [ ] The PR description includes the appropriate Release Notes section,\nand the correct `release_note:*` label is applied per the\n[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)\n- [ ] Review the [backport\nguidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing)\nand apply applicable `backport:*` labels.\n\n### Identify risks\n\nDoes this PR introduce any risks? For example, consider risks like hard\nto test bugs, performance regression, potential of data loss.\n\nDescribe the risk, its severity, and mitigation for each identified\nrisk. Invite stakeholders and evaluate how to proceed before merging.\n\n- [ ] [See some risk\nexamples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx)\n- [ ] ...","sha":"f3db70085cc8fe0332c85e7683b3e3d5d66ccb6c"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v9.4.0","branchLabelMappingKey":"^v9.4.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/258387","number":258387,"mergeCommit":{"message":"[Fleet] fix auto upgrade bug when upgrading agents in other policies interfered with calculation (#258387)\n\n### The \"target percentage already reached\" false positive — bug in the\ncounting query\nIn `getVersionAndCounts`\n([automatic_agent_upgrade_task.ts:284](https://github.com/elastic/kibana/blob/d0b55d1acafd637019f831c5d4a9e9d7bb0eab53/x-pack/platform/plugins/shared/fleet/server/tasks/automatic_agent_upgrade_task.ts#L284)),\nthe count of agents already on or upgrading to the target version uses\nthis query:\n\n`((policy_id:${agentPolicy.id} AND agent.version:9.2.6)\nOR (upgrade_details.target_version:9.2.6 AND NOT\nupgrade_details.state:UPG_FAILED))\nAND activeAgentsKuery`\nThe second OR clause — `upgrade_details.target_version:9.2.6 AND NOT\nupgrade_details.state:UPG_FAILED` — has no `policy_id` filter. It counts\nany active agent across the entire fleet that is currently upgrading to\n9.2.6, regardless of which policy they belong to.\n\nWith only 1 active agent in this policy (`totalActiveAgents = 1`), if\neven one agent from any other policy has `upgrade_details.target_version\n= 9.2.6` and `state ≠ UPG_FAILED` (e.g. `UPG_SCHEDULED`, `UPG_WATCHING`,\n`UPG_REPLACING`), then:\n\n`numberOfAgentsForUpgrade = Math.round(1 * 100 / 100) - 1 = 0`\n→ \"target percentage 100 already reached\", returns early without\ntouching the 8.16.3 agent.\n\n\n### Checklist\n\nCheck the PR satisfies following conditions. \n\nReviewers should verify this PR satisfies this list as well.\n\n- [ ] Any text added follows [EUI's writing\nguidelines](https://elastic.github.io/eui/#/guidelines/writing), uses\nsentence case text and includes [i18n\nsupport](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)\n- [ ]\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\nwas added for features that require explanation or tutorials\n- [ ] [Unit or functional\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\nwere updated or added to match the most common scenarios\n- [ ] If a plugin configuration key changed, check if it needs to be\nallowlisted in the cloud and added to the [docker\nlist](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)\n- [ ] This was checked for breaking HTTP API changes, and any breaking\nchanges have been approved by the breaking-change committee. The\n`release_note:breaking` label should be applied in these situations.\n- [ ] [Flaky Test\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was\nused on any tests changed\n- [ ] The PR description includes the appropriate Release Notes section,\nand the correct `release_note:*` label is applied per the\n[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)\n- [ ] Review the [backport\nguidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing)\nand apply applicable `backport:*` labels.\n\n### Identify risks\n\nDoes this PR introduce any risks? For example, consider risks like hard\nto test bugs, performance regression, potential of data loss.\n\nDescribe the risk, its severity, and mitigation for each identified\nrisk. Invite stakeholders and evaluate how to proceed before merging.\n\n- [ ] [See some risk\nexamples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx)\n- [ ] ...","sha":"f3db70085cc8fe0332c85e7683b3e3d5d66ccb6c"}}]}] BACKPORT--> --------- Co-authored-by: Julia Bardi <90178898+juliaElastic@users.noreply.github.com> Co-authored-by: Julia Bardi <julia.bardi@elastic.co>
jeramysoucy
pushed a commit
to jeramysoucy/kibana
that referenced
this pull request
Mar 26, 2026
…interfered with calculation (elastic#258387) ### The "target percentage already reached" false positive — bug in the counting query In `getVersionAndCounts` ([automatic_agent_upgrade_task.ts:284](https://github.com/elastic/kibana/blob/d0b55d1acafd637019f831c5d4a9e9d7bb0eab53/x-pack/platform/plugins/shared/fleet/server/tasks/automatic_agent_upgrade_task.ts#L284)), the count of agents already on or upgrading to the target version uses this query: `((policy_id:${agentPolicy.id} AND agent.version:9.2.6) OR (upgrade_details.target_version:9.2.6 AND NOT upgrade_details.state:UPG_FAILED)) AND activeAgentsKuery` The second OR clause — `upgrade_details.target_version:9.2.6 AND NOT upgrade_details.state:UPG_FAILED` — has no `policy_id` filter. It counts any active agent across the entire fleet that is currently upgrading to 9.2.6, regardless of which policy they belong to. With only 1 active agent in this policy (`totalActiveAgents = 1`), if even one agent from any other policy has `upgrade_details.target_version = 9.2.6` and `state ≠ UPG_FAILED` (e.g. `UPG_SCHEDULED`, `UPG_WATCHING`, `UPG_REPLACING`), then: `numberOfAgentsForUpgrade = Math.round(1 * 100 / 100) - 1 = 0` → "target percentage 100 already reached", returns early without touching the 8.16.3 agent. ### Checklist Check the PR satisfies following conditions. Reviewers should verify this PR satisfies this list as well. - [ ] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md) - [ ] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [ ] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [ ] If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the [docker list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker) - [ ] This was checked for breaking HTTP API changes, and any breaking changes have been approved by the breaking-change committee. The `release_note:breaking` label should be applied in these situations. - [ ] [Flaky Test Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was used on any tests changed - [ ] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) - [ ] Review the [backport guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing) and apply applicable `backport:*` labels. ### Identify risks Does this PR introduce any risks? For example, consider risks like hard to test bugs, performance regression, potential of data loss. Describe the risk, its severity, and mitigation for each identified risk. Invite stakeholders and evaluate how to proceed before merging. - [ ] [See some risk examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) - [ ] ...
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The "target percentage already reached" false positive — bug in the counting query
In
getVersionAndCounts(automatic_agent_upgrade_task.ts:284), the count of agents already on or upgrading to the target version uses this query:((policy_id:${agentPolicy.id} AND agent.version:9.2.6) OR (upgrade_details.target_version:9.2.6 AND NOT upgrade_details.state:UPG_FAILED)) AND activeAgentsKueryThe second OR clause —
upgrade_details.target_version:9.2.6 AND NOT upgrade_details.state:UPG_FAILED— has nopolicy_idfilter. It counts any active agent across the entire fleet that is currently upgrading to 9.2.6, regardless of which policy they belong to.With only 1 active agent in this policy (
totalActiveAgents = 1), if even one agent from any other policy hasupgrade_details.target_version = 9.2.6andstate ≠ UPG_FAILED(e.g.UPG_SCHEDULED,UPG_WATCHING,UPG_REPLACING), then:numberOfAgentsForUpgrade = Math.round(1 * 100 / 100) - 1 = 0→ "target percentage 100 already reached", returns early without touching the 8.16.3 agent.
Checklist
Check the PR satisfies following conditions.
Reviewers should verify this PR satisfies this list as well.
release_note:breakinglabel should be applied in these situations.release_note:*label is applied per the guidelinesbackport:*labels.Identify risks
Does this PR introduce any risks? For example, consider risks like hard to test bugs, performance regression, potential of data loss.
Describe the risk, its severity, and mitigation for each identified risk. Invite stakeholders and evaluate how to proceed before merging.