Skip to content

[Fleet] Add package rollback API#226754

Merged
jillguyonnet merged 24 commits intoelastic:mainfrom
jillguyonnet:fleet/add-package-rollback-api
Jul 18, 2025
Merged

[Fleet] Add package rollback API#226754
jillguyonnet merged 24 commits intoelastic:mainfrom
jillguyonnet:fleet/add-package-rollback-api

Conversation

@jillguyonnet
Copy link
Copy Markdown
Member

@jillguyonnet jillguyonnet commented Jul 7, 2025

Summary

Closes https://github.com/elastic/ingest-dev/issues/5445

This PR adds a public API endpoint POST kbn:/api/fleet/epm/packages/{packageName}/rollback that rolls back an integration to the previous installed version, provided that previous version had been saved and that all integration policies (in all spaces) have corresponding saved previous revisions.

This endpoint updates integration policies across all spaces. For example, given the following:

  • The system integration was installed on 2.2.0 with integration policies (also on 2.2.0) in multiple spaces.
  • The integration was upgraded to 2.3.2 and all integration policies were also upgraded.
    Then rolling back the integration will install 2.2.0 and update integration policies in all spaces to the latest revision they were on for 2.2.0.

Rollback will fail if:

  • The integration has no previous_version property set (set on upgrade, see [Fleet] Save package policy previous revision on package upgrade #222779).
  • At least one integration policy (in any space) doesn't have a saved previous revision (SO with id {id}:prev).
  • At least one integration policy (in any space) has a saved previous revision with a different previous version.

Implementation details

  1. Package policies rollback step 1
    1. Create temporary SO copies in order to reverse the rollback in case of failure.
    2. Update SO with data from saved previous revisions.
  2. Package rollback
  3. Package policies rollback step 2
    • If package rollback succeeded, delete temporary SO copies and previous revision SO + bump agent policy revisions.
    • If package rollback failed, restore SO to pre-rollback (using temporary SO copies) and delete these copies.

Testing

  1. Create a custom space with default accesses.
  2. In the default space install a package on an older version and make sure to have a package policy (edit the version in the browser URL and add the package). Choose a version old enough to have a few more recent versions to test with.
  3. In the custom space, add the package on the same older version in order to create a package policy. You should now have a package policy on this older version on both spaces.
  4. Go to Dev Tools and issue a rollback request: it should fail with No previous version found for package {pkgName}.
  5. Force upgrade the package to a newer (not the latest) version with
    POST kbn:/api/fleet/epm/packages/{pkgName}/{pkgVersion}
    {
      "force": true
    }
    
    This doesn't upgrade the package policies.
  6. Issue a rollback request: it should fail with No previous version found for package policies: [policyId1, policyId2].
  7. Force upgrade the package again to a newer version.
  8. In the UI, upgrade both package policies.
  9. Go to Dev Tools and issue a rollback request: it should fail with Wrong previous version for package policies: [policyId1, policyId2]. This is because the package was upgraded from v1 to v2 to v3, but policies were directly upgraded from v1 to v3.
  10. Force upgrade the package again to a newer version.
  11. In the UI, upgrade both package policies.
  12. Take note of the revisions the agent policies are on.
  13. Go to Dev Tools and issue a rollback request: this time, it should succeed and roll back the package and its policies in both spaces to the previous version.
  14. Check in the UI that the rollback looks correct (agent policy revisions should have been bumped). For completeness, you can check the saved objects of the package and the package policies:
  • the package SO should have the correct (older) version and previous_version: null
  • the package policy SO storing the revision with the previous version (with id ending in :prev) should have been deleted
  • the current package policy SO should have the correct (older) version, latest_revision: true and revision number bumped by 1
  • there should be no temporary SO (with id ending in :copy)

Checklist

Identify risks

Risk of incorrect integration policy data after rolling back the integration to its previous version.

@jillguyonnet jillguyonnet self-assigned this Jul 7, 2025
@jillguyonnet jillguyonnet added release_note:enhancement backport:skip This PR does not require backporting Team:Fleet Team label for Observability Data Collection Fleet team labels Jul 7, 2025
kibanamachine and others added 2 commits July 7, 2025 10:08
…t --include-path /api/status --include-path /api/alerting/rule/ --include-path /api/alerting/rules --include-path /api/actions --include-path /api/security/role --include-path /api/spaces --include-path /api/streams --include-path /api/fleet --include-path /api/dashboards --include-path /api/saved_objects/_import --include-path /api/saved_objects/_export --include-path /api/maintenance_window --update'
@jillguyonnet jillguyonnet marked this pull request as ready for review July 7, 2025 12:45
@jillguyonnet jillguyonnet requested a review from a team as a code owner July 7, 2025 12:45
@elasticmachine
Copy link
Copy Markdown
Contributor

Pinging @elastic/fleet (Team:Fleet)

Comment thread x-pack/platform/plugins/shared/fleet/server/routes/epm/handlers.ts Outdated
Comment thread x-pack/platform/plugins/shared/fleet/server/services/package_policy.ts Outdated
Comment thread x-pack/platform/plugins/shared/fleet/server/routes/epm/handlers.ts Outdated
Comment thread x-pack/platform/plugins/shared/fleet/server/services/epm/packages/rollback.ts Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to add integration tests for the new rollback API and do manual testing with agents using the package policies and ingesting data to data streams, to make sure the rollback doesn't disrupt data ingestion.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding to this, it would be good to test with input packages as well to make sure it doesn't break them.

Comment thread x-pack/platform/plugins/shared/fleet/server/routes/epm/index.ts Outdated
Comment thread x-pack/platform/plugins/shared/fleet/server/services/epm/packages/rollback.ts Outdated
Comment thread x-pack/platform/plugins/shared/fleet/server/services/epm/packages/rollback.ts Outdated
Comment thread x-pack/platform/plugins/shared/fleet/server/services/epm/packages/rollback.ts Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding to this, it would be good to test with input packages as well to make sure it doesn't break them.

Comment thread x-pack/platform/plugins/shared/fleet/server/services/epm/packages/rollback.ts Outdated
@jillguyonnet
Copy link
Copy Markdown
Member Author

@elasticmachine merge upstream

elasticmachine and others added 2 commits July 10, 2025 08:40
…t --include-path /api/status --include-path /api/alerting/rule/ --include-path /api/alerting/rules --include-path /api/actions --include-path /api/security/role --include-path /api/spaces --include-path /api/streams --include-path /api/fleet --include-path /api/dashboards --include-path /api/saved_objects/_import --include-path /api/saved_objects/_export --include-path /api/maintenance_window --update'
@jillguyonnet
Copy link
Copy Markdown
Member Author

@juliaElastic @criamico thank you for your feedback. I'm still working on functional tests, but I pushed a bunch of improvements and updated the PR description with additional test cases. Please feel free to try and break this. 🙂

Comment thread x-pack/platform/plugins/shared/fleet/server/services/epm/packages/rollback.ts Outdated
@jillguyonnet
Copy link
Copy Markdown
Member Author

@elasticmachine merge upstream

@kibanamachine
Copy link
Copy Markdown
Contributor

Flaky Test Runner Stats

🟠 Some tests failed. - kibana-flaky-test-suite-runner#8644

[❌] x-pack/test/fleet_api_integration/config.epm.ts: 75/100 tests passed.

see run history

Copy link
Copy Markdown
Contributor

@juliaElastic juliaElastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jillguyonnet
Copy link
Copy Markdown
Member Author

@elasticmachine merge upstream

timeout: 30_000,
retryCount: 25,
retryDelay: 10000,
timeout: 60_000,
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @juliaElastic - FYI I had to increase the retries and timeout in this test as the extra logic caused it to be too slow. It made me realise that we don't exclude rollback logic from managed policies, which should perhaps be addressed as a followup, WDYT?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is setup trying to run rollback?

I agree that we probably shouldn't allow rolling back managed policies.

Copy link
Copy Markdown
Member Author

@jillguyonnet jillguyonnet Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is setup trying to run rollback?

It isn't - it's the saving previous revisions logic here that adds extra time. Not sure why this didn't come up in the original PR.
Edit: I've kicked off a flaky test runner build to check if it's robust enough with these settings.

I agree that we probably shouldn't allow rolling back managed policies.

OK if I open a followup issue for this?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, it makes sense. A follow up issue sounds good.

@jillguyonnet
Copy link
Copy Markdown
Member Author

@elastic/core-docs @elastic/experience-docs Could I please get your review on this?

@kibanamachine
Copy link
Copy Markdown
Contributor

Flaky Test Runner Stats

🎉 All tests passed! - kibana-flaky-test-suite-runner#8669

[✅] x-pack/platform/test/fleet_api_integration/config.epm.ts: 100/100 tests passed.

see run history

@kibanamachine
Copy link
Copy Markdown
Contributor

Flaky Test Runner Stats

🎉 All tests passed! - kibana-flaky-test-suite-runner#8679

[✅] x-pack/platform/test/fleet_api_integration/config.fleet.ts: 100/100 tests passed.

see run history

Copy link
Copy Markdown
Member

@florent-leborgne florent-leborgne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! Looks like the API docs output files are no longer part of this PR. In case you'll re-add them, I mainly have one comment about making sure that the API's version availability is correctly set so it appears clearly in the docs:

If there is a version requirement to use this API, it should be specified with an availability attribute like this I believe (see example):

options: {
        availability: {
          since: '9.1.0',
        },
Copy link
Copy Markdown
Member

@florent-leborgne florent-leborgne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for specifying the availability options 🙏 LGTM for docs

@elasticmachine
Copy link
Copy Markdown
Contributor

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] Fleet Cypress Tests #2 / View agents list Agent status filter should filter on healthy and unhealthy
  • [job] [logs] Fleet Cypress Tests #2 / View agents list Bulk actions should allow to bulk upgrade agents and cancel that upgrade

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
fleet 1438 1455 +17

Public APIs missing exports

Total count of every type that is part of your API that should be exported but is not. This will cause broken links in the API documentation system. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats exports for more detailed information.

id before after diff
fleet 98 101 +3

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
fleet 167.0KB 167.1KB +43.0B
Unknown metric groups

API count

id before after diff
fleet 1569 1586 +17

History

cc @jillguyonnet

@jillguyonnet jillguyonnet merged commit 4e76740 into elastic:main Jul 18, 2025
12 checks passed
@jillguyonnet jillguyonnet deleted the fleet/add-package-rollback-api branch July 18, 2025 18:38
Bluefinger pushed a commit to Bluefinger/kibana that referenced this pull request Jul 22, 2025
## Summary

Closes elastic/ingest-dev#5445

This PR adds a public API endpoint `POST
kbn:/api/fleet/epm/packages/{packageName}/rollback` that rolls back an
integration to the previous installed version, provided that previous
version had been saved and that all integration policies (in all spaces)
have corresponding saved previous revisions.

This endpoint updates integration policies across all spaces. For
example, given the following:
* The `system` integration was installed on 2.2.0 with integration
policies (also on 2.2.0) in multiple spaces.
* The integration was upgraded to 2.3.2 and all integration policies
were also upgraded.
Then rolling back the integration will install 2.2.0 and update
integration policies in all spaces to the latest revision they were on
for 2.2.0.

Rollback will fail if:
* The integration has no `previous_version` property set (set on
upgrade, see elastic#222779).
* At least one integration policy (in any space) doesn't have a saved
previous revision (SO with id `{id}:prev`).
* At least one integration policy (in any space) has a saved previous
revision with a different previous version.

### Implementation details

1. Package policies rollback step 1
1. Create temporary SO copies in order to reverse the rollback in case
of failure.
   2. Update SO with data from saved previous revisions.
2. Package rollback
3. Package policies rollback step 2
* If package rollback succeeded, delete temporary SO copies and previous
revision SO + bump agent policy revisions.
* If package rollback failed, restore SO to pre-rollback (using
temporary SO copies) and delete these copies.

### Testing

1. Create a custom space with default accesses.
5. In the default space install a package on an older version and make
sure to have a package policy (edit the version in the browser URL and
add the package). Choose a version old enough to have a few more recent
versions to test with.
6. In the custom space, add the package on the same older version in
order to create a package policy. You should now have a package policy
on this older version on both spaces.
7. Go to Dev Tools and issue a rollback request: it should fail with `No
previous version found for package {pkgName}`.
8. Force upgrade the package to a newer (not the latest) version with
   ```
   POST kbn:/api/fleet/epm/packages/{pkgName}/{pkgVersion}
   {
     "force": true
   }
   ```
   This doesn't upgrade the package policies.
9. Issue a rollback request: it should fail with `No previous version
found for package policies: [policyId1, policyId2]`.
10.  Force upgrade the package again to a newer version.
11. In the UI, upgrade both package policies.
12. Go to Dev Tools and issue a rollback request: it should fail with
`Wrong previous version for package policies: [policyId1, policyId2]`.
This is because the package was upgraded from v1 to v2 to v3, but
policies were directly upgraded from v1 to v3.
13. Force upgrade the package again to a newer version.
14. In the UI, upgrade both package policies.
15. Take note of the revisions the agent policies are on.
16. Go to Dev Tools and issue a rollback request: this time, it should
succeed and roll back the package and its policies in both spaces to the
previous version.
17. Check in the UI that the rollback looks correct (agent policy
revisions should have been bumped). For completeness, you can check the
saved objects of the package and the package policies:
* the package SO should have the correct (older) version and
`previous_version: null`
* the package policy SO storing the revision with the previous version
(with id ending in `:prev`) should have been deleted
* the current package policy SO should have the correct (older) version,
`latest_revision: true` and revision number bumped by 1
   * there should be no temporary SO (with id ending in `:copy`)

### Checklist

- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [x] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
- [x] Review the [backport
guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing)
and apply applicable `backport:*` labels.

### Identify risks

Risk of incorrect integration policy data after rolling back the
integration to its previous version.

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
kertal pushed a commit to kertal/kibana that referenced this pull request Jul 25, 2025
## Summary

Closes elastic/ingest-dev#5445

This PR adds a public API endpoint `POST
kbn:/api/fleet/epm/packages/{packageName}/rollback` that rolls back an
integration to the previous installed version, provided that previous
version had been saved and that all integration policies (in all spaces)
have corresponding saved previous revisions.

This endpoint updates integration policies across all spaces. For
example, given the following:
* The `system` integration was installed on 2.2.0 with integration
policies (also on 2.2.0) in multiple spaces.
* The integration was upgraded to 2.3.2 and all integration policies
were also upgraded.
Then rolling back the integration will install 2.2.0 and update
integration policies in all spaces to the latest revision they were on
for 2.2.0.

Rollback will fail if:
* The integration has no `previous_version` property set (set on
upgrade, see elastic#222779).
* At least one integration policy (in any space) doesn't have a saved
previous revision (SO with id `{id}:prev`).
* At least one integration policy (in any space) has a saved previous
revision with a different previous version.

### Implementation details

1. Package policies rollback step 1
1. Create temporary SO copies in order to reverse the rollback in case
of failure.
   2. Update SO with data from saved previous revisions.
2. Package rollback
3. Package policies rollback step 2
* If package rollback succeeded, delete temporary SO copies and previous
revision SO + bump agent policy revisions.
* If package rollback failed, restore SO to pre-rollback (using
temporary SO copies) and delete these copies.

### Testing

1. Create a custom space with default accesses.
5. In the default space install a package on an older version and make
sure to have a package policy (edit the version in the browser URL and
add the package). Choose a version old enough to have a few more recent
versions to test with.
6. In the custom space, add the package on the same older version in
order to create a package policy. You should now have a package policy
on this older version on both spaces.
7. Go to Dev Tools and issue a rollback request: it should fail with `No
previous version found for package {pkgName}`.
8. Force upgrade the package to a newer (not the latest) version with
   ```
   POST kbn:/api/fleet/epm/packages/{pkgName}/{pkgVersion}
   {
     "force": true
   }
   ```
   This doesn't upgrade the package policies.
9. Issue a rollback request: it should fail with `No previous version
found for package policies: [policyId1, policyId2]`.
10.  Force upgrade the package again to a newer version.
11. In the UI, upgrade both package policies.
12. Go to Dev Tools and issue a rollback request: it should fail with
`Wrong previous version for package policies: [policyId1, policyId2]`.
This is because the package was upgraded from v1 to v2 to v3, but
policies were directly upgraded from v1 to v3.
13. Force upgrade the package again to a newer version.
14. In the UI, upgrade both package policies.
15. Take note of the revisions the agent policies are on.
16. Go to Dev Tools and issue a rollback request: this time, it should
succeed and roll back the package and its policies in both spaces to the
previous version.
17. Check in the UI that the rollback looks correct (agent policy
revisions should have been bumped). For completeness, you can check the
saved objects of the package and the package policies:
* the package SO should have the correct (older) version and
`previous_version: null`
* the package policy SO storing the revision with the previous version
(with id ending in `:prev`) should have been deleted
* the current package policy SO should have the correct (older) version,
`latest_revision: true` and revision number bumped by 1
   * there should be no temporary SO (with id ending in `:copy`)

### Checklist

- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [x] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
- [x] Review the [backport
guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing)
and apply applicable `backport:*` labels.

### Identify risks

Risk of incorrect integration policy data after rolling back the
integration to its previous version.

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:skip This PR does not require backporting release_note:enhancement Team:Fleet Team label for Observability Data Collection Fleet team v9.2.0

6 participants