Skip to content

Cleanup available rollbacks#11562

Merged
ebeahan merged 24 commits intoelastic:mainfrom
pchila:cleanup-rollbacks
Jan 12, 2026
Merged

Cleanup available rollbacks#11562
ebeahan merged 24 commits intoelastic:mainfrom
pchila:cleanup-rollbacks

Conversation

@pchila
Copy link
Member

@pchila pchila commented Dec 3, 2025

What does this PR do?

This PR will cleanup available rollbacks when:

  • initiating a new upgrade, to avoid increasing the disk space needed to 3x the size of an agent installation
  • when an available upgrade expires by running a goroutine that will periodically run a check and cleanup.

Why is it important?

To avoid having to clean up manually when upgrading again an agent still within the rollback window and not to wait until the agent restart to clean up obsolete installs

Checklist

  • I have read and understood the pull request guidelines of this project.
  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • [ ] I have added an entry in ./changelog/fragments using the changelog tool
  • [ ] I have added an integration test or an E2E test

Disruptive User Impact

How to test this PR locally

Related issues

Questions to ask yourself

  • How are we going to support this in production?
  • How are we going to measure its adoption?
  • How are we going to debug this?
  • What are the metrics I should take care of?
  • ...
@pchila pchila self-assigned this Dec 3, 2025
@pchila pchila added Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team Team:Elastic-Agent Label for the Agent team labels Dec 3, 2025
@mergify
Copy link
Contributor

mergify bot commented Dec 3, 2025

This pull request does not have a backport label. Could you fix it @pchila? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-./d./d is the label that automatically backports to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.
@pchila pchila added the enhancement New feature or request label Dec 11, 2025
@pchila pchila marked this pull request as ready for review December 11, 2025 12:31
@pchila pchila requested a review from a team as a code owner December 11, 2025 12:31
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

Copy link
Contributor

@blakerouse blakerouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overall looks good, but the part that really prevents me from giving this a +1 is an integration test. I think having this in an integration tests is critical. We should full observe in the test that the available rollback is removed once the upgrade is complete.

@pchila
Copy link
Member Author

pchila commented Dec 12, 2025

This overall looks good, but the part that really prevents me from giving this a +1 is an integration test. I think having this in an integration tests is critical. We should full observe in the test that the available rollback is removed once the upgrade is complete.

@blakerouse have a look at 3d6063a and fea6eb1

Copy link
Member

@cmacknz cmacknz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good after latest changes, just need CI to pass.

…lear_expired_rollback_after_upgrading_to_a_repackaged_version on windows
@ebeahan ebeahan added backport-9.3 Automated backport to the 9.3 branch and removed backport-skip labels Jan 5, 2026
@elasticmachine
Copy link
Contributor

elasticmachine commented Jan 8, 2026

💔 Build Failed

Failed CI Steps

History

cc @pchila

@ebeahan ebeahan merged commit 5e3c865 into elastic:main Jan 12, 2026
22 checks passed
mergify bot pushed a commit that referenced this pull request Jan 12, 2026
* Cleanup rollbacks when triggering a new upgrade

* refactor available rollback normalization at startup

* WIP - scheduled rollback cleanup

* WIP - wire scheduled rollback cleanup at agent startup

* remove appDone handling from PeriodicallyCleanRollbacks

* Pass the correct relative path to the rollback cleanup goroutine

* refactor from commit and repackaged fixtures for rollback tests

* Add Hash() to agent integration test fixture

* Define a minimum cleanup interval for available rollbacks

* create ttl marker files without world-readable permissions

* introduce integration test for automatic cleanup of expired rollbacks

* fixup! Define a minimum cleanup interval for available rollbacks

* fixup! fixup! Define a minimum cleanup interval for available rollbacks

* fixup! Add Hash() to agent integration test fixture

* Add cleanup rollback test for multiple upgrades within the window

* Use an additional subcontext for cleanup goroutine

* add debug logging and skip TestCleanupRollbacks/agent_should_clear_expired_rollback_after_upgrading_to_a_repackaged_version on windows

* fixup! add debug logging and skip TestCleanupRollbacks/agent_should_clear_expired_rollback_after_upgrading_to_a_repackaged_version on windows

* Check upgrade details status more frequently during upgrade integration tests

---------

Co-authored-by: Eric Beahan <eric.beahan@elastic.co>
(cherry picked from commit 5e3c865)
ebeahan added a commit that referenced this pull request Jan 14, 2026
* Cleanup rollbacks when triggering a new upgrade

* refactor available rollback normalization at startup

* WIP - scheduled rollback cleanup

* WIP - wire scheduled rollback cleanup at agent startup

* remove appDone handling from PeriodicallyCleanRollbacks

* Pass the correct relative path to the rollback cleanup goroutine

* refactor from commit and repackaged fixtures for rollback tests

* Add Hash() to agent integration test fixture

* Define a minimum cleanup interval for available rollbacks

* create ttl marker files without world-readable permissions

* introduce integration test for automatic cleanup of expired rollbacks

* fixup! Define a minimum cleanup interval for available rollbacks

* fixup! fixup! Define a minimum cleanup interval for available rollbacks

* fixup! Add Hash() to agent integration test fixture

* Add cleanup rollback test for multiple upgrades within the window

* Use an additional subcontext for cleanup goroutine

* add debug logging and skip TestCleanupRollbacks/agent_should_clear_expired_rollback_after_upgrading_to_a_repackaged_version on windows

* fixup! add debug logging and skip TestCleanupRollbacks/agent_should_clear_expired_rollback_after_upgrading_to_a_repackaged_version on windows

* Check upgrade details status more frequently during upgrade integration tests

---------


(cherry picked from commit 5e3c865)

Co-authored-by: Paolo Chilà <paolo.chila@elastic.co>
Co-authored-by: Eric Beahan <eric.beahan@elastic.co>
swiatekm pushed a commit that referenced this pull request Jan 14, 2026
* Cleanup rollbacks when triggering a new upgrade

* refactor available rollback normalization at startup

* WIP - scheduled rollback cleanup

* WIP - wire scheduled rollback cleanup at agent startup

* remove appDone handling from PeriodicallyCleanRollbacks

* Pass the correct relative path to the rollback cleanup goroutine

* refactor from commit and repackaged fixtures for rollback tests

* Add Hash() to agent integration test fixture

* Define a minimum cleanup interval for available rollbacks

* create ttl marker files without world-readable permissions

* introduce integration test for automatic cleanup of expired rollbacks

* fixup! Define a minimum cleanup interval for available rollbacks

* fixup! fixup! Define a minimum cleanup interval for available rollbacks

* fixup! Add Hash() to agent integration test fixture

* Add cleanup rollback test for multiple upgrades within the window

* Use an additional subcontext for cleanup goroutine

* add debug logging and skip TestCleanupRollbacks/agent_should_clear_expired_rollback_after_upgrading_to_a_repackaged_version on windows

* fixup! add debug logging and skip TestCleanupRollbacks/agent_should_clear_expired_rollback_after_upgrading_to_a_repackaged_version on windows

* Check upgrade details status more frequently during upgrade integration tests

---------

Co-authored-by: Eric Beahan <eric.beahan@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-9.3 Automated backport to the 9.3 branch enhancement New feature or request skip-changelog Team:Elastic-Agent Label for the Agent team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

5 participants