Ensure Agent configuration and state persist across restarts in Fleet mode#8856
Ensure Agent configuration and state persist across restarts in Fleet mode#8856rhr323 merged 11 commits intoelastic:mainfrom
Conversation
mode. Signed-off-by: Michael Montgomery <mmontg1@gmail.com>
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
Signed-off-by: Michael Montgomery <mmontg1@gmail.com>
|
buildkite test this |
|
On 8.13.0 I ran into: Also, we need to verify that advanced configuration still works. |
|
Another issue 9.1.2 when upgrading from ECK 3.1. to this PR: |
|
The errors I hightlighted above are all coming from the migration from the status quo in 3.1 where The guidance from @pkoutsovasilis is to force re-enroll with if [[ ! -f "/usr/share/elastic-agent/state/eck.config_migrated" ]]; then
echo "Attempting to remove fleet.enc and fleet.enc.lock from state path (ignore if not present)"
rm -f "/usr/share/elastic-agent/state/fleet.enc" "/usr/share/elastic-agent/state/fleet.enc.lock" 2>/dev/null || true
echo "Creating eck.config_migrated marker"
touch "/usr/share/elastic-agent/state/eck.config_migrated"
fi
Update: I am suspecting the migration is only necessary if a user went back and forth between |
Signed-off-by: Michael Montgomery <mmontg1@gmail.com>
Signed-off-by: Michael Montgomery <mmontg1@gmail.com>
Signed-off-by: Michael Montgomery <mmontg1@gmail.com>
Signed-off-by: Michael Montgomery <mmontg1@gmail.com>
Signed-off-by: Michael Montgomery <mmontg1@gmail.com>
Signed-off-by: Michael Montgomery <mmontg1@gmail.com>
Signed-off-by: Michael Montgomery <mmontg1@gmail.com>
|
8.13 fleet-mode is throwing errors with this setup, specifically with the fleet-server agent: Didn't investigate the issue as of yet... 7.17, 8.18, 9.x fleet-mode doesn't have the issue... |
|
Testing advanced (9.2.0-snap) doesn't seem to work: is this what you wanted applied @pebrc ? I'm not terribly familiar with this advanced config... |
I think you are right. The implementation from this PR seems to break advanced config mode. The previous implementation shipped with 3.1 declared |
|
After speaking with @pkoutsovasilis the relevant code on agent is https://github.com/elastic/elastic-agent/blob/6186951dfbad2a7e0e1a37c26097d5b4d9d38dba/internal/pkg/agent/cmd/container.go#L853-L858 It seems there is a bug in that this code only checks if the file is there but does not actually take contents into account. We could look into an init container that removes the file if it exists to force agent to copy it again. |
This might not be necessary after all. It seems that |
pebrc
left a comment
There was a problem hiding this comment.
LGTM I did extensive testing on this one. @rhr323 we need a known issue entry for this one. If users downgrade to 3.1 or before and upgrade again to 3.2 they might run into errors like
│ Error: fail to read state store '/usr/share/elastic-agent/state/data/state.enc': failed migrating YAML store JSON store: could not parse YAML │
│ fail to decode bytes: cipher: message authentication failed
or
Error: fail to read action store '/usr/share/elastic-agent/state/data/action_store.yml': yaml: input error: fail to decode bytes: cipher: message authentication failed
in these cases they should add the FLEET_FORCE=true environment variable to their manifest to force agent to enrol anew (it can be removed once the agent has re-enroled)
…eet mode (elastic#8856) * Consistently mount CONFIG_PATH and STATE_PATH to same directory in Fleet mode. Signed-off-by: Michael Montgomery <mmontg1@gmail.com> * Add comment Signed-off-by: Michael Montgomery <mmontg1@gmail.com> * Maybe add init container. Signed-off-by: Michael Montgomery <mmontg1@gmail.com> * Fix to be deployment specific Signed-off-by: Michael Montgomery <mmontg1@gmail.com> * remove unneeded call Signed-off-by: Michael Montgomery <mmontg1@gmail.com> * Add some logging Signed-off-by: Michael Montgomery <mmontg1@gmail.com> * Fix bug Signed-off-by: Michael Montgomery <mmontg1@gmail.com> * Add check for existing init container Signed-off-by: Michael Montgomery <mmontg1@gmail.com> * Add volume mount Signed-off-by: Michael Montgomery <mmontg1@gmail.com> * Remove test code to add init container. Signed-off-by: Michael Montgomery <mmontg1@gmail.com> * Adjust test expecations to new command --------- Signed-off-by: Michael Montgomery <mmontg1@gmail.com> Co-authored-by: Peter Brachwitz <peter.brachwitz@elastic.co> (cherry picked from commit 63bff02)
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
…eet mode (#8856) (#8859) * Consistently mount CONFIG_PATH and STATE_PATH to same directory in Fleet mode. * Add comment * Maybe add init container. * Fix to be deployment specific * remove unneeded call * Add some logging * Fix bug * Add check for existing init container * Add volume mount * Remove test code to add init container. * Adjust test expecations to new command --------- (cherry picked from commit 63bff02) Signed-off-by: Michael Montgomery <mmontg1@gmail.com> Co-authored-by: Michael Montgomery <mmontg1@gmail.com> Co-authored-by: Peter Brachwitz <peter.brachwitz@elastic.co>
) We ran into issues with Fleet server no longer enroling with the changes from #8856 This proposes to version gate the functionality to Elastic agent versions that actually support advanced configuration.
…astic#8869) We ran into issues with Fleet server no longer enroling with the changes from elastic#8856 This proposes to version gate the functionality to Elastic agent versions that actually support advanced configuration. (cherry picked from commit 70d91eb)
This PR contains the following updates: | Package | Update | Change | |---|---|---| | [eck-operator](https://github.com/elastic/cloud-on-k8s) | minor | `3.1.0` -> `3.2.0` | --- ### Release Notes <details> <summary>elastic/cloud-on-k8s (eck-operator)</summary> ### [`v3.2.0`](https://github.com/elastic/cloud-on-k8s/releases/tag/v3.2.0) [Compare Source](elastic/cloud-on-k8s@v3.1.0...v3.2.0) ### Elastic Cloud on Kubernetes 3.2.0 - [Quickstart guide](https://www.elastic.co/docs/deploy-manage/deploy/cloud-on-k8s#eck-quickstart) ##### Release Highlights ##### Automatic pod disruption budget (Enterprise feature) ECK now offers better out-of-the-box PodDisruptionBudgets that automatically keep your cluster available as Pods move across nodes. The new policy calculates the number of Pods per tier that can sustain replacement and automatically generates a PodDisruptionBudget for each tier, enabling the Elasticsearch cluster to vacate Kubernetes nodes more quickly, while considering cluster health, without interruption. ##### User Password Generation (Enterprise feature) ECK will now generate longer passwords by default for the administrative user of each Elasticsearch cluster. The password is 24 characters in length by default (can be configured to a maximum of 72 characters), incorporating alphabetic and numeric characters, to make password complexity stronger. ##### Features and enhancements - Enable certificate reloading for stack monitoring Beats [#​8833](elastic/cloud-on-k8s#8833) (issue: [#​5448](elastic/cloud-on-k8s#5448)) - Allow configuration of file-based password character set and length [#​8817](elastic/cloud-on-k8s#8817) (issues: [#​2795](elastic/cloud-on-k8s#2795), [#​8693](elastic/cloud-on-k8s#8693)) - Automatically set GOMEMLIMIT based on cgroups memory limits [#​8814](elastic/cloud-on-k8s#8814) (issue: [#​8790](elastic/cloud-on-k8s#8790)) - Introduce granular PodDisruptionBudgets based on node roles [#​8780](elastic/cloud-on-k8s#8780) (issue: [#​2936](elastic/cloud-on-k8s#2936)) ##### Fixes - Gate advanced Fleet config logic to Agent v8.13 and later [#​8869](elastic/cloud-on-k8s#8869) - Ensure Agent configuration and state persist across restarts in Fleet mode [#​8856](elastic/cloud-on-k8s#8856) (issue: [#​8819](elastic/cloud-on-k8s#8819)) - Do not set credentials label on Kibana config secret [#​8852](elastic/cloud-on-k8s#8852) (issue: [#​8839](elastic/cloud-on-k8s#8839)) - Allow elasticsearchRef.secretName in Kibana helm validation [#​8822](elastic/cloud-on-k8s#8822) (issue: [#​8816](elastic/cloud-on-k8s#8816)) ##### Documentation improvements - Update Logstash recipes from to filestream input [#​8801](elastic/cloud-on-k8s#8801) - Recipe for exposing Fleet server to outside of the Kubernetes cluster [#​8788](elastic/cloud-on-k8s#8788) - Clarify secretName restrictions [#​8782](elastic/cloud-on-k8s#8782) - Update ES\_JAVA\_OPTS comments and explain auto-heap behavior [#​8753](elastic/cloud-on-k8s#8753) ##### Dependency updates - github.com/gkampitakis/go-snaps v0.5.13 => v0.5.15 - github.com/hashicorp/vault/api v1.20.0 => v1.22.0 - github.com/KimMachineGun/automemlimit => v0.7.4 - github.com/prometheus/client\_golang v1.22.0 => v1.23.2 - github.com/prometheus/common v0.65.0 => v0.67.1 - github.com/sethvargo/go-password v0.3.1 => REMOVED - github.com/spf13/cobra v1.9.1 => v1.10.1 - github.com/spf13/pflag v1.0.6 => v1.0.10 - github.com/spf13/viper v1.20.1 => v1.21.0 - github.com/stretchr/testify v1.10.0 => v1.11.1 - golang.org/x/crypto v0.40.0 => v0.43.0 - k8s.io/api v0.33.2 => v0.34.1 - k8s.io/apimachinery v0.33.2 => v0.34.1 - k8s.io/client-go v0.33.2 => v0.34.1 - k8s.io/utils v0.0.0-20241104100929-3ea5e8cea738 => v0.0.0-20250604170112-4c0f3b243397 - sigs.k8s.io/controller-runtime v0.21.0 => v0.22.2 - sigs.k8s.io/controller-tools v0.18.0 => v0.19.0 </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0MS4xNTUuNCIsInVwZGF0ZWRJblZlciI6IjQxLjE1NS40IiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJjaGFydCJdfQ==--> Reviewed-on: https://gitea.alexlebens.dev/alexlebens/infrastructure/pulls/1911 Co-authored-by: Renovate Bot <renovate-bot@alexlebens.net> Co-committed-by: Renovate Bot <renovate-bot@alexlebens.net>
Resolves #8819
Related: elastic/elastic-agent#5185
Testing procedure
Bring es/kibana/agents online. Verified fleet page in kibana in fleet-mode, verified logs of agents themselves in non-fleet-mode. Restarted/killed agent pods, and ensured that the ones in the fleet-ui in Kibana didn't change names, or additional new ones weren't added.
For advanced configuration:
Add the following to the Agent manifest:
Exec into running agent pod and execute
elastic-agent inspectverify that the following is present in the output:Tested