fix: ensure OTel collector receives persisted service.telemetry config#12736
Conversation
|
This pull request does not have a backport label. Could you fix it @rubenruizdegauna? 🙏
|
Fix timing issue where c.otelCfg was set after refreshComponentModel was called, causing the OTel manager to receive nil configuration instead of the merged persisted + Fleet config. The fix moves the c.otelCfg assignment to occur after applyPersistedConfig but before refreshComponentModel, ensuring the OTel manager receives the complete merged configuration including service.telemetry settings. This enables agentless deployments to properly instrument the OTel collector with custom telemetry configuration. Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
cb56474 to
52510b6
Compare
|
I set the Reviewers - feel free to adjust backport targets if that's unnecessary. |
|
Edit: had a typo in my go test -run syntax, see update below. |
|
Turns out I had a typo in my This aligns with what I see when reading the test implementation, that likely |
|
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
💛 Build succeeded, but was flaky
Failed CI StepsHistory
|
|
LGTM, thanks! |
|
@Mergifyio backport 8.19 9.2 9.3 |
✅ Backports have been createdDetails
Cherry-pick of 6e8efaf has failed: To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally
Cherry-pick of 6e8efaf has failed: To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally
|
#12736) * fix: ensure OTel collector receives persisted service.telemetry config Fix timing issue where c.otelCfg was set after refreshComponentModel was called, causing the OTel manager to receive nil configuration instead of the merged persisted + Fleet config. The fix moves the c.otelCfg assignment to occur after applyPersistedConfig but before refreshComponentModel, ensuring the OTel manager receives the complete merged configuration including service.telemetry settings. This enables agentless deployments to properly instrument the OTel collector with custom telemetry configuration. Co-authored-by: Cursor <cursoragent@cursor.com> * Add changelog fragment Co-authored-by: Cursor <cursoragent@cursor.com> * remove unnecessary tests --------- Co-authored-by: Cursor <cursoragent@cursor.com> (cherry picked from commit 6e8efaf) # Conflicts: # internal/pkg/agent/application/coordinator/coordinator.go # internal/pkg/agent/application/coordinator/coordinator_test.go
#12736) * fix: ensure OTel collector receives persisted service.telemetry config Fix timing issue where c.otelCfg was set after refreshComponentModel was called, causing the OTel manager to receive nil configuration instead of the merged persisted + Fleet config. The fix moves the c.otelCfg assignment to occur after applyPersistedConfig but before refreshComponentModel, ensuring the OTel manager receives the complete merged configuration including service.telemetry settings. This enables agentless deployments to properly instrument the OTel collector with custom telemetry configuration. Co-authored-by: Cursor <cursoragent@cursor.com> * Add changelog fragment Co-authored-by: Cursor <cursoragent@cursor.com> * remove unnecessary tests --------- Co-authored-by: Cursor <cursoragent@cursor.com> (cherry picked from commit 6e8efaf) # Conflicts: # internal/pkg/agent/application/coordinator/coordinator.go # internal/pkg/agent/application/coordinator/coordinator_test.go
#12736) * fix: ensure OTel collector receives persisted service.telemetry config Fix timing issue where c.otelCfg was set after refreshComponentModel was called, causing the OTel manager to receive nil configuration instead of the merged persisted + Fleet config. The fix moves the c.otelCfg assignment to occur after applyPersistedConfig but before refreshComponentModel, ensuring the OTel manager receives the complete merged configuration including service.telemetry settings. This enables agentless deployments to properly instrument the OTel collector with custom telemetry configuration. Co-authored-by: Cursor <cursoragent@cursor.com> * Add changelog fragment Co-authored-by: Cursor <cursoragent@cursor.com> * remove unnecessary tests --------- Co-authored-by: Cursor <cursoragent@cursor.com> (cherry picked from commit 6e8efaf)
#12736) (#12831) * fix: ensure OTel collector receives persisted service.telemetry config Fix timing issue where c.otelCfg was set after refreshComponentModel was called, causing the OTel manager to receive nil configuration instead of the merged persisted + Fleet config. The fix moves the c.otelCfg assignment to occur after applyPersistedConfig but before refreshComponentModel, ensuring the OTel manager receives the complete merged configuration including service.telemetry settings. This enables agentless deployments to properly instrument the OTel collector with custom telemetry configuration. * Add changelog fragment * remove unnecessary tests --------- (cherry picked from commit 6e8efaf) Co-authored-by: Ruben Ruiz de Gauna <rubenruizdegauna@proton.me> Co-authored-by: Cursor <cursoragent@cursor.com>
Summary
Fixes a timing bug where the OTel collector was not receiving
service.telemetryconfiguration from the persisted config file (elastic-agent.yml). This affected agentless deployments in Kubernetes where the initial configuration should override Fleet settings.Changes:
c.otelCfgassignment to occur afterapplyPersistedConfig()but beforerefreshComponentModel()Problem
The persisted configuration containing
service.telemetrysettings was successfully merged but never reached the OTel manager due to incorrect timing of whenc.otelCfgwas set.Before Fix
sequenceDiagram participant PC as processConfig participant PCA as processConfigAgent participant APC as applyPersistedConfig participant RCM as refreshComponentModel participant UMC as updateManagersWithConfig participant OM as OTelManager PC->>PCA: call with cfg (from Fleet) PCA->>APC: merge persisted config Note over APC: cfg.OTel now has service.telemetry ✅ PCA->>RCM: generate components RCM->>UMC: update managers Note over UMC: reads c.otelCfg (still nil!) ❌ UMC->>OM: Update(nil config) Note over OM: receives nil, service.telemetry lost ❌ PC->>PC: c.otelCfg = cfg.OTel (too late!)After Fix
sequenceDiagram participant PC as processConfig participant PCA as processConfigAgent participant APC as applyPersistedConfig participant RCM as refreshComponentModel participant UMC as updateManagersWithConfig participant OM as OTelManager PC->>PCA: call with cfg PCA->>APC: merge persisted config Note over APC: cfg.OTel has service.telemetry ✅ PCA->>PCA: c.otelCfg = cfg.OTel ✅ PCA->>RCM: generate components RCM->>UMC: update managers Note over UMC: reads c.otelCfg (has service.telemetry!) ✅ UMC->>OM: Update(c.otelCfg with service.telemetry) Note over OM: receives correct config! ✅Testing
Added
Test_Coordinator_OTelManagerReceivesPersistedConfigwith three test cases:All tests verify that the OTel manager receives the correct merged configuration at the right time.
Test Results
Impact
Files Changed
internal/pkg/agent/application/coordinator/coordinator.go: Fixed timing ofc.otelCfgassignmentinternal/pkg/agent/application/coordinator/coordinator_test.go: Added comprehensive unit testinternal/pkg/otel/manager/manager.go: Removed debug logginginternal/pkg/otel/manager/execution_subprocess.go: Removed debug loggingMade with Cursor
Fixes: #12737