Skip to content

fix: ensure OTel collector receives persisted service.telemetry config#12736

Merged
rubenruizdegauna merged 3 commits intoelastic:mainfrom
rubenruizdegauna:fix/otel-collector-config-merge
Feb 18, 2026
Merged

fix: ensure OTel collector receives persisted service.telemetry config#12736
rubenruizdegauna merged 3 commits intoelastic:mainfrom
rubenruizdegauna:fix/otel-collector-config-merge

Conversation

@rubenruizdegauna
Copy link
Member

@rubenruizdegauna rubenruizdegauna commented Feb 12, 2026

Summary

Fixes a timing bug where the OTel collector was not receiving service.telemetry configuration from the persisted config file (elastic-agent.yml). This affected agentless deployments in Kubernetes where the initial configuration should override Fleet settings.

Changes:

  • Move c.otelCfg assignment to occur after applyPersistedConfig() but before refreshComponentModel()
  • Add comprehensive unit test covering three merge scenarios
  • Remove debug logging added during investigation

Problem

The persisted configuration containing service.telemetry settings was successfully merged but never reached the OTel manager due to incorrect timing of when c.otelCfg was set.

Before Fix

sequenceDiagram
    participant PC as processConfig
    participant PCA as processConfigAgent
    participant APC as applyPersistedConfig
    participant RCM as refreshComponentModel
    participant UMC as updateManagersWithConfig
    participant OM as OTelManager

    PC->>PCA: call with cfg (from Fleet)
    PCA->>APC: merge persisted config
    Note over APC: cfg.OTel now has service.telemetry ✅
    PCA->>RCM: generate components
    RCM->>UMC: update managers
    Note over UMC: reads c.otelCfg (still nil!) ❌
    UMC->>OM: Update(nil config)
    Note over OM: receives nil, service.telemetry lost ❌
    PC->>PC: c.otelCfg = cfg.OTel (too late!)
Loading

After Fix

sequenceDiagram
    participant PC as processConfig
    participant PCA as processConfigAgent
    participant APC as applyPersistedConfig
    participant RCM as refreshComponentModel
    participant UMC as updateManagersWithConfig
    participant OM as OTelManager

    PC->>PCA: call with cfg
    PCA->>APC: merge persisted config
    Note over APC: cfg.OTel has service.telemetry ✅
    PCA->>PCA: c.otelCfg = cfg.OTel ✅
    PCA->>RCM: generate components
    RCM->>UMC: update managers
    Note over UMC: reads c.otelCfg (has service.telemetry!) ✅
    UMC->>OM: Update(c.otelCfg with service.telemetry)
    Note over OM: receives correct config! ✅
Loading

Testing

Added Test_Coordinator_OTelManagerReceivesPersistedConfig with three test cases:

  1. Local OTel collector + Fleet with OTel config (merged): Verifies both configs merge correctly
  2. Local OTel collector, no Fleet OTel config: Verifies persisted config is used entirely
  3. No local collector config, OTel config in Fleet: Verifies Fleet config is used when no persisted config

All tests verify that the OTel manager receives the correct merged configuration at the right time.

Test Results

✅ Test_Coordinator_OTelManagerReceivesPersistedConfig - PASS (all 3 cases)
✅ Test_Coordinator_ProcessConfig - PASS (existing test)
✅ Test_ApplyPersistedConfig_OTelService - PASS (existing test)

Impact

  • Agentless deployments can now properly instrument the OTel collector with custom telemetry configuration
  • service.telemetry settings from initial config are now correctly applied to the collector
  • Enables proper observability of the OTel collector itself via OTLP endpoints

Files Changed

  • internal/pkg/agent/application/coordinator/coordinator.go: Fixed timing of c.otelCfg assignment
  • internal/pkg/agent/application/coordinator/coordinator_test.go: Added comprehensive unit test
  • internal/pkg/otel/manager/manager.go: Removed debug logging
  • internal/pkg/otel/manager/execution_subprocess.go: Removed debug logging

Made with Cursor

Fixes: #12737

@mergify
Copy link
Contributor

mergify bot commented Feb 12, 2026

This pull request does not have a backport label. Could you fix it @rubenruizdegauna? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-./d./d is the label that automatically backports to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.
rubenruizdegauna and others added 2 commits February 12, 2026 11:51
Fix timing issue where c.otelCfg was set after refreshComponentModel was
called, causing the OTel manager to receive nil configuration instead of
the merged persisted + Fleet config.

The fix moves the c.otelCfg assignment to occur after applyPersistedConfig
but before refreshComponentModel, ensuring the OTel manager receives the
complete merged configuration including service.telemetry settings.

This enables agentless deployments to properly instrument the OTel
collector with custom telemetry configuration.

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@rubenruizdegauna rubenruizdegauna force-pushed the fix/otel-collector-config-merge branch from cb56474 to 52510b6 Compare February 12, 2026 10:51
@ebeahan ebeahan added the backport-active-all Automated backport with mergify to all the active branches label Feb 12, 2026
@ebeahan
Copy link
Member

ebeahan commented Feb 12, 2026

I set the backport label to all active - 8.19, 9.2, and 9.3.

Reviewers - feel free to adjust backport targets if that's unnecessary.

@cmacknz
Copy link
Member

cmacknz commented Feb 13, 2026

Fix itself looks reasonable, but I think the tests need some work. If I move the c.otelCfg = cfg.OTel back to it's original location all the new tests still pass.

Edit: had a typo in my go test -run syntax, see update below.

@cmacknz
Copy link
Member

cmacknz commented Feb 13, 2026

Turns out I had a typo in my go test -run syntax when I tried this. If I undo your change here a test does fail:

❯ go test ./internal/pkg/agent/application/coordinator/... -run 'Test_ApplyPersistedConfig_OTelService|Test_Coordinator_ProcessConfig|Test_Coordinator_OTelManagerReceivesPersistedConfig'
--- FAIL: Test_Coordinator_ProcessConfig (0.01s)
    --- FAIL: Test_Coordinator_ProcessConfig/fleet_has_no_otel,_persisted_adds_otel_service (0.00s)
        coordinator_test.go:1546:
                Error Trace:    /Users/cmackenzie/go/src/github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator/coordinator_test.go:1546
                Error:          Expected value not to be nil.
                Test:           Test_Coordinator_ProcessConfig/fleet_has_no_otel,_persisted_adds_otel_service
                Messages:       c.otelCfg should be set when OTel config exists
--- FAIL: Test_Coordinator_OTelManagerReceivesPersistedConfig (0.01s)
    --- FAIL: Test_Coordinator_OTelManagerReceivesPersistedConfig/local_otel_collector_but_no_otel_collector_config_from_fleet (0.00s)
        coordinator_test.go:1757:
                Error Trace:    /Users/cmackenzie/go/src/github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator/coordinator_test.go:1757
                Error:          Expected value not to be nil.
                Test:           Test_Coordinator_OTelManagerReceivesPersistedConfig/local_otel_collector_but_no_otel_collector_config_from_fleet
                Messages:       OTel manager should have received a configuration
FAIL
FAIL    github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator     0.406s
?       github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator/mocks       [no test files]
FAIL

This aligns with what I see when reading the test implementation, that likely Test_Coordinator_OTelManagerReceivesPersistedConfig is all you need to keep out of the three cases.

@pierrehilbert pierrehilbert added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Feb 14, 2026
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@elasticmachine
Copy link
Contributor

💛 Build succeeded, but was flaky

Failed CI Steps

History

cc @rubenruizdegauna

@cmacknz
Copy link
Member

cmacknz commented Feb 17, 2026

LGTM, thanks!

@rubenruizdegauna rubenruizdegauna merged commit 6e8efaf into elastic:main Feb 18, 2026
22 checks passed
@github-actions
Copy link
Contributor

@Mergifyio backport 8.19 9.2 9.3

@mergify
Copy link
Contributor

mergify bot commented Feb 18, 2026

backport 8.19 9.2 9.3

✅ Backports have been created

Details

Cherry-pick of 6e8efaf has failed:

On branch mergify/bp/8.19/pr-12736
Your branch is up to date with 'origin/8.19'.

You are currently cherry-picking commit 6e8efafe5.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	new file:   changelog/fragments/1770892325-fix-otel-collector-config-merge.yaml

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   internal/pkg/agent/application/coordinator/coordinator.go
	both modified:   internal/pkg/agent/application/coordinator/coordinator_test.go

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

Cherry-pick of 6e8efaf has failed:

On branch mergify/bp/9.2/pr-12736
Your branch is up to date with 'origin/9.2'.

You are currently cherry-picking commit 6e8efafe5.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	new file:   changelog/fragments/1770892325-fix-otel-collector-config-merge.yaml

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   internal/pkg/agent/application/coordinator/coordinator.go
	both modified:   internal/pkg/agent/application/coordinator/coordinator_test.go

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

mergify bot pushed a commit that referenced this pull request Feb 18, 2026
#12736)

* fix: ensure OTel collector receives persisted service.telemetry config

Fix timing issue where c.otelCfg was set after refreshComponentModel was
called, causing the OTel manager to receive nil configuration instead of
the merged persisted + Fleet config.

The fix moves the c.otelCfg assignment to occur after applyPersistedConfig
but before refreshComponentModel, ensuring the OTel manager receives the
complete merged configuration including service.telemetry settings.

This enables agentless deployments to properly instrument the OTel
collector with custom telemetry configuration.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Add changelog fragment

Co-authored-by: Cursor <cursoragent@cursor.com>

* remove unnecessary tests

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
(cherry picked from commit 6e8efaf)

# Conflicts:
#	internal/pkg/agent/application/coordinator/coordinator.go
#	internal/pkg/agent/application/coordinator/coordinator_test.go
mergify bot pushed a commit that referenced this pull request Feb 18, 2026
#12736)

* fix: ensure OTel collector receives persisted service.telemetry config

Fix timing issue where c.otelCfg was set after refreshComponentModel was
called, causing the OTel manager to receive nil configuration instead of
the merged persisted + Fleet config.

The fix moves the c.otelCfg assignment to occur after applyPersistedConfig
but before refreshComponentModel, ensuring the OTel manager receives the
complete merged configuration including service.telemetry settings.

This enables agentless deployments to properly instrument the OTel
collector with custom telemetry configuration.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Add changelog fragment

Co-authored-by: Cursor <cursoragent@cursor.com>

* remove unnecessary tests

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
(cherry picked from commit 6e8efaf)

# Conflicts:
#	internal/pkg/agent/application/coordinator/coordinator.go
#	internal/pkg/agent/application/coordinator/coordinator_test.go
mergify bot pushed a commit that referenced this pull request Feb 18, 2026
#12736)

* fix: ensure OTel collector receives persisted service.telemetry config

Fix timing issue where c.otelCfg was set after refreshComponentModel was
called, causing the OTel manager to receive nil configuration instead of
the merged persisted + Fleet config.

The fix moves the c.otelCfg assignment to occur after applyPersistedConfig
but before refreshComponentModel, ensuring the OTel manager receives the
complete merged configuration including service.telemetry settings.

This enables agentless deployments to properly instrument the OTel
collector with custom telemetry configuration.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Add changelog fragment

Co-authored-by: Cursor <cursoragent@cursor.com>

* remove unnecessary tests

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
(cherry picked from commit 6e8efaf)
rubenruizdegauna added a commit that referenced this pull request Feb 18, 2026
#12736) (#12831)

* fix: ensure OTel collector receives persisted service.telemetry config

Fix timing issue where c.otelCfg was set after refreshComponentModel was
called, causing the OTel manager to receive nil configuration instead of
the merged persisted + Fleet config.

The fix moves the c.otelCfg assignment to occur after applyPersistedConfig
but before refreshComponentModel, ensuring the OTel manager receives the
complete merged configuration including service.telemetry settings.

This enables agentless deployments to properly instrument the OTel
collector with custom telemetry configuration.



* Add changelog fragment



* remove unnecessary tests

---------


(cherry picked from commit 6e8efaf)

Co-authored-by: Ruben Ruiz de Gauna <rubenruizdegauna@proton.me>
Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-active-all Automated backport with mergify to all the active branches Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

5 participants