Ingest collector internal telemetry via in-process hooks by faec · Pull Request #11813 · elastic/elastic-agent

faec · 2025-12-15T02:28:14Z

What does this PR do?

Add a custom telemetry factory and receiver to the OTel collector build to support backwards-compatible ingestion of Collector metrics as ECS metrics that can be viewed in existing Agent dashboards (see #10220 for context on the prometheus-based approach this replaces).

Checklist

I have read and understood the pull request guidelines of this project.
My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have made corresponding change to the default configuration files
I have added tests that prove my fix is effective or that my feature works
I have added an entry in ./changelog/fragments using the changelog tool
I have added an integration test or an E2E test

How to test this PR locally

Running Agent with metrics monitoring enabled and any components using the OTel runtime, you should see collector metric data appear in the metrics.elastic_agent* datastreams.

Related issues

Closes [beats receivers] Replace OTel collector internal telemetry monitoring with a scalable approach #10220

…y-monitoring

elasticmachine · 2025-12-15T02:28:18Z

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

mergify · 2025-12-15T02:28:56Z

This pull request does not have a backport label. Could you fix it @faec? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-./d./d is the label that automatically backports to the 8./d branch. /d is the digit
backport-active-all is the label that automatically backports to all active branches.
backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

elasticmachine · 2025-12-15T07:13:39Z

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

mergify · 2025-12-15T07:43:27Z

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b telemetry-monitoring upstream/telemetry-monitoring
git merge upstream/main
git push upstream telemetry-monitoring

…y-monitoring

…ut is present

cmacknz · 2025-12-19T22:25:37Z

In the test failure, the test is logging the wrong error variable. It's logging the one from _, err = fixture.InstallWithoutEnroll(ctx, &installOpts) which is nil and not the one causing the Eventually to fail to succeed.

elastic-agent/testing/integration/ess/restrict_upgrade_deb_test.go

Lines 49 to 58 in c06e8ca

    
           _, err = fixture.InstallWithoutEnroll(ctx, &installOpts) 
        
           require.NoError(t, err) 
        
           require.Eventuallyf(t, func() bool { 
        
           	err = fixture.IsHealthy(ctx) 
        
           	return err == nil 
        
           }, 5*time.Minute, time.Second, 
        
           	"Elastic-Agent did not report healthy. Agent status error: \"%v\"", 
        
           	err, 
        
           )

…y-monitoring

elasticmachine · 2025-12-20T05:39:19Z

💛 Build succeeded, but was flaky

Buildkite Build
Commit: e2acc27

Failed CI Steps

History

💔 Build #32509 failed 8f9a016
💔 Build #32507 failed 153d972
💔 Build #32476 failed 74b0fa9
💔 Build #32420 failed 31e5677
💔 Build #32386 failed af2ad82

cc @faec

* internal telemetry drafting * monitoring config plumbing * Cleaning up / commenting for review * removing no-longer-used files * cleanup, testing * cleanup, pass through remaining config * remove debug placeholders * clean up receiver code layout * remove debug startup * Make check-ci * Fix formatting in go.mod for pdata dependency * review comments * lint * fix config field name * Move receiver package * Use more translate helpers * white space * add changelog * Disable the previous (prometheus) internal telemetry ingestion component * Lint, log errors * Add remaining translated metrics * fix typo * update tests * lint * add error log * mage check * mage notice * review comment * Stop integration tests from expecting a prometheus monitoring component * Move internal telemetry factory to same module as telemetry receiver * mage check * mage notice * test fixes * adjust default poll interval * add allowable monitoring errors * Adjust expected component count * Only include the otel monitoring receiver if the otel monitoring output is present * remove checks for prometheus-related log warnings * Replace empty test error with verbose failure details * remove duplicated lines * fix error log order --------- Co-authored-by: Pierre HILBERT <pierre.hilbert@elastic.co> (cherry picked from commit 7c84fc3)

* internal telemetry drafting * monitoring config plumbing * Cleaning up / commenting for review * removing no-longer-used files * cleanup, testing * cleanup, pass through remaining config * remove debug placeholders * clean up receiver code layout * remove debug startup * Make check-ci * Fix formatting in go.mod for pdata dependency * review comments * lint * fix config field name * Move receiver package * Use more translate helpers * white space * add changelog * Disable the previous (prometheus) internal telemetry ingestion component * Lint, log errors * Add remaining translated metrics * fix typo * update tests * lint * add error log * mage check * mage notice * review comment * Stop integration tests from expecting a prometheus monitoring component * Move internal telemetry factory to same module as telemetry receiver * mage check * mage notice * test fixes * adjust default poll interval * add allowable monitoring errors * Adjust expected component count * Only include the otel monitoring receiver if the otel monitoring output is present * remove checks for prometheus-related log warnings * Replace empty test error with verbose failure details * remove duplicated lines * fix error log order --------- Co-authored-by: Pierre HILBERT <pierre.hilbert@elastic.co>

…1982) * internal telemetry drafting * monitoring config plumbing * Cleaning up / commenting for review * removing no-longer-used files * cleanup, testing * cleanup, pass through remaining config * remove debug placeholders * clean up receiver code layout * remove debug startup * Make check-ci * Fix formatting in go.mod for pdata dependency * review comments * lint * fix config field name * Move receiver package * Use more translate helpers * white space * add changelog * Disable the previous (prometheus) internal telemetry ingestion component * Lint, log errors * Add remaining translated metrics * fix typo * update tests * lint * add error log * mage check * mage notice * review comment * Stop integration tests from expecting a prometheus monitoring component * Move internal telemetry factory to same module as telemetry receiver * mage check * mage notice * test fixes * adjust default poll interval * add allowable monitoring errors * Adjust expected component count * Only include the otel monitoring receiver if the otel monitoring output is present * remove checks for prometheus-related log warnings * Replace empty test error with verbose failure details * remove duplicated lines * fix error log order --------- (cherry picked from commit 7c84fc3) Co-authored-by: Fae Charlton <fae.charlton@elastic.co> Co-authored-by: Pierre HILBERT <pierre.hilbert@elastic.co> Co-authored-by: Khushi Jain <khushi.jain@elastic.co> Co-authored-by: Michel Laterman <82832767+michel-laterman@users.noreply.github.com> Co-authored-by: Vihas Makwana <121151420+VihasMakwana@users.noreply.github.com>

faec added 10 commits December 12, 2025 11:49

internal telemetry drafting

b6ecbab

Merge branch 'main' of github.com:elastic/elastic-agent into telemetr…

fde8d91

…y-monitoring

monitoring config plumbing

bcbc9f2

Cleaning up / commenting for review

1e99156

removing no-longer-used files

fdede54

cleanup, testing

4d53c00

cleanup, pass through remaining config

d737afb

remove debug placeholders

fd32369

clean up receiver code layout

47e66a1

remove debug startup

b973845

faec self-assigned this Dec 15, 2025

faec requested a review from a team as a code owner December 15, 2025 02:28

faec added enhancement New feature or request Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team labels Dec 15, 2025

faec requested review from pchila and ycombinator December 15, 2025 02:28

pierrehilbert added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Dec 15, 2025

pierrehilbert requested review from blakerouse and swiatekm and removed request for pchila and ycombinator December 15, 2025 07:13

Make check-ci

f97683f

Merge branch 'main' into telemetry-monitoring

0366a5e

pierrehilbert requested review from cmacknz and leehinman December 15, 2025 07:54

Fix formatting in go.mod for pdata dependency

9aee313

test fixes

6780180

faec dismissed cmacknz’s stale review via 6780180 December 17, 2025 21:30

faec added 7 commits December 17, 2025 18:05

Merge branch 'main' of github.com:elastic/elastic-agent into telemetr…

e3fdc46

…y-monitoring

Merge branch 'main' of github.com:elastic/elastic-agent into telemetr…

293c42d

…y-monitoring

adjust default poll interval

4478a6b

add allowable monitoring errors

af2ad82

Adjust expected component count

66e251c

Only include the otel monitoring receiver if the otel monitoring outp…

31e5677

…ut is present

remove checks for prometheus-related log warnings

74b0fa9

faec added 4 commits December 19, 2025 17:45

Merge branch 'main' of github.com:elastic/elastic-agent into telemetr…

d731529

…y-monitoring

Replace empty test error with verbose failure details

153d972

remove duplicated lines

8f9a016

fix error log order

e2acc27

swiatekm approved these changes Dec 22, 2025

View reviewed changes

swiatekm requested review from cmacknz and leehinman December 22, 2025 14:24

leehinman approved these changes Dec 22, 2025

View reviewed changes

pierrehilbert merged commit 7c84fc3 into elastic:main Dec 22, 2025
23 checks passed

mergify bot mentioned this pull request Dec 22, 2025

[9.3] (backport #11813) Ingest collector internal telemetry via in-process hooks #11982

Merged

8 tasks

This was referenced Jan 5, 2026

Add tests for translation from OTel internal telemetry to Beats metrics #12089

Closed

Metrics derived from Collector internal telemetry should be labelled with policy IDs #12114

Closed

swiatekm mentioned this pull request Jan 21, 2026

Remove unused self-monitoring code #12359

Merged

mergify bot mentioned this pull request Jan 22, 2026

[9.3] (backport #12359) Remove unused self-monitoring code #12376

Merged

faec mentioned this pull request Jan 30, 2026

[Meta] Metrics and retry issues in the OTel Elasticsearch exporter #12525

Open

swiatekm mentioned this pull request Feb 9, 2026

Setting status_reporting.enabled: false mutes component level errors but marks agent as degraded anyway #12601

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ingest collector internal telemetry via in-process hooks#11813

Ingest collector internal telemetry via in-process hooks#11813
pierrehilbert merged 50 commits intoelastic:mainfrom
faec:telemetry-monitoring

faec commented Dec 15, 2025

elasticmachine commented Dec 15, 2025

mergify bot commented Dec 15, 2025

elasticmachine commented Dec 15, 2025

mergify bot commented Dec 15, 2025

cmacknz commented Dec 19, 2025

elasticmachine commented Dec 20, 2025

Uh oh!

Labels

6 participants

Conversation

faec commented Dec 15, 2025

What does this PR do?

Checklist

How to test this PR locally

Related issues

elasticmachine commented Dec 15, 2025

mergify bot commented Dec 15, 2025

elasticmachine commented Dec 15, 2025

mergify bot commented Dec 15, 2025

cmacknz commented Dec 19, 2025

elasticmachine commented Dec 20, 2025

💛 Build succeeded, but was flaky

Failed CI Steps

History

Uh oh!

Labels

6 participants