Skip to content

Ingest collector internal telemetry via in-process hooks#11813

Merged
pierrehilbert merged 50 commits intoelastic:mainfrom
faec:telemetry-monitoring
Dec 22, 2025
Merged

Ingest collector internal telemetry via in-process hooks#11813
pierrehilbert merged 50 commits intoelastic:mainfrom
faec:telemetry-monitoring

Conversation

@faec
Copy link
Contributor

@faec faec commented Dec 15, 2025

What does this PR do?

Add a custom telemetry factory and receiver to the OTel collector build to support backwards-compatible ingestion of Collector metrics as ECS metrics that can be viewed in existing Agent dashboards (see #10220 for context on the prometheus-based approach this replaces).

Checklist

  • I have read and understood the pull request guidelines of this project.
  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

How to test this PR locally

Running Agent with metrics monitoring enabled and any components using the OTel runtime, you should see collector metric data appear in the metrics.elastic_agent* datastreams.

Related issues

@faec faec self-assigned this Dec 15, 2025
@faec faec requested a review from a team as a code owner December 15, 2025 02:28
@faec faec added enhancement New feature or request Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team labels Dec 15, 2025
@faec faec requested review from pchila and ycombinator December 15, 2025 02:28
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@mergify
Copy link
Contributor

mergify bot commented Dec 15, 2025

This pull request does not have a backport label. Could you fix it @faec? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-./d./d is the label that automatically backports to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.
@pierrehilbert pierrehilbert added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Dec 15, 2025
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@pierrehilbert pierrehilbert requested review from blakerouse and swiatekm and removed request for pchila and ycombinator December 15, 2025 07:13
@mergify
Copy link
Contributor

mergify bot commented Dec 15, 2025

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b telemetry-monitoring upstream/telemetry-monitoring
git merge upstream/main
git push upstream telemetry-monitoring
@cmacknz
Copy link
Member

cmacknz commented Dec 19, 2025

In the test failure, the test is logging the wrong error variable. It's logging the one from _, err = fixture.InstallWithoutEnroll(ctx, &installOpts) which is nil and not the one causing the Eventually to fail to succeed.

_, err = fixture.InstallWithoutEnroll(ctx, &installOpts)
require.NoError(t, err)
require.Eventuallyf(t, func() bool {
err = fixture.IsHealthy(ctx)
return err == nil
}, 5*time.Minute, time.Second,
"Elastic-Agent did not report healthy. Agent status error: \"%v\"",
err,
)

@elasticmachine
Copy link
Contributor

@pierrehilbert pierrehilbert merged commit 7c84fc3 into elastic:main Dec 22, 2025
23 checks passed
mergify bot pushed a commit that referenced this pull request Dec 22, 2025
* internal telemetry drafting

* monitoring config plumbing

* Cleaning up / commenting for review

* removing no-longer-used files

* cleanup, testing

* cleanup, pass through remaining config

* remove debug placeholders

* clean up receiver code layout

* remove debug startup

* Make check-ci

* Fix formatting in go.mod for pdata dependency

* review comments

* lint

* fix config field name

* Move receiver package

* Use more translate helpers

* white space

* add changelog

* Disable the previous (prometheus) internal telemetry ingestion component

* Lint, log errors

* Add remaining translated metrics

* fix typo

* update tests

* lint

* add error log

* mage check

* mage notice

* review comment

* Stop integration tests from expecting a prometheus monitoring component

* Move internal telemetry factory to same module as telemetry receiver

* mage check

* mage notice

* test fixes

* adjust default poll interval

* add allowable monitoring errors

* Adjust expected component count

* Only include the otel monitoring receiver if the otel monitoring output is present

* remove checks for prometheus-related log warnings

* Replace empty test error with verbose failure details

* remove duplicated lines

* fix error log order

---------

Co-authored-by: Pierre HILBERT <pierre.hilbert@elastic.co>
(cherry picked from commit 7c84fc3)
VihasMakwana pushed a commit to VihasMakwana/elastic-agent that referenced this pull request Dec 23, 2025
* internal telemetry drafting

* monitoring config plumbing

* Cleaning up / commenting for review

* removing no-longer-used files

* cleanup, testing

* cleanup, pass through remaining config

* remove debug placeholders

* clean up receiver code layout

* remove debug startup

* Make check-ci

* Fix formatting in go.mod for pdata dependency

* review comments

* lint

* fix config field name

* Move receiver package

* Use more translate helpers

* white space

* add changelog

* Disable the previous (prometheus) internal telemetry ingestion component

* Lint, log errors

* Add remaining translated metrics

* fix typo

* update tests

* lint

* add error log

* mage check

* mage notice

* review comment

* Stop integration tests from expecting a prometheus monitoring component

* Move internal telemetry factory to same module as telemetry receiver

* mage check

* mage notice

* test fixes

* adjust default poll interval

* add allowable monitoring errors

* Adjust expected component count

* Only include the otel monitoring receiver if the otel monitoring output is present

* remove checks for prometheus-related log warnings

* Replace empty test error with verbose failure details

* remove duplicated lines

* fix error log order

---------

Co-authored-by: Pierre HILBERT <pierre.hilbert@elastic.co>
VihasMakwana added a commit that referenced this pull request Dec 29, 2025
…1982)

* internal telemetry drafting

* monitoring config plumbing

* Cleaning up / commenting for review

* removing no-longer-used files

* cleanup, testing

* cleanup, pass through remaining config

* remove debug placeholders

* clean up receiver code layout

* remove debug startup

* Make check-ci

* Fix formatting in go.mod for pdata dependency

* review comments

* lint

* fix config field name

* Move receiver package

* Use more translate helpers

* white space

* add changelog

* Disable the previous (prometheus) internal telemetry ingestion component

* Lint, log errors

* Add remaining translated metrics

* fix typo

* update tests

* lint

* add error log

* mage check

* mage notice

* review comment

* Stop integration tests from expecting a prometheus monitoring component

* Move internal telemetry factory to same module as telemetry receiver

* mage check

* mage notice

* test fixes

* adjust default poll interval

* add allowable monitoring errors

* Adjust expected component count

* Only include the otel monitoring receiver if the otel monitoring output is present

* remove checks for prometheus-related log warnings

* Replace empty test error with verbose failure details

* remove duplicated lines

* fix error log order

---------


(cherry picked from commit 7c84fc3)

Co-authored-by: Fae Charlton <fae.charlton@elastic.co>
Co-authored-by: Pierre HILBERT <pierre.hilbert@elastic.co>
Co-authored-by: Khushi Jain <khushi.jain@elastic.co>
Co-authored-by: Michel Laterman <82832767+michel-laterman@users.noreply.github.com>
Co-authored-by: Vihas Makwana <121151420+VihasMakwana@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-9.3 Automated backport to the 9.3 branch enhancement New feature or request Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team

6 participants