Skip to content

feat: refactors kafka/client to improve metric collection #17308

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
Apr 21, 2025

Conversation

aarogoss
Copy link
Contributor

@aarogoss aarogoss commented Apr 17, 2025

What this PR does / why we need it:
The pkg/kafka code base has some inconsistencies in how prometheus metrics are identified, described, and initialized. Reader and writer clients initialize kprom.Metrics structs in different ways, even though they ultimately build the same basic struct.

This change refactors the pkg/kafka module to refine how we're initializing Kafka clients and configuring metrics. This set of changes removes functionality for calling services to inject kprom.Metrics given all services are relying on the base set, and requires services which use the clients to specify a component string. This string is included as a label in the metrics.

In doing so, this set of changes improves the metric names so they represent the following:

  • metrics collected by the Kafka client (loki_kafka_client_)
  • metrics collected by a reader/writer (label: "service_role: reader|writer")
  • metrics collected by a particular component (label: "component: ...")

This set of changes also adds a few metrics to improve our coverage, and alters one metric description to help better explain what we're collecting.

In the future, we can easily extend this to allow services to inject customized kprom.Metrics structs if desired.

Special notes for your reviewer:

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • Title matches the required conventional commits format, see here
    • Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR
Updates the `pkg/kafka/*.go` code base to refine how we're initializing
Kafka clients and configuring metrics.  This set of changes removes
functionality for calling services to inject `kprom.Metrics` given all
services are relying on the base set.

In doing so, this set of changes improves the metric names so they
represent the following:
* The metrics are collected by the Kafka client (`loki_kafka_client_`)
* The metrics are collected by a reader/writer (label: "service_role: reader|writer")
* The metrics are collected by a particular component (label:
  "component: ...")

This set of changes also adds a few metrics to improve our coverage, and
it alters one metric description to help better explain what we're
collecting.

In the future, we can easily extend this to allow services to inject
customized `kprom.Metrics` structs if needed.
@aarogoss aarogoss self-assigned this Apr 17, 2025
@aarogoss aarogoss added the operational Used internally by Grafana Labs to help organize, added to tasks we put in our on-call workqueue. label Apr 17, 2025
aarogoss and others added 14 commits April 17, 2025 15:01
I just realized the tests in kafka/ use the standard Prometheus
registry.  Changing from NewPedanticRegistry() to NewRegistry() for
consistency's sake
Fixed variable name and removed unneeded boolean
Minor change to a few metrics.  Rather than naming them
'ingest_storage_reader', this changes them to 'partition_reader' to make
it clear what they are acting on.  This way we don't tie the metric name
to ingestion or storage.
The distributor code was missing additional identification for the
service and its role.  This commit adds an additional function to the
kafka writer to wrap a prometheus registerer with the appropriate
labels.  These labels are applied to the code the distributor uses to
create a new producer client.

Also added two constants for the `service_role` label.
To make things consistent, updating the partition reader metrics in
kafka so they have the same prefix as other kafka client metrics.  This
helps us quickly identify where these metrics are coming from.
enableKafkaHistogramMetrics(enableKafkaHistograms),
)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I relocated this and enableKafkaHistogramMetrics to the writer_client.go file to be consistent w/the reader_client.go's use of functions (e.g. commonKafkaClientOptions) in the writer_client.go file.

@aarogoss aarogoss marked this pull request as ready for review April 18, 2025 18:10
@aarogoss aarogoss requested a review from a team as a code owner April 18, 2025 18:10
aarogoss and others added 9 commits April 18, 2025 12:21
Minor change: don't use pedantic registry -- just use NewRegistry() in
client_test.go
Thinking about this over the weekend, I don't know that this label
provides us any real additional insight into the metrics/services which
are using the Kafka client.

Removing it -- it can always be reintroduced if we need additional
insight.
@aarogoss aarogoss merged commit ab65ac5 into main Apr 21, 2025
61 checks passed
@aarogoss aarogoss deleted the agoss/kafka-metric-adjustments branch April 21, 2025 21:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
operational Used internally by Grafana Labs to help organize, added to tasks we put in our on-call workqueue. size/L
2 participants