Add support for topk and bottomk functions for TraceQL metrics #4646

electron0zero · 2025-01-31T13:31:12Z

What this PR does:

Add support for topk and bottomk functions on top of TraceQL Metrics.

you can now write TraceQL Queries like {} | rate() by (span.client_ip) | topk(10) or {} | rate() by (span.client_ip) | bottomk(10), and only get top or bottomk series from the underlaying TraceQL Metrics Queries.

topk and bottomk behaves like Prometheus topk and bottomk functions.

checkout this demo video to see it in action:

topk_bottomk_demo_final.mov

Topk and Bottomk are implemented as first set of second stage functions (functions that operate on series generated by TraceQL Metrics). code in this PR lays the ground for second stage functions and opens the door for other second stage functions like min, max and avg on top of TraceQL Metrics.

Which issue(s) this PR fixes:
Fixes #4217
Fixes https://github.com/grafana/tempo-squad/issues/601

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

mdisibio

Looking good so far. Can review again after the process functions are implemented. To make sure we have the same understanding: I believe the result of our discussion was that they need to work like Prometheus and operate on each timestamp interval independently. For example topk(10) can return 10 different series at each interval. In addition to the design, was the performance concerns. This is so that we can push down some of the top/bottom filtering to the job/block level. Each job can return it's top 10, and the frontend then chooses the final top 10. If it is designed the other way, the 10 series with the highest values anywhere in the query, then nothing could be pushed down. All data would have to gathered up in the frontend.

pkg/traceql/ast_metrics.go

pkg/traceql/engine.go

pkg/traceql/engine_metrics.go

knylander-grafana · 2025-03-13T19:14:27Z

~~Should we add doc for this?~~ I added docs for this

javiermolinar · 2025-03-18T09:22:31Z

pkg/traceql/engine_metrics.go

@@ -1484,3 +1505,119 @@ func FloatizeAttribute(s Span, a Attribute) (float64, StaticType) {
 	}
 	return f, v.Type
 }
+
+// processTopK implements TopKBottomK topk method
+func processTopK(input SeriesSet, limit int) SeriesSet {


Sadly this is breaks avg_over_time.

The way this function works is by producing two series per matching. One with the current average and another one with the count. That way we can calculate the incremental average in the next aggregation layers:

{\"span.foo\"=\"baz\"} {__meta_type=\"__count\", \"span.foo\"=\"baz\"}

Therefore this is mixing count series with average ones, which is not correct.

Another concern is if we should apply topk at the first layer. We could drop series that we need in the next aggregation layer. Maybe this should be done just in the last aggregation one?

In this case it means that avg_over_time() is not shardable and must be computed at the query-frontend. Whatever follows an unshardable element is also unshardable. So it means that { } | avg_over_time(...) | topk(...), the topk must be done only at the query-frontend.

Some queries are shardable, for example min/max_over_time, the top/bottomk can also be pushed down to the earliest level.

Starting with always making the secondStage unshardable and computed only in the frontend sounds good. That is more straightforward and we can prioritize correctness first. Adding the shardable/unshardable aspect to the AST will be more involved.

had a chat with @javiermolinar about this and updated the code to handle this.

I am only handling topk and bottomk in frontend for now, and we can handle pushdown later when first stage is shardable.

pkg/traceql/engine_metrics_test.go

pkg/traceql/engine_metrics.go

pkg/traceql/ast_metrics.go

pkg/traceql/engine_metrics.go

docs/sources/tempo/traceql/metrics-queries/_index.md

docs/sources/tempo/traceql/metrics-queries/functions.md

electron0zero · 2025-04-08T12:21:21Z

pkg/traceql/ast_metrics.go

+	result() SeriesSet
+}
+
+type getExemplar func(Span) (float64, uint64)


most of the code here is moved from ast.go, to cleanup the ast.go file and extract out metrics related AST into it's own file.

electron0zero · 2025-04-08T12:21:59Z

pkg/traceql/ast_metrics.go

+
+var _ firstStageElement = (*MetricsAggregate)(nil)
+
+// secondStageElement represents operations that are performed


new code for second stage is from here.

docs/sources/tempo/traceql/metrics-queries/functions.md

knylander-grafana

Thank you for updating docs. :) I have two minor suggestions to correct a missing word and remove an extra space.

Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com>

…leanup

modules/querier/querier_query_range.go

alexbikfalvi · 2025-04-11T08:49:46Z

@electron0zero @knylander-grafana I think the documentation didn't publish to next:
https://github.com/grafana/tempo/actions/runs/14386797134/job/40344098510

Is there a way to retrigger?

electron0zero · 2025-04-11T09:36:36Z

@alexbikfalvi next run that triggered on another PR took care of this so it's published now: https://grafana.com/docs/tempo/next/traceql/metrics-queries/functions/#topk-and-bottomk-functions

so we don't need to re-trigger it now.

tho we should look into this failure, cc @knylander-grafana @jdbaldry

noticed this duplicate code while working on #4646, so sending this standalone PR to cleanup this code.

09jvilla · 2025-05-05T21:26:29Z

@electron0zero nice work on this. I checked out the demo video since you mentioned it in your presentation and enjoyed watching it :) My only question from that was why we were seeing more than 5 series for a 'topk(5)' query but your docs answered that question exactly when I checked them!

electron0zero changed the title ~~try 1 to get the grammer working~~ Jan 31, 2025

electron0zero force-pushed the topk_bottomk branch from b935091 to 8c6bcf5 Compare February 11, 2025 15:01

electron0zero force-pushed the topk_bottomk branch 5 times, most recently from bff2f87 to ac57f89 Compare February 26, 2025 10:25

xoan-grafana assigned electron0zero Feb 26, 2025

electron0zero force-pushed the topk_bottomk branch 2 times, most recently from eead50b to 3c21a76 Compare February 26, 2025 20:25

mdisibio reviewed Mar 10, 2025

View reviewed changes

pkg/traceql/ast_metrics.go Outdated Show resolved Hide resolved

pkg/traceql/engine.go Show resolved Hide resolved

pkg/traceql/engine_metrics.go Show resolved Hide resolved

pkg/traceql/engine_metrics.go Outdated Show resolved Hide resolved

electron0zero force-pushed the topk_bottomk branch 3 times, most recently from 2a1dfdf to 7c857c2 Compare March 17, 2025 17:19

electron0zero marked this pull request as ready for review March 17, 2025 20:35

electron0zero requested review from joe-elliott, mapno, yvrhdn, zalegrala, ie-pham, stoewer, javiermolinar and carles-grafana as code owners March 17, 2025 20:35

electron0zero requested a review from mdisibio March 17, 2025 20:35

electron0zero requested a review from knylander-grafana as a code owner March 17, 2025 21:00

javiermolinar reviewed Mar 18, 2025

View reviewed changes

knylander-grafana reviewed Mar 18, 2025

View reviewed changes

docs/sources/tempo/traceql/metrics-queries/_index.md Outdated Show resolved Hide resolved

knylander-grafana reviewed Mar 20, 2025

View reviewed changes

docs/sources/tempo/traceql/metrics-queries/functions.md Outdated Show resolved Hide resolved

knylander-grafana added the type/docs Improvements or additions to documentation label Mar 20, 2025

electron0zero requested review from knylander-grafana, mdisibio and javiermolinar April 7, 2025 20:43

address review comments

57adcec

electron0zero commented Apr 8, 2025

View reviewed changes

knylander-grafana reviewed Apr 8, 2025

View reviewed changes

docs/sources/tempo/traceql/metrics-queries/functions.md Outdated Show resolved Hide resolved

knylander-grafana reviewed Apr 8, 2025

View reviewed changes

docs/sources/tempo/traceql/metrics-queries/functions.md Outdated Show resolved Hide resolved

knylander-grafana approved these changes Apr 8, 2025

View reviewed changes

electron0zero and others added 4 commits April 9, 2025 16:53

Apply docs suggestions from code review

de9023c

Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com>

TODO to disallow compare with second stage

3843771

skip NaN exampler and only run second stage on AggregateModeFinal + c…

eb5d075

…leanup

handle and fix flaky test for ties

a95deff

electron0zero force-pushed the topk_bottomk branch from cf63269 to a95deff Compare April 9, 2025 17:58

electron0zero mentioned this pull request Apr 9, 2025

metrics: cleanup duplicate code in QueryRange endpoint #4978

Merged

3 tasks

add exampler fix in queryRangeTraceQLToProto

678c153

electron0zero commented Apr 9, 2025

View reviewed changes

modules/querier/querier_query_range.go Show resolved Hide resolved

electron0zero mentioned this pull request Apr 9, 2025

[TraceQL] Add topk/bottomk to TraceQL Metrics #4217

Closed

disallow compare() with second stage functions

9210e6b

electron0zero force-pushed the topk_bottomk branch from b4169be to 6303d81 Compare April 10, 2025 15:06

add bounds check where we skip exemplars with NaN value

abe64f3

electron0zero force-pushed the topk_bottomk branch from 6303d81 to abe64f3 Compare April 10, 2025 15:14

mdisibio approved these changes Apr 10, 2025

View reviewed changes

electron0zero merged commit ef2cee7 into grafana:main Apr 10, 2025
15 checks passed

electron0zero deleted the topk_bottomk branch April 10, 2025 17:39

electron0zero added a commit that referenced this pull request Apr 11, 2025

metrics: cleanup duplicate code in QueryRange endpoint (#4978)

9b6b6ed

noticed this duplicate code while working on #4646, so sending this standalone PR to cleanup this code.

electron0zero mentioned this pull request Apr 14, 2025

Tempo: Add support for second stage TraceQL Metrics functions grafana/grafana#103991

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for topk and bottomk functions for TraceQL metrics #4646

Add support for topk and bottomk functions for TraceQL metrics #4646

Uh oh!

electron0zero commented Jan 31, 2025 •

edited

Loading

mdisibio left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

knylander-grafana commented Mar 13, 2025 •

edited

Loading

javiermolinar Mar 18, 2025 •

edited

Loading

mdisibio Mar 25, 2025

electron0zero Apr 7, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

electron0zero Apr 8, 2025

electron0zero Apr 8, 2025

Uh oh!

Uh oh!

knylander-grafana left a comment

Uh oh!

Uh oh!

alexbikfalvi commented Apr 11, 2025

electron0zero commented Apr 11, 2025

09jvilla commented May 5, 2025 •

edited

Loading


		var _ firstStageElement = (*MetricsAggregate)(nil)

		// secondStageElement represents operations that are performed

Add support for topk and bottomk functions for TraceQL metrics #4646

Add support for topk and bottomk functions for TraceQL metrics #4646

Uh oh!

Conversation

electron0zero commented Jan 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

mdisibio left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

knylander-grafana commented Mar 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

javiermolinar Mar 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

mdisibio Mar 25, 2025

Choose a reason for hiding this comment

electron0zero Apr 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

electron0zero Apr 8, 2025

Choose a reason for hiding this comment

electron0zero Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

knylander-grafana left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alexbikfalvi commented Apr 11, 2025

electron0zero commented Apr 11, 2025

09jvilla commented May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

electron0zero commented Jan 31, 2025 •

edited

Loading

knylander-grafana commented Mar 13, 2025 •

edited

Loading

javiermolinar Mar 18, 2025 •

edited

Loading

09jvilla commented May 5, 2025 •

edited

Loading