Skip to content

chore(engine): adds AST conversion and result builders for metric queries #18166

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

ashwanthgoli
Copy link
Contributor

@ashwanthgoli ashwanthgoli commented Jun 19, 2025

What this PR does / why we need it:

  • Convert AST to logical plan for metric queries, mainly sum by (group) count_over_time{...[$range]}
  • Result builder to convert arrow records to promql.Vector for instant metric queries
  • Update Predicates and catalog lookup to also consider [$range]

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • Title matches the required conventional commits format, see here
    • Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR
@ashwanthgoli ashwanthgoli requested a review from a team as a code owner June 19, 2025 13:42

func buildPlanForSampleQuery(e syntax.SampleExpr, params logql.Params) (*Builder, error) {
if params.Step() > 0 {
return nil, fmt.Errorf("only instant metric queries are supported")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this return errUnimplemented? Afaik, we check in the http handler for that error to decide whether we need to fall back to the old engine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added in 8997274

i think all error branches need to either log more details about the specific operation that is not supported or return error with more context. I can take that in a follow-up

@@ -27,7 +27,7 @@ var (
// providing this information (e.g. pg_catalog, ...) whereas in Loki there
// is the Metastore.
type Catalog interface {
ResolveDataObj(Expression) ([]DataObjLocation, [][]int64, error)
ResolveDataObj(Expression, time.Duration) ([]DataObjLocation, [][]int64, error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a weird change to the Catalog API. Either we should pass no time information, or the full time information - otherwise it will get confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree :)

  1. we could add $range to the catalog state when scanning the AST. But as you mentioned in the other comment each branch of a binary can have different ranges, so this does not feel like the right place.
  2. I like the idea of passing both start and end time to the ResolveDataObj API.

2nd approach would require either the physical planner state or MakeTable logical plan node to hold start and end time. Putting this information in planner state sounds like a good start to me as this will get used in both MakeTable and TimeRange predicate creation. wdyt?


"github.com/grafana/loki/v3/pkg/engine/planner/logical"
)

// Internal state of the planner
type state struct {
direction SortOrder
direction SortOrder
rangeInterval time.Duration // for queries with [$range]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we deal with binop queries that have different ranges on their left and right branches? I guess it would work out of the box, because the state will refer to the most recent visited branch.

Copy link
Contributor Author

@ashwanthgoli ashwanthgoli Jun 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking about it now, this approach might not work with the current instruction traversing (it looks like BFS?)
if it were DFS, the $range from an aggregation would get correctly picked by its child MakeTable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should work i think as we are doing DFS. this could be error prone though, should i add a TODO with a comment so we can revisit when we add more operations that make use of the planner state?

Comment on lines +177 to +182
case types.ColumnNameGeneratedValue:
if col.IsNull(i) {
return promql.Sample{}, false
}

sample.F = float64(col.(*array.Int64).Value(i))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we enforce that a generated column can only be of type float when creating the column?


func (b *vectorResultBuilder) collectRow(rec arrow.Record, i int) (promql.Sample, bool) {
var sample promql.Sample
lbls := labels.NewBuilder(labels.EmptyLabels())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the builder be re-used?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 participants