feat(logical planning): Planner playground #16403

owen-d · 2025-02-21T20:32:46Z

Logical Query Planner Structure

Messing around with logical planning; opening a draft PR to showcase & spark conversation. Much inspired by https://howqueryengineswork.com & datafusion

edit: I've refactored this to abandon the interface-heavy first implementation to prefer signal types and concrete implementations, similar to what's used in the dataobj columnar format.

Core (plan.go, expr.go)

Plan - Schema provider + child traversal
Expr - Column/computation abstraction

Plan Types

MakeTable (table.go) - Leaf node, reads from DataSource
Filter (filter.go) - Row filtering via predicates
Project (project.go) - Column selection/transformation
Aggregate (aggregate_plan.go) - Grouping + aggregations
BinOp (binop.go) - Binary operations between plans
Limit (limit.go) - limits (offset and number of rows to return)
Sort (sort_plan.go) - Sorting results

Expression Types

ColumnExpr (column.go) - Column references
LiteralExpr (literal.go) - Constants (int64, float64, string)
AggregateExpr (aggregate_expr.go) - sum, avg, min, max, count
Binary comparisons in binop.go (>, <, ==, etc)
SortExpr - sorting and related options (ordering, null handling) used by sort planning.

Query Building

DataFrame (dataframe.go) - Ergonomic API for plan construction

SSA (Static Single Assignment) mappings

Easily construct SSA forms for easier from the initial logical plan trees.

Debug Formatting (format/)

TreeFormatter - Tree-style output (├── child)
SSA formatting

All components use schema propagation

Example

The following construction yields the following plan.

	// Calculate the sum of sales per region for the year 2020
	ds := &testDataSource{
		name: "orders",
		schema: schema.Schema{
			Columns: []schema.ColumnSchema{
				{Name: "region", Type: datasetmd.VALUE_TYPE_STRING},
				{Name: "sales", Type: datasetmd.VALUE_TYPE_UINT64},
				{Name: "year", Type: datasetmd.VALUE_TYPE_UINT64},
			},
		},
	}

	df := NewDataFrame(
		NewScan(ds, nil),
	).Filter(
		Eq("year_2020", Col("year"), LitI64(2020)),
	).Project(
		[]Expr{
			Col("region"),
			Col("sales"),
			Col("year"),
		},
	).Aggregate(
		[]Expr{Col("region")},
		[]AggregateExpr{
			Sum("total_sales", Col("sales")),
		},
	)

Aggregate groupings=(region) aggregates=(total_sales)
├── GroupExprs
│   └── Column #region
├── AggregateExprs
│   └── AggregateExpr op=sum
│       └── Column #sales
└── Projection region=VALUE_TYPE_STRING sales=VALUE_TYPE_UINT64 year=VALUE_TYPE_UINT64
    ├── Column #region
    ├── Column #sales
    ├── Column #year
    └── Filter expr=year_2020
        ├── BooleanCmpExpr op=(==) name=year_2020
        │   ├── Column #year
        │   └── LiteralI64 value=2020
        └── Scan data_source=orders

edit: I've also added SSA mappings. The same plan generates the following:

%1 = MakeTable [name=orders]
%2 = ColumnRef [name=year, type=VALUE_TYPE_UINT64]
%3 = Literal [val=2020, type=VALUE_TYPE_INT64]
%4 = BinaryOp [op=(==), name=year_2020, left=%2, right=%3]
%5 = Filter [name=year_2020, predicate=%4, plan=%1]
%6 = ColumnRef [name=region, type=VALUE_TYPE_STRING]
%7 = ColumnRef [name=sales, type=VALUE_TYPE_UINT64]
%8 = ColumnRef [name=year, type=VALUE_TYPE_UINT64]
%9 = Project [region=%6, sales=%7, year=%8]
%10 = ColumnRef [name=region, type=VALUE_TYPE_STRING]
%11 = ColumnRef [name=sales, type=VALUE_TYPE_UINT64]
%12 = AggregationExpr [name=total_sales, op=sum]
%13 = AggregatePlan [aggregations=[%12], groupings=[%10]]

chaudum

This is looking good to me!
However, I think I would not place the code inside pkg/dataobj but rather in pkg/engine.

chaudum · 2025-02-27T08:15:07Z

pkg/dataobj/planner/logical/format_tree_test.go

+├── GroupExprs
+│   └── Column #region
+├── AggregateExprs
+│   └── AggregateExpr op=sum


nit: I would have expected the the AggregateExpr would also be printed with its name, like BinaryOp:

Suggested change

│ └── AggregateExpr op=sum

│ └── AggregateExpr op=sum name=total_sales

owen-d · 2025-02-27T15:36:42Z

pkg/dataobj/planner/logical/ifc.go

+2. Type Safety - Compile-time guarantees through interfaces and type assertions
+3. API Surface Reduction - Exposing minimal interfaces to external consumers
+
+The pattern works as follows:


Not sure about this level of indirection. WDYT

Don't have a strong opinion. But I think it's not strictly necessary to have a Type() function, since you can use the type switch statement instead.

initial SSA mapping from logical plans

This reverts commit 16cb270.

…ataobj internals)

chaudum · 2025-03-05T14:02:46Z

pkg/dataobj/planner/logical/sort_plan.go

+//	sort := newSort(inputPlan, []SortExpr{
+//		NewSortExpr("sort_by_age", Col("age"), true, false),
+//		NewSortExpr("sort_by_name", Col("name"), false, true),
+//	})
+func newSort(input Plan, expr SortExpr) *Sort {


newSort only accepts a single sort expression as argument, not a slice as documented.

chaudum · 2025-03-05T14:07:17Z

pkg/dataobj/planner/logical/plan.go

+// Plan is the core plan type in the logical package.
+// It wraps a concrete plan type (MakeTable, Filter, etc.)
+// and provides methods to safely access the underlying value.
+// This approach replaces the previous interface-based design to reduce
+// indirection and improve code clarity.
+type Plan struct {
+	ty  PlanType // The type of plan
+	val any      // The concrete plan value
+}


I am not fully convinced with this approach, but also not against it.

While it removes the complexity of the interface indirections, it adds a decent amount of boilderplate code (see Schema() function etc). Maybe changing the type of val from any to something more specific, like a minimalistic interface

type interface PlanImplementation { isPlanImplementation() // make sure it's only implemented within this package Schema() Schema }

However, I think the approach shines when you have multiple implementations of the same type.

chaudum · 2025-03-05T14:09:43Z

pkg/dataobj/planner/logical/plan.go

+func (p Plan) Type() PlanType {
+	return p.ty
+}


This makes all the Type() PlanType functions on the concrete plan "implementations" obsolete.

chaudum · 2025-03-05T14:58:57Z

pkg/dataobj/planner/logical/format_tree.go

+
+// writePlan writes a plan node to the tree formatter.
+// It dispatches to the appropriate write method based on the plan type.
+func (t *treeFormatter) writePlan(ast Plan) {


Eventually, the treeFormatter should be tree type agnostic, since we would use it for the physical plan as well.

A PlanVisitor could traverse the AST to create the treeFormatter nodes.

I was about to mentioned the benefits of a visitor.

chaudum · 2025-03-05T20:12:45Z

pkg/dataobj/planner/logical/binop.go

+type BinOpType int
+
+const (
+	BinOpTypeInvalid BinOpType = iota // Invalid or uninitialized binary operation


nit: Since you use an int as BinOpType, you could start the iota value at -1 (for "invalid type"). However, what do we need the BinOpTypeInvalid type for?

You have this for other enums as well.

Ah, I understand now: The empty value of BinOpExpr{} has the BinOpTypeInvalid type.

chaudum · 2025-03-05T20:22:48Z

pkg/dataobj/planner/logical/expr.go

+	case ExprTypeSort:
+		return "Sort"
+	default:
+		return "Unknown"


nit

Suggested change

return "Unknown"

return UNKNOWN

chaudum · 2025-03-05T20:25:41Z

pkg/dataobj/planner/logical/filter.go

+// Type implements the Plan interface
+func (f *Filter) Type() PlanType {
+	return PlanTypeFilter
+}


Type() is now implemented on the Plan struct

chaudum · 2025-03-05T20:28:10Z

pkg/dataobj/planner/logical/table.go

+// TableSchema returns the schema of the table.
+// This is a convenience method that returns the same value as Schema().
+func (t *MakeTable) TableSchema() schema.Schema {
+	return t.schema
+}


This function is unused. Also, it's not a convenience method, but a duplicate ;-)

chaudum · 2025-03-05T20:30:42Z

pkg/dataobj/planner/logical/aggregate_expr.go

+// AggregateOp represents the type of aggregation operation to perform.
+// It is a string-based enum that identifies different aggregation functions
+// that can be applied to expressions.
+type AggregateOp string
+
+const (
+	// AggregateOpSum represents a sum aggregation
+	AggregateOpSum AggregateOp = "sum"
+	// AggregateOpAvg represents an average aggregation
+	AggregateOpAvg AggregateOp = "avg"
+	// AggregateOpMin represents a minimum value aggregation
+	AggregateOpMin AggregateOp = "min"
+	// AggregateOpMax represents a maximum value aggregation
+	AggregateOpMax AggregateOp = "max"
+	// AggregateOpCount represents a count aggregation
+	AggregateOpCount AggregateOp = "count"
+)


I think we should use the same enum pattern (int type + iota + String() function) as in the rest of the PR.

rfratto

I think this is a good starting point 🎉 I'm approving and merging immediately so we can keep momentum on the other parts of the query engine while you're on PTO. I have a few high-level thoughts for future improvements:

I wonder if there's more we can do to have our types more directly represent an SSA logical plan. At the moment, the package API itself is forming a hierarchical tree, with the SSA code mainly focusing on representing that tree as a string.

If you compare to something like ssa.BasicBlock from Go, you can see how their package API more naturally fits the pattern of a sequence of instructions instead of a hierarchy.
I like that we're learning lessons from other query engines (like DataFusion) and trying to apply it to our own, but I wonder if we'll actually use the builder pattern defined here in practice.

Specifically, the builder pattern is used to construct a plan top-down, with each chained function call being the next operation to do after all previous chained function calls.

However, if our logical plans were built with depth-first traversal of the AST, it would be constructing nodes bottom-up, which might reduce how often we use the builder pattern in our planning code.
Hiding fields behind getters may be increasing the size of the mental model for the package API. Exposing struct fields publicly like we do for the AST may help cut down the amount of code and files needed. As a bonus, this may also give the caller more choices for how to construct plans without needing to use the builder pattern.

rfratto · 2025-03-06T19:13:09Z

Merging as mentioned above, but @chaudum's comments are also worth addressing once you're back from PTO (and I agree that pkg/dataobj is probably not the best home for this storage-agnostic engine)

chaudum · 2025-03-07T07:05:04Z

Merging as mentioned above, but @chaudum's comments are also worth addressing once you're back from PTO (and I agree that pkg/dataobj is probably not the best home for this storage-agnostic engine)

The reason for the package to be in pkg/dataobj is the import of the internal package for datatypes.

rfratto · 2025-03-07T17:53:46Z

The reason for the package to be in pkg/dataobj is the import of the internal package for datatypes.

Thanks, I noticed that, but I don't think it needs to be that way. It would be reasonable to have duplicate definitions of data types specifically for the query engine. Even if datasetmd wasn't internal, I'd still recommend doing that to minimize the intersections where the query engine uses dataobj code ("a little copying is better than a little dependency"). This typically helps with code organization and compartmentalizing the mental model.

pull-request-size bot added the size/XXL label Feb 21, 2025

owen-d changed the title ~~Planner playground~~ Feb 21, 2025

chaudum reviewed Feb 27, 2025

View reviewed changes

owen-d commented Feb 27, 2025

View reviewed changes

owen-d added 16 commits March 4, 2025 08:32

playing with a planner description

033b06c

autogen comments

dc76102

more progress, inc scan & project

2550bb5

aggregate plans & dataframe init

60a3b49

format pkg

e0c05a6

formatting

7b03abb

typo

e5b5a06

lint

979c22a

remove datasource, s/scan/maketable/

18a77e2

[wip] SSA progress

9620b6e

plan interfaces vs impls; reimplements tree formatter

3e1e1df

[wip] more ssa hacking

c761b93

initial SSA mapping from logical plans

rename

1196ed4

doc work

3e91374

lint

80959b2

nit: s/type/category/

16cb270

owen-d force-pushed the planner-playground branch from a276ec2 to 16cb270 Compare March 4, 2025 16:40

owen-d added 4 commits March 4, 2025 08:43

Revert " nit: s/type/category/"

8318bef

This reverts commit 16cb270.

abandons interface approach for variant signal approach (similar to d…

ad48cf9

…ataobj internals)

doc

c22d7b9

adds limit plan

0739f41

owen-d marked this pull request as ready for review March 4, 2025 19:21

owen-d requested a review from a team as a code owner March 4, 2025 19:21

sort impl

ac15f25

chaudum reviewed Mar 5, 2025

View reviewed changes

rfratto approved these changes Mar 6, 2025

View reviewed changes

rfratto merged commit 89f0ed7 into grafana:main Mar 6, 2025
58 checks passed

JoaoBraveCoding pushed a commit to JoaoBraveCoding/loki that referenced this pull request Mar 12, 2025

feat(logical planning): Planner playground (grafana#16403)

db8de97

This was referenced Mar 20, 2025

chore(k246): release 3.5.0 #16856

Open

chore(k247): release 3.5.0 #16872

Open

This was referenced Mar 31, 2025

chore(k248): release 3.5.0 #16954

Merged

chore(k249): release 3.5.0 #17046

Open

This was referenced Apr 14, 2025

chore(k250): release 3.5.0 #17211

Open

chore(k251): release 3.5.0 #17359

Open

loki-gh-app bot mentioned this pull request Apr 22, 2025

chore(k248): release 3.5.0 (backport main) #17396

Open

This was referenced Apr 23, 2025

logcli 3.5.0 Homebrew/homebrew-core#221118

Merged

loki 3.5.0 Homebrew/homebrew-core#221119

Merged

promtail 3.5.0 Homebrew/homebrew-core#221123

Merged

This was referenced May 5, 2025

chore(k253): release 3.5.0 #17570

Open

chore(k254): release 3.5.0 #17658

Open

This was referenced May 19, 2025

chore(k255): release 3.5.0 #17732

Open

chore(release-3.5.x): release 3.5.2 #17801

Open

This was referenced May 26, 2025

chore(k256): release 3.5.0 #17828

Open

chore(k257): release 3.5.0 #17926

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(logical planning): Planner playground #16403

feat(logical planning): Planner playground #16403

Uh oh!

owen-d commented Feb 21, 2025 •

edited

Loading

chaudum left a comment

chaudum Feb 27, 2025

owen-d Feb 27, 2025

chaudum Feb 28, 2025

chaudum Mar 5, 2025

chaudum Mar 5, 2025

chaudum Mar 5, 2025

chaudum Mar 5, 2025

jeschkies Mar 6, 2025

chaudum Mar 5, 2025

chaudum Mar 6, 2025

chaudum Mar 5, 2025

chaudum Mar 5, 2025

chaudum Mar 5, 2025

chaudum Mar 5, 2025

rfratto left a comment •

edited

Loading

rfratto commented Mar 6, 2025 •

edited

Loading

Uh oh!

chaudum commented Mar 7, 2025

rfratto commented Mar 7, 2025 •

edited

Loading

	│ └── AggregateExpr op=sum
	│ └── AggregateExpr op=sum name=total_sales

feat(logical planning): Planner playground #16403

feat(logical planning): Planner playground #16403

Uh oh!

Conversation

owen-d commented Feb 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Logical Query Planner Structure

Core (plan.go, expr.go)

Plan Types

Expression Types

Query Building

SSA (Static Single Assignment) mappings

Debug Formatting (format/)

Example

chaudum left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rfratto left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

rfratto commented Mar 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chaudum commented Mar 7, 2025

rfratto commented Mar 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

owen-d commented Feb 21, 2025 •

edited

Loading

rfratto left a comment •

edited

Loading

rfratto commented Mar 6, 2025 •

edited

Loading

rfratto commented Mar 7, 2025 •

edited

Loading