Skip to content

ESQL: Pull OrderBy followed by InlineJoin on top of it#137648

Merged
bpintea merged 10 commits intoelastic:mainfrom
bpintea:enh/sort_before_ijoin
Dec 9, 2025
Merged

ESQL: Pull OrderBy followed by InlineJoin on top of it#137648
bpintea merged 10 commits intoelastic:mainfrom
bpintea:enh/sort_before_ijoin

Conversation

@bpintea
Copy link
Contributor

@bpintea bpintea commented Nov 5, 2025

Add optimisation rule to pull OrderBy above InlineJoin. This is required since otherwise the OrderBy won't be moved out of the left hand-side of the join, ending up as a stand-alone node that can't be turned into an executable -- it'll just be rejected by the verification as an "unbounded sort". Since the InlineJoin is sort agnostic, the OrderBy can be moved upwards past it, ending up as a top TopN.

This doesn't entirely solve the issue of unbounded sorts, however: some nodes, such as MV_EXPAND or LOOKUP JOIN break the inbound sort, so pulling a SORT on top of them isn't possible. In this cases the query will still fail. This is not INLINE JOIN specific, though.

This is a revival of #132417, but with a new approach on how to deal with the attributes used by SORT, but dropped or overshadowed by the time INLINE JOIN is reached. In this case, we'll add temporary internal attributes kept until the INLINE JOIN and dropped afterwards.

Related #124715 (#113727)
Related #124721 (#133120)

Add optimisation rule to pull OrderBy above InlineJoin. This is required
since otherwise the SORT won't be moved out of the left
hand-side of the join, ending up as a stand-alone node that can't be turned
into an executable -- it'll be rejected by verification already as an
"unbounded sort". Since the InlineJoin is sort agnostic, the OrderBy can
be moved upwards past it, ending up as a top TopN.
@elasticsearchmachine
Copy link
Collaborator

Hi @bpintea, I've created a changelog YAML for you.

@bpintea bpintea marked this pull request as ready for review November 6, 2025 20:23
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Nov 6, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the tests, and I always like more tests :-).

The fact that this PR pulls up OrderBy passed InlineJoin without "using" or be made aware by the existence of SortAgnostic (as opposed to the original PR which considered SortAgnostic for something related to OrderBy) shows that the presence of SortAgnostic is not well enforced in code (for the developer's visibility and awareness), something that (reading between lines) has been made clear here. In other words, SortAgnostic defies half of its purpose - to be a tool developers are aware of (or be made aware of reliably) when they develop new commands but also (my own addition here) when developers (new or seasoned ones) touch any planning code (not only related to the new command).

The other half of SortAgnostics purpose is well defined: TRYING to reveal issues with using a command that doesn't fully considers all the subtleties and implication of a sort before that command that might silently give wrong results. But, this holds true considering the following:

  • the new command PR has good and, as much as possible, enough tests that reveal what SortAgnostic is trying to guard
  • the new command PR is being reviewed by people who are aware of sort tweaks and SortAgnostic presence and can point out that sort can be something to look into further, something that is not always true these days

IMHO, just like my previous review to the original PR, is that this PR brings a much needed improvement to the intricacies of inline stats + sort usage: an irrelevant sort should be pulled up passed inline stats because inline stats right-hand side should use the "original" relevant set of data (and an unbounded sorted set of data is irrelevant) and should be considered for adoption.

I still need to dig through the surgical take from rewriteMidProjections method and see why it's so surgical :-), but this is for the second round of review.

* <p>
* See also {@link PruneRedundantOrderBy}.
*/
public final class PullUpOrderByBeforeInlineJoin extends OptimizerRules.OptimizerRule<LogicalPlan> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense for this rule to be OptimizerRules.CoordinatorOnly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I can't find a scenario where this wouldn't need to run on coordinator, but could run on the data node. Will update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

*/
public void testInlineJoinPrunedAfterSortAndLookupJoin() {
var query = """
FROM airports
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is interesting in that if I add count to the keep I start getting Unbounded SORT not supported yet [SORT abbrev DESC] please add a LIMIT\nline 31:15: INLINE STATS [INLINE STATS count=COUNT(*) BY scalerank] cannot yet have an unbounded SORT [SORT abbrev DESC] before it : either move the SORT after it, or add a LIMIT before the SORT which might be correct, but it is confusing to the user. SORT abbrev was there before when count was not kept.

We eliminate the inline stats because we don't really need (and the sort is irrelevant from the point of view of inline stats) it but the outcome is definitely confusing to the user.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed.

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left some suggestions, more or less optional.

10029 |4 |null |74999
;

mixedShadowingInlineStatsAfterSort
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a variant of this test, as well:

FROM employees
| KEEP salary, emp_no, first_name, gender
| EVAL old_salary = salary
| SORT old_salary, emp_no
| DROP old_salary
| INLINE STATS salary = COUNT(*) BY gender
| LIMIT 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And another one:

FROM employees
| KEEP salary, emp_no, first_name, gender
| EVAL old_salary = salary
| SORT old_salary, emp_no
| DROP old_salary
| EVAL salary = salary / 1000
| RENAME salary AS s
| SORT s, first_name
| INLINE STATS s = COUNT(*) BY gender
| LIMIT 5
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added both.

Comment on lines 44 to 64
protected LogicalPlan rule(LogicalPlan plan) {
return plan.transformUp(LogicalPlan.class, PullUpOrderByBeforeInlineJoin::pullUpOrderByBeforeInlineJoin);
}

private static LogicalPlan pullUpOrderByBeforeInlineJoin(LogicalPlan plan) {
if (plan instanceof InlineJoin inlineJoin) {
Holder<OrderBy> orderByHolder = new Holder<>();
inlineJoin.forEachDownMayReturnEarly((node, breakEarly) -> {
if (node instanceof OrderBy orderBy) {
orderByHolder.set(orderBy);
breakEarly.set(true);
} else {
breakEarly.set(isSortBreaker(node));
}
});

OrderBy orderBy = orderByHolder.get();
plan = orderBy == null ? plan : pullUpOrderByBeforeInlineJoin(inlineJoin, orderBy);
}
return plan;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
protected LogicalPlan rule(LogicalPlan plan) {
return plan.transformUp(LogicalPlan.class, PullUpOrderByBeforeInlineJoin::pullUpOrderByBeforeInlineJoin);
}
private static LogicalPlan pullUpOrderByBeforeInlineJoin(LogicalPlan plan) {
if (plan instanceof InlineJoin inlineJoin) {
Holder<OrderBy> orderByHolder = new Holder<>();
inlineJoin.forEachDownMayReturnEarly((node, breakEarly) -> {
if (node instanceof OrderBy orderBy) {
orderByHolder.set(orderBy);
breakEarly.set(true);
} else {
breakEarly.set(isSortBreaker(node));
}
});
OrderBy orderBy = orderByHolder.get();
plan = orderBy == null ? plan : pullUpOrderByBeforeInlineJoin(inlineJoin, orderBy);
}
return plan;
}
protected LogicalPlan rule(LogicalPlan plan) {
return plan.transformUp(InlineJoin.class, ij -> pullUpOrderByBeforeInlineJoin(ij));
}
private static LogicalPlan pullUpOrderByBeforeInlineJoin(InlineJoin inlineJoin) {
Holder<OrderBy> orderByHolder = new Holder<>();
inlineJoin.forEachDownMayReturnEarly((node, breakEarly) -> {
if (node instanceof OrderBy orderBy) {
orderByHolder.set(orderBy);
breakEarly.set(true);
} else {
breakEarly.set(isSortBreaker(node));
}
});
OrderBy orderBy = orderByHolder.get();
return orderBy == null ? inlineJoin : pullUpOrderByBeforeInlineJoin(inlineJoin, orderBy);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeh, fixed, thanks.

return plan.transformUp(lp -> {
if (lp == eval) {
evalVisited.set(true);
} else if (lp instanceof Project project && evalVisited.get()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find a problem, but I wanted to check the aggregation (since it's considered to be a variant of projection) and I would like to propose unit test/csv test with (I know you have one unit test with sort ... stats ... inline stats):

FROM employees
| KEEP salary, emp_no, first_name, gender
| SORT salary
| STATS salary = MAX(salary) BY gender
| SORT salary DESC
| INLINE STATS s = COUNT(*) BY gender
| LIMIT 5

and a slightly different variant:

FROM employees
| KEEP salary, emp_no, first_name, gender
| SORT salary
| STATS salary = MAX(salary) BY gender
| INLINE STATS s = COUNT(*) BY gender
| LIMIT 5
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, added them (though the last one needs a trailing sort for stability in CSV tests (only)).

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly superficial review, but I think the approach makes sense. There's very likely a hidden problem with projects that's worth looking into, even though it may not affect current production queries at this time. Update: There was no problem, my mistake.

And we need to align on opt-in vs. opt-out optimizer rules and the use of interfaces. For now, I'd really like us to use interfaces to opt-in to optimizer rules so that correctness is guaranteed rather than finding out later, when more commands get added to ESQL, that we ran queries that always returned wrong results to the user because a new command silently opted into an optimization (like hoisting a sort before inline stats while the new command is in between) that it doesn't actually work with.
At the very least, let's be consistent: we've already started using opt-in interfaces and have 3 (or more) rules that use them; having an opt-out list of commands as part of the optimizer rule just for 1 rule would be confusing and breaking the pattern.

* <p>
* See also {@link PruneRedundantOrderBy}.
*/
public final class PullUpOrderByBeforeInlineJoin extends OptimizerRules.OptimizerRule<LogicalPlan> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we've started calling "push up" "hoist" in other rules.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed; not a fan. :)
We have preexisting "push down", not "lower" or "drop" and when listing the rules one would thus look up and grasp quicker "pull up" than "hoist" (which is also not "plain English").

But OK, since we have committed "hoist" already, I'm aligning this to it, but would still campaign on changing to "pull up" (and thus renaming the existing "hoist" rules, plus this one).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing against renaming "hoist" to "pullUp" or so. As long as we're consistent :)

/**
* Returns `true` if the {@code plan}'s position cannot be swapped with a SORT, `false` otherwise.
*/
private static boolean isSortBreaker(LogicalPlan plan) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to become an interface; a list of instanceofs is too easy to go stale. It's also inconsistent that multiple optimizer rules already use interfaces that allow logical plan nodes to opt into the optimizations, but here we have an instanceof list instead.

E.g. Aggregate will become a sort breaker most likely once we support window functions; when someone who's unaware of this rule works on window functions, they are almost guaranteed to miss this and the problem will only surface through sufficient test coverage by humans - so the reviewer would have to keep this in mind and say "hey, but what about inline stats with window functions when there is an agg before the inline stats and before that there's a sort"? That makes reviews pretty hard.

I also think the interface should be one that opts into this optimizer rule, not one that opts out of it.

If we're worried about the zoo of interfaces being too large to manage by non-expert devs, we can have an abstract super class StandardLogicalPlan (probably with a much better name - RowStreamingLogicalPlan or so, for commands like Eval, Grok, Enrich etc.?) that extends all the opt-in interfaces that most plans would normally work with; and then new commands would generally just extend this class unless they're special enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to become an interface

I agree. :) This is why the discussion around SortAgnostic (which I was hoping to be able to use in this logic's 1st attempt and then in a reshape of that) and I suppose Andrei's comment above.

If we're worried about the zoo of interfaces being too large to manage by non-expert devs

I just hope we don't end up introducing interfaces that are only relevant for just one rule. I think that's not a great design and it'll have unpleasant consequences for most folks. (There's more to touch on here, we can do that in #137543.)
I'm adding a new interface here, but I hope we can fuse this somehow with SortAgnostic.

I also think the interface should be one that opts into this optimizer rule, not one that opts out of it.

These interfaces are basically boolean-stand-ins, so they're considered either way, but you probably suggest to make it "omission" safe (i.e. fail execution rather than wrong result). I've thus added a SortPreserving (in case one sort breaking new command wouldn't implement it, the rule wouldn't advance past such a command to fetch an OrderBy to pull up above InlineJoin and thus leave the query unexecutable, so that'd be safe w.r.t. correctness).

But I can also imagine the usages of these interfaces might end up having "conflicting" safety in different rules (i.e. one rule could be safe as an opt-in in one rule, but opt-out-safe in another). So we might need to put some additional guard-rails in place (possibly à la https://github.com/elastic/elasticsearch/pull/137543/files#diff-92b2b661404e647b0d14884daff5d4b666e17c994a02f3cae08f44303ed0dd7f).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just hope we don't end up introducing interfaces that are only relevant for just one rule.

I don't think there's an issue with interfaces that work with just one rule. But sure, if we can simplify by merging two interfaces together if they serve similar purposes, let's go :)

"omission" safe (i.e. fail execution rather than wrong result).

Yes, exactly! Thank you :)

i.e. one rule could be safe as an opt-in in one rule, but opt-out-safe in another

That's fair. I guess the general guidance should be "the optimizer rule takes a valid plan and produces a valid plan, even when ESQL evolves and we don't know all plan nodes in the query". Silently wrong results are just too dangerous, and we've seen with e.g. remote enrich planning how we can accidentally provide wrong results for queries that couldn't actually be executed correctly.

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the iteration, Bogdan! Looking even nicer now!

@bpintea
Copy link
Contributor Author

bpintea commented Dec 9, 2025

Thanks Andrei and Alex!

@bpintea bpintea merged commit 467bc2a into elastic:main Dec 9, 2025
34 checks passed
@bpintea bpintea deleted the enh/sort_before_ijoin branch December 9, 2025 12:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.3.0

4 participants