ESQL: Pull `OrderBy` followed by `InlineJoin` on top of it by bpintea · Pull Request #137648 · elastic/elasticsearch

bpintea · 2025-11-05T21:19:16Z

Add optimisation rule to pull OrderBy above InlineJoin. This is required since otherwise the OrderBy won't be moved out of the left hand-side of the join, ending up as a stand-alone node that can't be turned into an executable -- it'll just be rejected by the verification as an "unbounded sort". Since the InlineJoin is sort agnostic, the OrderBy can be moved upwards past it, ending up as a top TopN.

This doesn't entirely solve the issue of unbounded sorts, however: some nodes, such as MV_EXPAND or LOOKUP JOIN break the inbound sort, so pulling a SORT on top of them isn't possible. In this cases the query will still fail. This is not INLINE JOIN specific, though.

This is a revival of #132417, but with a new approach on how to deal with the attributes used by SORT, but dropped or overshadowed by the time INLINE JOIN is reached. In this case, we'll add temporary internal attributes kept until the INLINE JOIN and dropped afterwards.

Related #124715 (#113727)
Related #124721 (#133120)

Add optimisation rule to pull OrderBy above InlineJoin. This is required since otherwise the SORT won't be moved out of the left hand-side of the join, ending up as a stand-alone node that can't be turned into an executable -- it'll be rejected by verification already as an "unbounded sort". Since the InlineJoin is sort agnostic, the OrderBy can be moved upwards past it, ending up as a top TopN.

elasticsearchmachine · 2025-11-05T21:19:42Z

Hi @bpintea, I've created a changelog YAML for you.

elasticsearchmachine · 2025-11-06T20:24:08Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

astefan

I like the tests, and I always like more tests :-).

The fact that this PR pulls up OrderBy passed InlineJoin without "using" or be made aware by the existence of SortAgnostic (as opposed to the original PR which considered SortAgnostic for something related to OrderBy) shows that the presence of SortAgnostic is not well enforced in code (for the developer's visibility and awareness), something that (reading between lines) has been made clear here. In other words, SortAgnostic defies half of its purpose - to be a tool developers are aware of (or be made aware of reliably) when they develop new commands but also (my own addition here) when developers (new or seasoned ones) touch any planning code (not only related to the new command).

The other half of SortAgnostics purpose is well defined: TRYING to reveal issues with using a command that doesn't fully considers all the subtleties and implication of a sort before that command that might silently give wrong results. But, this holds true considering the following:

the new command PR has good and, as much as possible, enough tests that reveal what SortAgnostic is trying to guard
the new command PR is being reviewed by people who are aware of sort tweaks and SortAgnostic presence and can point out that sort can be something to look into further, something that is not always true these days

IMHO, just like my previous review to the original PR, is that this PR brings a much needed improvement to the intricacies of inline stats + sort usage: an irrelevant sort should be pulled up passed inline stats because inline stats right-hand side should use the "original" relevant set of data (and an unbounded sorted set of data is irrelevant) and should be considered for adoption.

I still need to dig through the surgical take from rewriteMidProjections method and see why it's so surgical :-), but this is for the second round of review.

astefan · 2025-11-07T17:40:36Z

...java/org/elasticsearch/xpack/esql/optimizer/rules/logical/PullUpOrderByBeforeInlineJoin.java

+ * <p>
+ * See also {@link PruneRedundantOrderBy}.
+ */
+public final class PullUpOrderByBeforeInlineJoin extends OptimizerRules.OptimizerRule<LogicalPlan> {


I think it makes sense for this rule to be OptimizerRules.CoordinatorOnly.

Yes, I can't find a scenario where this wouldn't need to run on coordinator, but could run on the data node. Will update.

astefan · 2025-11-07T17:57:52Z

.../java/org/elasticsearch/xpack/esql/optimizer/HoistOrderByBeforeInlineJoinOptimizerTests.java

+     */
+    public void testInlineJoinPrunedAfterSortAndLookupJoin() {
+        var query = """
+            FROM airports


This one is interesting in that if I add count to the keep I start getting Unbounded SORT not supported yet [SORT abbrev DESC] please add a LIMIT\nline 31:15: INLINE STATS [INLINE STATS count=COUNT(*) BY scalerank] cannot yet have an unbounded SORT [SORT abbrev DESC] before it : either move the SORT after it, or add a LIMIT before the SORT which might be correct, but it is confusing to the user. SORT abbrev was there before when count was not kept.

We eliminate the inline stats because we don't really need (and the sort is irrelevant from the point of view of inline stats) it but the outcome is definitely confusing to the user.

astefan

LGTM. Left some suggestions, more or less optional.

astefan · 2025-11-21T08:59:32Z

x-pack/plugin/esql/qa/testFixtures/src/main/resources/inlinestats.csv-spec

+10029          |4              |null             |74999
+;
+
+mixedShadowingInlineStatsAfterSort


Added both.

astefan · 2025-11-21T09:09:22Z

...java/org/elasticsearch/xpack/esql/optimizer/rules/logical/PullUpOrderByBeforeInlineJoin.java

+    protected LogicalPlan rule(LogicalPlan plan) {
+        return plan.transformUp(LogicalPlan.class, PullUpOrderByBeforeInlineJoin::pullUpOrderByBeforeInlineJoin);
+    }
+
+    private static LogicalPlan pullUpOrderByBeforeInlineJoin(LogicalPlan plan) {
+        if (plan instanceof InlineJoin inlineJoin) {
+            Holder<OrderBy> orderByHolder = new Holder<>();
+            inlineJoin.forEachDownMayReturnEarly((node, breakEarly) -> {
+                if (node instanceof OrderBy orderBy) {
+                    orderByHolder.set(orderBy);
+                    breakEarly.set(true);
+                } else {
+                    breakEarly.set(isSortBreaker(node));
+                }
+            });
+
+            OrderBy orderBy = orderByHolder.get();
+            plan = orderBy == null ? plan : pullUpOrderByBeforeInlineJoin(inlineJoin, orderBy);
+        }
+        return plan;
+    }


Suggested change

protected LogicalPlan rule(LogicalPlan plan) {

return plan.transformUp(LogicalPlan.class, PullUpOrderByBeforeInlineJoin::pullUpOrderByBeforeInlineJoin);

}

private static LogicalPlan pullUpOrderByBeforeInlineJoin(LogicalPlan plan) {

if (plan instanceof InlineJoin inlineJoin) {

Holder<OrderBy> orderByHolder = new Holder<>();

inlineJoin.forEachDownMayReturnEarly((node, breakEarly) -> {

if (node instanceof OrderBy orderBy) {

orderByHolder.set(orderBy);

breakEarly.set(true);

} else {

breakEarly.set(isSortBreaker(node));

}

});

OrderBy orderBy = orderByHolder.get();

plan = orderBy == null ? plan : pullUpOrderByBeforeInlineJoin(inlineJoin, orderBy);

}

return plan;

}

protected LogicalPlan rule(LogicalPlan plan) {

return plan.transformUp(InlineJoin.class, ij -> pullUpOrderByBeforeInlineJoin(ij));

}

private static LogicalPlan pullUpOrderByBeforeInlineJoin(InlineJoin inlineJoin) {

Holder<OrderBy> orderByHolder = new Holder<>();

inlineJoin.forEachDownMayReturnEarly((node, breakEarly) -> {

if (node instanceof OrderBy orderBy) {

orderByHolder.set(orderBy);

breakEarly.set(true);

} else {

breakEarly.set(isSortBreaker(node));

}

});

OrderBy orderBy = orderByHolder.get();

return orderBy == null ? inlineJoin : pullUpOrderByBeforeInlineJoin(inlineJoin, orderBy);

}

Oh yeh, fixed, thanks.

astefan · 2025-11-21T13:45:44Z

.../java/org/elasticsearch/xpack/esql/optimizer/rules/logical/HoistOrderByBeforeInlineJoin.java

+        return plan.transformUp(lp -> {
+            if (lp == eval) {
+                evalVisited.set(true);
+            } else if (lp instanceof Project project && evalVisited.get()) {


I didn't find a problem, but I wanted to check the aggregation (since it's considered to be a variant of projection) and I would like to propose unit test/csv test with (I know you have one unit test with sort ... stats ... inline stats):

FROM employees | KEEP salary, emp_no, first_name, gender | SORT salary | STATS salary = MAX(salary) BY gender | SORT salary DESC | INLINE STATS s = COUNT(*) BY gender | LIMIT 5

and a slightly different variant:

FROM employees | KEEP salary, emp_no, first_name, gender | SORT salary | STATS salary = MAX(salary) BY gender | INLINE STATS s = COUNT(*) BY gender | LIMIT 5

Sure, added them (though the last one needs a trailing sort for stability in CSV tests (only)).

alex-spies

Slightly superficial review, but I think the approach makes sense. ~~There's very likely a hidden problem with projects that's worth looking into, even though it may not affect current production queries at this time.~~ Update: There was no problem, my mistake.

And we need to align on opt-in vs. opt-out optimizer rules and the use of interfaces. For now, I'd really like us to use interfaces to opt-in to optimizer rules so that correctness is guaranteed rather than finding out later, when more commands get added to ESQL, that we ran queries that always returned wrong results to the user because a new command silently opted into an optimization (like hoisting a sort before inline stats while the new command is in between) that it doesn't actually work with.
At the very least, let's be consistent: we've already started using opt-in interfaces and have 3 (or more) rules that use them; having an opt-out list of commands as part of the optimizer rule just for 1 rule would be confusing and breaking the pattern.

alex-spies · 2025-12-03T11:17:43Z

...java/org/elasticsearch/xpack/esql/optimizer/rules/logical/PullUpOrderByBeforeInlineJoin.java

+ * <p>
+ * See also {@link PruneRedundantOrderBy}.
+ */
+public final class PullUpOrderByBeforeInlineJoin extends OptimizerRules.OptimizerRule<LogicalPlan> {


nit: we've started calling "push up" "hoist" in other rules.

Noticed; not a fan. :)
We have preexisting "push down", not "lower" or "drop" and when listing the rules one would thus look up and grasp quicker "pull up" than "hoist" (which is also not "plain English").

But OK, since we have committed "hoist" already, I'm aligning this to it, but would still campaign on changing to "pull up" (and thus renaming the existing "hoist" rules, plus this one).

Nothing against renaming "hoist" to "pullUp" or so. As long as we're consistent :)

alex-spies · 2025-12-03T11:29:51Z

...java/org/elasticsearch/xpack/esql/optimizer/rules/logical/PullUpOrderByBeforeInlineJoin.java

+    /**
+     * Returns `true` if the {@code plan}'s position cannot be swapped with a SORT, `false` otherwise.
+     */
+    private static boolean isSortBreaker(LogicalPlan plan) {


I think this needs to become an interface; a list of instanceofs is too easy to go stale. It's also inconsistent that multiple optimizer rules already use interfaces that allow logical plan nodes to opt into the optimizations, but here we have an instanceof list instead.

E.g. Aggregate will become a sort breaker most likely once we support window functions; when someone who's unaware of this rule works on window functions, they are almost guaranteed to miss this and the problem will only surface through sufficient test coverage by humans - so the reviewer would have to keep this in mind and say "hey, but what about inline stats with window functions when there is an agg before the inline stats and before that there's a sort"? That makes reviews pretty hard.

I also think the interface should be one that opts into this optimizer rule, not one that opts out of it.

If we're worried about the zoo of interfaces being too large to manage by non-expert devs, we can have an abstract super class StandardLogicalPlan (probably with a much better name - RowStreamingLogicalPlan or so, for commands like Eval, Grok, Enrich etc.?) that extends all the opt-in interfaces that most plans would normally work with; and then new commands would generally just extend this class unless they're special enough.

I think this needs to become an interface

I agree. :) This is why the discussion around SortAgnostic (which I was hoping to be able to use in this logic's 1st attempt and then in a reshape of that) and I suppose Andrei's comment above.

If we're worried about the zoo of interfaces being too large to manage by non-expert devs

I just hope we don't end up introducing interfaces that are only relevant for just one rule. I think that's not a great design and it'll have unpleasant consequences for most folks. (There's more to touch on here, we can do that in #137543.)
I'm adding a new interface here, but I hope we can fuse this somehow with SortAgnostic.

I also think the interface should be one that opts into this optimizer rule, not one that opts out of it.

These interfaces are basically boolean-stand-ins, so they're considered either way, but you probably suggest to make it "omission" safe (i.e. fail execution rather than wrong result). I've thus added a SortPreserving (in case one sort breaking new command wouldn't implement it, the rule wouldn't advance past such a command to fetch an OrderBy to pull up above InlineJoin and thus leave the query unexecutable, so that'd be safe w.r.t. correctness).

But I can also imagine the usages of these interfaces might end up having "conflicting" safety in different rules (i.e. one rule could be safe as an opt-in in one rule, but opt-out-safe in another). So we might need to put some additional guard-rails in place (possibly à la https://github.com/elastic/elasticsearch/pull/137543/files#diff-92b2b661404e647b0d14884daff5d4b666e17c994a02f3cae08f44303ed0dd7f).

I just hope we don't end up introducing interfaces that are only relevant for just one rule.

I don't think there's an issue with interfaces that work with just one rule. But sure, if we can simplify by merging two interfaces together if they serve similar purposes, let's go :)

"omission" safe (i.e. fail execution rather than wrong result).

Yes, exactly! Thank you :)

i.e. one rule could be safe as an opt-in in one rule, but opt-out-safe in another

That's fair. I guess the general guidance should be "the optimizer rule takes a valid plan and produces a valid plan, even when ESQL evolves and we don't know all plan nodes in the query". Silently wrong results are just too dangerous, and we've seen with e.g. remote enrich planning how we can accidentally provide wrong results for queries that couldn't actually be executed correctly.

...java/org/elasticsearch/xpack/esql/optimizer/rules/logical/PullUpOrderByBeforeInlineJoin.java

alex-spies

Thanks for the iteration, Bogdan! Looking even nicer now!

bpintea · 2025-12-09T12:41:41Z

Thanks Andrei and Alex!

bpintea added >enhancement :Analytics/ES|QL AKA ESQL v9.3.0 labels Nov 5, 2025

Update docs/changelog/137648.yaml

e6e2fe7

bpintea added 5 commits November 6, 2025 16:29

adjust mid-Projects patching

0f1fb2c

Merge remote-tracking branch 'upstream/main' into enh/sort_before_ijoin

dac7465

adjust one test

e0407f8

Merge remote-tracking branch 'upstream/main' into enh/sort_before_ijoin

7904057

adjust one more test

7fe8f79

bpintea requested review from alex-spies, astefan and fang-xing-esql November 6, 2025 20:23

bpintea marked this pull request as ready for review November 6, 2025 20:23

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Nov 6, 2025

astefan reviewed Nov 7, 2025

View reviewed changes

astefan approved these changes Nov 21, 2025

View reviewed changes

alex-spies approved these changes Dec 3, 2025

View reviewed changes

bpintea added 3 commits December 8, 2025 19:31

Address review comments

a36c204

Merge remote-tracking branch 'upstream/main' into enh/sort_before_ijoin

6ab6c9e

Introduce a SortPreserving interface

291f342

alex-spies approved these changes Dec 9, 2025

View reviewed changes

bpintea merged commit 467bc2a into elastic:main Dec 9, 2025
34 checks passed

bpintea deleted the enh/sort_before_ijoin branch December 9, 2025 12:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ESQL: Pull `OrderBy` followed by `InlineJoin` on top of it#137648

ESQL: Pull `OrderBy` followed by `InlineJoin` on top of it#137648
bpintea merged 10 commits intoelastic:mainfrom
bpintea:enh/sort_before_ijoin

bpintea commented Nov 5, 2025 •

edited by astefan

Loading

elasticsearchmachine commented Nov 5, 2025

elasticsearchmachine commented Nov 6, 2025

astefan left a comment

astefan Nov 7, 2025

bpintea Nov 10, 2025

bpintea Dec 8, 2025

astefan Nov 7, 2025

bpintea Nov 10, 2025

astefan left a comment

astefan Nov 21, 2025

astefan Nov 21, 2025

bpintea Dec 8, 2025

astefan Nov 21, 2025

bpintea Dec 8, 2025

astefan Nov 21, 2025

bpintea Dec 8, 2025

alex-spies left a comment •

edited

Loading

alex-spies Dec 3, 2025

bpintea Dec 8, 2025

alex-spies Dec 9, 2025

alex-spies Dec 3, 2025

bpintea Dec 8, 2025

alex-spies Dec 9, 2025

Uh oh!

Uh oh!

alex-spies left a comment

bpintea commented Dec 9, 2025

Uh oh!

Labels

4 participants

Conversation

bpintea commented Nov 5, 2025 • edited by astefan Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

elasticsearchmachine commented Nov 5, 2025

elasticsearchmachine commented Nov 6, 2025

astefan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

astefan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alex-spies left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alex-spies left a comment

Choose a reason for hiding this comment

bpintea commented Dec 9, 2025

Uh oh!

Labels

4 participants

bpintea commented Nov 5, 2025 •

edited by astefan

Loading

alex-spies left a comment •

edited

Loading