Skip to content

ESQL: MV_EXPAND and Sort: wrong optimizations or unplannable with "unknown physical plan node [OrderExec]" #120803

@alex-spies

Description

@alex-spies

Because MV_EXPAND generally cannot be pushed down past a SORT, it can cause situations where a SORT/OrderBy node cannot be combined with a downstream LIMIT node. When we cannot combine an OrderBy with a Limit, it gets mapped to a OrderExec - a physical plan node that we cannot map to an operator because we don't implement unbounded sorts (sorts without a built-in limit, i.e. TopNs).

In some cases, queries with MV_EXPAND and SORT just fail with the cryptic message unknown physical plan node [OrderExec]. In other cases, the query is optimized to make it run somehow, but I believe this is not actually correct for all cases.

I think this requires:

  • A better error message. In fact, I think we shouldn't ever map to OrderExec. We probably should implement PostOptimizationVerificationAware on the OrderBy class and have the verification fail if a node is even present after optimization, ideally with a suggestion on what a user can try to do.
  • Double checking the rule AddDefaultTopN for correctness.

In the following examples, consider a simple index test with fields field1 and field2.

Example 1

from test* | sort field1 | mv_expand field1 | sort field2 | limit 1

This should be impossible to run, but the query does not fail. That's because AddDefaultTopN turns the OrderBy into a TopN with limit 10000:

[2025-01-24T15:33:30,464][TRACE][o.e.x.e.o.LogicalPlanOptimizer] [runTask-0] Rule logical.AddDefaultTopN applied
TopN[[Order[field2{f}#26,ASC,LAST]],2[INTEGER]]     = TopN[[Order[field2{f}#26,ASC,LAST]],2[INTEGER]]
\_MvExpand[field1{f}#25,field1{r}#27,null]          = \_MvExpand[field1{f}#25,field1{r}#27,null]
  \_OrderBy[[Order[field1{f}#25,ASC,LAST]]]         !   \_TopN[[Order[field1{f}#25,ASC,LAST]],10000[INTEGER]]
    \_EsRelation[test*][field1{f}#25, field2{f}#26] =     \_EsRelation[test*][field1{f}#25, field2{f}#26]

This is simply wrong in cases where the top 10k documents by field1 do not include the single top 1 document by field2. However, it would be correct to eliminate the sort field1 altogether, because after the mv_expand field1 we sort again, anyway. It'd also be correct to exchange the positions of the mv_expand field1 with sort field2 because sorting by a non-expanded field is unaffected by the expansion. That means the query is equivalent to

from test* | sort field1 | sort field2 | mv_expand field1| limit 1

where the two sorts next to each other would be pruned by PushDownAndCombineOrderBy. This would end up performing the TopN operation before the mv_expand, allowing us to push this down to Lucene and making the query fast.

Example 2

from test* | sort field1 | mv_expand field2 | sort field2 | limit 1

Again, AddDefaultTopN wrongly discards anything other than the top 10k documents by field1:

[2025-01-24T15:40:45,503][TRACE][o.e.x.e.o.LogicalPlanOptimizer] [runTask-0] Rule logical.AddDefaultTopN applied
TopN[[Order[field2{r}#35,ASC,LAST]],2[INTEGER]]     = TopN[[Order[field2{r}#35,ASC,LAST]],2[INTEGER]]
\_MvExpand[field2{f}#34,field2{r}#35,null]          = \_MvExpand[field2{f}#34,field2{r}#35,null]
  \_OrderBy[[Order[field1{f}#33,ASC,LAST]]]         !   \_TopN[[Order[field1{f}#33,ASC,LAST]],10000[INTEGER]]
    \_EsRelation[test*][field1{f}#33, field2{f}#34] =     \_EsRelation[test*][field1{f}#33, field2{f}#34]

In this case, it'd be correct to push down mv_expand field2 past sort field1. If we did that, we'd have two sorts next to each other, essentially as if we had written

from test* | mv_expand field2 | sort field1 | sort field2 | limit 1

and in this case, the sort field1 gets correctly eliminated:

[2025-01-24T15:45:03,381][TRACE][o.e.x.e.o.LogicalPlanOptimizer] [runTask-0] Rule logical.PushDownAndCombineOrderBy applied
Limit[1[INTEGER]]                                     = Limit[1[INTEGER]]
\_OrderBy[[Order[field2{r}#43,ASC,LAST]]]             = \_OrderBy[[Order[field2{r}#43,ASC,LAST]]]
  \_OrderBy[[Order[field1{f}#41,ASC,LAST]]]           !   \_MvExpand[field2{f}#42,field2{r}#43,null]
    \_MvExpand[field2{f}#42,field2{r}#43,null]        !     \_EsRelation[test*][field1{f}#41, field2{f}#42]
      \_EsRelation[test*][field1{f}#41, field2{f}#42] ! 

Example 3

from test* | sort field1 | mv_expand field1 | sort field1 | limit 1

It'd be correct to eliminate the first sort field1. It'd be incorrect per se to exchange the mv_expand field1 with any of the sorts left or right of it to achieve that.

Example 4

from test* | sort field1 | mv_expand field2 | sort field1 | limit 1

It'd be correct to eliminate the first or second sort field1. It'd be correct per se to exchange the mv_expand field1 with any of the sorts left or right of it to achieve that. But eliminating/pushgin down the rightmost sort field1 would lead to a potentially more optimial plan due to being able to push down a Top1 to Lucene.

Example 5

from test* | limit 10000 | sort field1 | mv_expand field2 | sort field2 | limit 2

In this case, there's not even AddDefaultTopN jumping in to the rescue (even though it did so wrongly in Example 2):

"error":{"root_cause":[{"type":"esql_illegal_argument_exception","reason":"unknown physical plan node [OrderExec]"}],"type":"esql_illegal_argument_exception","reason":"unknown physical plan node [OrderExec]"},"status":500}

Similarly to Example 2, the first sort field1 is actually fine to eliminate.

Example 6

from test* | limit 10000 | sort field1 | mv_expand field1 | sort field2 | limit 2

The same as in Example 5, we get an illegal argument exception with a 500. But again, the sort field1 is redundant.

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions