ESQL: INLINESTATS topN optimization on the left-hand side only

Description

From https://github.com/elastic/elasticsearch/pull/132417/files#r2284843141

For LOOKUP JOIN, we push down limits via duplication, rather than pulling up order by.

Pushing down limits has the advantage that a lot fewer rows need to be joined on. Without pushing down, the limit becomes less meaningful, because it turns into a "run the query on the whole data, and then throw most of it away" at the last moment.

The problem is, of course, that StubRelation stands for the inline join's whole left hand side, and this prevents duplicating limits past the inline join as it would also include the limit in the computed stats.

I think this can be solved by making it more explicit which logical plan the stubrelation stands for, actually. We could add a StubRelationTarget plan node that identifies the target of the stub relation via an id. Then we could have
     * EsqlProject[[emp_no{f}#11, avg{r}#5, languages{f}#14, gender{f}#13]]
     * \_Limit[5]
     *   \_InlineJoin[LEFT,[languages{f}#14],[languages{f}#14],[languages{r}#14]]
     *     |_TopN[[Order[emp_no{f}#11,ASC,LAST]],5[INTEGER]]
     *     |  \_StubRelationTarget[#1]
     *     |    \_EsRelation[test][_meta_field{f}#17, emp_no{f}#11, first_name{f}#12, ..]
     *     \_Project[[avg{r}#5, languages{f}#14]]
     *       \_Eval[[$$SUM$avg$0{r$}#22 / $$COUNT$avg$1{r$}#23 AS avg#5]]
     *         \_Aggregate[[languages{f}#14],[SUM(salary{f}#16,true[BOOLEAN]) AS $$SUM$avg$0#22, COUNT(salary{f}#16,true[BOOLEAN]) AS
     *              $$COUNT$avg$1#23, languages{f}#14]]
     *           \_StubRelation[target=#1][[_meta_field{f}#17, emp_no{f}#11, first_name{f}#12, gender{f}#13, hire_date{f}#18, job{f}#19, job.raw{f}#20,
     *                  languages{f}#14, last_name{f}#15, long_noidx{f}#21, salary{f}#16]]
In particular, the stub relation target can be ignored during optimization of the second phase of the inline stats, pushing the topn 5 all the way to Lucene.

Another approach is to let go of the StubRelation and explicitly replace it by the plan that the StubRelation is standing in for. That would be "honest" in the sense that we admit that the left and right hand sides of the joins need different optimizations and can be computed separately, sometimes.

Implementation-wise, it means that the InlineJoin needs to be replaced by a simpler left join node that will be planned as hash join.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ESQL: INLINESTATS topN optimization on the left-hand side only #133120

Description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ESQL: INLINESTATS topN optimization on the left-hand side only #133120

Description

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions