-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Description
Description
From https://github.com/elastic/elasticsearch/pull/132417/files#r2284843141
For LOOKUP JOIN, we push down limits via duplication, rather than pulling up order by.
Pushing down limits has the advantage that a lot fewer rows need to be joined on. Without pushing down, the limit becomes less meaningful, because it turns into a "run the query on the whole data, and then throw most of it away" at the last moment.
The problem is, of course, that
StubRelationstands for the inline join's whole left hand side, and this prevents duplicating limits past the inline join as it would also include the limit in the computed stats.I think this can be solved by making it more explicit which logical plan the stubrelation stands for, actually. We could add a
StubRelationTargetplan node that identifies the target of the stub relation via an id. Then we could have* EsqlProject[[emp_no{f}#11, avg{r}#5, languages{f}#14, gender{f}#13]] * \_Limit[5] * \_InlineJoin[LEFT,[languages{f}#14],[languages{f}#14],[languages{r}#14]] * |_TopN[[Order[emp_no{f}#11,ASC,LAST]],5[INTEGER]] * | \_StubRelationTarget[#1] * | \_EsRelation[test][_meta_field{f}#17, emp_no{f}#11, first_name{f}#12, ..] * \_Project[[avg{r}#5, languages{f}#14]] * \_Eval[[$$SUM$avg$0{r$}#22 / $$COUNT$avg$1{r$}#23 AS avg#5]] * \_Aggregate[[languages{f}#14],[SUM(salary{f}#16,true[BOOLEAN]) AS $$SUM$avg$0#22, COUNT(salary{f}#16,true[BOOLEAN]) AS * $$COUNT$avg$1#23, languages{f}#14]] * \_StubRelation[target=#1][[_meta_field{f}#17, emp_no{f}#11, first_name{f}#12, gender{f}#13, hire_date{f}#18, job{f}#19, job.raw{f}#20, * languages{f}#14, last_name{f}#15, long_noidx{f}#21, salary{f}#16]]In particular, the stub relation target can be ignored during optimization of the second phase of the inline stats, pushing the topn 5 all the way to Lucene.
Another approach is to let go of the StubRelation and explicitly replace it by the plan that the StubRelation is standing in for. That would be "honest" in the sense that we admit that the left and right hand sides of the joins need different optimizations and can be computed separately, sometimes.
Implementation-wise, it means that the InlineJoin needs to be replaced by a simpler left join node that will be planned as hash join.