ESQL: Fix INLINE STATS GROUP BY null being incorrectly pruned#140027
ESQL: Fix INLINE STATS GROUP BY null being incorrectly pruned#140027elasticsearchmachine merged 14 commits intoelastic:mainfrom
INLINE STATS GROUP BY null being incorrectly pruned#140027Conversation
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
# Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
# Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
7061c46 to
e1d44f8
Compare
# Conflicts: # x-pack/plugin/esql/qa/testFixtures/src/main/resources/inlinestats.csv-spec # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
# Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
| // side can be Aggregate or LocalRelation (when optimized), and the join key is always the grouping. Since | ||
| // STATS supports GROUP BY null, pruning the join when the join key (grouping) is null would incorrectly | ||
| // change the query results. | ||
| if (join.config().type() == LEFT && join.right() instanceof EsRelation) { |
There was a problem hiding this comment.
@kanoshiou thank you for adding the code comment here.
If this line is essential to this bug's fix, I feel that the check about the type of the join (lookup, inline) should be stronger than instanceof EsRelation. Why not join instanceof LookupJoin or join instaceof InlineStats == false?
There was a problem hiding this comment.
You're absolutely right! instanceof InlineJoin == false is more precise than my original instanceof EsRelation check. Updated in 71f07f0 - thanks for catching this!
| /** | ||
| * Fix folding of coalesce function | ||
| * https://github.com/elastic/elasticsearch/issues/139887 | ||
| * https://github.com/elastic/elasticsearch/issues/139344 | ||
| */ | ||
| FIX_FOLD_COALESCE, |
There was a problem hiding this comment.
I believe @luigidellaquila referenced the wrong issue. #140028 (comment)
|
buildkite test this |
|
CC @bpintea on the test failure above: |
|
@astefan My bad, I missed adding the new capability to this test. I’ll update the branch once the CI finishes, just in case any other tests fail. |
|
buildkite test this |
|
Thanks @astefan. Regarding the failed test in elasticsearch-ci/part-4, I’m unable to reproduce it locally, which suggests it might be an issue specific to the CI runner. Let's see how this run goes. |
|
Thanks for following up and for the effort you put into this PR, @astefan! Just a friendly reminder to merge this whenever you're ready, as this PR isn't tagged with |
|
@kanoshiou thank you for the reminder. I want also @bpintea's opinion on this PR as he's the original author of |
bpintea
left a comment
There was a problem hiding this comment.
Thanks for the fix, LGTM.
Looking at the code, these following cases are covered too, but for good measure, do you think you could please add two more tests, more for future proofing than this change check:
- one where we have another grouping key, that's
null(likeBY c = null, d = null) - one where we have another grouping key, that's not null (like
BY c = null, emp_no)
|
Thank you for the review, @bpintea! I’ve added the tests you suggested. 😊 |
|
buildkite test this |
…tic#140027) This PR fixes the issue where `INLINE STATS GROUP BY null` was being incorrectly pruned by `PruneLeftJoinOnNullMatchingField`. Fixes elastic#139887 ## Problem For query: ``` FROM employees | INLINE STATS c = COUNT(*) BY n = null | KEEP c, n | LIMIT 3 ``` During `LogicalPlanOptimizer`: ``` Limit[3[INTEGER],false,false] \_EsqlProject[[c{r}elastic#2, n{r}elastic#4]] \_InlineJoin[LEFT,[n{r}elastic#4],[n{r}elastic#4]] |_Eval[[null[NULL] AS n#4]] | \_EsRelation[employees][<no-fields>{r$}elastic#7] \_Aggregate[[n{r}elastic#4],[COUNT(*[KEYWORD],true[BOOLEAN],PT0S[TIME_DURATION]) AS c#2, n{r}elastic#4]] \_StubRelation[[<no-fields>{r$}elastic#7, n{r}elastic#4]] ``` The following join node: ``` InlineJoin[LEFT,[n{r}elastic#4],[n{r}elastic#4]] |_Eval[[null[NULL] AS n#4]] | \_EsRelation[employees][<no-fields>{r$}elastic#7] \_Aggregate[[n{r}elastic#4],[COUNT(*[KEYWORD],true[BOOLEAN],PT0S[TIME_DURATION]) AS c#2, n{r}elastic#4]] \_StubRelation[[<no-fields>{r$}elastic#7, n{r}elastic#4]] ``` should NOT have `PruneLeftJoinOnNullMatchingField` applied, because the right side is an `Aggregate` (originating from `INLINE STATS`). Since `STATS` supports `GROUP BY null`, the join key being null is a valid use case. Pruning this join would incorrectly eliminate the aggregation results, changing the query semantics. During `LocalLogicalPlanOptimizer`: ``` ProjectExec[[c{r}elastic#2, n{r}elastic#4]] \_LimitExec[3[INTEGER],null] \_ExchangeExec[[c{r}elastic#2, n{r}elastic#4],false] \_FragmentExec[filter=null, estimatedRowSize=0, reducer=[], fragment=[<> Project[[c{r}elastic#2, n{r}elastic#4]] \_Limit[3[INTEGER],false,false] \_InlineJoin[LEFT,[n{r}elastic#4],[n{r}elastic#4]] |_Eval[[null[NULL] AS n#4]] | \_EsRelation[employees][<no-fields>{r$}elastic#7] \_LocalRelation[[c{r}elastic#2, n{r}elastic#4],Page{blocks=[LongVectorBlock[vector=ConstantLongVector[positions=1, value=100]], ConstantNullBlock[positions=1]]}]<>]] ``` The following join node: ``` InlineJoin[LEFT,[n{r}elastic#4],[n{r}elastic#4]] |_Eval[[null[NULL] AS n#4]] | \_EsRelation[employees][<no-fields>{r$}elastic#7] \_LocalRelation[[c{r}elastic#2, n{r}elastic#4],Page{blocks=[LongVectorBlock[vector=ConstantLongVector[positions=1, value=100]], ConstantNullBlock[positions=1]]}] ``` should NOT have `PruneLeftJoinOnNullMatchingField` applied, because the right side is a `LocalRelation` (the `Aggregate` was optimized into a `LocalRelation` containing the pre-computed aggregation results). Pruning this join when the join key is null would discard the valid aggregation results stored in the `LocalRelation`, incorrectly producing null values instead of the expected count. ## Solution The fix ensures that `PruneLeftJoinOnNullMatchingField` only applies to `LOOKUP JOIN` nodes, where `join.right()` is an `EsRelation`. For `INLINE STATS` joins, the right side can be: - `Aggregate` (before optimization), or - `LocalRelation` (after the aggregate is optimized) By checking `join.right() instanceof EsRelation`, we correctly skip the pruning optimization for `INLINE STATS` joins, preserving the expected query results when grouping by null. (cherry picked from commit f3ccb70)
) (#141095) This PR fixes the issue where `INLINE STATS GROUP BY null` was being incorrectly pruned by `PruneLeftJoinOnNullMatchingField`. Fixes #139887 ## Problem For query: ``` FROM employees | INLINE STATS c = COUNT(*) BY n = null | KEEP c, n | LIMIT 3 ``` During `LogicalPlanOptimizer`: ``` Limit[3[INTEGER],false,false] \_EsqlProject[[c{r}#2, n{r}#4]] \_InlineJoin[LEFT,[n{r}#4],[n{r}#4]] |_Eval[[null[NULL] AS n#4]] | \_EsRelation[employees][<no-fields>{r$}#7] \_Aggregate[[n{r}#4],[COUNT(*[KEYWORD],true[BOOLEAN],PT0S[TIME_DURATION]) AS c#2, n{r}#4]] \_StubRelation[[<no-fields>{r$}#7, n{r}#4]] ``` The following join node: ``` InlineJoin[LEFT,[n{r}#4],[n{r}#4]] |_Eval[[null[NULL] AS n#4]] | \_EsRelation[employees][<no-fields>{r$}#7] \_Aggregate[[n{r}#4],[COUNT(*[KEYWORD],true[BOOLEAN],PT0S[TIME_DURATION]) AS c#2, n{r}#4]] \_StubRelation[[<no-fields>{r$}#7, n{r}#4]] ``` should NOT have `PruneLeftJoinOnNullMatchingField` applied, because the right side is an `Aggregate` (originating from `INLINE STATS`). Since `STATS` supports `GROUP BY null`, the join key being null is a valid use case. Pruning this join would incorrectly eliminate the aggregation results, changing the query semantics. During `LocalLogicalPlanOptimizer`: ``` ProjectExec[[c{r}#2, n{r}#4]] \_LimitExec[3[INTEGER],null] \_ExchangeExec[[c{r}#2, n{r}#4],false] \_FragmentExec[filter=null, estimatedRowSize=0, reducer=[], fragment=[<> Project[[c{r}#2, n{r}#4]] \_Limit[3[INTEGER],false,false] \_InlineJoin[LEFT,[n{r}#4],[n{r}#4]] |_Eval[[null[NULL] AS n#4]] | \_EsRelation[employees][<no-fields>{r$}#7] \_LocalRelation[[c{r}#2, n{r}#4],Page{blocks=[LongVectorBlock[vector=ConstantLongVector[positions=1, value=100]], ConstantNullBlock[positions=1]]}]<>]] ``` The following join node: ``` InlineJoin[LEFT,[n{r}#4],[n{r}#4]] |_Eval[[null[NULL] AS n#4]] | \_EsRelation[employees][<no-fields>{r$}#7] \_LocalRelation[[c{r}#2, n{r}#4],Page{blocks=[LongVectorBlock[vector=ConstantLongVector[positions=1, value=100]], ConstantNullBlock[positions=1]]}] ``` should NOT have `PruneLeftJoinOnNullMatchingField` applied, because the right side is a `LocalRelation` (the `Aggregate` was optimized into a `LocalRelation` containing the pre-computed aggregation results). Pruning this join when the join key is null would discard the valid aggregation results stored in the `LocalRelation`, incorrectly producing null values instead of the expected count. ## Solution The fix ensures that `PruneLeftJoinOnNullMatchingField` only applies to `LOOKUP JOIN` nodes, where `join.right()` is an `EsRelation`. For `INLINE STATS` joins, the right side can be: - `Aggregate` (before optimization), or - `LocalRelation` (after the aggregate is optimized) By checking `join.right() instanceof EsRelation`, we correctly skip the pruning optimization for `INLINE STATS` joins, preserving the expected query results when grouping by null. (cherry picked from commit f3ccb70) Co-authored-by: kanoshiou <uiaao@tuta.io>
This PR fixes the issue where
INLINE STATS GROUP BY nullwas being incorrectly pruned byPruneLeftJoinOnNullMatchingField.Fixes #139887
Problem
For query:
During
LogicalPlanOptimizer:The following join node:
should NOT have
PruneLeftJoinOnNullMatchingFieldapplied, because the right side is anAggregate(originating fromINLINE STATS). SinceSTATSsupportsGROUP BY null, the join key being null is a valid use case. Pruning this join would incorrectly eliminate the aggregation results, changing the query semantics.During
LocalLogicalPlanOptimizer:The following join node:
should NOT have
PruneLeftJoinOnNullMatchingFieldapplied, because the right side is aLocalRelation(theAggregatewas optimized into aLocalRelationcontaining the pre-computed aggregation results). Pruning this join when the join key is null would discard the valid aggregation results stored in theLocalRelation, incorrectly producing null values instead of the expected count.Solution
The fix ensures that
PruneLeftJoinOnNullMatchingFieldonly applies toLOOKUP JOINnodes, wherejoin.right()is anEsRelation. ForINLINE STATSjoins, the right side can be:Aggregate(before optimization), orLocalRelation(after the aggregate is optimized)By checking
join.right() instanceof EsRelation, we correctly skip the pruning optimization forINLINE STATSjoins, preserving the expected query results when grouping by null.