Skip to content

ESQL: Fix INLINE STATS GROUP BY null being incorrectly pruned#140027

Merged
elasticsearchmachine merged 14 commits intoelastic:mainfrom
kanoshiou:fix-inline-stats-null-grouping
Jan 22, 2026
Merged

ESQL: Fix INLINE STATS GROUP BY null being incorrectly pruned#140027
elasticsearchmachine merged 14 commits intoelastic:mainfrom
kanoshiou:fix-inline-stats-null-grouping

Conversation

@kanoshiou
Copy link
Contributor

@kanoshiou kanoshiou commented Dec 29, 2025

This PR fixes the issue where INLINE STATS GROUP BY null was being incorrectly pruned by PruneLeftJoinOnNullMatchingField.

Fixes #139887

Problem

For query:

FROM employees
| INLINE STATS c = COUNT(*) BY n = null
| KEEP c, n
| LIMIT 3

During LogicalPlanOptimizer:

Limit[3[INTEGER],false,false]
\_EsqlProject[[c{r}#2, n{r}#4]]
  \_InlineJoin[LEFT,[n{r}#4],[n{r}#4]]
    |_Eval[[null[NULL] AS n#4]]
    | \_EsRelation[employees][<no-fields>{r$}#7]
    \_Aggregate[[n{r}#4],[COUNT(*[KEYWORD],true[BOOLEAN],PT0S[TIME_DURATION]) AS c#2, n{r}#4]]
      \_StubRelation[[<no-fields>{r$}#7, n{r}#4]]

The following join node:

InlineJoin[LEFT,[n{r}#4],[n{r}#4]]
|_Eval[[null[NULL] AS n#4]]
| \_EsRelation[employees][<no-fields>{r$}#7]
\_Aggregate[[n{r}#4],[COUNT(*[KEYWORD],true[BOOLEAN],PT0S[TIME_DURATION]) AS c#2, n{r}#4]]
  \_StubRelation[[<no-fields>{r$}#7, n{r}#4]]

should NOT have PruneLeftJoinOnNullMatchingField applied, because the right side is an Aggregate (originating from INLINE STATS). Since STATS supports GROUP BY null, the join key being null is a valid use case. Pruning this join would incorrectly eliminate the aggregation results, changing the query semantics.

During LocalLogicalPlanOptimizer:

ProjectExec[[c{r}#2, n{r}#4]]
\_LimitExec[3[INTEGER],null]
  \_ExchangeExec[[c{r}#2, n{r}#4],false]
    \_FragmentExec[filter=null, estimatedRowSize=0, reducer=[], fragment=[<>
Project[[c{r}#2, n{r}#4]]
\_Limit[3[INTEGER],false,false]
  \_InlineJoin[LEFT,[n{r}#4],[n{r}#4]]
    |_Eval[[null[NULL] AS n#4]]
    | \_EsRelation[employees][<no-fields>{r$}#7]
    \_LocalRelation[[c{r}#2, n{r}#4],Page{blocks=[LongVectorBlock[vector=ConstantLongVector[positions=1, value=100]], ConstantNullBlock[positions=1]]}]<>]]

The following join node:

InlineJoin[LEFT,[n{r}#4],[n{r}#4]]
|_Eval[[null[NULL] AS n#4]]
| \_EsRelation[employees][<no-fields>{r$}#7]
\_LocalRelation[[c{r}#2, n{r}#4],Page{blocks=[LongVectorBlock[vector=ConstantLongVector[positions=1, value=100]], ConstantNullBlock[positions=1]]}]

should NOT have PruneLeftJoinOnNullMatchingField applied, because the right side is a LocalRelation (the Aggregate was optimized into a LocalRelation containing the pre-computed aggregation results). Pruning this join when the join key is null would discard the valid aggregation results stored in the LocalRelation, incorrectly producing null values instead of the expected count.

Solution

The fix ensures that PruneLeftJoinOnNullMatchingField only applies to LOOKUP JOIN nodes, where join.right() is an EsRelation. For INLINE STATS joins, the right side can be:

  • Aggregate (before optimization), or
  • LocalRelation (after the aggregate is optimized)

By checking join.right() instanceof EsRelation, we correctly skip the pruning optimization for INLINE STATS joins, preserving the expected query results when grouping by null.

@elasticsearchmachine elasticsearchmachine added external-contributor Pull request authored by a developer outside the Elasticsearch team needs:triage Requires assignment of a team area label v9.4.0 labels Dec 29, 2025
@gbanasiak gbanasiak added the :Analytics/ES|QL AKA ESQL label Dec 30, 2025
@elasticsearchmachine elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) and removed needs:triage Requires assignment of a team area label labels Dec 30, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@astefan astefan requested review from astefan and bpintea December 30, 2025 08:49
# Conflicts:
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
# Conflicts:
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
@kanoshiou kanoshiou force-pushed the fix-inline-stats-null-grouping branch from 7061c46 to e1d44f8 Compare January 10, 2026 11:58
kanoshiou and others added 2 commits January 14, 2026 23:12
# Conflicts:
#	x-pack/plugin/esql/qa/testFixtures/src/main/resources/inlinestats.csv-spec
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
# Conflicts:
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
@astefan astefan added the >bug label Jan 19, 2026
@astefan astefan self-assigned this Jan 19, 2026
// side can be Aggregate or LocalRelation (when optimized), and the join key is always the grouping. Since
// STATS supports GROUP BY null, pruning the join when the join key (grouping) is null would incorrectly
// change the query results.
if (join.config().type() == LEFT && join.right() instanceof EsRelation) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kanoshiou thank you for adding the code comment here.
If this line is essential to this bug's fix, I feel that the check about the type of the join (lookup, inline) should be stronger than instanceof EsRelation. Why not join instanceof LookupJoin or join instaceof InlineStats == false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're absolutely right! instanceof InlineJoin == false is more precise than my original instanceof EsRelation check. Updated in 71f07f0 - thanks for catching this!

Comment on lines 1886 to 1890
/**
* Fix folding of coalesce function
* https://github.com/elastic/elasticsearch/issues/139887
* https://github.com/elastic/elasticsearch/issues/139344
*/
FIX_FOLD_COALESCE,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe @luigidellaquila referenced the wrong issue. #140028 (comment)

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@astefan
Copy link
Contributor

astefan commented Jan 20, 2026

buildkite test this

@astefan
Copy link
Contributor

astefan commented Jan 20, 2026

CC @bpintea on the test failure above:

REPRODUCE WITH: ./gradlew ":x-pack:plugin:esql:qa:server:mixed-cluster:v9.3.0#javaRestTest" -Dtests.class="org.elasticsearch.xpack.esql.qa.mixed.MixedClusterEsqlSpecIT" -Dtests.method="test {csv-spec:unmapped-nullify.InlinestatsCount}" -Dtests.seed=48F92E2B2C099573 -Dtests.bwc=true -Dtests.locale=blo-Latn-BJ -Dtests.timezone=Indian/Mahe -Druntime.java=25
--
 
MixedClusterEsqlSpecIT > test {csv-spec:unmapped-nullify.InlinestatsCount} FAILED
java.lang.AssertionError:
Data mismatch:
row 0 column 2:a list containing
row 0 column 2:0: expected <1L> but was "null"
Actual:
x:integer \| does_not_exist:null \| c:long \| s:double \| d:null
1         \| null                \| null   \| null     \| null
 
Expected:
x:integer \| does_not_exist:null \| c:long \| s:double \| d:null
1         \| null                \| 1      \| null     \| null
at __randomizedtesting.SeedInfo.seed([48F92E2B2C099573:C0AD11F182F5F88B]:0)
at org.junit.Assert.fail(Assert.java:89)
at org.elasticsearch.xpack.esql.CsvAssert.dataFailure(CsvAssert.java:317)
at org.elasticsearch.xpack.esql.CsvAssert.assertData(CsvAssert.java:271)
at org.elasticsearch.xpack.esql.qa.rest.EsqlSpecTestCase.assertResults(EsqlSpecTestCase.java:423)
at org.elasticsearch.xpack.esql.qa.rest.EsqlSpecTestCase.doTest(EsqlSpecTestCase.java:379)
at org.elasticsearch.xpack.esql.qa.rest.EsqlSpecTestCase.doTest(EsqlSpecTestCase.java:344)
at org.elasticsearch.xpack.esql.qa.rest.EsqlSpecTestCase.test(EsqlSpecTestCase.java:229)
@kanoshiou
Copy link
Contributor Author

kanoshiou commented Jan 20, 2026

@astefan My bad, I missed adding the new capability to this test. I’ll update the branch once the CI finishes, just in case any other tests fail.

@astefan
Copy link
Contributor

astefan commented Jan 20, 2026

buildkite test this

@kanoshiou
Copy link
Contributor Author

Thanks @astefan.

Regarding the failed test in elasticsearch-ci/part-4, I’m unable to reproduce it locally, which suggests it might be an issue specific to the CI runner. Let's see how this run goes.

@astefan astefan added v9.3.1 v9.2.5 auto-backport Automatically create backport pull requests when merged labels Jan 20, 2026
@kanoshiou
Copy link
Contributor Author

Thanks for following up and for the effort you put into this PR, @astefan!

Just a friendly reminder to merge this whenever you're ready, as this PR isn't tagged with auto-merge-without-approval.

@astefan
Copy link
Contributor

astefan commented Jan 21, 2026

@kanoshiou thank you for the reminder. I want also @bpintea's opinion on this PR as he's the original author of null pruning, that's the main reason I didn't yet merge this PR. Thank you for your patience and understanding.

Copy link
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix, LGTM.

Looking at the code, these following cases are covered too, but for good measure, do you think you could please add two more tests, more for future proofing than this change check:

  • one where we have another grouping key, that's null (like BY c = null, d = null)
  • one where we have another grouping key, that's not null (like BY c = null, emp_no)
@kanoshiou
Copy link
Contributor Author

Thank you for the review, @bpintea! I’ve added the tests you suggested. 😊

@bpintea
Copy link
Contributor

bpintea commented Jan 22, 2026

buildkite test this

@bpintea bpintea added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Jan 22, 2026
@elasticsearchmachine elasticsearchmachine merged commit f3ccb70 into elastic:main Jan 22, 2026
38 checks passed
astefan pushed a commit to astefan/elasticsearch that referenced this pull request Jan 22, 2026
…tic#140027)

This PR fixes the issue where `INLINE STATS GROUP BY null` was being
incorrectly pruned by `PruneLeftJoinOnNullMatchingField`.

Fixes elastic#139887

## Problem For query:

```
FROM employees
| INLINE STATS c = COUNT(*) BY n = null
| KEEP c, n
| LIMIT 3
```

During `LogicalPlanOptimizer`:

```
Limit[3[INTEGER],false,false]
\_EsqlProject[[c{r}elastic#2, n{r}elastic#4]]
  \_InlineJoin[LEFT,[n{r}elastic#4],[n{r}elastic#4]]
    |_Eval[[null[NULL] AS n#4]]
    | \_EsRelation[employees][<no-fields>{r$}elastic#7]
    \_Aggregate[[n{r}elastic#4],[COUNT(*[KEYWORD],true[BOOLEAN],PT0S[TIME_DURATION]) AS c#2, n{r}elastic#4]]
      \_StubRelation[[<no-fields>{r$}elastic#7, n{r}elastic#4]]
```

The following join node:

```
InlineJoin[LEFT,[n{r}elastic#4],[n{r}elastic#4]]
|_Eval[[null[NULL] AS n#4]]
| \_EsRelation[employees][<no-fields>{r$}elastic#7]
\_Aggregate[[n{r}elastic#4],[COUNT(*[KEYWORD],true[BOOLEAN],PT0S[TIME_DURATION]) AS c#2, n{r}elastic#4]]
  \_StubRelation[[<no-fields>{r$}elastic#7, n{r}elastic#4]]
```

should NOT have `PruneLeftJoinOnNullMatchingField` applied, because the
right side is an `Aggregate` (originating from `INLINE STATS`). Since
`STATS` supports `GROUP BY null`, the join key being null is a valid use
case. Pruning this join would incorrectly eliminate the aggregation
results, changing the query semantics.

During `LocalLogicalPlanOptimizer`:

```
ProjectExec[[c{r}elastic#2, n{r}elastic#4]]
\_LimitExec[3[INTEGER],null]
  \_ExchangeExec[[c{r}elastic#2, n{r}elastic#4],false]
    \_FragmentExec[filter=null, estimatedRowSize=0, reducer=[], fragment=[<>
Project[[c{r}elastic#2, n{r}elastic#4]]
\_Limit[3[INTEGER],false,false]
  \_InlineJoin[LEFT,[n{r}elastic#4],[n{r}elastic#4]]
    |_Eval[[null[NULL] AS n#4]]
    | \_EsRelation[employees][<no-fields>{r$}elastic#7]
    \_LocalRelation[[c{r}elastic#2, n{r}elastic#4],Page{blocks=[LongVectorBlock[vector=ConstantLongVector[positions=1, value=100]], ConstantNullBlock[positions=1]]}]<>]]
```

The following join node:

```
InlineJoin[LEFT,[n{r}elastic#4],[n{r}elastic#4]]
|_Eval[[null[NULL] AS n#4]]
| \_EsRelation[employees][<no-fields>{r$}elastic#7]
\_LocalRelation[[c{r}elastic#2, n{r}elastic#4],Page{blocks=[LongVectorBlock[vector=ConstantLongVector[positions=1, value=100]], ConstantNullBlock[positions=1]]}]
```

should NOT have `PruneLeftJoinOnNullMatchingField` applied, because the
right side is a `LocalRelation` (the `Aggregate` was optimized into a
`LocalRelation` containing the pre-computed aggregation results).
Pruning this join when the join key is null would discard the valid
aggregation results stored in the `LocalRelation`, incorrectly producing
null values instead of the expected count.

## Solution The fix ensures that `PruneLeftJoinOnNullMatchingField` only
applies to `LOOKUP JOIN` nodes, where `join.right()` is an `EsRelation`.
For `INLINE STATS` joins, the right side can be:

 - `Aggregate` (before optimization), or
 - `LocalRelation` (after the aggregate is optimized)

By checking `join.right() instanceof EsRelation`, we correctly skip the
pruning optimization for `INLINE STATS` joins, preserving the expected
query results when grouping by null.

(cherry picked from commit f3ccb70)
@astefan astefan removed the v9.2.5 label Jan 22, 2026
elasticsearchmachine pushed a commit that referenced this pull request Jan 22, 2026
) (#141095)

This PR fixes the issue where `INLINE STATS GROUP BY null` was being
incorrectly pruned by `PruneLeftJoinOnNullMatchingField`.

Fixes #139887

## Problem For query:

```
FROM employees
| INLINE STATS c = COUNT(*) BY n = null
| KEEP c, n
| LIMIT 3
```

During `LogicalPlanOptimizer`:

```
Limit[3[INTEGER],false,false]
\_EsqlProject[[c{r}#2, n{r}#4]]
  \_InlineJoin[LEFT,[n{r}#4],[n{r}#4]]
    |_Eval[[null[NULL] AS n#4]]
    | \_EsRelation[employees][<no-fields>{r$}#7]
    \_Aggregate[[n{r}#4],[COUNT(*[KEYWORD],true[BOOLEAN],PT0S[TIME_DURATION]) AS c#2, n{r}#4]]
      \_StubRelation[[<no-fields>{r$}#7, n{r}#4]]
```

The following join node:

```
InlineJoin[LEFT,[n{r}#4],[n{r}#4]]
|_Eval[[null[NULL] AS n#4]]
| \_EsRelation[employees][<no-fields>{r$}#7]
\_Aggregate[[n{r}#4],[COUNT(*[KEYWORD],true[BOOLEAN],PT0S[TIME_DURATION]) AS c#2, n{r}#4]]
  \_StubRelation[[<no-fields>{r$}#7, n{r}#4]]
```

should NOT have `PruneLeftJoinOnNullMatchingField` applied, because the
right side is an `Aggregate` (originating from `INLINE STATS`). Since
`STATS` supports `GROUP BY null`, the join key being null is a valid use
case. Pruning this join would incorrectly eliminate the aggregation
results, changing the query semantics.

During `LocalLogicalPlanOptimizer`:

```
ProjectExec[[c{r}#2, n{r}#4]]
\_LimitExec[3[INTEGER],null]
  \_ExchangeExec[[c{r}#2, n{r}#4],false]
    \_FragmentExec[filter=null, estimatedRowSize=0, reducer=[], fragment=[<>
Project[[c{r}#2, n{r}#4]]
\_Limit[3[INTEGER],false,false]
  \_InlineJoin[LEFT,[n{r}#4],[n{r}#4]]
    |_Eval[[null[NULL] AS n#4]]
    | \_EsRelation[employees][<no-fields>{r$}#7]
    \_LocalRelation[[c{r}#2, n{r}#4],Page{blocks=[LongVectorBlock[vector=ConstantLongVector[positions=1, value=100]], ConstantNullBlock[positions=1]]}]<>]]
```

The following join node:

```
InlineJoin[LEFT,[n{r}#4],[n{r}#4]]
|_Eval[[null[NULL] AS n#4]]
| \_EsRelation[employees][<no-fields>{r$}#7]
\_LocalRelation[[c{r}#2, n{r}#4],Page{blocks=[LongVectorBlock[vector=ConstantLongVector[positions=1, value=100]], ConstantNullBlock[positions=1]]}]
```

should NOT have `PruneLeftJoinOnNullMatchingField` applied, because the
right side is a `LocalRelation` (the `Aggregate` was optimized into a
`LocalRelation` containing the pre-computed aggregation results).
Pruning this join when the join key is null would discard the valid
aggregation results stored in the `LocalRelation`, incorrectly producing
null values instead of the expected count.

## Solution The fix ensures that `PruneLeftJoinOnNullMatchingField` only
applies to `LOOKUP JOIN` nodes, where `join.right()` is an `EsRelation`.
For `INLINE STATS` joins, the right side can be:

 - `Aggregate` (before optimization), or
 - `LocalRelation` (after the aggregate is optimized)

By checking `join.right() instanceof EsRelation`, we correctly skip the
pruning optimization for `INLINE STATS` joins, preserving the expected
query results when grouping by null.

(cherry picked from commit f3ccb70)

Co-authored-by: kanoshiou <uiaao@tuta.io>
@astefan astefan removed backport pending auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) auto-backport Automatically create backport pull requests when merged labels Jan 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >bug external-contributor Pull request authored by a developer outside the Elasticsearch team Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.3.1 v9.4.0

5 participants