ESQL: Fix ReplaceMissingFieldsWithNull#125764
Merged
alex-spies merged 13 commits intoelastic:mainfrom Apr 3, 2025
Merged
Conversation
The change in 80125a4 is a quick fix and allows breaking an invariant of Layout. Revert that.
When encountering projections, it tries to do the job of field extraction for missing fields by injecting an Eval that creates a literal null with the same name id as the field attribute for the missing field. This is wrong: 1. We only insert an Eval in case that a Project relies on the missing attribute. There could be other plan nodes that rely on the missing attribute. 2. Even for Projects, we only insert an Eval in case we squarely project for the field - in case of aliases (e.g. from RENAME), we do nothing. 3. In case of multiple Projects that use this attribute, we create multiple attributes with the original field attribute's id, causing a wrong Layout. This triggered elastic#121754.
Collaborator
|
Hi @alex-spies, I've created a changelog YAML for you. |
Member
fang-xing-esql
left a comment
There was a problem hiding this comment.
Thank you @alex-spies, I added a couple of things that I can think of.
.../org/elasticsearch/xpack/esql/optimizer/rules/logical/local/ReplaceMissingFieldWithNull.java
Show resolved
Hide resolved
.../org/elasticsearch/xpack/esql/optimizer/rules/logical/local/ReplaceMissingFieldWithNull.java
Show resolved
Hide resolved
Contributor
Author
|
Thanks for the remarks @fang-xing-esql . I'll add a couple of comments to the code so that we don't have to scratch our heads with the same questions in the future. |
This can lead to empty output, which leads to the EsRelation being replaced by a LocalRelation with 0 rows.
This was referenced Apr 1, 2025
Collaborator
|
Hi @alex-spies, I've updated the changelog YAML for you. |
Collaborator
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
fang-xing-esql
approved these changes
Apr 2, 2025
Member
fang-xing-esql
left a comment
There was a problem hiding this comment.
LGTM, thank you @alex-spies !
…-fields-with-null
This was referenced Apr 2, 2025
…-fields-with-null
Contributor
Author
|
Last CI run was green except for elasticsearch-ci/part-1 timing out: Restarted CI as there was a merge conflict now. |
alex-spies
added a commit
that referenced
this pull request
Apr 3, 2025
* Revert changes to Layout.java The change in 80125a4 is a quick fix and allows breaking an invariant of Layout. Revert that. * Simplify ReplaceMissingFieldWithNull When encountering projections, it tries to do the job of field extraction for missing fields by injecting an Eval that creates a literal null with the same name id as the field attribute for the missing field. This is wrong: 1. We only insert an Eval in case that a Project relies on the missing attribute. There could be other plan nodes that rely on the missing attribute. 2. Even for Projects, we only insert an Eval in case we squarely project for the field - in case of aliases (e.g. from RENAME), we do nothing. 3. In case of multiple Projects that use this attribute, we create multiple attributes with the original field attribute's id, causing a wrong Layout. This triggered #121754. * Revive logic for EsRelation instead of Project * Update LocalLogicalPlanOptimizerTests * Update test expectations * Do not prune attributes from EsRelation This can lead to empty output, which leads to the EsRelation being replaced by a LocalRelation with 0 rows. * Add tests + capability * Add comments * Update docs/changelog/125764.yaml
alex-spies
added a commit
that referenced
this pull request
Apr 3, 2025
* Revert changes to Layout.java The change in 80125a4 is a quick fix and allows breaking an invariant of Layout. Revert that. * Simplify ReplaceMissingFieldWithNull When encountering projections, it tries to do the job of field extraction for missing fields by injecting an Eval that creates a literal null with the same name id as the field attribute for the missing field. This is wrong: 1. We only insert an Eval in case that a Project relies on the missing attribute. There could be other plan nodes that rely on the missing attribute. 2. Even for Projects, we only insert an Eval in case we squarely project for the field - in case of aliases (e.g. from RENAME), we do nothing. 3. In case of multiple Projects that use this attribute, we create multiple attributes with the original field attribute's id, causing a wrong Layout. This triggered #121754. * Revive logic for EsRelation instead of Project * Update LocalLogicalPlanOptimizerTests * Update test expectations * Do not prune attributes from EsRelation This can lead to empty output, which leads to the EsRelation being replaced by a LocalRelation with 0 rows. * Add tests + capability * Add comments * [CI] Auto commit changes from spotless * Update docs/changelog/125764.yaml --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Contributor
Author
Contributor
Author
Contributor
Author
|
Ah yeah, publishing the build scan failed: |
alex-spies
added a commit
to alex-spies/elasticsearch
that referenced
this pull request
Apr 3, 2025
…lastic#126166) * Revert changes to Layout.java The change in 80125a4 is a quick fix and allows breaking an invariant of Layout. Revert that. * Simplify ReplaceMissingFieldWithNull When encountering projections, it tries to do the job of field extraction for missing fields by injecting an Eval that creates a literal null with the same name id as the field attribute for the missing field. This is wrong: 1. We only insert an Eval in case that a Project relies on the missing attribute. There could be other plan nodes that rely on the missing attribute. 2. Even for Projects, we only insert an Eval in case we squarely project for the field - in case of aliases (e.g. from RENAME), we do nothing. 3. In case of multiple Projects that use this attribute, we create multiple attributes with the original field attribute's id, causing a wrong Layout. This triggered elastic#121754. * Revive logic for EsRelation instead of Project * Update LocalLogicalPlanOptimizerTests * Update test expectations * Do not prune attributes from EsRelation This can lead to empty output, which leads to the EsRelation being replaced by a LocalRelation with 0 rows. * Add tests + capability * Add comments * [CI] Auto commit changes from spotless * Update docs/changelog/125764.yaml --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co> (cherry picked from commit 96ca13a) # Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
alex-spies
added a commit
to alex-spies/elasticsearch
that referenced
this pull request
Apr 3, 2025
…lastic#126166) * Revert changes to Layout.java The change in 80125a4 is a quick fix and allows breaking an invariant of Layout. Revert that. * Simplify ReplaceMissingFieldWithNull When encountering projections, it tries to do the job of field extraction for missing fields by injecting an Eval that creates a literal null with the same name id as the field attribute for the missing field. This is wrong: 1. We only insert an Eval in case that a Project relies on the missing attribute. There could be other plan nodes that rely on the missing attribute. 2. Even for Projects, we only insert an Eval in case we squarely project for the field - in case of aliases (e.g. from RENAME), we do nothing. 3. In case of multiple Projects that use this attribute, we create multiple attributes with the original field attribute's id, causing a wrong Layout. This triggered elastic#121754. * Revive logic for EsRelation instead of Project * Update LocalLogicalPlanOptimizerTests * Update test expectations * Do not prune attributes from EsRelation This can lead to empty output, which leads to the EsRelation being replaced by a LocalRelation with 0 rows. * Add tests + capability * Add comments * [CI] Auto commit changes from spotless * Update docs/changelog/125764.yaml --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co> (cherry picked from commit 96ca13a) # Conflicts: # x-pack/plugin/esql/qa/testFixtures/src/main/resources/lookup-join.csv-spec # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/local/ReplaceMissingFieldWithNull.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/Layout.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LocalLogicalPlanOptimizerTests.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LocalPhysicalPlanOptimizerTests.java
elasticsearchmachine
pushed a commit
that referenced
this pull request
Apr 3, 2025
… (#126186) * Revert changes to Layout.java The change in 80125a4 is a quick fix and allows breaking an invariant of Layout. Revert that. * Simplify ReplaceMissingFieldWithNull When encountering projections, it tries to do the job of field extraction for missing fields by injecting an Eval that creates a literal null with the same name id as the field attribute for the missing field. This is wrong: 1. We only insert an Eval in case that a Project relies on the missing attribute. There could be other plan nodes that rely on the missing attribute. 2. Even for Projects, we only insert an Eval in case we squarely project for the field - in case of aliases (e.g. from RENAME), we do nothing. 3. In case of multiple Projects that use this attribute, we create multiple attributes with the original field attribute's id, causing a wrong Layout. This triggered #121754. * Revive logic for EsRelation instead of Project * Update LocalLogicalPlanOptimizerTests * Update test expectations * Do not prune attributes from EsRelation This can lead to empty output, which leads to the EsRelation being replaced by a LocalRelation with 0 rows. * Add tests + capability * Add comments * [CI] Auto commit changes from spotless * Update docs/changelog/125764.yaml --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co> (cherry picked from commit 96ca13a) # Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
elasticsearchmachine
pushed a commit
that referenced
this pull request
Apr 3, 2025
* [8.18] ESQL: ESQL: Fix ReplaceMissingFieldsWithNull (#125764) (#126166) * Revert changes to Layout.java The change in 80125a4 is a quick fix and allows breaking an invariant of Layout. Revert that. * Simplify ReplaceMissingFieldWithNull When encountering projections, it tries to do the job of field extraction for missing fields by injecting an Eval that creates a literal null with the same name id as the field attribute for the missing field. This is wrong: 1. We only insert an Eval in case that a Project relies on the missing attribute. There could be other plan nodes that rely on the missing attribute. 2. Even for Projects, we only insert an Eval in case we squarely project for the field - in case of aliases (e.g. from RENAME), we do nothing. 3. In case of multiple Projects that use this attribute, we create multiple attributes with the original field attribute's id, causing a wrong Layout. This triggered #121754. * Revive logic for EsRelation instead of Project * Update LocalLogicalPlanOptimizerTests * Update test expectations * Do not prune attributes from EsRelation This can lead to empty output, which leads to the EsRelation being replaced by a LocalRelation with 0 rows. * Add tests + capability * Add comments * [CI] Auto commit changes from spotless * Update docs/changelog/125764.yaml --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co> (cherry picked from commit 96ca13a) # Conflicts: # x-pack/plugin/esql/qa/testFixtures/src/main/resources/lookup-join.csv-spec # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/local/ReplaceMissingFieldWithNull.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/Layout.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LocalLogicalPlanOptimizerTests.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LocalPhysicalPlanOptimizerTests.java * Re-instate fix for LOOKUP JOIN, update tests
andreidan
pushed a commit
to andreidan/elasticsearch
that referenced
this pull request
Apr 9, 2025
* Revert changes to Layout.java The change in 80125a4 is a quick fix and allows breaking an invariant of Layout. Revert that. * Simplify ReplaceMissingFieldWithNull When encountering projections, it tries to do the job of field extraction for missing fields by injecting an Eval that creates a literal null with the same name id as the field attribute for the missing field. This is wrong: 1. We only insert an Eval in case that a Project relies on the missing attribute. There could be other plan nodes that rely on the missing attribute. 2. Even for Projects, we only insert an Eval in case we squarely project for the field - in case of aliases (e.g. from RENAME), we do nothing. 3. In case of multiple Projects that use this attribute, we create multiple attributes with the original field attribute's id, causing a wrong Layout. This triggered elastic#121754. * Revive logic for EsRelation instead of Project * Update LocalLogicalPlanOptimizerTests * Update docs/changelog/125764.yaml * Update test expectations * Do not prune attributes from EsRelation This can lead to empty output, which leads to the EsRelation being replaced by a LocalRelation with 0 rows. * Add tests + capability * Update docs/changelog/125764.yaml * Add comments
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


Fix #126030
Fix #126036
Fix #121754
There are multiple problems with
ReplaceMissingFieldsWithNull:When encountering projections, it tries to do the job of field extraction for missing fields by injecting an Eval that creates a literal null with the same name id as the field attribute for the missing field. But:
There are multiple ways of dealing with this:
InsertFieldExtraction. This will be correct and simple, but we will miss out on important logical optimizations, namely turning theEsRelationinto aLocalRelationin case all fields are missing, or propagating the literal nulls created in place of the missing field (enabling more constant folding etc.).InsertFieldExtraction, but in the local logical optimizer. In addition to the duplicated logic, this has a risk of accidentally messing up the ordering of the output fields (Evals place their attributes always on the right hand side), which could cause serious problems unless there's a Project/Stats somewhere downstream (which is probably always given, but still).This PR uses approach 1.ii. (Approach 1.i would also be possible and is arguably more natural. It would require double checking that we don't miss out on some optimizations, though.)