ESQL: Fix alias removal in regex extraction with JOIN#127687
ESQL: Fix alias removal in regex extraction with JOIN#127687astefan merged 16 commits intoelastic:mainfrom
JOIN#127687Conversation
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
|
buildkite test this |
There was a problem hiding this comment.
The solution is not optimal. Maybe it fixes the problem, but the canRemoveAliases references in multiple places makes the code hard to follow and reason about.
Instead, look at the code that is tied to lookup join and enrich where, after seeing this issue with dissect, it is clear that this code - p.forEachExpressionDown(Alias.class, alias -> { shouldn't only apply to Alias (which comes from eval commands), but also to whatever comes out from dissect (meaning, any user-defined columns).
Rather than using canRemoveAliases[0] in RegexExtract check, look into adjusting the code I mentioned above.
|
Thank you for your review and detailed guidance @astefan! Your keen eye for code goes far beyond mine, I still have a long journey ahead in learning and improving. I've updated the branch, please feel free to review it again at your convenience. If there are any further adjustments needed, I'm more than happy to make them. |
|
buildkite test this |
|
buildkite test this |
| AttributeSet planRefs = p.references(); | ||
| Set<String> fieldNames = planRefs.names(); | ||
| p.forEachExpressionDown(Alias.class, alias -> { | ||
| p.forEachExpressionDown(NamedExpression.class, expression -> { |
There was a problem hiding this comment.
Just a naming preference towards NamedExpression - ne that's used in other places in code.
| p.forEachExpressionDown(NamedExpression.class, expression -> { | |
| p.forEachExpressionDown(NamedExpression.class, ne-> { |
| ; | ||
|
|
||
|
|
||
| joinMaskingRegex |
There was a problem hiding this comment.
I am not a big fan of having such test queries here. At least, add a comment with the link to the original bug report.
| joinMaskingDissect | ||
| required_capability: join_lookup_v12 | ||
| required_capability: fix_join_masking_regex_extract | ||
| from sample_data |
There was a problem hiding this comment.
Add this test and the one below in IndexResolverFieldNamesTests as well.
There was a problem hiding this comment.
I think you missed this request here.
There was a problem hiding this comment.
Oh, sorry! I misunderstood what you meant. I was actually thinking about adding comments of links to the original issue.
| } | ||
| referencesBuilder.removeIf(attr -> matchByName(attr, alias.name(), keepCommandRefsBuilder.contains(attr))); | ||
| referencesBuilder.removeIf( | ||
| attr -> matchByName(attr, expression.name(), (expression instanceof Alias) && keepCommandRefsBuilder.contains(attr)) |
There was a problem hiding this comment.
Why this check here? (expression instanceof Alias)
There was a problem hiding this comment.
I didn't give it much thought since skipIfPattern was set to false for RegexExtract. After considering it briefly, I think removing this check wouldn't hurt too much.
|
buildkite test this |
|
buildkite test this |
…emoval # Conflicts: # x-pack/plugin/esql/qa/server/src/main/java/org/elasticsearch/xpack/esql/qa/rest/generative/GenerativeRestTest.java # x-pack/plugin/esql/qa/testFixtures/src/main/resources/lookup-join.csv-spec # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java
| /** | ||
| * During resolution (pre-analysis) we have to consider that joins can override regex extracted values | ||
| * see <a href="https://github.com/elastic/elasticsearch/issues/127467"> ES|QL: pruning of JOINs leads to missing fields #127467 </a> | ||
| */ | ||
| FIX_JOIN_MASKING_REGEX_EXTRACT, | ||
|
|
There was a problem hiding this comment.
Please, add this capability at the end of the list of capabilities. It will be slightly easier for me to manually backport the PR, if the automatic backport fails.
|
When you get the chance, please integrate new changes from |
|
Thanks @astefan. The branch has been updated. |
|
buildkite test this |
1 similar comment
|
buildkite test this |
💔 Backport failed
You can use sqren/backport to manually backport by running |
|
Thank you @astefan! |
* Disallow removal of regex extracted fields --------- Co-authored-by: Andrei Stefan <astefan@users.noreply.github.com> Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co> (cherry picked from commit 557f1f1)
* Disallow removal of regex extracted fields --------- Co-authored-by: Andrei Stefan <astefan@users.noreply.github.com> Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co> (cherry picked from commit 557f1f1)
…28204) * ESQL: Fix alias removal in regex extraction with `JOIN` (#127687) * Disallow removal of regex extracted fields --------- Co-authored-by: Andrei Stefan <astefan@users.noreply.github.com> Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co> (cherry picked from commit 557f1f1) * Checkstyle --------- Co-authored-by: kanoshiou <73424326+kanoshiou@users.noreply.github.com> Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
…) (elastic#128204) * Disallow removal of regex extracted fields
Because we aim to minimize the number of attributes required from
field_capsat pre-analysis time, a removal check helps eliminate fields defined by users (such aseval). However, this removal logic does not apply toReferenceAttributeproduced by regex extraction commandsgrokanddissect, even though these fields are also user-defined.Closes #127467