Propagate converted fields through projections#137923
Conversation
|
Hi @MattAlp, I've created a changelog YAML for you. |
There was a problem hiding this comment.
Thank you @MattAlp ! Can we add some LogicalPlanOptimizerTests and CsvTests to validate this change? The original issue errors out in logical plan's verifier, it will be helpful to validate the plan after logical planner is as expected and we get expected results. I tried this query below, it failed on main, and run to completion on the branch, however it seems like we get an extra field returned.
+ curl -u elastic:password -v -H 'Content-Type: application/json' '127.0.0.1:9200/_query?format=txt' -d '{
"query": "from sample_data_* | keep @timestamp, message | mv_expand @timestamp | eval message = message::keyword"
}
'
@timestamp |$$message$converted_to$keyword| message
------------------------+------------------------------+---------------------
2023-10-23T12:15:03.360Z|Connected to 10.1.0.3 |Connected to 10.1.0.3
2023-10-23T12:27:28.948Z|Connected to 10.1.0.2 |Connected to 10.1.0.2
2023-10-23T13:33:34.937Z|Disconnected |Disconnected
2023-10-23T13:51:54.732Z|Connection error |Connection error
2023-10-23T13:52:55.015Z|Connection error |Connection error
2023-10-23T13:53:55.832Z|Connection error |Connection error
2023-10-23T13:55:01.543Z|Connected to 10.1.0.1 |Connected to 10.1.0.1
2023-10-23T12:15:03.360Z|Connected to 10.1.0.3 |Connected to 10.1.0.3
2023-10-23T12:27:28.948Z|Connected to 10.1.0.2 |Connected to 10.1.0.2
2023-10-23T13:33:34.937Z|Disconnected |Disconnected
2023-10-23T13:51:54.732Z|Connection error |Connection error
2023-10-23T13:52:55.015Z|Connection error |Connection error
2023-10-23T13:53:55.832Z|Connection error |Connection error
2023-10-23T13:55:01.543Z|Connected to 10.1.0.1 |Connected to 10.1.0.1
bpintea
left a comment
There was a problem hiding this comment.
Once we get the right solution, we should add integration tests too: adding to union-types.csv-spec might be the easiest (there are also yaml ones in 160_union_types.yml).
x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/AnalyzerTests.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/AnalyzerTests.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/AnalyzerTests.java
Show resolved
Hide resolved
| } | ||
| return esr; | ||
| }); | ||
| res = res.transformUp(Project.class, p -> { |
There was a problem hiding this comment.
It seems that just adding the missing synthetic attribute won't be enough.
What happens is that the Project still contains FieldAttribute(s) containing an InvalidMappedField that the UnionTypesCleanup rule converts to an UnsupportedAttribute. Because of this, the resulting plan is not resolved(). Which means that the rule will not add a top Project to drop the synthetic attributes. So we end up outputting more attributes than we need to.
I think we will need to:
- [either] not only add the synthetic attributes to the Project (as done so far), but drop the original attribute corresponding to to the synthetic one. This should be safe, since no other command is allowed to work with this attribute before conversion; but I'm not sure if the verification of the plan will remain safe then and we'll put out the correct error message to the user, in case the query does touch the attribute before conversion (the existing tests should reveal that); an alternative would be to:
- [or] add a
Projectbefore the convertingEVALthat drops the original attribute corresponding to to the synthetic one.
There was a problem hiding this comment.
Because of this, the resulting plan is not resolved()
Wow, that'd be horrific! But actually, it's not the case, but it's rightfully confusing. The only plan where an UnsupportedAttribute can occur in the expressions should be EsqlProject; and that one specifically considers UnsupportedAttributes okay:
There was a problem hiding this comment.
not only add the synthetic attributes to the Project (as done so far), but drop the original attribute corresponding to to the synthetic one
Hm, we originally left the original attributes with the mapping conflicts untouched, so we don't mess with the validation in the verifier; removing the attribute will give an inconsistent plan to the verifier, which is why adding the projection on top should be safer and easier.
I think just adding the synth attribute to the projection should work.
There was a problem hiding this comment.
Leaving a note here regarding some test cases in which one does want to retain unsupported values: #135547
There was a problem hiding this comment.
@bpintea introduced the super nice ResolvingProjects in a recent PR. These represent projects that aren't finalized yet, and keep track of patterns; e.g. KEEP foo* may right now resolve to Project[foo1, foo2], but adding a foo3 upstream would not have an effect because foo3 would be thrown away. A ResolvingProject[foo*], however, gets resolved at the end of the analyzer run and becomes a Project[foo1, foo2, foo3] if foo3 was added upstream.
These ResolvingProjects could also solve this problem here if we ever end up being unhappy with the current solution. A Project[mapping_conflict_field] could become a ResolvingProject[mapping_conflict_field, $$mapping_conflict_field$converted_to$*].
There was a problem hiding this comment.
Solution looks correct to me, thanks @MattAlp !
You got some suggestions on added tests - I agree these make sense adding before merging!
@bpintea 's comment here is important. I'm quite convinced that we're fine, but that's a great rabbit hole to dig into to understand why we're, indeed, fine. It'd be cool if you could have a look at this, and add some comments in places that would make this easier to understand for our future selves. Also, there are existing csv tests for union types that ensure that the synthetic attributes don't slip through - maybe have a look at them if you want to run the debugger and confirm what's happening.
What I can say so far is:
- plans must be resolved after passing the verifier, otherwise we cannot e.g. call
.datatype()on expressions and that blows up the optimizer. - UnsupportedAttribute is actually Unresolvable. Generally, that means that plans and expressions containing an UnsupportedAttribute will be consider unresolved.
This makes it so that using it in expressions will generally trigger an error message from the verifier; that's what we want. - Using UnsupportedAttributes is okay in EsqlProject, and the verifier has an exception to allow it. That's because it's fine to use a field with mapping conflicts in KEEP and DROP (but not elsewhere, and not in RENAME).
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java
Outdated
Show resolved
Hide resolved
| IndexResolution resolution = IndexResolution.valid(index); | ||
| Analyzer analyzer = analyzer(resolution); | ||
|
|
||
| String query = "FROM union_index_combined | KEEP id, foo | MV_EXPAND foo | EVAL id = id::keyword"; |
There was a problem hiding this comment.
note: the analyzer doesn't push down projections, yet, so the MV_EXPAND shouldn't be required for making a test.
There was a problem hiding this comment.
I kept it in to be faithful to the original issue & for the future, but good shout!
…analysis/AnalyzerTests.java Co-authored-by: Bogdan Pintea <sig11@mailbox.org>
…analysis/AnalyzerTests.java Co-authored-by: Bogdan Pintea <sig11@mailbox.org>
Keep synthetics if they aren't aliased, keep unsupported attributed if they're explicitly retained, drop either if shadowing (with optional casting) occurs
…ields-thru-projections
Minor divergence due to ~1mo of staleness
ℹ️ Important: Docs version tagging👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version. We use applies_to tags to mark version-specific features and changes. Expand for a quick overviewWhen to use applies_to tags:✅ At the page level to indicate which products/deployments the content applies to (mandatory) What NOT to do:❌ Don't remove or replace information that applies to an older version 🤔 Need help?
|
|
I haven't added many comments inline as part of this PR - I'd actually like to add them to the internal & external docs alongside some other notes I've been taking while working on planner PRs incl. this one. |
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
|
|
...gin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java
Show resolved
Hide resolved
...gin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java
Show resolved
Hide resolved
julian-elastic
left a comment
There was a problem hiding this comment.
Overall looks good, thank you for taking the time to explain the fix to me! I left a few comments about additional tests to make sure we have handled all cases.
| } | ||
| return esr; | ||
| }); | ||
| res = res.transformUp(Project.class, p -> { |
There was a problem hiding this comment.
@bpintea introduced the super nice ResolvingProjects in a recent PR. These represent projects that aren't finalized yet, and keep track of patterns; e.g. KEEP foo* may right now resolve to Project[foo1, foo2], but adding a foo3 upstream would not have an effect because foo3 would be thrown away. A ResolvingProject[foo*], however, gets resolved at the end of the analyzer run and becomes a Project[foo1, foo2, foo3] if foo3 was added upstream.
These ResolvingProjects could also solve this problem here if we ever end up being unhappy with the current solution. A Project[mapping_conflict_field] could become a ResolvingProject[mapping_conflict_field, $$mapping_conflict_field$converted_to$*].
| } | ||
| return esr; | ||
| }); | ||
| res = res.transformUp(Project.class, p -> { |
There was a problem hiding this comment.
nit: we could check if this transformUp call is even necessary. If no fields were added to the EsRelation, then we don't have any Projects that would need fixing.
Addresses #137187.
Reviewers should check