ESQL: Enable physical plan verification#118114
ESQL: Enable physical plan verification#118114elasticsearchmachine merged 9 commits intoelastic:mainfrom
Conversation
|
Hi @bpintea, I've created a changelog YAML for you. |
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
Yes, will "fixes" it here, thanks. (I actually started with verifying plans from fragments after being replanned. But the spacial aggregations makes it tricky, so stripped that out for now and what remains could address this issue, true.) |
costin
left a comment
There was a problem hiding this comment.
Nice - thanks for looking into this and cleaning this up, especially the AggregateMapper.
LGTM
| if (p instanceof AggregateExec agg) { | ||
| var ordinalAttributes = agg.ordinalAttributes(); | ||
| missing.removeAll(Expressions.references(ordinalAttributes)); |
| List<Attribute> orginalAttributs = new ArrayList<>(groupings.size()); | ||
| if (groupings().size() == 1) { | ||
| // CATEGORIZE requires the standard hash aggregator as well. | ||
| if (groupings.get(0).anyMatch(e -> e instanceof Categorize) == false) { |
There was a problem hiding this comment.
| if (groupings.get(0).anyMatch(e -> e instanceof Categorize) == false) { | |
| if (groupings.get(0).noneMatch(e -> e instanceof Categorize)) { |
I think this could also be merged with above if to minimize nesting.
There was a problem hiding this comment.
groupings is a list of Expression (not a stream), so won't have noneMatch(). But I think refuting a predicate is quite common, adding a noneMatch() would actually make sense. Will leave this as is for now, but will follow up.
I think this could also be merged with above if to minimize nesting.
I've only grafted this patch, so I can only guess that the intention might have been to allow for more exceptions (or easier comments). But maybe this can be now merged and split again when needed.
| if (groupings.get(0).anyMatch(e -> e instanceof Categorize) == false) { | ||
| var leaves = new LinkedList<>(); | ||
| // TODO: this seems out of place | ||
| aggregates.stream().filter(a -> groupings.contains(a) == false).forEach(a -> leaves.addAll(a.collectLeaves())); |
There was a problem hiding this comment.
Lets use collect instead of addAll in forEach.
Since it is used for .contains check below, should it collect to set instead of the list?
There was a problem hiding this comment.
Good point, turned it into a Set.
Lets use collect instead of addAll in forEach.
Yeh, that'd be an alternative, but with stream chaining and the line breaking, it wouldn't be more succint than current format, so I've left as is.
💚 Backport successful
|
This enables the physical plan verification. For it, a couple of changes needed to be applied/corrected: * AggregateMapper creates attributes with unique names; * AggregateExec's verification needs not consider ordinal attribute(s); * LookupJoinExec needs to merge attributes of same name at output, "winning" the right child; * ExchangeExec does no input referencing, since it only outputs all synthetic attributes, "sourced" from remote exchanges; * FieldExtractExec doesn't reference the attributes it "produces".
| var addedFieldsNames = addedFields.stream().map(Attribute::name).toList(); | ||
| lazyOutput.removeIf(a -> addedFieldsNames.contains(a.name())); | ||
| lazyOutput.addAll(addedFields); |
There was a problem hiding this comment.
Oh, this is very interesting. Thanks for catching this.
I'm wondering if enforcing unique names for PhysicalPlan outputs is actually the right way to go. It is what we assumed so far, so I don't wanna depart from here, but there's also a reason to be more lenient:
In PhysicalPlan land, this output method was technically correct, because it represented exactly what the physical operators will do: they're gonna append the blocks for the added fields to the pages that are processed. The blocks corresponding to shadowed attributes/channels will not be stripped from incoming pages.
This means that the output layout technically has duplicate attribute names. This change just hides them and makes them inaccessible during the physical planning stage.
FWIW, EvalExec has the exact same situation. Eval performs shadowing, but EvalOperator can only ever append blocks, so the output pages actually still contain the blocks corresponding to shadowed attributes.
alex-spies
left a comment
There was a problem hiding this comment.
Late to the party, but had to take a look anyway because this will interact with LOOKUP JOIN a little.
Just wanted to say thank you again and LGTM!
) * ESQL: Enable physical plan verification (#118114) This enables the physical plan verification. For it, a couple of changes needed to be applied/corrected: * AggregateMapper creates attributes with unique names; * AggregateExec's verification needs not consider ordinal attribute(s); * LookupJoinExec needs to merge attributes of same name at output, "winning" the right child; * ExchangeExec does no input referencing, since it only outputs all synthetic attributes, "sourced" from remote exchanges; * FieldExtractExec doesn't reference the attributes it "produces". * ESQL: Disable remote enrich verification (#118534) This disables verifying the plans generated for remote ENRICHing. It also re-enables corresponding failing test. Related: #118531 Fixes #118307. (cherry picked from commit e7a4436)
…#118534) (elastic#118302) * ESQL: Enable physical plan verification (elastic#118114) This enables the physical plan verification. For it, a couple of changes needed to be applied/corrected: * AggregateMapper creates attributes with unique names; * AggregateExec's verification needs not consider ordinal attribute(s); * LookupJoinExec needs to merge attributes of same name at output, "winning" the right child; * ExchangeExec does no input referencing, since it only outputs all synthetic attributes, "sourced" from remote exchanges; * FieldExtractExec doesn't reference the attributes it "produces". * ESQL: Disable remote enrich verification (elastic#118534) This disables verifying the plans generated for remote ENRICHing. It also re-enables corresponding failing test. Related: elastic#118531 Fixes elastic#118307. (cherry picked from commit e7a4436)
…#118534) (elastic#118302) * ESQL: Enable physical plan verification (elastic#118114) This enables the physical plan verification. For it, a couple of changes needed to be applied/corrected: * AggregateMapper creates attributes with unique names; * AggregateExec's verification needs not consider ordinal attribute(s); * LookupJoinExec needs to merge attributes of same name at output, "winning" the right child; * ExchangeExec does no input referencing, since it only outputs all synthetic attributes, "sourced" from remote exchanges; * FieldExtractExec doesn't reference the attributes it "produces". * ESQL: Disable remote enrich verification (elastic#118534) This disables verifying the plans generated for remote ENRICHing. It also re-enables corresponding failing test. Related: elastic#118531 Fixes elastic#118307. (cherry picked from commit e7a4436)
This enables the physical plan verification.
For it, a couple of changes needed to be applied/corrected:
Fixes #105436.