ES|QL - Avoid retrieving unnecessary fields on node-reduce phase#137920
Conversation
… and original top level projection
GalLalouche
left a comment
There was a problem hiding this comment.
Hey @carlosdelest, thanks for taking the time to fix this! I left a couple of small comments but other LGTM. But you might one of the planning folks to take a look :)
| return Optional.empty(); | ||
| } | ||
|
|
||
| // Calculate the expected output attributes for the data driver plan. |
There was a problem hiding this comment.
Nit: redundant comment (You can rename the variable to expectedDataDriverOutputAttrs if you wanted to, or extract this code to a method if you think it warrants a header.)
...lugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/LateMaterializationPlanner.java
Outdated
Show resolved
Hide resolved
...lugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/LateMaterializationPlanner.java
Outdated
Show resolved
Hide resolved
...lugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/LateMaterializationPlanner.java
Outdated
Show resolved
Hide resolved
|
Thanks for reviewing @GalLalouche ! I'll definitely incorporate your suggestions, add some testing, and open up to more folks. 👍 |
…-materialization-unnecessary-fields
|
Hi @carlosdelest, I've created a changelog YAML for you. |
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
|
Hi @GalLalouche ! I've addressed your comments and added some testing. Can you please give this another review? |
| return Optional.empty(); | ||
| } | ||
|
|
||
| // Calculate the expected output attributes for the data driver plan. |
...lugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/LateMaterializationPlanner.java
Outdated
Show resolved
Hide resolved
LGTM, but I think it might warrant a look from someone from planning as well (I know @alex-spies had a lot of great comments on the original PR!). |
bpintea
left a comment
There was a problem hiding this comment.
Do we have any tests like suggested in the issue or the PR description?
| List<Attribute> expectedDataOutput = toPhysical(topN, context).output(); | ||
| Attribute doc = expectedDataOutput.stream().filter(EsQueryExec::isDocAttribute).findFirst().orElse(null); | ||
|
|
||
| List<Attribute> physicalDataOutput = toPhysical(topN, context).output(); |
There was a problem hiding this comment.
Nit/optional: no "data" is being involved so far, just it's characterisation.
| List<Attribute> physicalDataOutput = toPhysical(topN, context).output(); | |
| List<Attribute> physicalPlanOutput = toPhysical(topN, context).output(); |
or smth like "..dataNode..".
| AttributeSet.Builder expectedDataOutputAttrs = AttributeSet.builder(); | ||
| // We need to add the doc attribute to the project since otherwise when the fragment is converted to a physical plan for the data | ||
| // driver, the resulting ProjectExec won't have the doc attribute in its output, which is needed by the reduce driver. | ||
| expectedDataOutputAttrs.add(doc); |
There was a problem hiding this comment.
I'd drop this line and add the isDocAttribute condition on the filter below, since it's already present into the physicalDataOutput. (This might also simply a bit the constructs here.)
| AttributeSet orderRefsSet = AttributeSet.of(topN.order().stream().flatMap(o -> o.references().stream()).toList()); | ||
| // Get the output from the physical plan below the TopN, and filter it to only the attributes needed for the final output (either | ||
| // because they are in the top-level Project's output, or because they are needed for ordering) | ||
| expectedDataOutputAttrs.addAll( | ||
| physicalDataOutput.stream().filter(a -> topLevelProject.outputSet().contains(a) || orderRefsSet.contains(a)).toList() | ||
| ); | ||
| List<Attribute> expectedDataOutput = expectedDataOutputAttrs.build().stream().toList(); |
There was a problem hiding this comment.
Optional: this section is very streams-heavy and while they might make the code a bit more compact(?), for the few attributes that they'll likely usually handle, I'd just iterate on lists instead.
…-materialization-unnecessary-fields
…rialization-unnecessary-fields' into enhancement/esql-late-materialization-unnecessary-fields
@bpintea we have tests like that:
Are there any other tests you're missing? |
|
@elasticmachine run elasticsearch-ci/part-2 |
bpintea
left a comment
There was a problem hiding this comment.
Thanks Carlos.
Wrt test, I was thinking of something like adding a filter that's not projected (/projected away), like from test | where filtered > 0 | sort sorted | limit 42 | stats sum(read) and checking that in EsqlReductionLateMaterializationTestCase. If there's one like that already, all good.
alex-spies
left a comment
There was a problem hiding this comment.
This is great, thanks Carlos! It's essentially a specialized ProjectAwayColumns for the updated data node plan, which is nice and consistent with the planning that the coordinator does.
I agree with Bogdan: a test case or two in EsqlReductionLateMaterializationTestCase.java where only the updated data node plan requires a field, but it's correctly projected away at the end of the data node plan because the reduce plan doesn't need it - that'd be nice.
…-materialization-unnecessary-fields
|
Got it, thanks @bpintea and @alex-spies for explaining! I added more tests in c387814, LMKWYT! |
Closes #134363
Avoids retrieving unnecessary fields in
FieldExtractorExecon the node-reduce phase.This PR checks that the fields retrieved are either:
This way, a query like:
will not retrieve the
vectorfield on the node-reduce phase and will avoid loading it completely from source.