Skip to content

ES|QL - Avoid retrieving unnecessary fields on node-reduce phase#137920

Merged
carlosdelest merged 17 commits intoelastic:mainfrom
carlosdelest:enhancement/esql-late-materialization-unnecessary-fields
Nov 27, 2025
Merged

ES|QL - Avoid retrieving unnecessary fields on node-reduce phase#137920
carlosdelest merged 17 commits intoelastic:mainfrom
carlosdelest:enhancement/esql-late-materialization-unnecessary-fields

Conversation

@carlosdelest
Copy link
Member

@carlosdelest carlosdelest commented Nov 11, 2025

Closes #134363

Avoids retrieving unnecessary fields in FieldExtractorExec on the node-reduce phase.

This PR checks that the fields retrieved are either:

  • Present on the TopN operator on the node-reduce phase, or
  • Needed on the top level Project for the node-reduce phase

This way, a query like:

FROM test METADATA _score
| WHERE knn(vector, [0, 1, 2])
| SORT _score DESC
| LIMIT 10
| KEEP id, _score

will not retrieve the vector field on the node-reduce phase and will avoid loading it completely from source.

@carlosdelest carlosdelest added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch :Search Relevance/ES|QL Search functionality in ES|QL labels Nov 11, 2025
Copy link
Contributor

@GalLalouche GalLalouche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @carlosdelest, thanks for taking the time to fix this! I left a couple of small comments but other LGTM. But you might one of the planning folks to take a look :)

return Optional.empty();
}

// Calculate the expected output attributes for the data driver plan.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: redundant comment (You can rename the variable to expectedDataDriverOutputAttrs if you wanted to, or extract this code to a method if you think it warrants a header.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unaddressed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did that as part of Addressed in d033d14

@carlosdelest
Copy link
Member Author

Thanks for reviewing @GalLalouche ! I'll definitely incorporate your suggestions, add some testing, and open up to more folks. 👍

elasticsearchmachine added 2 commits November 18, 2025 07:43
@carlosdelest carlosdelest changed the title PoC - Avoid retrieving unnecessary fields on node-reduce phase Nov 20, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @carlosdelest, I've created a changelog YAML for you.

@carlosdelest carlosdelest marked this pull request as ready for review November 21, 2025 06:33
@carlosdelest carlosdelest changed the title Avoid retrieving unnecessary fields on node-reduce phase Nov 21, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@carlosdelest
Copy link
Member Author

Hi @GalLalouche ! I've addressed your comments and added some testing. Can you please give this another review?

@carlosdelest carlosdelest requested a review from a team November 24, 2025 08:19
return Optional.empty();
}

// Calculate the expected output attributes for the data driver plan.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unaddressed.

@GalLalouche
Copy link
Contributor

Hi @GalLalouche ! I've addressed your comments and added some testing. Can you please give this another review?

LGTM, but I think it might warrant a look from someone from planning as well (I know @alex-spies had a lot of great comments on the original PR!).

Copy link
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any tests like suggested in the issue or the PR description?

List<Attribute> expectedDataOutput = toPhysical(topN, context).output();
Attribute doc = expectedDataOutput.stream().filter(EsQueryExec::isDocAttribute).findFirst().orElse(null);

List<Attribute> physicalDataOutput = toPhysical(topN, context).output();
Copy link
Contributor

@bpintea bpintea Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit/optional: no "data" is being involved so far, just it's characterisation.

Suggested change
List<Attribute> physicalDataOutput = toPhysical(topN, context).output();
List<Attribute> physicalPlanOutput = toPhysical(topN, context).output();

or smth like "..dataNode..".

Copy link
Member Author

@carlosdelest carlosdelest Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in d033d14

AttributeSet.Builder expectedDataOutputAttrs = AttributeSet.builder();
// We need to add the doc attribute to the project since otherwise when the fragment is converted to a physical plan for the data
// driver, the resulting ProjectExec won't have the doc attribute in its output, which is needed by the reduce driver.
expectedDataOutputAttrs.add(doc);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd drop this line and add the isDocAttribute condition on the filter below, since it's already present into the physicalDataOutput. (This might also simply a bit the constructs here.)

Copy link
Member Author

@carlosdelest carlosdelest Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, addressed in d033d14

Comment on lines 125 to 131
AttributeSet orderRefsSet = AttributeSet.of(topN.order().stream().flatMap(o -> o.references().stream()).toList());
// Get the output from the physical plan below the TopN, and filter it to only the attributes needed for the final output (either
// because they are in the top-level Project's output, or because they are needed for ordering)
expectedDataOutputAttrs.addAll(
physicalDataOutput.stream().filter(a -> topLevelProject.outputSet().contains(a) || orderRefsSet.contains(a)).toList()
);
List<Attribute> expectedDataOutput = expectedDataOutputAttrs.build().stream().toList();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: this section is very streams-heavy and while they might make the code a bit more compact(?), for the few attributes that they'll likely usually handle, I'd just iterate on lists instead.

Copy link
Member Author

@carlosdelest carlosdelest Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++, ab1c94b

elasticsearchmachine added 5 commits November 25, 2025 18:13
…rialization-unnecessary-fields' into enhancement/esql-late-materialization-unnecessary-fields
@carlosdelest
Copy link
Member Author

Do we have any tests like suggested in the issue or the PR description?

@bpintea we have tests like that:

Are there any other tests you're missing?

@carlosdelest carlosdelest requested a review from bpintea November 25, 2025 17:27
@carlosdelest
Copy link
Member Author

@elasticmachine run elasticsearch-ci/part-2

Copy link
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Carlos.
Wrt test, I was thinking of something like adding a filter that's not projected (/projected away), like from test | where filtered > 0 | sort sorted | limit 42 | stats sum(read) and checking that in EsqlReductionLateMaterializationTestCase. If there's one like that already, all good.

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thanks Carlos! It's essentially a specialized ProjectAwayColumns for the updated data node plan, which is nice and consistent with the planning that the coordinator does.

I agree with Bogdan: a test case or two in EsqlReductionLateMaterializationTestCase.java where only the updated data node plan requires a field, but it's correctly projected away at the end of the data node plan because the reduce plan doesn't need it - that'd be nice.

@carlosdelest
Copy link
Member Author

Got it, thanks @bpintea and @alex-spies for explaining!

I added more tests in c387814, LMKWYT!

@carlosdelest carlosdelest merged commit 0955a82 into elastic:main Nov 27, 2025
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >enhancement :Search Relevance/ES|QL Search functionality in ES|QL Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.3.0

5 participants