Skip to content

BlockSourceReader should always apply source filtering#136438

Merged
martijnvg merged 26 commits intoelastic:mainfrom
martijnvg:BlockSourceReader_source_filtering
Oct 16, 2025
Merged

BlockSourceReader should always apply source filtering#136438
martijnvg merged 26 commits intoelastic:mainfrom
martijnvg:BlockSourceReader_source_filtering

Conversation

@martijnvg
Copy link
Member

@martijnvg martijnvg commented Oct 10, 2025

In order to avoid a performance bottleneck when loading values for synthetic source source based block loaders should use source filtering.

This change is mostly mechanical and pushes field names and ignore source format to StoredFieldSpec.

Main changes:

  • Added sourcePaths to StoredFieldsSpec. This allows source loaders to apply a SourceFilter.
  • Update compute engine ShardContext#newSourceLoader(...) method signature to accepts a set of fields to include.
  • Update the EsPhysicalOperationProviders.DefaultShardContext class to create a SourceFilter and push that down to SearchExecutionContext#newSourceLoader(...).
  • Update the BlockSourceReader.SourceBlockLoader#rowStrideStoredFieldSpec(...) implementations to delegate to ValueFetcher#storedFieldsSpec(...) to fetch the stored field specification.
  • Make ArraySourceValueFetcher and SourceValueFetcher return stored field spec with sourcePaths set to its own configured sourcePaths.

The main advantage of delegating to ValueFetcher#storedFieldsSpec(...) is that the source paths are already know there. Only ignored source format needed to be added as field to ArraySourceValueFetcher and SourceValueFetcher. Additionally in a follow up change, the search APIs can now also opt into use source loader to uses a SourceFilter.

The StoredFieldsSpec#sourcePaths is only used in compute engine to build a SourceFilter and not elsewhere. So other usages of ArraySourceValueFetcher and SourceValueFetcher will just ignore the source paths.

Set<String> requiredStoredFields,
IgnoredFieldsSpec ignoredFieldsSpec
IgnoredFieldsSpec ignoredFieldsSpec,
Set<String> sourcePaths
Copy link
Member Author

@martijnvg martijnvg Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is overlap with IgnoredFieldsSpec.requiredIgnoredFields and sourcePaths and could be unified. But IgnoredFieldsSpec is only used with synthetic source and this change also works for stored source.

In a followup we should try to merge IgnoredFieldsSpec ignoredFieldsSpec with StoredFieldSpec. Not doing this in this PR now, as it would make this PR larger.

@martijnvg martijnvg changed the title BlockSourceReader should apply source filtering Oct 14, 2025
@martijnvg martijnvg added :Analytics/Compute Engine Analytics in ES|QL :StorageEngine/Mapping The storage related side of mappings >enhancement labels Oct 14, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @martijnvg, I've created a changelog YAML for you.

@martijnvg martijnvg force-pushed the BlockSourceReader_source_filtering branch from 410b9d9 to faf9b86 Compare October 14, 2025 11:19
@martijnvg martijnvg marked this pull request as ready for review October 14, 2025 12:55
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Oct 14, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor suggestion, but LGTM otherwise.

) {
public StoredFieldsSpec(boolean requiresSource, boolean requiresMetadata, Set<String> requiredStoredFields) {
this(requiresSource, requiresMetadata, requiredStoredFields, IgnoredFieldsSpec.NONE);
this(requiresSource, requiresMetadata, requiredStoredFields, IgnoredFieldsSpec.NONE, null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe pass Set.of() here instead? Needing to do a null check every time we read this field in future feels like a recipe for NPEs in annoying to track down places. And it brings it into line with the existing requiredStoredFields behaviour.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense, I will update this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed: 43ad340

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@martijnvg martijnvg merged commit 756bc9d into elastic:main Oct 16, 2025
34 checks passed
Kubik42 pushed a commit to Kubik42/elasticsearch that referenced this pull request Oct 16, 2025
In order to avoid a performance bottleneck when loading values for synthetic source source based block loaders should use source filtering.
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Jan 14, 2026
In order to avoid a performance bottleneck when loading values for synthetic source source based block loaders should use source filtering.
martijnvg added a commit that referenced this pull request Jan 15, 2026
* BlockSourceReader should always apply source filtering (#136438)

In order to avoid a performance bottleneck when loading values for synthetic source source based block loaders should use source filtering.

* Combine ignored fields spec with source paths (#136771)

With the recent addition of StoredFieldsSpec.sourcePaths, it is
redundant to have a separate property for ignored source spec.

This PR removes IgnoredSourceSpec and uses StoredFieldsSpec.sourcePaths
instead.

* Fix StoredFieldsSpec#merge(...) issue (#138306)

If StoredFieldsSpec requires source and has no sourcePaths, then it requires the whole source. For example, in case of `SourceFieldBlockLoader`. When merging StoredFieldsSpec then omit the sourcePaths of the other StoredFieldsSpec.

Co-authored-by: Jordan Powers <jordan.powers@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/Compute Engine Analytics in ES|QL >enhancement :StorageEngine/Mapping The storage related side of mappings Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:StorageEngine v9.2.5 v9.3.0

3 participants