BlockSourceReader should always apply source filtering#136438
BlockSourceReader should always apply source filtering#136438martijnvg merged 26 commits intoelastic:mainfrom
Conversation
| Set<String> requiredStoredFields, | ||
| IgnoredFieldsSpec ignoredFieldsSpec | ||
| IgnoredFieldsSpec ignoredFieldsSpec, | ||
| Set<String> sourcePaths |
There was a problem hiding this comment.
There is overlap with IgnoredFieldsSpec.requiredIgnoredFields and sourcePaths and could be unified. But IgnoredFieldsSpec is only used with synthetic source and this change also works for stored source.
In a followup we should try to merge IgnoredFieldsSpec ignoredFieldsSpec with StoredFieldSpec. Not doing this in this PR now, as it would make this PR larger.
|
Hi @martijnvg, I've created a changelog YAML for you. |
410b9d9 to
faf9b86
Compare
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
romseygeek
left a comment
There was a problem hiding this comment.
One minor suggestion, but LGTM otherwise.
| ) { | ||
| public StoredFieldsSpec(boolean requiresSource, boolean requiresMetadata, Set<String> requiredStoredFields) { | ||
| this(requiresSource, requiresMetadata, requiredStoredFields, IgnoredFieldsSpec.NONE); | ||
| this(requiresSource, requiresMetadata, requiredStoredFields, IgnoredFieldsSpec.NONE, null); |
There was a problem hiding this comment.
Maybe pass Set.of() here instead? Needing to do a null check every time we read this field in future feels like a recipe for NPEs in annoying to track down places. And it brings it into line with the existing requiredStoredFields behaviour.
There was a problem hiding this comment.
This makes sense, I will update this PR.
In order to avoid a performance bottleneck when loading values for synthetic source source based block loaders should use source filtering.
In order to avoid a performance bottleneck when loading values for synthetic source source based block loaders should use source filtering.
* BlockSourceReader should always apply source filtering (#136438) In order to avoid a performance bottleneck when loading values for synthetic source source based block loaders should use source filtering. * Combine ignored fields spec with source paths (#136771) With the recent addition of StoredFieldsSpec.sourcePaths, it is redundant to have a separate property for ignored source spec. This PR removes IgnoredSourceSpec and uses StoredFieldsSpec.sourcePaths instead. * Fix StoredFieldsSpec#merge(...) issue (#138306) If StoredFieldsSpec requires source and has no sourcePaths, then it requires the whole source. For example, in case of `SourceFieldBlockLoader`. When merging StoredFieldsSpec then omit the sourcePaths of the other StoredFieldsSpec. Co-authored-by: Jordan Powers <jordan.powers@elastic.co>
In order to avoid a performance bottleneck when loading values for synthetic source source based block loaders should use source filtering.
This change is mostly mechanical and pushes field names and ignore source format to
StoredFieldSpec.Main changes:
sourcePathstoStoredFieldsSpec. This allows source loaders to apply aSourceFilter.ShardContext#newSourceLoader(...)method signature to accepts a set of fields to include.EsPhysicalOperationProviders.DefaultShardContextclass to create aSourceFilterand push that down toSearchExecutionContext#newSourceLoader(...).BlockSourceReader.SourceBlockLoader#rowStrideStoredFieldSpec(...)implementations to delegate toValueFetcher#storedFieldsSpec(...)to fetch the stored field specification.ArraySourceValueFetcherandSourceValueFetcherreturn stored field spec with sourcePaths set to its own configured sourcePaths.The main advantage of delegating to
ValueFetcher#storedFieldsSpec(...)is that the source paths are already know there. Only ignored source format needed to be added as field toArraySourceValueFetcherandSourceValueFetcher. Additionally in a follow up change, the search APIs can now also opt into use source loader to uses aSourceFilter.The
StoredFieldsSpec#sourcePathsis only used in compute engine to build aSourceFilterand not elsewhere. So other usages ofArraySourceValueFetcherandSourceValueFetcherwill just ignore the source paths.