Add Optional Source Filtering to Source Loaders#113827
Conversation
Spinoff of elastic#113036. This change introduces optional source filtering directly within source loaders (both synthetic and stored). The main benefit is seen in synthetic source loaders, as synthetic fields are stored independently. By filtering while loading the synthetic source, generating the source becomes linear in the number of fields that match the filter. This update also modifies the get document API to apply source filters earlier—directly through the source loader. The search API, however, is not affected in this change, since the loaded source is still used by other features (e.g., highlighting, fields, nested hits), and source filtering is always applied as the final step. A follow-up will be required to ensure careful handling of all search-related scenarios.
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
|
Hi @jimczi, I've created a changelog YAML for you. |
.../impl/src/main/java/org/elasticsearch/xcontent/provider/XContentParserConfigurationImpl.java
Outdated
Show resolved
Hide resolved
| protected SourceLoader.SyntheticFieldLoader syntheticFieldLoader(Stream<Mapper> mappers, boolean isFragment) { | ||
| var fields = mappers.sorted(Comparator.comparing(Mapper::fullPath)) | ||
| .map(Mapper::syntheticFieldLoader) | ||
| SourceLoader.SyntheticFieldLoader syntheticFieldLoader(SourceFilter filter, Collection<Mapper> mappers, boolean isFragment) { |
There was a problem hiding this comment.
Nit: keep passing the stream to avoid reconverting the list to a stream?
I also wonder about the overhead here, shall we skip streams altogether and use good-old collections?
server/src/main/java/org/elasticsearch/index/mapper/ObjectMapper.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/ObjectMapper.java
Outdated
Show resolved
Hide resolved
|
Looks good overall, added Martijn and Sasha to get another look, since this is a core part of synthetic source. Do we have any benchmark results quantifying the impact of this? |
libs/x-content/src/main/java/org/elasticsearch/xcontent/XContentParserConfiguration.java
Outdated
Show resolved
Hide resolved
lkts
left a comment
There was a problem hiding this comment.
This is great, synthetic source changes look good to me. I wonder if there is any performance impact when filtering is not used compared to current state.
server/src/main/java/org/elasticsearch/index/mapper/XContentDataHelper.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/XContentDataHelper.java
Outdated
Show resolved
Hide resolved
|
@jimczi any chance you can get back to submitting this? It should also help with fetching fields from synthetic source in ES|QL, with proper block loader adjustments. |
|
@kkrik-es I have updated the branch and can proceed with merging this PR as is, provided there are no objections. Please let me know if you have any concerns or additional feedback. |
|
I think we've all approved, very excited to see it submitted. |
💔 Backport failed
You can use sqren/backport to manually backport by running |
This change introduces optional source filtering directly within source loaders (both synthetic and stored). The main benefit is seen in synthetic source loaders, as synthetic fields are stored independently. By filtering while loading the synthetic source, generating the source becomes linear in the number of fields that match the filter. This update also modifies the get document API to apply source filters earlier—directly through the source loader. The search API, however, is not affected in this change, since the loaded source is still used by other features (e.g., highlighting, fields, nested hits), and source filtering is always applied as the final step. A follow-up will be required to ensure careful handling of all search-related scenarios.
This change introduces optional source filtering directly within source loaders (both synthetic and stored). The main benefit is seen in synthetic source loaders, as synthetic fields are stored independently. By filtering while loading the synthetic source, generating the source becomes linear in the number of fields that match the filter. This update also modifies the get document API to apply source filters earlier—directly through the source loader. The search API, however, is not affected in this change, since the loaded source is still used by other features (e.g., highlighting, fields, nested hits), and source filtering is always applied as the final step. A follow-up will be required to ensure careful handling of all search-related scenarios.
Spinoff of #113036.
This change introduces optional source filtering directly within source loaders (both synthetic and stored). The main benefit is seen in synthetic source loaders, as synthetic fields are stored independently. By filtering while loading the synthetic source, generating the source becomes linear in the number of fields that match the filter.
This update also modifies the get document API to apply source filters earlier/directly through the source loader. The search API, however, is not affected in this change, since the loaded source is still used by other features (e.g., highlighting, fields, nested hits), and source filtering is always applied as the final step. A follow-up will be required to ensure careful handling of all search-related scenarios.