Do not respect synthetic_source_keep=arrays if type parses arrays#127796
Conversation
|
Hi @parkertimmins, I've created a changelog YAML for you. |
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
|
Should we add the |
|
Hi @parkertimmins, I've updated the changelog YAML for you. |
There was a problem hiding this comment.
I think offset-based implementation would definitely be a better solution than "do nothing" approach we currently have. We can still do this later though and this PR does not block that work. So i am in favour of merging this.
Should we add a note about this here https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/geo-point#geo-point-synthetic-source?
| && (fieldMapper.syntheticSourceMode() == FieldMapper.SyntheticSourceMode.FALLBACK | ||
| || sourceKeepMode == Mapper.SourceKeepMode.ALL | ||
| || (sourceKeepMode == Mapper.SourceKeepMode.ARRAYS && context.inArrayScope()) | ||
| || (sourceKeepMode == Mapper.SourceKeepMode.ARRAYS && context.inArrayScope() && parsesArrayValue(mapper) == false) |
There was a problem hiding this comment.
This is a correct fix in my opinion. But now that you mention offsets i actually wonder how does this work if offsets are enabled. It seems like (sourceKeepMode == Mapper.SourceKeepMode.ARRAYS && context.inArrayScope() is true and we'll proceed with storing the value in ignored source. But that seems wrong since we should be using offsets?
There was a problem hiding this comment.
If we enable offsets, parsesArrayValue will still be true for geo_point, and we won't use ignored source, right? I'm not totally sure if this is safe though. Maybe by adding offsets, geo_point will be subject to the same issue that other types have when they are in an array context. In which case we would want to use ignored_source.
There was a problem hiding this comment.
Right, i was asking about existing fields that implement offsets logic like ip. Does it not have the same problem?
There was a problem hiding this comment.
I think existing fields with offsets do have this issue, and need to be use stored values. For example, if
The followings:
curl -XPUT localhost:9200/idx7 -H 'Content-Type: application/json' -d'
{
"settings": {
"index": {
"mapping": {
"source": {
"mode": "synthetic"
}
}
}
},
"mappings": {
"properties": {
"obj": {
"properties": {
"str": {
"synthetic_source_keep": "arrays",
"type": "keyword"
}
}
}
}
}
}
'
curl -XPOST localhost:9200/idx7/_doc -H 'Content-Type: application/json' -d'
{
"obj": [
{
"str": ["cat", "dog"]
},
{
"str": "cat"
}
]
}
'
Results in:
"_source": {
"obj": {
"str": [
"cat",
"dog"
]
}
}
Meaning we do need the stored source. Or is that not what you're talking about?
martijnvg
left a comment
There was a problem hiding this comment.
LGTM - Let's also backport this to 8.19 branch?
| "point": { "type": "geo_point" } | ||
| "point": { | ||
| "type": "geo_point", | ||
| "synthetic_source_keep": "arrays" |
There was a problem hiding this comment.
Do you want to flip the points here to emphasize that order is different? Right now it happens to be a correct order.
Co-authored-by: Oleksandr Kolomiiets <olkolomiiets@gmail.com>
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
…astic#127796) Types that parse arrays directly should not need to store values in _ignored_source if synthetic_source_keep=arrays. Since they have custom handling of arrays, it provides no benefit to store in _ignored_source when there are multiple values of the type. (cherry picked from commit c04a956)
…astic#127796) Types that parse arrays directly should not need to store values in _ignored_source if synthetic_source_keep=arrays. Since they have custom handling of arrays, it provides no benefit to store in _ignored_source when there are multiple values of the type.
…ays (#127796) (#127999) * Do not respect synthetic_source_keep=arrays if type parses arrays (#127796) Types that parse arrays directly should not need to store values in _ignored_source if synthetic_source_keep=arrays. Since they have custom handling of arrays, it provides no benefit to store in _ignored_source when there are multiple values of the type. (cherry picked from commit c04a956) * Broke some stuff while merging
Types that parse arrays directly should not need to store values in
_ignored_sourceifsynthetic_source_keep=arrays. Since they have custom handling of arrays, it provides no benefit to store in_ignored_sourcewhen there are multiple values of the type.For example, consider the document with
pointfield of typegeo_point:This set of points is not subject to
synthetic_source_keep=arrayssince thegeo_pointmapper does custom array parsing. Since this is the case, it does not make sense to require that the following equivalent document use_ignored_source:Currently, this second document does use
_ignored_source, which is a waste of space. This PR causes the second document to behave the same as the first, not using_ignored_source.Fixes #126155