Bugfix to doc-values and multi-value support in spatial functions (ST_ENVELOPE, ST_XMAX, etc.)#139932
Conversation
* Made ST_ENVELOPE, ST_XMAX, etc. multi-value aware * Refactored doc-values support (so ST_ENVELOPE, ST_XMAX, etc. and all geogrid functions are similar) This is a partial back-port of elastic#139618, which fixed the mv and doc-values issues in main, but also added ST_NPOINTS. This is now a bug-fix only PR, with all support for ST_NPOINTS and ST_SIMPLIFY removed, so 9.3.0 only gets the bugfix and no new features.
ℹ️ Important: Docs version tagging👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version. We use applies_to tags to mark version-specific features and changes. Expand for a quick overviewWhen to use applies_to tags:✅ At the page level to indicate which products/deployments the content applies to (mandatory) What NOT to do:❌ Don't remove or replace information that applies to an older version 🤔 Need help?
|
The optimization for doc-values extration for geo-grid functions at #138917 expanded the range of scenarios under which doc-values can be extracted, and exposed a pre-existing bug that was previous unlikely to be found, but now much more likely to be noticed. The above PR actually did fix this issue, but did not cover all possible cases. The issue was that if the original
geo_pointfield was returned to the coordinator, it would be in a LongBlock and not understood by any coordinator level functions (since the doc-values optimization only works on data nodes). Previously this was very, very unlikely to be encountered by a user, because the doc-values optimization would only trigger with spatial aggregations, likeST_EXTENT_AGGorST_CENTROID_AGG, which consume the original point and do not return it unless it is used in theBYclause:Such a query is not useful and would likely never by typed by a user. However, with the geo-grid optimization, since that is not an aggregating function, we expanded the range of cases where the doc-values extraction can occur to a much wider set, including before SORT functions. However, this makes it much easier for a user to create a failing query, for example:
The geo-grid optimization at #138917 did fix this scenario, by requiring that the original geo_point field is not returned, so the optimization would only work if the user dropped that field:
At first, this looks sufficient. But what was missing was the fact that other spatial functions could be included in the EVAL, and they would not be able to support the extracted doc-values.
This new bug-fix makes sure that all spatial functions can be made doc-values aware, as well as work consistently with multi-valued fields. It is based on the work at #139618, but with the ST_NPOINTS and ST_SIMPLIFY features extracted, leaving only bug-fixes:
This is a partial back-port of #139618, which fixed the mv and doc-values issues in main, but also added ST_NPOINTS. This is now a bug-fix only PR, with all support for ST_NPOINTS and ST_SIMPLIFY removed, so 9.3.0 only gets the bugfix and no new features.