Optimize geogrid functions to read points from doc-values by craigtaverner · Pull Request #138917 · elastic/elasticsearch

craigtaverner · 2025-12-02T16:08:58Z

The ability to read geo_point from doc-values is a per-data-node decision, based on SearchStats on the shards on that data node. As such we can only perform FieldExtraction from doc-values if the geogrid function is run on the data node. This is currently only ensured when the function is before a pipeline breaker (like STATS or SORT). This means that this optimization only applies to queries where the geogrid function occurs before those commands. We test specifically queries like:

FROM airports
| EVAL grid = ST_GEOTILE(location, 2)
| STATS count=COUNT() BY grid

Will be optimized to read from doc-values.

FROM airports
| EVAL grid = ST_GEOTILE(location, 2)
| DROP location
| SORT abbrev

Will also be optimized to read from doc-values.

While a query like this will not be optimized:

FROM airports
| EVAL grid = ST_GEOTILE(location, 2)

And neither will this one, because the original location is not dropped:

FROM airports
| EVAL grid = ST_GEOTILE(location, 2)
| SORT abbrev

Optimizing this requires ensuring the geogrid function is run on the data node. One way to do that will be to push the function into the field loader itself, which is something we plan to do when optimizing support for geo_shape loading, so will not be attempted in this PR.

elasticsearchmachine · 2025-12-02T16:09:51Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

elasticsearchmachine · 2025-12-02T16:10:23Z

Hi @craigtaverner, I've created a changelog YAML for you.

Older nodes don't know about the optimization and will get errors like: IllegalStateException: Expected [BYTES_REF] but was [LONG] When running in mixed clusters.

Since before now we only ever supported loading from doc values when the spatial type was consumed by an aggregation, this was not needed. But now the geogrid function can be used even without an aggregation. The new rule is that as long as the original spatial type is not returned in the final output (needing original source), we can load from doc values. But TopNExec has an issue with the ElementType in this regard.

It is entirely unneeded since this optimization runs on the data nodes only.

ncordon

I left a couple of questions and a suggestion to improve the test coverage

ncordon · 2025-12-15T11:36:15Z

.../org/elasticsearch/xpack/esql/optimizer/rules/physical/local/SpatialDocValuesExtraction.java

 */
 public class SpatialDocValuesExtraction extends PhysicalOptimizerRules.ParameterizedOptimizerRule<
-    AggregateExec,
+    UnaryExec,


Is UnaryExec a physical plan wrapping a unique child?

I just needed a parent class in common with both AggregateExec and TopNExec, which are the two types I really care about. I could invent a new class just to bind these two "pipeline breakers" together, but thought it probably worked well enough to use the common parent class. This does mean we will look into this rule a lot more than needed, but I'm assuming the consequences of that are negligable.

ncordon · 2025-12-15T13:25:12Z

...ck/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/LocalExecutionPlanner.java

        List<Layout.ChannelSet> inverse = source.layout.inverse();
        for (int channel = 0; channel < inverse.size(); channel++) {
-            elementTypes[channel] = PlannerUtils.toElementType(inverse.get(channel).type());
+            var fieldExtractPreference = fieldExtractPreference(topNExec, inverse.get(channel).nameIds());


Probably I'm reading and understanding this wrongly, but we could have several attributes here in the TopN and we are going to treat them all as doc values if any of them are? Is that correct?

The decision should be per-channel, so if one requires doc-values, only that would will get that ElementType change. There should be no leaking between channels.

ncordon · 2025-12-15T13:33:06Z

.../org/elasticsearch/xpack/esql/optimizer/rules/physical/local/SpatialDocValuesExtraction.java

+                if (aggExpr instanceof Alias as && as.child() instanceof SpatialAggregateFunction af) {
+                    if (af.field() instanceof FieldAttribute fieldAttribute
+                        && allowedForDocValues(fieldAttribute, ctx.searchStats(), agg, foundAttributes)) {
+                        foundAttributes.add(fieldAttribute);


We are missing tests with geospatial aggregations I think, so this code path is never hit. Could we add one for the csv spec and one for the physical planning tests please?

Actually, this was the previously supported case (before this PR), so there should be lots of tests for this, in both PhysicalPlanOptimizerTests and csv-spec tests. Only the former asserts that doc-values are used, while the latter asserts results are correct. In PhysicalPlanOptimizerTests I see 11 tests using ST_EXTENT_AGG and 9 tests using ST_CENTROID_AGG. These focused on this code path. The new behaviour added in this PR is that we now also use SORT (topN) and non-spatial aggregations to trigger this, and we only added three new tests:

testSpatialTypesAndSortGeoGridUseDocValues (tests geogrid with SORT)

testSpatialTypesAndStatsGeoGridUseDocValues (tests geogrid with COUNT aggregation)

testSpatialTypesAndSortGeoGridUseDocValues2 (enhanced test with sort to test case where we drop or not drop the original spatial column, which affects the choice)

There were also lots older of csv-spec tests that tested spatial aggregations, as well as non-spatial aggregations (usually COUNT). This PR only added tests for geogrid cases, since that was not covered yet.

Are there specific combinations that you think I've missed? Or perhaps you would like to see more COUNT cases, since I only added one physical planner test for that scenario?

…8917) The ability to read `geo_point` from doc-values is a per-data-node decision, based on SearchStats on the shards on that data node. As such we can only perform FieldExtraction from doc-values if the geogrid function is run on the data node. This is currently only ensured when the function is before a pipeline breaker (like STATS or SORT). This means that this optimization only applies to queries where the geogrid function occurs before those commands. We test specifically queries like: ``` FROM airports | EVAL grid = ST_GEOTILE(location, 2) | STATS count=COUNT() BY grid ``` Will be optimized to read from doc-values. ``` FROM airports | EVAL grid = ST_GEOTILE(location, 2) | DROP location | SORT abbrev ``` Will also be optimized to read from doc-values. While a query like this will not be optimized: ``` FROM airports | EVAL grid = ST_GEOTILE(location, 2) ``` And neither will this one, because the original `location` is not dropped: ``` FROM airports | EVAL grid = ST_GEOTILE(location, 2) | SORT abbrev ``` Optimizing this requires ensuring the geogrid function is run on the data node. One way to do that will be to push the function into the field loader itself, which is something we plan to do when optimizing support for geo_shape loading, so will not be attempted in this PR.

Recent [optimizations in geo-grid functions](#138917) exposed some inconsistencies in other spatial functions that can return results either quantized (doc-values or lucene index) or not (source). ES|QL typically returns fields from doc-values, for performance reasons. However, for geospatial point data, this means a slight loss of precision, because geo_point and cartesian_point are quantized from two doubles (128bits) down to one long (64bits), in both doc-values and the lucene index (but not stored fields, or source). For all real-world use cases the remaining precision is fine, and something most users are willing to trade for the performance advantages. However, that willingness usually only extends to analytics, and if the user simply returns the original field, they usually want to see the exact original values. For this reason geospatial data is always returned from source in ES|QL, at a huge performance hit. However, we have implemented a number of optimizations that try to make use of doc-values whenever possible, and whenever the user does not return the original points so they will not see the precision loss. As we've expanded the scope of these optimizations, we've encountered a BWC issue with two particular functions, ST_X and ST_Y (both GA), and a less concerning issue with a group of related functions (tech-preview): ST_ENVELOPE, ST_XMAX, ST_XMIN, ST_YMAX and ST_YMIN. This PR fixes the inconsistency, making ST_X, ST_Y and the envelope functions all produce quantized results, so optimizations do not result in different values. Users can still get the original values with full prevision simply by not dropping the original geometry field, which will still be read from source. This gives users control over the precision-vs-performance lever. Just drop the original field to maximize performance, or keep it to see the original precision.

Recent [optimizations in geo-grid functions](elastic#138917) exposed some inconsistencies in other spatial functions that can return results either quantized (doc-values or lucene index) or not (source). ES|QL typically returns fields from doc-values, for performance reasons. However, for geospatial point data, this means a slight loss of precision, because geo_point and cartesian_point are quantized from two doubles (128bits) down to one long (64bits), in both doc-values and the lucene index (but not stored fields, or source). For all real-world use cases the remaining precision is fine, and something most users are willing to trade for the performance advantages. However, that willingness usually only extends to analytics, and if the user simply returns the original field, they usually want to see the exact original values. For this reason geospatial data is always returned from source in ES|QL, at a huge performance hit. However, we have implemented a number of optimizations that try to make use of doc-values whenever possible, and whenever the user does not return the original points so they will not see the precision loss. As we've expanded the scope of these optimizations, we've encountered a BWC issue with two particular functions, ST_X and ST_Y (both GA), and a less concerning issue with a group of related functions (tech-preview): ST_ENVELOPE, ST_XMAX, ST_XMIN, ST_YMAX and ST_YMIN. This PR fixes the inconsistency, making ST_X, ST_Y and the envelope functions all produce quantized results, so optimizations do not result in different values. Users can still get the original values with full prevision simply by not dropping the original geometry field, which will still be read from source. This gives users control over the precision-vs-performance lever. Just drop the original field to maximize performance, or keep it to see the original precision.

Recent [optimizations in geo-grid functions](#138917) exposed some inconsistencies in other spatial functions that can return results either quantized (doc-values or lucene index) or not (source). ES|QL typically returns fields from doc-values, for performance reasons. However, for geospatial point data, this means a slight loss of precision, because geo_point and cartesian_point are quantized from two doubles (128bits) down to one long (64bits), in both doc-values and the lucene index (but not stored fields, or source). For all real-world use cases the remaining precision is fine, and something most users are willing to trade for the performance advantages. However, that willingness usually only extends to analytics, and if the user simply returns the original field, they usually want to see the exact original values. For this reason geospatial data is always returned from source in ES|QL, at a huge performance hit. However, we have implemented a number of optimizations that try to make use of doc-values whenever possible, and whenever the user does not return the original points so they will not see the precision loss. As we've expanded the scope of these optimizations, we've encountered a BWC issue with two particular functions, ST_X and ST_Y (both GA), and a less concerning issue with a group of related functions (tech-preview): ST_ENVELOPE, ST_XMAX, ST_XMIN, ST_YMAX and ST_YMIN. This PR fixes the inconsistency, making ST_X, ST_Y and the envelope functions all produce quantized results, so optimizations do not result in different values. Users can still get the original values with full prevision simply by not dropping the original geometry field, which will still be read from source. This gives users control over the precision-vs-performance lever. Just drop the original field to maximize performance, or keep it to see the original precision.

Optimize geogrid functions to read points from doc-values

cd3a9ba

craigtaverner added >enhancement :Analytics/Geo Indexing, search aggregations of geo points and shapes Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL labels Dec 2, 2025

elasticsearchmachine added the v9.3.0 label Dec 2, 2025

Update docs/changelog/138917.yaml

cdc4542

craigtaverner added 10 commits December 3, 2025 10:41

Merge branch 'main' into geogrid_doc_values

a8b91db

Merge branch 'main' into geogrid_doc_values

3c0d519

Merge branch 'main' into geogrid_doc_values

c36da96

Fix changelog and use TransportVersion to enable optimization

ba1fe26

Older nodes don't know about the optimization and will get errors like: IllegalStateException: Expected [BYTES_REF] but was [LONG] When running in mixed clusters.

Visualize doc-values attributes in TopNExec

663193c

Merge remote-tracking branch 'origin/main' into geogrid_doc_values

36e11a1

Update transport version

ed020f0

Merge remote-tracking branch 'origin/main' into geogrid_doc_values

ea8d416

Removed use of transport version

213d563

It is entirely unneeded since this optimization runs on the data nodes only.

craigtaverner requested a review from ncordon December 10, 2025 17:05

craigtaverner added 2 commits December 11, 2025 11:57

Fix EsqlNodeSubclassTests

509aac3

Merge remote-tracking branch 'origin/main' into geogrid_doc_values

7345e4d

ncordon reviewed Dec 15, 2025

View reviewed changes

ncordon approved these changes Dec 15, 2025

View reviewed changes

craigtaverner merged commit c1d6aa2 into elastic:main Dec 15, 2025
34 checks passed

This was referenced Dec 23, 2025

Bugfix to doc-values and multi-value support in spatial functions (ST_ENVELOPE, ST_XMAX, etc.) #139932

Merged

Inconsistent Geospatial quantized results for point coordinates in ES|QL #139943

Closed

craigtaverner mentioned this pull request Jan 21, 2026

Quantize ST_X, ST_Y and related functions #140963

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize geogrid functions to read points from doc-values#138917

Optimize geogrid functions to read points from doc-values#138917
craigtaverner merged 14 commits intoelastic:mainfrom
craigtaverner:geogrid_doc_values

craigtaverner commented Dec 2, 2025 •

edited

Loading

elasticsearchmachine commented Dec 2, 2025

elasticsearchmachine commented Dec 2, 2025

ncordon left a comment

ncordon Dec 15, 2025

craigtaverner Dec 15, 2025

ncordon Dec 15, 2025

craigtaverner Dec 15, 2025

ncordon Dec 15, 2025

craigtaverner Dec 15, 2025

Uh oh!

Labels

3 participants

Conversation

craigtaverner commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

elasticsearchmachine commented Dec 2, 2025

elasticsearchmachine commented Dec 2, 2025

ncordon left a comment

Choose a reason for hiding this comment

ncordon Dec 15, 2025

Choose a reason for hiding this comment

craigtaverner Dec 15, 2025

Choose a reason for hiding this comment

ncordon Dec 15, 2025

Choose a reason for hiding this comment

craigtaverner Dec 15, 2025

Choose a reason for hiding this comment

ncordon Dec 15, 2025

Choose a reason for hiding this comment

craigtaverner Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Labels

3 participants

craigtaverner commented Dec 2, 2025 •

edited

Loading