Quantize ST_X, ST_Y and related functions by craigtaverner · Pull Request #140963 · elastic/elasticsearch

craigtaverner · 2026-01-20T11:53:38Z

Recent optimizations in geo-grid functions exposed some inconsistencies in other spatial functions that can return results either quantized (doc-values or lucene index) or not (source).

ES|QL typically returns fields from doc-values, for performance reasons. However, for geospatial point data, this means a slight loss of precision, because geo_point and cartesian_point are quantized from two doubles (128bits) down to one long (64bits), in both doc-values and the lucene index (but not stored fields, or source). For all real-world use cases the remaining precision is fine, and something most users are willing to trade for the performance advantages. However, that willingness usually only extends to analytics, and if the user simply returns the original field, they usually want to see the exact original values. For this reason geospatial data is always returned from source in ES|QL, at a huge performance hit. However, we have implemented a number of optimizations that try to make use of doc-values whenever possible, and whenever the user does not return the original points so they will not see the precision loss. As we've expanded the scope of these optimizations, we've encountered a BWC issue with two particular functions, ST_X and ST_Y (both GA), and a less concerning issue with a group of related functions (tech-preview): ST_ENVELOPE, ST_XMAX, ST_XMIN, ST_YMAX and ST_YMIN.

This PR fixes the inconsistency, making ST_X, ST_Y and the envelope functions all produce quantized results, so optimizations do not result in different values.

Users can still get the original values with full prevision simply by not dropping the original geometry field, which will still be read from source. This gives users control over the precision-vs-performance lever. Just drop the original field to maximize performance, or keep it to see the original precision.

Fixes #139943

elasticsearchmachine · 2026-01-20T11:54:04Z

Hi @craigtaverner, I've created a changelog YAML for you.

github-actions · 2026-01-20T18:49:10Z

ℹ️ Important: Docs version tagging

👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version.

We use applies_to tags to mark version-specific features and changes.

Expand for a quick overview

When to use applies_to tags:

✅ At the page level to indicate which products/deployments the content applies to (mandatory)
✅ When features change state (e.g. preview, ga) in a specific version
✅ When availability differs across deployments and environments

What NOT to do:

❌ Don't remove or replace information that applies to an older version
❌ Don't add new information that applies to a specific version without an applies_to tag
❌ Don't forget that applies_to tags can be used at the page, section, and inline level

🤔 Need help?

Check out the cumulative docs guidelines
Reach out in the #docs Slack channel

elasticsearchmachine · 2026-01-21T13:21:58Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

github-actions · 2026-01-21T14:50:39Z

🔍 Preview links for changed docs

iverase · 2026-01-21T15:28:00Z

...rc/main/java/org/elasticsearch/xpack/esql/expression/function/scalar/spatial/StEnvelope.java

@@ -137,6 +137,24 @@ protected NodeInfo<? extends Expression> info() {
    }

    static void buildEnvelopeResults(BytesRefBlock.Builder results, Rectangle rectangle) {


Can we call this method buildCartesinanEnvelopeResults? it is confusing as it is now. And probably we can add a method that takes a encoder to avoid duplication.

OK. The missing Cartesian problem was in many places (notably evaluators), so I fixed them all by doing a general cleanup making sure all evaluators have the correct Geo/Cartesian naming, as well as putting it always before the WKB/DocValues (some were after). I'll push this, and then work on reducing code duplication, which is also in more places than just this one method.

OK. I've entirely removed duplicated code, by also using the pattern we used for sharing state in StGeotile. We create the visitor once per thread, instead of once per page, making fewer objects (the same optimization we used in the geo-grid functions). And I made sure we used the embedded SpatialCoordinateTypes inside the resultsBuilder, so we did not need multiple methods.

And I was therefor also able to reduce the total number of generated evaluator classes. I imagine I could use this trick elsewhere to cut the generated code down a bit. But for now I think this is enough for this PR!

…cValues This increases similarity and consistency and reduces confusion

Also removed incorrect multiple quantization for doc-values results

We were using a supplier for the results builder, but mistakenly passing in a shared points visitor, instead of making a new one when the builder was initiated.

…the results builders

…ch into quantize_st_xy

ncordon

LGTM just left a couple of questions

ncordon · 2026-01-22T18:14:30Z

docs/reference/query-languages/esql/limitations.md

+The spatial types `geo_point`, `geo_shape`, `cartesian_point` and `cartesian_shape` are maintained at source precision in the original documents,
+but indexed at reduced precision by Lucene, for performance reasons.
+To ensure this optimization is available in the widest context, all [spatial functions](/reference/query-languages/esql/functions-operators/spatial-functions.md) will produce results
+at this reduced precision, aligned with the underlying Lucene index grid.


Are we explaining what's the Lucene grid in the docs somewhere before this?

Good point. I understood we described this in the Query DSL and ingest docs, but should actually confirm this. And make links. I've discussed briefly with the docs team, and we want to re-review these docs next week and do a followup PR with any touch-ups.

ncordon · 2026-01-23T08:56:18Z

...rnalClusterTest/java/org/elasticsearch/xpack/esql/spatial/SpatialPushDownPointsTestCase.java

+                    greaterThan(0)
+                );
+                for (int column = 1; column < 8; column++) {
+                    if (index > 0) {


I couldn't understand why this condition is here so I commented it out and tests pass. Why?

The test is mostly comparing all indexes to index 0 (which is the fully indexed case). So uncommenting that line will compare index 0 to itself, which is fine, but redundant. See line 186 for where we get index 0 from as the baseline to compare against.

But I see your point. This test was based on logic in other tests that have only one column and compare all indexes to the first index. But this test also compares other columns to column 0, and that comparison is valid for index 0 also. So that line should be removed, as you suggest. Pity this PR is already merged! A followup would make sense.

Since ST_X and ST_Y return slightly different precision in older versions, mixed clusters will return no results due to precision differences in the WHERE clause. We could consider relaxing equality for spatial types, but that is expensive (needs to convert WKB to Points, quantize and then back to WKB).

Recent [optimizations in geo-grid functions](elastic#138917) exposed some inconsistencies in other spatial functions that can return results either quantized (doc-values or lucene index) or not (source). ES|QL typically returns fields from doc-values, for performance reasons. However, for geospatial point data, this means a slight loss of precision, because geo_point and cartesian_point are quantized from two doubles (128bits) down to one long (64bits), in both doc-values and the lucene index (but not stored fields, or source). For all real-world use cases the remaining precision is fine, and something most users are willing to trade for the performance advantages. However, that willingness usually only extends to analytics, and if the user simply returns the original field, they usually want to see the exact original values. For this reason geospatial data is always returned from source in ES|QL, at a huge performance hit. However, we have implemented a number of optimizations that try to make use of doc-values whenever possible, and whenever the user does not return the original points so they will not see the precision loss. As we've expanded the scope of these optimizations, we've encountered a BWC issue with two particular functions, ST_X and ST_Y (both GA), and a less concerning issue with a group of related functions (tech-preview): ST_ENVELOPE, ST_XMAX, ST_XMIN, ST_YMAX and ST_YMIN. This PR fixes the inconsistency, making ST_X, ST_Y and the envelope functions all produce quantized results, so optimizations do not result in different values. Users can still get the original values with full prevision simply by not dropping the original geometry field, which will still be read from source. This gives users control over the precision-vs-performance lever. Just drop the original field to maximize performance, or keep it to see the original precision.

Recent [optimizations in geo-grid functions](#138917) exposed some inconsistencies in other spatial functions that can return results either quantized (doc-values or lucene index) or not (source). ES|QL typically returns fields from doc-values, for performance reasons. However, for geospatial point data, this means a slight loss of precision, because geo_point and cartesian_point are quantized from two doubles (128bits) down to one long (64bits), in both doc-values and the lucene index (but not stored fields, or source). For all real-world use cases the remaining precision is fine, and something most users are willing to trade for the performance advantages. However, that willingness usually only extends to analytics, and if the user simply returns the original field, they usually want to see the exact original values. For this reason geospatial data is always returned from source in ES|QL, at a huge performance hit. However, we have implemented a number of optimizations that try to make use of doc-values whenever possible, and whenever the user does not return the original points so they will not see the precision loss. As we've expanded the scope of these optimizations, we've encountered a BWC issue with two particular functions, ST_X and ST_Y (both GA), and a less concerning issue with a group of related functions (tech-preview): ST_ENVELOPE, ST_XMAX, ST_XMIN, ST_YMAX and ST_YMIN. This PR fixes the inconsistency, making ST_X, ST_Y and the envelope functions all produce quantized results, so optimizations do not result in different values. Users can still get the original values with full prevision simply by not dropping the original geometry field, which will still be read from source. This gives users control over the precision-vs-performance lever. Just drop the original field to maximize performance, or keep it to see the original precision.

craigtaverner added >bug :Analytics/Geo Indexing, search aggregations of geo points and shapes Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL branch:9.2 v9.4.0 branch:9.3 labels Jan 20, 2026

elasticsearchmachine added v9.3.1 v9.2.5 and removed branch:9.2 branch:9.3 labels Jan 20, 2026

Quantize ST_X, ST_Y and related functions

2e5ca74

craigtaverner force-pushed the quantize_st_xy branch from d33a5f7 to 2e5ca74 Compare January 21, 2026 13:20

craigtaverner marked this pull request as ready for review January 21, 2026 13:21

craigtaverner requested review from iverase and ncordon January 21, 2026 13:21

Add more information to the docs concerning spatial precision

f1be07c

iverase reviewed Jan 21, 2026

View reviewed changes

craigtaverner added the auto-backport Automatically create backport pull requests when merged label Jan 21, 2026

craigtaverner and others added 7 commits January 22, 2026 12:50

Refactored all evaluator names to include Cartesian/Geo before WKB/Do…

fe35d37

…cValues This increases similarity and consistency and reduces confusion

Merge remote-tracking branch 'origin/main' into quantize_st_xy

dcebb30

Fixed mistake in quantizing geo-envelope using CARTESIAN

0b73ff2

Also removed incorrect multiple quantization for doc-values results

Make envelope points accumulator thread-local for memory efficiency

62896b1

Reduce code duplication using spatial coordinate type in buillder

c9025bf

Fix multi-threading issue with PointVisitor in ST_ENVELOPE et al

6ae5b5f

We were using a supplier for the results builder, but mistakenly passing in a shared points visitor, instead of making a new one when the builder was initiated.

[CI] Auto commit changes from spotless

4d2a4ec

craigtaverner added 3 commits January 22, 2026 18:58

Reduce the number of evaluators since we've moved GEO/Cartesian into …

1d08572

…the results builders

Merge branch 'quantize_st_xy' of github.com:craigtaverner/elasticsear…

93fa6ea

…ch into quantize_st_xy

Merge branch 'main' into quantize_st_xy

0aedd14

iverase approved these changes Jan 22, 2026

View reviewed changes

ncordon approved these changes Jan 23, 2026

View reviewed changes

craigtaverner added 2 commits January 23, 2026 10:59

Merge remote-tracking branch 'origin/main' into quantize_st_xy

42477fe

craigtaverner merged commit a380117 into elastic:main Jan 23, 2026
34 of 35 checks passed

elasticsearchmachine added the backport pending label Jan 23, 2026

craigtaverner mentioned this pull request Jan 23, 2026

[9.3] Quantize ST_X, ST_Y and related functions (#140963) #141201

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantize ST_X, ST_Y and related functions#140963

Quantize ST_X, ST_Y and related functions#140963
craigtaverner merged 14 commits intoelastic:mainfrom
craigtaverner:quantize_st_xy

craigtaverner commented Jan 20, 2026 •

edited

Loading

elasticsearchmachine commented Jan 20, 2026

github-actions bot commented Jan 20, 2026

When to use applies_to tags:

What NOT to do:

elasticsearchmachine commented Jan 21, 2026

github-actions bot commented Jan 21, 2026 •

edited

Loading

iverase Jan 21, 2026 •

edited

Loading

craigtaverner Jan 22, 2026

craigtaverner Jan 22, 2026

craigtaverner Jan 22, 2026

ncordon left a comment

ncordon Jan 22, 2026

craigtaverner Jan 23, 2026

ncordon Jan 23, 2026

craigtaverner Jan 23, 2026

craigtaverner Jan 23, 2026

Uh oh!

Labels

4 participants

		@@ -137,6 +137,24 @@ protected NodeInfo<? extends Expression> info() {
		}

		static void buildEnvelopeResults(BytesRefBlock.Builder results, Rectangle rectangle) {

Conversation

craigtaverner commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

elasticsearchmachine commented Jan 20, 2026

github-actions bot commented Jan 20, 2026

ℹ️ Important: Docs version tagging

When to use applies_to tags:

What NOT to do:

🤔 Need help?

elasticsearchmachine commented Jan 21, 2026

github-actions bot commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔍 Preview links for changed docs

iverase Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ncordon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Uh oh!

Labels

4 participants

craigtaverner commented Jan 20, 2026 •

edited

Loading

github-actions bot commented Jan 21, 2026 •

edited

Loading

iverase Jan 21, 2026 •

edited

Loading