Skip to content

Fix ST_DISTANCE handling of invalid geometry literals that fold to null#140116

Merged
craigtaverner merged 12 commits intoelastic:mainfrom
craigtaverner:geometry_literal_npe
Jan 6, 2026
Merged

Fix ST_DISTANCE handling of invalid geometry literals that fold to null#140116
craigtaverner merged 12 commits intoelastic:mainfrom
craigtaverner:geometry_literal_npe

Conversation

@craigtaverner
Copy link
Contributor

@craigtaverner craigtaverner commented Jan 2, 2026

Occasionally on serverless we see a NullPointerException in parsing a geometry literal within ST_DISTANCE functions. This turned out to be hard to reproduce since we cannot see the original query causing this problem. It has been reported in #138594, including a full stack track confirming that it occurs during planning, while folding literals.

It turned out to be reproducible with a query like:

ROW point = MV_SLICE(["POINT(0)"], 0, 0)::geo_point
| EVAL distance = ST_DISTANCE(point, point)

Here the WKT is invalid, and the fold over the MV_SLICE would return literal null values, since the inner TO_GEOPOINT function was throwing exceptions (since the WKT was invalid). It turned out that MvSlice was using EvaluatorMapper.fold for folding the children, which returns literal java null values, and any function using these values needs to also handle literal nulls. So, fold() is expected to explicitly return null values when input is null, and the implementation of fold in StDistance did not do this.

This PR now covers all descendents of BinarySpatialFunction:

  • ST_DISTANCE
  • ST_INTERSECTS
  • ST_DISJOINT
  • ST_CONTAINS
  • ST_WITHIN

Fixes #138594

@craigtaverner craigtaverner added >bug :Analytics/Geo Indexing, search aggregations of geo points and shapes backport Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL branch:9.3 labels Jan 2, 2026
"Unsupported combination of literal [" + value.getClass().getSimpleName() + "] of type [" + dataType + "]"
);
var objType = (value == null ? "null" : value.getClass().getSimpleName());
throw new IllegalArgumentException("Unsupported combination of literal [" + objType + "] of type [" + dataType + "]");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't there a difference between the Java null value, and an object of type null (i.e., ESQL's NULL)? Or is the former the way we represent the latter in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the java null should never appear here, which is why I've been unable to reproduce this. I think the actual bug lies elsewhere and is creating this java null during the fold operation, somewhere in another function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it seems my st_distance.fold function was not correctly following null folding rules. We need explicit support for nulls here, and I've changed this code to return null instead of throwing an exception.

Copy link
Contributor

@ivancea ivancea Jan 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got to reproduce this with:

ROW x = MV_SLICE(["1"], 0, 0)::geo_point
| EVAL y = st_distance(x, x)

I had to add that MV_SLICE part, that ends up being null, but probably avoids some other null folding rules

Copy link
Contributor Author

@craigtaverner craigtaverner Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Ivan, that helped me track down and fix the real bug. It appears that fold needs to explicitly handle nulls. The default implementation of fold in EvaluatorMapper returns a raw java null, and all overriding implementations appear to need to do that too.

@craigtaverner craigtaverner changed the title Geometry Literal: Change NullPointerException to IllegalArgumentException Jan 5, 2026
@craigtaverner craigtaverner added branch:9.2 auto-backport Automatically create backport pull requests when merged and removed backport labels Jan 5, 2026
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Hi @craigtaverner, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Hi @craigtaverner, I've updated the changelog YAML for you.

throw new IllegalArgumentException("Failed to fold constant fields: " + e.getMessage(), e);
}
protected SpatialRelations getSpatialRelations() {
return (crsType() == SpatialCrsType.GEO) ? GEO : CARTESIAN;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant parens.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this kind of ternary expression easier to read when the condition is within parentheses. But your point is valid, I've been using parenthesis so long I started thinking they were required...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I did this anyway, since the expression is so simple.

return foldGeoGrid(ctx, left(), right(), right().dataType());
}
}
GeometryDocValueReader docValueReader = asGeometryDocValueReader(ctx, crsType(), left());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, you could shave a few cycles if you early exit here when docValuesReader is already null (though IIUC folding is done during planning so is probably not on the hot path).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, not on the hot path. I think the code is simpler with a single if.

long gridId = (Long) valueOf(ctx, gridExp);
return GEO.compareGeometryAndGrid(makeGeometryFromLiteral(ctx, spatialExp), gridId, gridType);
protected SpatialRelations getSpatialRelations() {
return (crsType() == SpatialCrsType.GEO) ? GEO : CARTESIAN;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant parens

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

long gridId = (Long) valueOf(ctx, gridExp);
return GEO.compareGeometryAndGrid(makeGeometryFromLiteral(ctx, spatialExp), gridId, gridType);
protected SpatialRelations getSpatialRelations() {
return (crsType() == SpatialCrsType.GEO) ? GEO : CARTESIAN;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant parens.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

/** Spatial relates functions always use Lucene GeometryDocValueReader and Component2D instead of geometries */
protected Object fold(Geometry leftGeom, Geometry rightGeom) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing a @Override.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

throw new IllegalArgumentException("Failed to fold constant fields: " + e.getMessage(), e);
}
protected SpatialRelations getSpatialRelations() {
return (crsType() == SpatialCrsType.GEO) ? GEO : CARTESIAN;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant parens.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


@Override
public Object fold(FoldContext ctx) {
var leftGeom = makeGeometryFromLiteral(ctx, left());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, you could shave a few cycles if you early exit here when leftGeom is already null (though IIUC folding is done during planning so is probably not on the hot path).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll stick with a single if for simplicity.

ST_CONTAINS needs a custom version of fold() due to the need to split all components into separate components. We could consider doing the same for all other functions, but that would require larger refactoring and could incut performance costs.
@craigtaverner craigtaverner merged commit db5f41f into elastic:main Jan 6, 2026
35 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
9.2 Commit could not be cherrypicked due to conflicts
9.3 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 140116

craigtaverner added a commit to craigtaverner/elasticsearch that referenced this pull request Jan 7, 2026
…ll (elastic#140116)

Occasionally on serverless we see a NullPointerException in parsing a geometry literal within ST_DISTANCE functions. This turned out to be hard to reproduce since we cannot see the original query causing this problem. It has been reported in elastic#138594, including a full stack track confirming that it occurs during planning, while folding literals.

It turned out to be reproducible with a query like:

```
ROW point = MV_SLICE(["POINT(0)"], 0, 0)::geo_point
| EVAL distance = ST_DISTANCE(point, point)
```

Here the WKT is invalid, and the fold over the MV_SLICE would return literal `null` values, since the inner TO_GEOPOINT function was throwing exceptions (since the WKT was invalid). It turned out that MvSlice was using EvaluatorMapper.fold for folding the children, which returns literal java `null` values, and any function using these values needs to also handle literal nulls. So, `fold()` is expected to explicitly return null values when input is null, and the implementation of fold in StDistance did not do this.

This PR now covers all descendents of BinarySpatialFunction:
* ST_DISTANCE
* ST_INTERSECTS
* ST_DISJOINT
* ST_CONTAINS
* ST_WITHIN
@craigtaverner
Copy link
Contributor Author

Manual backport in #140265

elasticsearchmachine pushed a commit that referenced this pull request Jan 7, 2026
…ll (#140116) (#140265)

Occasionally on serverless we see a NullPointerException in parsing a geometry literal within ST_DISTANCE functions. This turned out to be hard to reproduce since we cannot see the original query causing this problem. It has been reported in #138594, including a full stack track confirming that it occurs during planning, while folding literals.

It turned out to be reproducible with a query like:

```
ROW point = MV_SLICE(["POINT(0)"], 0, 0)::geo_point
| EVAL distance = ST_DISTANCE(point, point)
```

Here the WKT is invalid, and the fold over the MV_SLICE would return literal `null` values, since the inner TO_GEOPOINT function was throwing exceptions (since the WKT was invalid). It turned out that MvSlice was using EvaluatorMapper.fold for folding the children, which returns literal java `null` values, and any function using these values needs to also handle literal nulls. So, `fold()` is expected to explicitly return null values when input is null, and the implementation of fold in StDistance did not do this.

This PR now covers all descendents of BinarySpatialFunction:
* ST_DISTANCE
* ST_INTERSECTS
* ST_DISJOINT
* ST_CONTAINS
* ST_WITHIN
craigtaverner added a commit to craigtaverner/elasticsearch that referenced this pull request Jan 7, 2026
…ll (elastic#140116) (elastic#140265)

Occasionally on serverless we see a NullPointerException in parsing a geometry literal within ST_DISTANCE functions. This turned out to be hard to reproduce since we cannot see the original query causing this problem. It has been reported in elastic#138594, including a full stack track confirming that it occurs during planning, while folding literals.

It turned out to be reproducible with a query like:

```
ROW point = MV_SLICE(["POINT(0)"], 0, 0)::geo_point
| EVAL distance = ST_DISTANCE(point, point)
```

Here the WKT is invalid, and the fold over the MV_SLICE would return literal `null` values, since the inner TO_GEOPOINT function was throwing exceptions (since the WKT was invalid). It turned out that MvSlice was using EvaluatorMapper.fold for folding the children, which returns literal java `null` values, and any function using these values needs to also handle literal nulls. So, `fold()` is expected to explicitly return null values when input is null, and the implementation of fold in StDistance did not do this.

This PR now covers all descendents of BinarySpatialFunction:
* ST_DISTANCE
* ST_INTERSECTS
* ST_DISJOINT
* ST_CONTAINS
* ST_WITHIN
elasticsearchmachine pushed a commit that referenced this pull request Jan 7, 2026
…ll (#140116) (#140265) (#140276)

Occasionally on serverless we see a NullPointerException in parsing a geometry literal within ST_DISTANCE functions. This turned out to be hard to reproduce since we cannot see the original query causing this problem. It has been reported in #138594, including a full stack track confirming that it occurs during planning, while folding literals.

It turned out to be reproducible with a query like:

```
ROW point = MV_SLICE(["POINT(0)"], 0, 0)::geo_point
| EVAL distance = ST_DISTANCE(point, point)
```

Here the WKT is invalid, and the fold over the MV_SLICE would return literal `null` values, since the inner TO_GEOPOINT function was throwing exceptions (since the WKT was invalid). It turned out that MvSlice was using EvaluatorMapper.fold for folding the children, which returns literal java `null` values, and any function using these values needs to also handle literal nulls. So, `fold()` is expected to explicitly return null values when input is null, and the implementation of fold in StDistance did not do this.

This PR now covers all descendents of BinarySpatialFunction:
* ST_DISTANCE
* ST_INTERSECTS
* ST_DISJOINT
* ST_CONTAINS
* ST_WITHIN
sidosera pushed a commit to sidosera/elasticsearch that referenced this pull request Jan 7, 2026
…ll (elastic#140116)

Occasionally on serverless we see a NullPointerException in parsing a geometry literal within ST_DISTANCE functions. This turned out to be hard to reproduce since we cannot see the original query causing this problem. It has been reported in elastic#138594, including a full stack track confirming that it occurs during planning, while folding literals.

It turned out to be reproducible with a query like:

```
ROW point = MV_SLICE(["POINT(0)"], 0, 0)::geo_point
| EVAL distance = ST_DISTANCE(point, point)
```

Here the WKT is invalid, and the fold over the MV_SLICE would return literal `null` values, since the inner TO_GEOPOINT function was throwing exceptions (since the WKT was invalid). It turned out that MvSlice was using EvaluatorMapper.fold for folding the children, which returns literal java `null` values, and any function using these values needs to also handle literal nulls. So, `fold()` is expected to explicitly return null values when input is null, and the implementation of fold in StDistance did not do this.

This PR now covers all descendents of BinarySpatialFunction:
* ST_DISTANCE
* ST_INTERSECTS
* ST_DISJOINT
* ST_CONTAINS
* ST_WITHIN
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL :Analytics/Geo Indexing, search aggregations of geo points and shapes auto-backport Automatically create backport pull requests when merged >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.2.4 v9.3.1 v9.4.0

4 participants