Support ST_ENVELOPE and related ST_XMIN, etc.#116964
Support ST_ENVELOPE and related ST_XMIN, etc.#116964craigtaverner merged 24 commits intoelastic:mainfrom
Conversation
|
Documentation preview: |
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
|
Hi @craigtaverner, I've created a changelog YAML for you. |
|
Hi @craigtaverner, I've updated the changelog YAML for you. |
This will allow wrapping longitude around the dateline. TODO: Write tests and use different Evaluator for the Geo version.
Asserting in particular that the geo-wrapping over the dateline works.
| * Code that wishes to modify the behaviour of the visitor can implement the <code>PointVisitor</code> interface, | ||
| * or extend the existing implementations. | ||
| */ | ||
| public class SpatialEnvelopeVisitor implements GeometryVisitor<Boolean, RuntimeException> { |
There was a problem hiding this comment.
This class is in lib/geo because it is needed by both ESQL for the ST_ENVELOPE function as well as in the compute engine for the ST_EXTENT aggregation. It seems generic enough for libs/geo, but another choice would be to put it in the compute engine. I did not do that because there was no obvious place for generic spatial utils there. We could make one, of course.
There was a problem hiding this comment.
I think this implementation might evolve quite a bit so let's leave it here for now
| import static org.elasticsearch.xpack.esql.core.util.SpatialCoordinateTypes.UNSPECIFIED; | ||
|
|
||
| @FunctionName("st_xmax") | ||
| public class StXMaxTests extends AbstractScalarFunctionTestCase { |
There was a problem hiding this comment.
There's a lot of copy paste between all 4 X/Y Min/Max tests. Although this is test code, it could easily be refactored out using an abstract test template.
There was a problem hiding this comment.
I looked through the differences between two tests and the the number of lines that are actually the same is quite low. Mostly they are very similar, but not identical. I think any effort to reduce duplication will add complexity and make the tests harder to follow.
| } | ||
|
|
||
| @ConvertEvaluator(extraName = "FromWKB", warnExceptions = { IllegalArgumentException.class }) | ||
| static double fromWellKnownBinary(BytesRef wkb) { |
There was a problem hiding this comment.
This 4x copy paste with 2 line differences could easily be factored out, and the JIT should take care of any performance issues.
There was a problem hiding this comment.
I thought about common code, where we pass in functions for the two unique calls (eg. getY and getYMax), but that creates much harder to read code, and saves only a few lines of code duplication, so I thought it was not worth it.
| * The function `st_envelope` is defined in the <a href="https://www.ogc.org/standard/sfs/">OGC Simple Feature Access</a> standard. | ||
| * Alternatively it is well described in PostGIS documentation at | ||
| * <a href="https://postgis.net/docs/ST_ENVELOPE.html">PostGIS:ST_ENVELOPE</a>. | ||
| * |
There was a problem hiding this comment.
Nit: remove empty line
| /** | ||
| * Determine the BBOX assuming the CRS is geographic (eg WGS84) and optionally wrapping the longitude around the dateline. | ||
| */ | ||
| public static Optional<Rectangle> visit(Geometry geometry, boolean wrapLongitude) { |
There was a problem hiding this comment.
- Avoid overloading here: it isn't clear which
visitrefers to Cartesian and which to geo coordinates. - Replace
booleanwith an enum or an overload (between two geo shape versions) to avoid Boolean Blindness.
There was a problem hiding this comment.
I would use different names to specify what is geo and what is cartesian. no so fuzzy about the use of boolean here, enum is fine too.
There was a problem hiding this comment.
Used different names, by copying the version in ST_EXTENT PR. If we want to use an enum, we can make that change in that PR.
|
|
||
| boolean isValid(); | ||
|
|
||
| Rectangle getResult(); |
There was a problem hiding this comment.
Since this can return null, it should return Optional<Rectangle> (Or at the very least be marked with @Nullable, but Optional is better)
There was a problem hiding this comment.
It should never return null. I removed the one case where it does.
| visitLongitude(maxX); | ||
| } | ||
|
|
||
| private void visitLongitude(double x) { |
There was a problem hiding this comment.
While mathematically correct, I find this more confusing than it should be, since you're comparing both min and maxes with both min and maxes :)
|
|
||
| @Override | ||
| public Rectangle getResult() { | ||
| if (Double.isInfinite(maxY)) { |
There was a problem hiding this comment.
For consistency's sake, consider using isValid instead of this check.
There was a problem hiding this comment.
Removed this check. The SpatialEnvelopeVisitor already calls isValid before this, and this check is an unnecessary secondary call.
| } | ||
|
|
||
| @Override | ||
| public Rectangle getResult() { |
There was a problem hiding this comment.
To allow this to be used outside of this class, consider extracting this to a static method, maybe in another class.
iverase
left a comment
There was a problem hiding this comment.
We need to consider quantization of the coordinates but I spoke to Craig offline and we will do it once we push things down to lucene,
| * It also disallows invalid rectangles where minX > maxX. | ||
| */ | ||
| public static class CartesianPointVisitor implements PointVisitor { | ||
| private double minX = Double.POSITIVE_INFINITY; |
There was a problem hiding this comment.
These should be either be protected or exposed via getters so they can be used by its clients, e.g., during intermediate serialization. For my money, I prefer getters, due to the whole "prefer composition to inheritance".
There was a problem hiding this comment.
Got this from the changes in ST_EXTENT PR.
Support ST_ENVELOPE and related ST_XMIN, etc. Based on the PostGIS equivalents: https://postgis.net/docs/ST_Envelope.html https://postgis.net/docs/ST_XMin.html https://postgis.net/docs/ST_XMax.html https://postgis.net/docs/ST_YMin.html https://postgis.net/docs/ST_YMax.html
* Support ST_ENVELOPE and related ST_XMIN, etc. (#116964) Support ST_ENVELOPE and related ST_XMIN, etc. Based on the PostGIS equivalents: https://postgis.net/docs/ST_Envelope.html https://postgis.net/docs/ST_XMin.html https://postgis.net/docs/ST_XMax.html https://postgis.net/docs/ST_YMin.html https://postgis.net/docs/ST_YMax.html * Fix off-by-one error reported in #118051
…lastic#118743) * Support ST_ENVELOPE and related ST_XMIN, etc. (elastic#116964) Support ST_ENVELOPE and related ST_XMIN, etc. Based on the PostGIS equivalents: https://postgis.net/docs/ST_Envelope.html https://postgis.net/docs/ST_XMin.html https://postgis.net/docs/ST_XMax.html https://postgis.net/docs/ST_YMin.html https://postgis.net/docs/ST_YMax.html * Fix off-by-one error reported in elastic#118051
|
Manual backport in #118743 |
Support ST_ENVELOPE and related ST_XMIN, etc.
Based on the PostGIS equivalents:
Using ST_XMIN and similar can be done directly to the geometry, or by first calculating the envelope so the same geometry does not need to be iterated through multiple times:
Note that when the incoming data is Geo (geo_point and geo_shape) the bounding box can wrap the dateline, in which case the final result will have
xMin > xMax. This is the same as the default behaviour of the existinggeo_boundsaggregation. However, in that aggregation it is possible to control this behaviour with thewrap_longitudeboolean flag. In the current work we do not yet support that flag, and always wrap the longitude for geographic data.Fixes #104875