[ES|QL] Support first(), last() functions#137408
Conversation
...ugin/esql/compute/gen/src/main/java/org/elasticsearch/compute/gen/AggregatorImplementer.java
Show resolved
Hide resolved
...ugin/esql/compute/gen/src/main/java/org/elasticsearch/compute/gen/AggregatorImplementer.java
Outdated
Show resolved
Hide resolved
fd1fe7e to
bff4ea4
Compare
...ugin/esql/compute/gen/src/main/java/org/elasticsearch/compute/gen/AggregatorImplementer.java
Show resolved
Hide resolved
c2ead19 to
9ccbfd1
Compare
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
| builder.beginControlFlow("if ($LValueCount == 0)", a.name()); | ||
| builder.addStatement("continue"); | ||
| builder.endControlFlow(); | ||
| a.addContinueIfPositionHasNoValueBlock(builder); |
There was a problem hiding this comment.
The affect seems to be to remove a check on empty blocks. Can I presume this is because that is now being done earlier in the stack, so the generated code will no longer see empty blocks?
There was a problem hiding this comment.
The idea was to let the *Aggregator.combine(...) method handle the null/zero blocks according to its own logic, and that there's nothing special about them that warrants skipping. At least that is the logic behind it when it comes to supporting nulls for first/last.
| builder.addStatement("int valuesPosition = groupPosition + positionOffset"); | ||
| if (valuesAreVector == false) { | ||
|
|
||
| // TBD - affects 2 geo classes |
There was a problem hiding this comment.
The effect seems to be to remove a check on null blocks in the generated code. Can I assume this means that the null check is being done higher in the stack?
There was a problem hiding this comment.
If you are receiving a Block argument you should check for null yourself, yeah. At least, I declare that's what we should do from now on.
Mostly because the function @mouhc1ne needs to make a different decision with regards to null.
So! That means for the geo aggs that work this way they should check for null and skip. And we should make sure we have a test that fails for this!
There was a problem hiding this comment.
The zero/null checks that I removed from the the aggregator function, that used to short-circuit the call to combine, are now placed inside the combine method itself. The outcome should be equivalent to what we had before.
I do agree that function that receive block arguments should get all of them, unfiltered, including zero/null so they can handle them accordingly.
There was a problem hiding this comment.
The last change allows the final step of SpatialExtentGroupingState to treat infinite values the same way it treats missing ones, by appending null. This allows us to skip a failed attempt at rectangle creation which results in an IllegalArgException if no values have been seen for a particular group. Btw, this only happens after inserting rows with null shapes in the 2 dataset files, which in itself is only included to exercise the logic inside the 4 combine methods and verify if it indeed filters out nulls. At this stage, the csv tests hit 3 out of 4 cases. The missing case is single (no-grouping) for SpatialExtentGeoShapeDocValuesAggregator which I couldn't hit yet. Simply removing the group by and adjusting the query doesn't exercise that path for some reason.
craigtaverner
left a comment
There was a problem hiding this comment.
null input should produce null results
x-pack/plugin/esql/qa/testFixtures/src/main/resources/spatial.csv-spec
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/qa/testFixtures/src/main/resources/spatial.csv-spec
Outdated
Show resolved
Hide resolved
.../src/main/java/org/elasticsearch/compute/aggregation/spatial/SpatialExtentGroupingState.java
Outdated
Show resolved
Hide resolved
c27db10 to
9f2d7b6
Compare
Create copies of the following classes into their respective, and temporary, "All*" equivalent: - FirstLongByTimestampAggregator - LongLongState - First
- Register ALL_FIRST as snapshot function, and restrict it to the long type only, for now. - Exceptionally allow ALL_FIRST agg function to be passed down null values by exempting it from the corresponding rule. - Change agg and grouping agg implementers, to pass all values down to the aggregator which must handle null values according to its internal logic.
- Sample csv tests for illustrating purposes only. - Will be removed and substituted with tests that read from one of the existing datasets instead of creating new one.
nik9000
left a comment
There was a problem hiding this comment.
Code looks good to me. The tests need another look I think.
- Filter-out null rows from existing spatial CSV tests that are affected by the new rows containing null shapes. - Add equivalent CSV tests that operate only the rows (ones with null shapes), and assert that we get nulls. - Allow SpatialExtentState to return null when a row was seen but X/Y values remain in their default state.
nik9000
left a comment
There was a problem hiding this comment.
LGTM.
@craigtaverner should have a look too.
| // ignore literals (e.g. COUNT(1)) | ||
| // make sure the field exists at the source and is indexed (not runtime) | ||
|
|
||
| if (af instanceof AllFirst) { |
There was a problem hiding this comment.
👍
Not a fan of this, but we'll be less hacky when we have more aggs like this.
This reverts commit 92e3bfb.
craigtaverner
left a comment
There was a problem hiding this comment.
Looks great. Fantastic that you also found a fixed a bug in ST_EXTENT_AGG on null handling! I had some minor comments and questions, but generally like the solution.
...lasticsearch/compute/aggregation/spatial/SpatialExtentCartesianShapeDocValuesAggregator.java
Show resolved
Hide resolved
...lasticsearch/compute/aggregation/spatial/SpatialExtentCartesianShapeDocValuesAggregator.java
Show resolved
Hide resolved
| int maxY = maxYs.get(si); | ||
| int minY = minYs.get(si); | ||
|
|
||
| boolean hasInfinity = minX == Integer.MAX_VALUE |
There was a problem hiding this comment.
This infinity check is new logic. Does this represent a bug-fix covering some cases that were not properly tested before?
| var factory = driverContext.blockFactory(); | ||
| return seen ? factory.newConstantBytesRefBlockWith(new BytesRef(toWKB()), 1) : factory.newConstantNullBlock(1); | ||
|
|
||
| boolean hasInfinity = minX == Integer.MAX_VALUE |
There was a problem hiding this comment.
This infinity check is new logic. Does this represent a bug-fix covering some cases that were not properly tested before?
There was a problem hiding this comment.
Was seen insufficient to cover this case?
There was a problem hiding this comment.
The existing dataset just didn't have rows that would exercise the path where we have seen a value so the seen flag is on, but its corresponding min(x/y) and max(x./y values still point to defaults (-/+) inf. In that case, the attempt the create the rectangle fails with an IllegalArgExc. This can happen in both SpatialExtentState and SpatialExtentGroupingState.
max y cannot be less than min y
java.lang.IllegalArgumentException: max y cannot be less than min y
at org.elasticsearch.geometry.Rectangle.<init>(Rectangle.java:75)
at org.elasticsearch.geometry.Rectangle.<init>(Rectangle.java:60)
at org.elasticsearch.compute.aggregation.spatial.SpatialExtentGroupingState.toBlock(SpatialExtentGroupingState.java:170)
at org.elasticsearch.compute.aggregation.spatial.SpatialExtentAggregator.evaluateFinal(SpatialExtentAggregator.java:31)
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
Show resolved
Hide resolved
x-pack/plugin/esql/qa/testFixtures/src/main/resources/spatial.csv-spec
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/qa/testFixtures/src/main/resources/spatial.csv-spec
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/qa/testFixtures/src/main/resources/spatial.csv-spec
Outdated
Show resolved
Hide resolved
| 1 | Mis | null | ||
| ; | ||
|
|
||
| stExtentCartesianShapesNonGroupingOnNulls |
There was a problem hiding this comment.
I noticed all tests are on cartesian data, never geo data. This could be sufficient, since the changes are identical across the two types, but ideally tests on geo would also be nice.
- Minor refactoring in "build.gradle" to allow the generation of the two variants of state classes (e.g.: LongLongState and AllLongLongState) - Generate the new two state files using "All-X-2State.java.st", for all types. - Generate the new timestamp aggregator files using "", for all types. - Register ALL_LAST as snapshot function - Augment existing csv tests to exercise ALL_LAST as well, and to check for all supported types.
- Minor refactoring in "build.gradle" to allow the generation of the two variants of state classes (e.g.: LongLongState and AllLongLongState) - Generate the new two state files using "All-X-2State.java.st", for all types. - Generate the new timestamp aggregator files using "All-X-ValueByTimestampAggregator.java.st", for all types. - Register ALL_LAST as snapshot function - Augment existing csv tests to exercise ALL_LAST as well, and to check for all supported types.
- Minor refactoring in "build.gradle" to allow the generation of the two variants of state classes (e.g.: LongLongState and AllLongLongState) - Generate the new two state files using "All-X-2State.java.st", for all types. - Generate the new timestamp aggregator files using "All-X-ValueByTimestampAggregator.java.st", for all types. - Register ALL_LAST as snapshot function - Augment existing csv tests to exercise ALL_LAST as well, and to check for all supported types.
- Minor refactoring in "build.gradle" to allow the generation of the two variants of state classes (e.g.: LongLongState and AllLongLongState) - Generate the new two state files using "All-X-2State.java.st", for all types. - Generate the new timestamp aggregator files using "All-X-ValueByTimestampAggregator.java.st", for all types. - Register ALL_LAST as snapshot function - Augment existing csv tests to exercise ALL_LAST as well, and to check for all supported types.
- Skip vector optimization for block arguments when grouping, to prevent grouping aggs from executing the no-op path.
- Skip vector optimization for block arguments when grouping, to prevent grouping aggs from executing the no-op path.
- Skip vector optimization for block arguments when grouping, to prevent grouping aggs from executing the no-op path. - This path will be enabled back once an actual implementation is in place.
…elastic#138066) - Minor refactoring in "build.gradle" to allow the generation of the two variants of state classes (e.g.: LongLongState and AllLongLongState) - Generate the new two state files using "All-X-2State.java.st", for all types. - Generate the new timestamp aggregator files using "All-X-ValueByTimestampAggregator.java.st", for all types. - Register ALL_LAST as snapshot function - Augment existing csv tests to exercise ALL_LAST as well, and to check for all supported types.
- Skip vector optimization for block arguments when grouping, to prevent grouping aggs from executing the no-op path. - This path will be enabled back once an actual implementation is in place.
Commit 1
Create copies of the following classes into their respective, and temporary,
All*equivalent:FirstLongByTimestampAggregatorLongLongStateFirstCommit 2
ALL_FIRSTas snapshot function, and restrict it to the long type only, for now.ALL_FIRSTagg function to be passed down null values by exempting it from the corresponding rule.Commit 3