Skip to content

[ES|QL] Support first(), last() functions#137408

Merged
mouhc1ne merged 25 commits intoelastic:mainfrom
mouhc1ne:all-first
Nov 13, 2025
Merged

[ES|QL] Support first(), last() functions#137408
mouhc1ne merged 25 commits intoelastic:mainfrom
mouhc1ne:all-first

Conversation

@mouhc1ne
Copy link
Contributor

@mouhc1ne mouhc1ne commented Oct 30, 2025

Commit 1

Create copies of the following classes into their respective, and temporary, All* equivalent:

  • FirstLongByTimestampAggregator
  • LongLongState
  • First

Commit 2

  • Register ALL_FIRST as snapshot function, and restrict it to the long type only, for now.
  • Exceptionally allow ALL_FIRST agg function to be passed down null values by exempting it from the corresponding rule.
  • Change agg and grouping agg implementers, to pass all values down to the aggregator which must handle null values according to its internal logic.

Commit 3

  • Sample csv tests for illustrating purposes only.
  • Will be removed and substituted with tests that read from one of the existing datasets instead of creating new one.
@mouhc1ne mouhc1ne requested a review from nik9000 October 30, 2025 21:27
@mouhc1ne mouhc1ne self-assigned this Oct 30, 2025
@mouhc1ne mouhc1ne added >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL labels Oct 30, 2025
@mouhc1ne mouhc1ne changed the title Exceptionally allow this agg function to be passed down null values Oct 31, 2025
@mouhc1ne mouhc1ne force-pushed the all-first branch 2 times, most recently from fd1fe7e to bff4ea4 Compare November 4, 2025 11:53
@mouhc1ne mouhc1ne requested a review from nik9000 November 4, 2025 12:04
@mouhc1ne mouhc1ne force-pushed the all-first branch 2 times, most recently from c2ead19 to 9ccbfd1 Compare November 5, 2025 20:13
@mouhc1ne mouhc1ne marked this pull request as ready for review November 5, 2025 20:14
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

builder.beginControlFlow("if ($LValueCount == 0)", a.name());
builder.addStatement("continue");
builder.endControlFlow();
a.addContinueIfPositionHasNoValueBlock(builder);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The affect seems to be to remove a check on empty blocks. Can I presume this is because that is now being done earlier in the stack, so the generated code will no longer see empty blocks?

Copy link
Contributor Author

@mouhc1ne mouhc1ne Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to let the *Aggregator.combine(...) method handle the null/zero blocks according to its own logic, and that there's nothing special about them that warrants skipping. At least that is the logic behind it when it comes to supporting nulls for first/last.

builder.addStatement("int valuesPosition = groupPosition + positionOffset");
if (valuesAreVector == false) {

// TBD - affects 2 geo classes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The effect seems to be to remove a check on null blocks in the generated code. Can I assume this means that the null check is being done higher in the stack?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are receiving a Block argument you should check for null yourself, yeah. At least, I declare that's what we should do from now on.

Mostly because the function @mouhc1ne needs to make a different decision with regards to null.

So! That means for the geo aggs that work this way they should check for null and skip. And we should make sure we have a test that fails for this!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The zero/null checks that I removed from the the aggregator function, that used to short-circuit the call to combine, are now placed inside the combine method itself. The outcome should be equivalent to what we had before.

I do agree that function that receive block arguments should get all of them, unfiltered, including zero/null so they can handle them accordingly.

Copy link
Contributor Author

@mouhc1ne mouhc1ne Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last change allows the final step of SpatialExtentGroupingState to treat infinite values the same way it treats missing ones, by appending null. This allows us to skip a failed attempt at rectangle creation which results in an IllegalArgException if no values have been seen for a particular group. Btw, this only happens after inserting rows with null shapes in the 2 dataset files, which in itself is only included to exercise the logic inside the 4 combine methods and verify if it indeed filters out nulls. At this stage, the csv tests hit 3 out of 4 cases. The missing case is single (no-grouping) for SpatialExtentGeoShapeDocValuesAggregator which I couldn't hit yet. Simply removing the group by and adjusting the query doesn't exercise that path for some reason.

Copy link
Contributor

@craigtaverner craigtaverner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

null input should produce null results

@mouhc1ne mouhc1ne force-pushed the all-first branch 2 times, most recently from c27db10 to 9f2d7b6 Compare November 7, 2025 21:01
Create copies of the following classes into their respective, and temporary, "All*" equivalent:
- FirstLongByTimestampAggregator
- LongLongState
- First
- Register ALL_FIRST as snapshot function, and restrict it to the long type only, for now.
- Exceptionally allow ALL_FIRST agg function to be passed down null values by exempting it from the corresponding rule.
- Change agg and grouping agg implementers, to pass all values down to the aggregator which must handle null values according to its internal logic.
- Sample csv tests for illustrating purposes only.
- Will be removed and substituted with tests that read from one of the existing datasets instead of creating new one.
Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good to me. The tests need another look I think.

- Filter-out null rows from existing spatial CSV tests that are affected by the new rows containing null shapes.
- Add equivalent CSV tests that operate only the rows (ones with null shapes), and assert that we get nulls.
- Allow SpatialExtentState to return null when a row was seen but X/Y values remain in their default state.
@mouhc1ne mouhc1ne requested a review from nik9000 November 10, 2025 17:01
Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@craigtaverner should have a look too.

// ignore literals (e.g. COUNT(1))
// make sure the field exists at the source and is indexed (not runtime)

if (af instanceof AllFirst) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Not a fan of this, but we'll be less hacky when we have more aggs like this.

Copy link
Contributor

@craigtaverner craigtaverner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Fantastic that you also found a fixed a bug in ST_EXTENT_AGG on null handling! I had some minor comments and questions, but generally like the solution.

int maxY = maxYs.get(si);
int minY = minYs.get(si);

boolean hasInfinity = minX == Integer.MAX_VALUE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This infinity check is new logic. Does this represent a bug-fix covering some cases that were not properly tested before?

var factory = driverContext.blockFactory();
return seen ? factory.newConstantBytesRefBlockWith(new BytesRef(toWKB()), 1) : factory.newConstantNullBlock(1);

boolean hasInfinity = minX == Integer.MAX_VALUE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This infinity check is new logic. Does this represent a bug-fix covering some cases that were not properly tested before?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was seen insufficient to cover this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing dataset just didn't have rows that would exercise the path where we have seen a value so the seen flag is on, but its corresponding min(x/y) and max(x./y values still point to defaults (-/+) inf. In that case, the attempt the create the rectangle fails with an IllegalArgExc. This can happen in both SpatialExtentState and SpatialExtentGroupingState.

max y cannot be less than min y
java.lang.IllegalArgumentException: max y cannot be less than min y
	at org.elasticsearch.geometry.Rectangle.<init>(Rectangle.java:75)
	at org.elasticsearch.geometry.Rectangle.<init>(Rectangle.java:60)
	at org.elasticsearch.compute.aggregation.spatial.SpatialExtentGroupingState.toBlock(SpatialExtentGroupingState.java:170)
	at org.elasticsearch.compute.aggregation.spatial.SpatialExtentAggregator.evaluateFinal(SpatialExtentAggregator.java:31)
1 | Mis | null
;

stExtentCartesianShapesNonGroupingOnNulls
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed all tests are on cartesian data, never geo data. This could be sufficient, since the changes are identical across the two types, but ideally tests on geo would also be nice.

@mouhc1ne mouhc1ne merged commit 85a4782 into elastic:main Nov 13, 2025
34 checks passed
mouhc1ne added a commit to mouhc1ne/elasticsearch that referenced this pull request Nov 13, 2025
- Minor refactoring in "build.gradle" to allow the generation of the two variants of state classes (e.g.: LongLongState and AllLongLongState)
- Generate the new two state files using "All-X-2State.java.st", for all types.
- Generate the new timestamp aggregator files using "", for all types.
- Register ALL_LAST as snapshot function
- Augment existing csv tests to exercise ALL_LAST as well, and to check for all supported types.
mouhc1ne added a commit to mouhc1ne/elasticsearch that referenced this pull request Nov 13, 2025
- Minor refactoring in "build.gradle" to allow the generation of the two variants of state classes (e.g.: LongLongState and AllLongLongState)
- Generate the new two state files using "All-X-2State.java.st", for all types.
- Generate the new timestamp aggregator files using "All-X-ValueByTimestampAggregator.java.st", for all types.
- Register ALL_LAST as snapshot function
- Augment existing csv tests to exercise ALL_LAST as well, and to check for all supported types.
mouhc1ne added a commit to mouhc1ne/elasticsearch that referenced this pull request Nov 14, 2025
- Minor refactoring in "build.gradle" to allow the generation of the two variants of state classes (e.g.: LongLongState and AllLongLongState)
- Generate the new two state files using "All-X-2State.java.st", for all types.
- Generate the new timestamp aggregator files using "All-X-ValueByTimestampAggregator.java.st", for all types.
- Register ALL_LAST as snapshot function
- Augment existing csv tests to exercise ALL_LAST as well, and to check for all supported types.
mouhc1ne added a commit that referenced this pull request Nov 19, 2025
- Minor refactoring in "build.gradle" to allow the generation of the two variants of state classes (e.g.: LongLongState and AllLongLongState)
- Generate the new two state files using "All-X-2State.java.st", for all types.
- Generate the new timestamp aggregator files using "All-X-ValueByTimestampAggregator.java.st", for all types.
- Register ALL_LAST as snapshot function
- Augment existing csv tests to exercise ALL_LAST as well, and to check for all supported types.
mouhc1ne added a commit to mouhc1ne/elasticsearch that referenced this pull request Nov 19, 2025
- Skip vector optimization for block arguments when grouping, to prevent grouping aggs from executing the no-op path.
mouhc1ne added a commit to mouhc1ne/elasticsearch that referenced this pull request Nov 20, 2025
- Skip vector optimization for block arguments when grouping, to prevent grouping aggs from executing the no-op path.
mouhc1ne added a commit that referenced this pull request Nov 24, 2025
- Skip vector optimization for block arguments when grouping, to prevent grouping aggs from executing the no-op path.
- This path will be enabled back once an actual implementation is in place.
afoucret pushed a commit to afoucret/elasticsearch that referenced this pull request Nov 26, 2025
- Skip vector optimization for block arguments when grouping, to prevent grouping aggs from executing the no-op path.
- This path will be enabled back once an actual implementation is in place.
ncordon pushed a commit to ncordon/elasticsearch that referenced this pull request Nov 26, 2025
…elastic#138066)

- Minor refactoring in "build.gradle" to allow the generation of the two variants of state classes (e.g.: LongLongState and AllLongLongState)
- Generate the new two state files using "All-X-2State.java.st", for all types.
- Generate the new timestamp aggregator files using "All-X-ValueByTimestampAggregator.java.st", for all types.
- Register ALL_LAST as snapshot function
- Augment existing csv tests to exercise ALL_LAST as well, and to check for all supported types.
ncordon pushed a commit to ncordon/elasticsearch that referenced this pull request Nov 26, 2025
- Skip vector optimization for block arguments when grouping, to prevent grouping aggs from executing the no-op path.
- This path will be enabled back once an actual implementation is in place.
@mouhc1ne mouhc1ne deleted the all-first branch February 27, 2026 12:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.3.0

4 participants