Skip to content

ES|QL: Push down COUNT(*) BY DATE_TRUNC#138023

Merged
GalLalouche merged 27 commits intoelastic:mainfrom
GalLalouche:feature/count_by_trunc
Nov 25, 2025
Merged

ES|QL: Push down COUNT(*) BY DATE_TRUNC#138023
GalLalouche merged 27 commits intoelastic:mainfrom
GalLalouche:feature/count_by_trunc

Conversation

@GalLalouche
Copy link
Contributor

@GalLalouche GalLalouche commented Nov 13, 2025

Pushes down count(*) by round_to to Lucene.
Example query:

FROM employees
| STATS COUNT(*) BY DATE_TRUNC(1 YEAR, hire_date)

This is actually a culmination of several rules:

  1. ReplaceDateTruncBucketWithRoundTo Replaces the DATE_TRUNC with a ROUND_TO
  2. ReplaceRoundToWithQueryAndTags Replaces the ROUND_TO with query and tags.
  3. PushCountQueryAndTagsToSource (This PR) Pushes the aggregation down to Lucene.

Note that a query with a filter is not yet supported, but will be done a follow-up PR.

FROM employees
| STATS COUNT(*) WHERE hire_date > "1985-01-01" BY d=DATE_TRUNC(1 YEAR, hire_date) 
private final Expression limit;
private final List<Attribute> attrs;
private final List<Stat> stats;
private final Stat stat;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've refactored this since we don't support multiple aggregates right now anyway. When we do, we can turn this back into a list.

public class ReplaceRoundToWithQueryAndTagsTests extends AbstractLocalPhysicalPlanOptimizerTests {

public ReplaceRoundToWithQueryAndTagsTests(String name, Configuration config) {
public class SubtituteRoundToTests extends AbstractLocalPhysicalPlanOptimizerTests {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Renamed since it now tests for both rewrites in the same rule batch.
  • I've also refactored this to reduce some of the duplication.
  • This could have really benefited from the planned golden test feature!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some tests for INLINE STATS with count + date_histogram as well, if there isn't yet? Having some CsvTests for them will be great. Just to make sure the new filter(>0) added does not give us troubles for the inline join after the aggregation.

fork and subquery may also have aggregation inside the branches, having some additional tests for them will give us extra confidence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added 3 spec tests (inline stats, fork, subquery) and two plan tests to SubstituteRoundToTests (fork and inline stats; as discussed offline, we can't test inline stats right now).

@GalLalouche GalLalouche marked this pull request as ready for review November 14, 2025 13:13
@GalLalouche GalLalouche requested a review from nik9000 November 14, 2025 13:13
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Nov 14, 2025
@GalLalouche GalLalouche added >feature Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL labels Nov 14, 2025
@elasticsearchmachine elasticsearchmachine removed the needs:triage Requires assignment of a team area label label Nov 14, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Hi @GalLalouche, I've created a changelog YAML for you.

Copy link
Member

@fang-xing-esql fang-xing-esql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @GalLalouche , the new rule added makes sense to me. I added comments around additional tests, they will give us extra confidence of this change.

I'm just curious if there are any early performance results of this change yet? It will be really exciting to see the improvements.

I'll leave the review of the changes in operators to Nik.

public class ReplaceRoundToWithQueryAndTagsTests extends AbstractLocalPhysicalPlanOptimizerTests {

public ReplaceRoundToWithQueryAndTagsTests(String name, Configuration config) {
public class SubtituteRoundToTests extends AbstractLocalPhysicalPlanOptimizerTests {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some tests for INLINE STATS with count + date_histogram as well, if there isn't yet? Having some CsvTests for them will be great. Just to make sure the new filter(>0) added does not give us troubles for the inline join after the aggregation.

fork and subquery may also have aggregation inside the branches, having some additional tests for them will give us extra confidence.

public List<ElementType> tagTypes() {
return List.of(switch (queryBuilderAndTags.getFirst().tags().getFirst()) {
case Integer i -> ElementType.INT;
case Long l -> ElementType.LONG;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason that double is not supported?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we only support COUNT pushdown at the moment. I can simplify this by removing StatsType altogether, so it's clearer, but I wanted to avoid overly complicating this PR.

@GalLalouche
Copy link
Contributor Author

Thanks for the review @fang-xing-esql! I also added logic and test to verify we don't push ES Filters either (or any filter for that matter), after @nik9000 mentioned that it's another edge case we best avoid at the moment.

Copy link
Member

@fang-xing-esql fang-xing-esql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @GalLalouche! LGTM, only a couple of minor comments.

@GalLalouche GalLalouche changed the title ES|QL: Push down count Nov 25, 2025
@GalLalouche GalLalouche merged commit 96e799a into elastic:main Nov 25, 2025
34 checks passed
ncordon pushed a commit to ncordon/elasticsearch that referenced this pull request Nov 26, 2025
Pushes down `count(*) by round_to` to Lucene.
Example query:
```
FROM employees
| STATS COUNT(*) BY DATE_TRUNC(1 YEAR, hire_date)
```

This is actually a culmination of several rules:
1. `ReplaceDateTruncBucketWithRoundTo` Replaces the `DATE_TRUNC` with a `ROUND_TO`
2. `ReplaceRoundToWithQueryAndTags` Replaces the `ROUND_TO` with query and tags.
3. `PushCountQueryAndTagsToSource` (This PR) Pushes the aggregation down to Lucene.

Note that a query with a filter is not yet supported, but will be done a follow-up PR.
```
FROM employees
| STATS COUNT(*) WHERE hire_date > "1985-01-01" BY d=DATE_TRUNC(1 YEAR, hire_date) 
```
GalLalouche added a commit that referenced this pull request Nov 26, 2025
GalLalouche added a commit that referenced this pull request Dec 11, 2025
This PR adds filter support for pushing down `COUNT(*) BY DATE_TRUNC` introduced in #138023, i.e.,
```
FROM idx 
| WHERE date > NOW() - 90 DAYS 
| STATS COUNT(*) BY DATE_TRUNC(1 DAY, date)
```
Note that we still won't push down queries where the filter is *on* the `COUNT(*)` since that case is a lot more complicated due to having to keep zeros for filtered buckets:
```
FROM idx
| STATS COUNT(*) WHERE date > NOW() - 90 DAYS BY DATE_TRUNC(1 DAY, date)
```
GalLalouche added a commit that referenced this pull request Jan 19, 2026
This PR adds golden tests to ES|QL!

● I added a GoldenTestsReadme.md with instructions on how to add new golden tests.
● I tried to keep this (relatively) minimal for the first PR. In the future we can add more features, configurations, stages (e.g., local node_reduce), etc.
● As a proof of concept, I've re-implemented (most) of the tests in SubstitudeRoundToTests using the golden testing framework.
○ When I recently worked on these tests (#138023,#138765), the majority of my time was spent mechanically fixing the expected output in the tests.
○ This was a good use case since I needed to add more support for test nesting due to the loops in the original tests.
○ While more text is being produced when using golden tests (70k lines!), fixing the expected output is a lot simpler, since you can just bulldoze the output! In this case, eyeballing the difference is enough to be convinced that the change is correct. So the maintenance cost goes down.
○ One of the tests (testRoundToWithTimeSeriesIndices) was skipped because its output isn't actually consistent across runs. The original test only verifies a very specific subset of the AST, so it's not a good candidate for golden tests.
spinscale pushed a commit to spinscale/elasticsearch that referenced this pull request Jan 21, 2026
This PR adds golden tests to ES|QL!

● I added a GoldenTestsReadme.md with instructions on how to add new golden tests.
● I tried to keep this (relatively) minimal for the first PR. In the future we can add more features, configurations, stages (e.g., local node_reduce), etc.
● As a proof of concept, I've re-implemented (most) of the tests in SubstitudeRoundToTests using the golden testing framework.
○ When I recently worked on these tests (elastic#138023,elastic#138765), the majority of my time was spent mechanically fixing the expected output in the tests.
○ This was a good use case since I needed to add more support for test nesting due to the loops in the original tests.
○ While more text is being produced when using golden tests (70k lines!), fixing the expected output is a lot simpler, since you can just bulldoze the output! In this case, eyeballing the difference is enough to be convinced that the change is correct. So the maintenance cost goes down.
○ One of the tests (testRoundToWithTimeSeriesIndices) was skipped because its output isn't actually consistent across runs. The original test only verifies a very specific subset of the AST, so it's not a good candidate for golden tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >feature Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.3.0

3 participants