ES|QL: Push down COUNT(*) BY DATE_TRUNC#138023
Conversation
| private final Expression limit; | ||
| private final List<Attribute> attrs; | ||
| private final List<Stat> stats; | ||
| private final Stat stat; |
There was a problem hiding this comment.
I've refactored this since we don't support multiple aggregates right now anyway. When we do, we can turn this back into a list.
| public class ReplaceRoundToWithQueryAndTagsTests extends AbstractLocalPhysicalPlanOptimizerTests { | ||
|
|
||
| public ReplaceRoundToWithQueryAndTagsTests(String name, Configuration config) { | ||
| public class SubtituteRoundToTests extends AbstractLocalPhysicalPlanOptimizerTests { |
There was a problem hiding this comment.
- Renamed since it now tests for both rewrites in the same rule batch.
- I've also refactored this to reduce some of the duplication.
- This could have really benefited from the planned golden test feature!
There was a problem hiding this comment.
Can we add some tests for INLINE STATS with count + date_histogram as well, if there isn't yet? Having some CsvTests for them will be great. Just to make sure the new filter(>0) added does not give us troubles for the inline join after the aggregation.
fork and subquery may also have aggregation inside the branches, having some additional tests for them will give us extra confidence.
There was a problem hiding this comment.
I've added 3 spec tests (inline stats, fork, subquery) and two plan tests to SubstituteRoundToTests (fork and inline stats; as discussed offline, we can't test inline stats right now).
…icsearch into feature/count_by_trunc
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
|
Hi @GalLalouche, I've created a changelog YAML for you. |
fang-xing-esql
left a comment
There was a problem hiding this comment.
Thank you @GalLalouche , the new rule added makes sense to me. I added comments around additional tests, they will give us extra confidence of this change.
I'm just curious if there are any early performance results of this change yet? It will be really exciting to see the improvements.
I'll leave the review of the changes in operators to Nik.
...ugin/esql/qa/server/src/main/java/org/elasticsearch/xpack/esql/qa/rest/EsqlSpecTestCase.java
Outdated
Show resolved
Hide resolved
...g/elasticsearch/xpack/esql/optimizer/rules/physical/local/PushCountQueryAndTagsToSource.java
Outdated
Show resolved
Hide resolved
| public class ReplaceRoundToWithQueryAndTagsTests extends AbstractLocalPhysicalPlanOptimizerTests { | ||
|
|
||
| public ReplaceRoundToWithQueryAndTagsTests(String name, Configuration config) { | ||
| public class SubtituteRoundToTests extends AbstractLocalPhysicalPlanOptimizerTests { |
There was a problem hiding this comment.
Can we add some tests for INLINE STATS with count + date_histogram as well, if there isn't yet? Having some CsvTests for them will be great. Just to make sure the new filter(>0) added does not give us troubles for the inline join after the aggregation.
fork and subquery may also have aggregation inside the branches, having some additional tests for them will give us extra confidence.
| public List<ElementType> tagTypes() { | ||
| return List.of(switch (queryBuilderAndTags.getFirst().tags().getFirst()) { | ||
| case Integer i -> ElementType.INT; | ||
| case Long l -> ElementType.LONG; |
There was a problem hiding this comment.
Is there a particular reason that double is not supported?
There was a problem hiding this comment.
Since we only support COUNT pushdown at the moment. I can simplify this by removing StatsType altogether, so it's clearer, but I wanted to avoid overly complicating this PR.
|
Thanks for the review @fang-xing-esql! I also added logic and test to verify we don't push |
fang-xing-esql
left a comment
There was a problem hiding this comment.
Thank you @GalLalouche! LGTM, only a couple of minor comments.
...java/org/elasticsearch/xpack/esql/optimizer/rules/physical/local/SubstituteRoundToTests.java
Outdated
Show resolved
Hide resolved
...ugin/esql/qa/server/src/main/java/org/elasticsearch/xpack/esql/qa/rest/EsqlSpecTestCase.java
Outdated
Show resolved
Hide resolved
…icsearch into feature/count_by_trunc
Pushes down `count(*) by round_to` to Lucene. Example query: ``` FROM employees | STATS COUNT(*) BY DATE_TRUNC(1 YEAR, hire_date) ``` This is actually a culmination of several rules: 1. `ReplaceDateTruncBucketWithRoundTo` Replaces the `DATE_TRUNC` with a `ROUND_TO` 2. `ReplaceRoundToWithQueryAndTags` Replaces the `ROUND_TO` with query and tags. 3. `PushCountQueryAndTagsToSource` (This PR) Pushes the aggregation down to Lucene. Note that a query with a filter is not yet supported, but will be done a follow-up PR. ``` FROM employees | STATS COUNT(*) WHERE hire_date > "1985-01-01" BY d=DATE_TRUNC(1 YEAR, hire_date) ```
This PR adds filter support for pushing down `COUNT(*) BY DATE_TRUNC` introduced in #138023, i.e., ``` FROM idx | WHERE date > NOW() - 90 DAYS | STATS COUNT(*) BY DATE_TRUNC(1 DAY, date) ``` Note that we still won't push down queries where the filter is *on* the `COUNT(*)` since that case is a lot more complicated due to having to keep zeros for filtered buckets: ``` FROM idx | STATS COUNT(*) WHERE date > NOW() - 90 DAYS BY DATE_TRUNC(1 DAY, date) ```
This PR adds golden tests to ES|QL! ● I added a GoldenTestsReadme.md with instructions on how to add new golden tests. ● I tried to keep this (relatively) minimal for the first PR. In the future we can add more features, configurations, stages (e.g., local node_reduce), etc. ● As a proof of concept, I've re-implemented (most) of the tests in SubstitudeRoundToTests using the golden testing framework. ○ When I recently worked on these tests (#138023,#138765), the majority of my time was spent mechanically fixing the expected output in the tests. ○ This was a good use case since I needed to add more support for test nesting due to the loops in the original tests. ○ While more text is being produced when using golden tests (70k lines!), fixing the expected output is a lot simpler, since you can just bulldoze the output! In this case, eyeballing the difference is enough to be convinced that the change is correct. So the maintenance cost goes down. ○ One of the tests (testRoundToWithTimeSeriesIndices) was skipped because its output isn't actually consistent across runs. The original test only verifies a very specific subset of the AST, so it's not a good candidate for golden tests.
This PR adds golden tests to ES|QL! ● I added a GoldenTestsReadme.md with instructions on how to add new golden tests. ● I tried to keep this (relatively) minimal for the first PR. In the future we can add more features, configurations, stages (e.g., local node_reduce), etc. ● As a proof of concept, I've re-implemented (most) of the tests in SubstitudeRoundToTests using the golden testing framework. ○ When I recently worked on these tests (elastic#138023,elastic#138765), the majority of my time was spent mechanically fixing the expected output in the tests. ○ This was a good use case since I needed to add more support for test nesting due to the loops in the original tests. ○ While more text is being produced when using golden tests (70k lines!), fixing the expected output is a lot simpler, since you can just bulldoze the output! In this case, eyeballing the difference is enough to be convinced that the change is correct. So the maintenance cost goes down. ○ One of the tests (testRoundToWithTimeSeriesIndices) was skipped because its output isn't actually consistent across runs. The original test only verifies a very specific subset of the AST, so it's not a good candidate for golden tests.
Pushes down
count(*) by round_toto Lucene.Example query:
This is actually a culmination of several rules:
ReplaceDateTruncBucketWithRoundToReplaces theDATE_TRUNCwith aROUND_TOReplaceRoundToWithQueryAndTagsReplaces theROUND_TOwith query and tags.PushCountQueryAndTagsToSource(This PR) Pushes the aggregation down to Lucene.Note that a query with a filter is not yet supported, but will be done a follow-up PR.