Skip to content

ESQL: GROUP BY ALL#137367

Merged
leontyevdv merged 66 commits intoelastic:mainfrom
leontyevdv:feature/esql-group-by-all
Nov 25, 2025
Merged

ESQL: GROUP BY ALL#137367
leontyevdv merged 66 commits intoelastic:mainfrom
leontyevdv:feature/esql-group-by-all

Conversation

@leontyevdv
Copy link
Contributor

@leontyevdv leontyevdv commented Oct 30, 2025

The final goal of the task #136253 is to output a list of dimensions for the time-series group.

We split this work into the three steps:

  1. Support bare time-series aggregation functions and output the value of _tsid in the new _timeseries column:
    Example:
TS k8s
| STATS avg = avg_over_time(network.cost) BY tbucket = TBUCKET(1 hour)

avg:double         | tbucket:datetime         | _timeseries:keyword
7.262931034482759  | 2024-05-10T00:00:00.000Z | KBGrBhmEnziumfgfOq1dn9NyLSzHbdLj0kfy_m-tXh-yxVR6b3E17ss
6.870689655172414  | 2024-05-10T00:00:00.000Z | KBGrBhmEnziumfgfOq1dn9NyLSzHck3A2MQiFrxrlvTQFP7YxbJ0rYg
6.635416666666667  | 2024-05-10T00:00:00.000Z | KBGrBhmEnziumfgfOq1dn9NyLSzHmDfU38IhLHmDYPRNCcqXvvDolM4
  1. Load a list of dimensions and output them in the _timeseries column instead of _tsid (here is a doc about it).

  2. Make the optimization to load a doc per _tsid.


This PR adds support for bare time-series aggregate functions and the _timeseries column into the output (step 1).

Part of #136253

not-napoleon and others added 30 commits August 27, 2025 10:51
 Conflicts:
	x-pack/plugin/esql/qa/testFixtures/src/main/resources/k8s-timeseries.csv-spec
 Conflicts:
	x-pack/plugin/esql/qa/testFixtures/src/main/java/org/elasticsearch/xpack/esql/LoadMapping.java
	x-pack/plugin/esql/qa/testFixtures/src/main/resources/k8s-timeseries.csv-spec
…-all-over-time' into esql-group-by-all-over-time
# Conflicts:
#	server/src/main/resources/transport/upper_bounds/8.18.csv
#	server/src/main/resources/transport/upper_bounds/8.19.csv
#	server/src/main/resources/transport/upper_bounds/9.0.csv
#	server/src/main/resources/transport/upper_bounds/9.1.csv
#	server/src/main/resources/transport/upper_bounds/9.2.csv
#	server/src/main/resources/transport/upper_bounds/9.3.csv
#	x-pack/plugin/esql/qa/testFixtures/src/main/resources/k8s-timeseries.csv-spec
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/TranslateTimeSeriesAggregate.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/Drop.java
Ignore GroupByAll in GenerativeForkRestTest

Part of elastic#136253
@leontyevdv leontyevdv marked this pull request as ready for review November 19, 2025 17:22
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@elasticsearchmachine
Copy link
Collaborator

Hi @leontyevdv, I've updated the changelog YAML for you.

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach looks good. Thanks Dima! I played with the PR and think we can simplify the translation and dimension loading. Can you take a look at this patch to see if I missed anything?

* DROP commands are parsed into Drop objects, which the {@link org.elasticsearch.xpack.esql.analysis.Analyzer.ResolveRefs} rule then
* rewrites into {@link org.elasticsearch.xpack.esql.plan.logical.local.EsqlProject} plans, along with other projection-like commands.
* As such, Drop is neither serializable nor able to be mapped to a corresponding physical plan.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we revert this change? Adding Javadoc is great, but it should be done in a separate change.

boolean ignoreError = hasProjectAwayColumns || hasLookupJoinExec || hasTextGroupingInTimeSeries;

// TranslateTimeSeriesAggregate may add a _timeseries attribute into the projection
boolean hasTimeseriesReplacingTsid = expectedOutputAttributes.stream()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we check TimeSeriesAggregate only? This could slow down simple queries like FROM x when the output has many fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, done!

Map.entry(GEOHEX, (source, fieldEval) -> new ToStringFromGeoGridEvaluator.Factory(source, fieldEval, GEOHEX)),
Map.entry(AGGREGATE_METRIC_DOUBLE, ToStringFromAggregateMetricDoubleEvaluator.Factory::new)
Map.entry(AGGREGATE_METRIC_DOUBLE, ToStringFromAggregateMetricDoubleEvaluator.Factory::new),
Map.entry(TSID_DATA_TYPE, ToStringFromTsidEvaluator.Factory::new)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we support tsid in Base64 for and surrogate this to Base64 instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this will be better. I reverted toString and adjusted Base64 to support _tsid. Thank you!


public LogicalPlan rule(TimeSeriesAggregate aggregate) {
boolean hasTopLevelOverTimeAggs = false;
List<NamedExpression> newAggregateFunctions = new ArrayList<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: init with the size form aggregate.aggregates()

newAggregateFunctions.add(new Alias(alias.source(), alias.name(), new Values(tsAgg.source(), tsAgg)));
} else {
// Preserve non-time-series aggregates
newAggregateFunctions.add(agg);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should fail if we have mixed aggregates: with and without outer aggregation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a check for this below. If there are mixed aggs, we throw an exception now.

@dnhatn dnhatn self-requested a review November 20, 2025 00:17
Remove Drop JavaDoc

Part of elastic#136253
Address the PR's comments

Part of elastic#136253
Address the PR's comments

Part of elastic#136253
# Conflicts:
#	x-pack/plugin/esql/qa/testFixtures/src/main/resources/k8s-timeseries.csv-spec
Address the PR's comments

Part of elastic#136253
Address the PR's comments

Part of elastic#136253
Address the PR's comments

Part of elastic#136253
);
boolean ignoreError = hasProjectAwayColumns || hasLookupJoinExec || hasTextGroupingInTimeSeries;

// TranslateTimeSeriesAggregate may add a _timeseries attribute into the projection
Copy link
Contributor Author

@leontyevdv leontyevdv Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need this because in the LogicalPlanOptimizer.optimize() line 123 we take an expected output from the verified plan and the _timeseries attribute is not there. It appears in the optimized plan.

# Conflicts:
#	x-pack/plugin/esql/qa/server/src/main/java/org/elasticsearch/xpack/esql/qa/rest/generative/GenerativeForkRestTest.java
Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Thanks Dima!

);
List<Expression> groupings = new ArrayList<>();
groupings.add(timeSeries);
groupings.addAll(aggregate.groupings());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kkrik-es Should we allow or reject TS .. | STATS rate(x) BY cluster. I am okay with either option.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My 2c is that this is a bit confusing. It suggests that we're just grouping by cluster but there's no outer aggregation for it. I understand that it groups by all and then also adds the cluster column but I don't think that's self-evident.

Maybe it's something we should leave out for now, it's something we can easily add later as well. Maybe it makes more sense in combination with an explicit BY DIMENSIONS(), cluster. But I'm skeptical if it's an important use case to support both an implicit and explicit grouping at the same time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I'd rather keep the semantics simple and reject this. We either provide no grouping attribute, and group by all, or provide one or more grouping attributes along with a reduction function (outer agg).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a check that rejects such cases. It allows to use GroupingFunction (Bucket, TBucket, Categorize) but rejects attributes. This applies to the time-series aggregates only.

@leontyevdv leontyevdv merged commit ef8a008 into elastic:main Nov 25, 2025
34 checks passed
ncordon pushed a commit to ncordon/elasticsearch that referenced this pull request Nov 26, 2025
Adds support for bare time-series aggregate functions and a _timeseries column into the output.

The _timeseries column contain a BASE64 representation of a _tsid value.

Part of elastic#136253

---------

Co-authored-by: Mark Tozzi <mark.tozzi@gmail.com>
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >enhancement :StorageEngine/ES|QL Timeseries / metrics / PromQL / logsdb capabilities in ES|QL :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:StorageEngine v9.3.0

6 participants