Skip to content

ESQL: GROUP BY ALL with the dimensions output#138595

Merged
leontyevdv merged 96 commits intoelastic:mainfrom
leontyevdv:feature/esql-group-by-all-dimensions
Dec 2, 2025
Merged

ESQL: GROUP BY ALL with the dimensions output#138595
leontyevdv merged 96 commits intoelastic:mainfrom
leontyevdv:feature/esql-group-by-all-dimensions

Conversation

@leontyevdv
Copy link
Contributor

@leontyevdv leontyevdv commented Nov 25, 2025

The second part of the #136253.

Load a list of dimensions and output them in the _timeseries column instead of _tsid (here is a doc about it).

The first part is here: #137367

Part of #136253

not-napoleon and others added 30 commits August 27, 2025 10:51
 Conflicts:
	x-pack/plugin/esql/qa/testFixtures/src/main/resources/k8s-timeseries.csv-spec
 Conflicts:
	x-pack/plugin/esql/qa/testFixtures/src/main/java/org/elasticsearch/xpack/esql/LoadMapping.java
	x-pack/plugin/esql/qa/testFixtures/src/main/resources/k8s-timeseries.csv-spec
…-all-over-time' into esql-group-by-all-over-time
# Conflicts:
#	server/src/main/resources/transport/upper_bounds/8.18.csv
#	server/src/main/resources/transport/upper_bounds/8.19.csv
#	server/src/main/resources/transport/upper_bounds/9.0.csv
#	server/src/main/resources/transport/upper_bounds/9.1.csv
#	server/src/main/resources/transport/upper_bounds/9.2.csv
#	server/src/main/resources/transport/upper_bounds/9.3.csv
#	x-pack/plugin/esql/qa/testFixtures/src/main/resources/k8s-timeseries.csv-spec
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/TranslateTimeSeriesAggregate.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/Drop.java
Copy link
Contributor

@kkrik-es kkrik-es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment, but this looks good. Thanks Dima!

}

MappingLookup mappingLookup = ctx.getMappingLookup();
Set<String> dimensionFields = new HashSet<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we return an ordered collection to ensure consistent _timeseries values?

Copy link
Contributor Author

@leontyevdv leontyevdv Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to using LinkedHashMap 👍🏼

Add output of the dimension list into the _timeseries column.

Part of elastic#136253
Add output of the dimension list into the _timeseries column.

Part of elastic#136253
…ions' into feature/esql-group-by-all-dimensions
Copy link
Contributor

@kkrik-es kkrik-es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done Dima. Will leave it to Nhat to stamp.

@leontyevdv leontyevdv requested a review from dnhatn December 1, 2025 15:57
@leontyevdv leontyevdv marked this pull request as ready for review December 1, 2025 15:59
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment, but this looks great. Thanks Dima!

* Returns a list of dimension field names from a MappingLookup.
*/
@Nullable
default Set<String> dimensionFields() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should remove this method and add MappingLookup to BlockLoaderContext and compute dimensionFields inside SourceFieldMapper#blockLoader or TimeSeriesMetadataFieldBlockLoader instead. This can be done in a follow-up.

Copy link
Contributor Author

@leontyevdv leontyevdv Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Nhat! This is a great idea which makes the code cleaner. I implemented it in this PR 👍🏼

Move the dimensions extraction into the TimeSeriesMetadataFieldBlockLoader

Part of elastic#136253
…ions' into feature/esql-group-by-all-dimensions
private static class TimeSeries extends BlockStoredFieldsReader {
@Override
public void read(int docId, StoredFields storedFields, Builder builder) throws IOException {
// TODO support appending BytesReference
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: maybe check for empty dimension set? What should we be doing in that case, @dnhatn?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we allow cases where all dimensions have no value? If so, I think we should append an empty json {}.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a test for this case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot create such index. What I've tried:

  1. Creating an index without a routing_path leads to
    "type": "illegal_argument_exception",
    "reason": "[index.mode=time_series] requires a non-empty [index.routing_path]"
  1. Creating an index with a routing_path ["cluster", "pod"] but without time_series_dimension in the mapping leads to
    "type": "illegal_argument_exception",
    "reason": "All fields that match routing_path must be configured with [time_series_dimension: true] or flattened fields with a list of dimensions in [time_series_dimensions] and without the [script] parameter. [cluster] was not a dimension."

Creation of an index without dimensions is impossible (correct me if I'm wrong) so having the empty set of the dimensions is impossible as well.

Another thing I've tried is to index a document without dimensions. This led to the routing hash exception:
It failed while indexing the doc with the exception:

[1:49] failed to parse: Input byte[] should at least have 2 bytes for base64 bytes
org.elasticsearch.index.mapper.DocumentParsingException: [1:49] failed to parse: Input byte[] should at least have 2 bytes for base64 bytes
	at __randomizedtesting.SeedInfo.seed([730F72C82C3D6654:ECFB9F4DA02BB689]:0)
	at org.elasticsearch.index.mapper.DocumentParser.wrapInDocumentParsingException(DocumentParser.java:272)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd probably need to have a time-series index with hardcoded routing path that includes no dimension fields. This is not very interesting, admittedly, but can happen I guess. A graceful failure is fine, so long as we don't produce a cryptic error.

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Dima for another iteration!

}
return dimensionFields;
}
return null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think we should throw an IllegalStateException here, as we should not reach this line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@leontyevdv leontyevdv added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL :StorageEngine/ES|QL Timeseries / metrics / PromQL / logsdb capabilities in ES|QL labels Dec 2, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@leontyevdv leontyevdv merged commit fa98de5 into elastic:main Dec 2, 2025
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >enhancement :StorageEngine/ES|QL Timeseries / metrics / PromQL / logsdb capabilities in ES|QL :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:StorageEngine v9.3.0

6 participants