ESQL: GROUP BY ALL with the dimensions output#138595
ESQL: GROUP BY ALL with the dimensions output#138595leontyevdv merged 96 commits intoelastic:mainfrom
Conversation
Conflicts: x-pack/plugin/esql/qa/testFixtures/src/main/resources/k8s-timeseries.csv-spec
Conflicts: x-pack/plugin/esql/qa/testFixtures/src/main/java/org/elasticsearch/xpack/esql/LoadMapping.java x-pack/plugin/esql/qa/testFixtures/src/main/resources/k8s-timeseries.csv-spec
…-all-over-time' into esql-group-by-all-over-time
# Conflicts: # server/src/main/resources/transport/upper_bounds/8.18.csv # server/src/main/resources/transport/upper_bounds/8.19.csv # server/src/main/resources/transport/upper_bounds/9.0.csv # server/src/main/resources/transport/upper_bounds/9.1.csv # server/src/main/resources/transport/upper_bounds/9.2.csv # server/src/main/resources/transport/upper_bounds/9.3.csv # x-pack/plugin/esql/qa/testFixtures/src/main/resources/k8s-timeseries.csv-spec # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/TranslateTimeSeriesAggregate.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/Drop.java
Add output of the dimension list into the _timeseries column. Part of elastic#136253
dnhatn
left a comment
There was a problem hiding this comment.
One comment, but this looks good. Thanks Dima!
| } | ||
|
|
||
| MappingLookup mappingLookup = ctx.getMappingLookup(); | ||
| Set<String> dimensionFields = new HashSet<>(); |
There was a problem hiding this comment.
Can we return an ordered collection to ensure consistent _timeseries values?
There was a problem hiding this comment.
I changed it to using LinkedHashMap 👍🏼
Add output of the dimension list into the _timeseries column. Part of elastic#136253
Add output of the dimension list into the _timeseries column. Part of elastic#136253
Fix a comment Part of elastic#136253
…ions' into feature/esql-group-by-all-dimensions
kkrik-es
left a comment
There was a problem hiding this comment.
Well done Dima. Will leave it to Nhat to stamp.
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
dnhatn
left a comment
There was a problem hiding this comment.
One comment, but this looks great. Thanks Dima!
| * Returns a list of dimension field names from a MappingLookup. | ||
| */ | ||
| @Nullable | ||
| default Set<String> dimensionFields() { |
There was a problem hiding this comment.
I think we should remove this method and add MappingLookup to BlockLoaderContext and compute dimensionFields inside SourceFieldMapper#blockLoader or TimeSeriesMetadataFieldBlockLoader instead. This can be done in a follow-up.
There was a problem hiding this comment.
Thanks Nhat! This is a great idea which makes the code cleaner. I implemented it in this PR 👍🏼
Move the dimensions extraction into the TimeSeriesMetadataFieldBlockLoader Part of elastic#136253
…ions' into feature/esql-group-by-all-dimensions
Add tests Part of elastic#136253
| private static class TimeSeries extends BlockStoredFieldsReader { | ||
| @Override | ||
| public void read(int docId, StoredFields storedFields, Builder builder) throws IOException { | ||
| // TODO support appending BytesReference |
There was a problem hiding this comment.
Nit: maybe check for empty dimension set? What should we be doing in that case, @dnhatn?
There was a problem hiding this comment.
Do we allow cases where all dimensions have no value? If so, I think we should append an empty json {}.
There was a problem hiding this comment.
I'll add a test for this case
There was a problem hiding this comment.
I cannot create such index. What I've tried:
- Creating an index without a routing_path leads to
"type": "illegal_argument_exception",
"reason": "[index.mode=time_series] requires a non-empty [index.routing_path]"
- Creating an index with a routing_path ["cluster", "pod"] but without time_series_dimension in the mapping leads to
"type": "illegal_argument_exception",
"reason": "All fields that match routing_path must be configured with [time_series_dimension: true] or flattened fields with a list of dimensions in [time_series_dimensions] and without the [script] parameter. [cluster] was not a dimension."
Creation of an index without dimensions is impossible (correct me if I'm wrong) so having the empty set of the dimensions is impossible as well.
Another thing I've tried is to index a document without dimensions. This led to the routing hash exception:
It failed while indexing the doc with the exception:
[1:49] failed to parse: Input byte[] should at least have 2 bytes for base64 bytes
org.elasticsearch.index.mapper.DocumentParsingException: [1:49] failed to parse: Input byte[] should at least have 2 bytes for base64 bytes
at __randomizedtesting.SeedInfo.seed([730F72C82C3D6654:ECFB9F4DA02BB689]:0)
at org.elasticsearch.index.mapper.DocumentParser.wrapInDocumentParsingException(DocumentParser.java:272)
There was a problem hiding this comment.
You'd probably need to have a time-series index with hardcoded routing path that includes no dimension fields. This is not very interesting, admittedly, but can happen I guess. A graceful failure is fine, so long as we don't produce a cryptic error.
Add tests Part of elastic#136253
dnhatn
left a comment
There was a problem hiding this comment.
Thanks Dima for another iteration!
| } | ||
| return dimensionFields; | ||
| } | ||
| return null; |
There was a problem hiding this comment.
nit: I think we should throw an IllegalStateException here, as we should not reach this line.
Polish code Part of elastic#136253
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
The second part of the #136253.
Load a list of dimensions and output them in the _timeseries column instead of _tsid (here is a doc about it).
The first part is here: #137367
Part of #136253