ESQL: introduce a new interface to declare functions depending on the @timestamp attribute#137040
ESQL: introduce a new interface to declare functions depending on the @timestamp attribute#137040bpintea merged 15 commits intoelastic:mainfrom
@timestamp attribute#137040Conversation
|
Hi @bpintea, I've created a changelog YAML for you. |
| if (plan instanceof Rerank r) { | ||
| return resolveRerank(r, childrenOutput); | ||
| } | ||
| return switch (plan) { |
costin
left a comment
There was a problem hiding this comment.
Thanks for looking into this. Left a first round of comments.
There was a problem hiding this comment.
The alternative to this interface is the TimestampAware interface with a timestamp() getter-like method. This would act as a marker interface which can be used by the FunctionRegistry to know if the timestamp needs to be injected or not.
To avoid compiler ambiguities, potentially add the tcab wrapper.
| public TBucket(Source source, Expression buckets, TimestampAttributeSupplier timestampSupplier) { | ||
| this(source, buckets, timestampSupplier.timestampAttribute()); | ||
| } |
There was a problem hiding this comment.
This constructor could be removed and instead the function would rely only on the TBucket constructor with 2 parameters where the users supplies the bucket while the timestamp by the ESQL infra.
There was a problem hiding this comment.
Adding a timestamp() method would be useful.
There was a problem hiding this comment.
Added. It's not absolutely needed now, but having it available for later. It fits well with some of the implementing methods' implicit timestamp() existing method.
| if (arg.foldable() && ((arg instanceof EsqlScalarFunction) == false)) { | ||
| if (i < targetDataTypes.size()) { | ||
| targetDataType = targetDataTypes.get(i); | ||
| int targetDataTypesIdx = f instanceof TimestampAware ? i - 1 : i; |
There was a problem hiding this comment.
How about using TimestampAware.timestamp() ?
| /** | ||
| * Marker interface to identify classes of functions that operate on the {code @timestamp} field of an index. | ||
| * Implementations of this interface need to expect the associated {@code Attribute} to be passed as the following argument after the | ||
| * {@code Source} one, which is always the first one. | ||
| */ | ||
| public interface TimestampAware {} |
There was a problem hiding this comment.
What's the reason for having the Timestamp expression right after Source and not the last Expression, that is:
Class(Source, Expression field, Expression timestamp)
Class(Source, Expression field, Expression timestamp, Configuration config)
Class(Source, Expression field, Expression timestamp, List)
In case of ambiguity, this can be sorted out inside the function registry.
There was a problem hiding this comment.
IMO it makes the most sense to have it following source:
- it's always there, just like source and always on same position for all TimestampAware functions;
- it's consistent to report (verification issues) about its position, if needed;
- it's clearer to apply positional transformations, like implicit casting (example).
But I've reverted this change, leaving the code inline with the other TS functions.
|
Hi @bpintea, I've updated the changelog YAML for you. |
| import static org.hamcrest.Matchers.startsWith; | ||
|
|
||
| //@TestLogging(value = "org.elasticsearch.xpack.esql:TRACE,org.elasticsearch.compute:TRACE", reason = "debug") | ||
| @TestLogging(value = "org.elasticsearch.xpack.esql:TRACE,org.elasticsearch.compute:TRACE", reason = "debug") |
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
| defTS(FirstOverTime.class, FirstOverTime::new, "first_over_time"), | ||
| def(PercentileOverTime.class, bi(PercentileOverTime::new), "percentile_over_time"), | ||
| // dense vector function | ||
| def(TextEmbedding.class, bi(TextEmbedding::new), "text_embedding") } }; |
There was a problem hiding this comment.
I wonder if First/Last functions should be updated as well?
Conceptually they need @timestamp although (if I remember correctly) it is possible to supply it or another date/time field explicitly.
There was a problem hiding this comment.
The sort can be anything (as long as dt -> dt == DataType.LONG || dt == DataType.DATETIME || dt == DataType.DATE_NANOS), @timestamp included, but not restricted to it.
This change only applies to functions that only work with the @timestamp field and which assume it's there, i.e. it's an implicit dependency (which this change provides/injects).
This updates the way the
@timestampfield is injected into the functions that require it implicitly: these functions no longer need to declare an attribute themselves, the function registry will do it for them.Followingly, this can be traced from the source and eventually wired into the functions (so that renames no longer be problematic).
Closes #136772