[ES|QL] Making the Expression Suggestor More Semantically Intelligent#241081
[ES|QL] Making the Expression Suggestor More Semantically Intelligent#241081bartoval merged 6 commits intoelastic:mainfrom
Conversation
|
Pinging @elastic/kibana-esql (Team:ESQL) |
|
One of the purposes of this PR is also to manually test to see if something has been forgotten or not totally correct |
stratoula
left a comment
There was a problem hiding this comment.
Looks great and I could not find any bug, only better experience ❤️
I just have some questions that I would like you to answer first
| }; | ||
|
|
||
| // Re-export operator groups for use in tests (avoid hardcoding operator names) | ||
| export { |
There was a problem hiding this comment.
Why are you exporting here again? We have them already here ../definitions/all_operators. Why dont you use the existings exports for the tests?
There was a problem hiding this comment.
Thanks,
At the beginning it was hardcoded and I had all the references to this structure, then I changed it but I forgot to delete it and update
| ], | ||
| mockCallbacks | ||
| ); | ||
| await evalExpectSuggestions('from a | eval a=round(doubleField, ', [], mockCallbacks); |
|
|
||
| test('suggests operators after initial column based on type', async () => { | ||
| // case( field ) suggests all appropriate operators for that field type | ||
| // Note: CASE is expression-heavy, comma is not automatically suggested after fields |
| expect(suggestions).toContainEqual(expectedCompletionItem); | ||
| }); | ||
|
|
||
| test('should NOT appear inside BUCKET function', async () => { |
There was a problem hiding this comment.
It took me some time to understand. This is a suggestion, change it if you dont like it but we need to adjust it
| test('should NOT appear inside BUCKET function', async () => { | |
| test('BUCKET constant arguments should not trigger function suggestions', async () => { |
| expect(duplicates).toEqual([]); | ||
| }); | ||
|
|
||
| test('BUCKET with numeric field should NOT show col0 or date histogram at second param', async () => { |
| }); | ||
|
|
||
| it('conditional with text field: suggests comparison/pattern/IN/null operators in STATS', async () => { | ||
| it.skip('conditional with text field: suggests pattern/IN/null operators (not comparison) in STATS', async () => { |
There was a problem hiding this comment.
Why do you skip this poor test?
There was a problem hiding this comment.
Because he failed, I'm touchy and I punished him. :D
Now I have forgiven him
sddonne
left a comment
There was a problem hiding this comment.
Looks amazing! these are really nice improvements ❤️
There was a problem hiding this comment.
FROM kibana_sample_data_logs | STATS COUNT_DISTINCT(agent) BY BUCKETS(timestamp, 3, // Here the second parameter is a constant, so the rules change. I unlock parameters 3 and 4 with the possible (constant) values: ?start . ?end, select from datepicker
Not sure if it's simple enough to achieve, but would be nice that BUCKET suggests date functions for the last 2 args. To be able to write querys like this:
FROM ebt-kibana-browser | STATS BY BUCKET(context.cloudTrialEndDate, 30, "2012-10-15T11:27:38.230Z", NOW())
There was a problem hiding this comment.
do you mean when you press Add date histogram?
There was a problem hiding this comment.
ah ok you are referring to NOW()
5a89150 to
cb6092c
Compare
💔 Build Failed
Failed CI StepsTest Failures
Metrics [docs]Async chunks
Page load bundle
History
cc @bartoval |
f9359e4 to
9d9cebb
Compare
…elastic#241081) ## Summary elastic#239507 This PR has a multiple purpose: - Improve the quality of context-based suggestions, especially for multiple, conditional, and homogeneous signatures. (CASE, COALESCE, BUCKETS) - prediction of upcoming fields based on signature combinations (already present but improved) - Resolve old ambiguities present in old versions that cause errors in queries (such as suggesting Integer fields when requesting constant integers) - Strengthen the suggest-for-expression mechanism. - Improve the quality of suggestion tests. **Important notes:** - We've currently limited TEXT/KEYWORDS suggestions (only inside functions) to operators like is (not) null, like, etc., removing comparison operators. This point need a bit clarification because theoretically it does make sense suggest comparison operators. - **STATS** doesn't have a location for those operators either, so for example` STATS COUNT_DISTINCT(agent` only shows the comma and not operators like is (not) null - We now distinguish between integers (long, double) and constants, which means that in some cases **we don't suggest** anything. For example, ROUNT_TO has an optional precision field that must be a numeric constant. Previously, we also suggested numeric fields (like bytes), which made the Elastic query invalid. This case can work well with PR elastic#180528 **some examples** **Only Constant and variadic tests ** `FROM kibana_sample_data_logs | EVAL ROUND_TO(bytes,` // we don't suggest nothing because we expect numeric constants `FROM kibana_sample_data_logs | EVAL ROUND_TO(bytes, 3` // we know that this function has max 2 parameters and we suggest only math opearators, because they are the only ones that make sense `FROM kibana_sample_data_logs | EVAL CONCAT(agent, agent.keywords` // CONCAT has min 2 params but it can continue, so we suggest a comma **Homogenity tests** `FROM kibana_sample_data_logs | EVAL COALESCE(bytes` // Here we suggest numeric, comma and comparison operators (because we can transform it into boolean `FROM kibana_sample_data_logs | EVAL COALESCE(bytes,` // After the comma we know that the subsequent types must be numeric and we suggest only these (and subsequently the operators that make sense) **Conditional cases** `FROM kibana_sample_data_logs | EVAL CASE(bytes` // The first parameter must be a complete Boolean expression, so I don't have to suggest a comma. I suggest a comma only when the right operand also exists. **Multisignatures** `FROM kibana_sample_data_logs | STATS COUNT_DISTINCT(agent) BY BUCKETS(bytes,` // It's a number and I only expect a second constant parameter, so I don't suggest anything `FROM kibana_sample_data_logs | STATS COUNT_DISTINCT(agent) BY BUCKETS(timestamp,` // Here I suggest 1 month, 1 year... because it could be a date or a numeric constant (no suggestion). If I select 1 month, then I don't suggest anything because the signature says I have a maximum of 2 parameters. `FROM kibana_sample_data_logs | STATS COUNT_DISTINCT(agent) BY BUCKETS(timestamp, 3, ` // Here the second parameter is a constant, so the rules change. I unlock parameters 3 and 4 with the possible (constant) values: ?start . ?end, select from datepicker
…elastic#241081) ## Summary elastic#239507 This PR has a multiple purpose: - Improve the quality of context-based suggestions, especially for multiple, conditional, and homogeneous signatures. (CASE, COALESCE, BUCKETS) - prediction of upcoming fields based on signature combinations (already present but improved) - Resolve old ambiguities present in old versions that cause errors in queries (such as suggesting Integer fields when requesting constant integers) - Strengthen the suggest-for-expression mechanism. - Improve the quality of suggestion tests. **Important notes:** - We've currently limited TEXT/KEYWORDS suggestions (only inside functions) to operators like is (not) null, like, etc., removing comparison operators. This point need a bit clarification because theoretically it does make sense suggest comparison operators. - **STATS** doesn't have a location for those operators either, so for example` STATS COUNT_DISTINCT(agent` only shows the comma and not operators like is (not) null - We now distinguish between integers (long, double) and constants, which means that in some cases **we don't suggest** anything. For example, ROUNT_TO has an optional precision field that must be a numeric constant. Previously, we also suggested numeric fields (like bytes), which made the Elastic query invalid. This case can work well with PR elastic#180528 **some examples** **Only Constant and variadic tests ** `FROM kibana_sample_data_logs | EVAL ROUND_TO(bytes,` // we don't suggest nothing because we expect numeric constants `FROM kibana_sample_data_logs | EVAL ROUND_TO(bytes, 3` // we know that this function has max 2 parameters and we suggest only math opearators, because they are the only ones that make sense `FROM kibana_sample_data_logs | EVAL CONCAT(agent, agent.keywords` // CONCAT has min 2 params but it can continue, so we suggest a comma **Homogenity tests** `FROM kibana_sample_data_logs | EVAL COALESCE(bytes` // Here we suggest numeric, comma and comparison operators (because we can transform it into boolean `FROM kibana_sample_data_logs | EVAL COALESCE(bytes,` // After the comma we know that the subsequent types must be numeric and we suggest only these (and subsequently the operators that make sense) **Conditional cases** `FROM kibana_sample_data_logs | EVAL CASE(bytes` // The first parameter must be a complete Boolean expression, so I don't have to suggest a comma. I suggest a comma only when the right operand also exists. **Multisignatures** `FROM kibana_sample_data_logs | STATS COUNT_DISTINCT(agent) BY BUCKETS(bytes,` // It's a number and I only expect a second constant parameter, so I don't suggest anything `FROM kibana_sample_data_logs | STATS COUNT_DISTINCT(agent) BY BUCKETS(timestamp,` // Here I suggest 1 month, 1 year... because it could be a date or a numeric constant (no suggestion). If I select 1 month, then I don't suggest anything because the signature says I have a maximum of 2 parameters. `FROM kibana_sample_data_logs | STATS COUNT_DISTINCT(agent) BY BUCKETS(timestamp, 3, ` // Here the second parameter is a constant, so the rules change. I unlock parameters 3 and 4 with the possible (constant) values: ?start . ?end, select from datepicker
…elastic#241081) ## Summary elastic#239507 This PR has a multiple purpose: - Improve the quality of context-based suggestions, especially for multiple, conditional, and homogeneous signatures. (CASE, COALESCE, BUCKETS) - prediction of upcoming fields based on signature combinations (already present but improved) - Resolve old ambiguities present in old versions that cause errors in queries (such as suggesting Integer fields when requesting constant integers) - Strengthen the suggest-for-expression mechanism. - Improve the quality of suggestion tests. **Important notes:** - We've currently limited TEXT/KEYWORDS suggestions (only inside functions) to operators like is (not) null, like, etc., removing comparison operators. This point need a bit clarification because theoretically it does make sense suggest comparison operators. - **STATS** doesn't have a location for those operators either, so for example` STATS COUNT_DISTINCT(agent` only shows the comma and not operators like is (not) null - We now distinguish between integers (long, double) and constants, which means that in some cases **we don't suggest** anything. For example, ROUNT_TO has an optional precision field that must be a numeric constant. Previously, we also suggested numeric fields (like bytes), which made the Elastic query invalid. This case can work well with PR elastic#180528 **some examples** **Only Constant and variadic tests ** `FROM kibana_sample_data_logs | EVAL ROUND_TO(bytes,` // we don't suggest nothing because we expect numeric constants `FROM kibana_sample_data_logs | EVAL ROUND_TO(bytes, 3` // we know that this function has max 2 parameters and we suggest only math opearators, because they are the only ones that make sense `FROM kibana_sample_data_logs | EVAL CONCAT(agent, agent.keywords` // CONCAT has min 2 params but it can continue, so we suggest a comma **Homogenity tests** `FROM kibana_sample_data_logs | EVAL COALESCE(bytes` // Here we suggest numeric, comma and comparison operators (because we can transform it into boolean `FROM kibana_sample_data_logs | EVAL COALESCE(bytes,` // After the comma we know that the subsequent types must be numeric and we suggest only these (and subsequently the operators that make sense) **Conditional cases** `FROM kibana_sample_data_logs | EVAL CASE(bytes` // The first parameter must be a complete Boolean expression, so I don't have to suggest a comma. I suggest a comma only when the right operand also exists. **Multisignatures** `FROM kibana_sample_data_logs | STATS COUNT_DISTINCT(agent) BY BUCKETS(bytes,` // It's a number and I only expect a second constant parameter, so I don't suggest anything `FROM kibana_sample_data_logs | STATS COUNT_DISTINCT(agent) BY BUCKETS(timestamp,` // Here I suggest 1 month, 1 year... because it could be a date or a numeric constant (no suggestion). If I select 1 month, then I don't suggest anything because the signature says I have a maximum of 2 parameters. `FROM kibana_sample_data_logs | STATS COUNT_DISTINCT(agent) BY BUCKETS(timestamp, 3, ` // Here the second parameter is a constant, so the rules change. I unlock parameters 3 and 4 with the possible (constant) values: ?start . ?end, select from datepicker

Summary
#239507
This PR has a multiple purpose:
Important notes:
STATS COUNT_DISTINCT(agentonly shows the comma and not operators like is (not) nullsome examples
**Only Constant and variadic tests **
FROM kibana_sample_data_logs | EVAL ROUND_TO(bytes,// we don't suggest nothing because we expect numeric constantsFROM kibana_sample_data_logs | EVAL ROUND_TO(bytes, 3// we know that this function has max 2 parameters and we suggest only math opearators, because they are the only ones that make senseFROM kibana_sample_data_logs | EVAL CONCAT(agent, agent.keywords// CONCAT has min 2 params but it can continue, so we suggest a commaHomogenity tests
FROM kibana_sample_data_logs | EVAL COALESCE(bytes// Here we suggest numeric, comma and comparison operators (because we can transform it into booleanFROM kibana_sample_data_logs | EVAL COALESCE(bytes,// After the comma we know that the subsequent types must be numeric and we suggest only these (and subsequently the operators that make sense)Conditional cases
FROM kibana_sample_data_logs | EVAL CASE(bytes// The first parameter must be a complete Boolean expression, so I don't have to suggest a comma. I suggest a comma only when the right operand also exists.Multisignatures
FROM kibana_sample_data_logs | STATS COUNT_DISTINCT(agent) BY BUCKETS(bytes,// It's a number and I only expect a second constant parameter, so I don't suggest anythingFROM kibana_sample_data_logs | STATS COUNT_DISTINCT(agent) BY BUCKETS(timestamp,// Here I suggest 1 month, 1 year... because it could be a date or a numeric constant (no suggestion). If I select 1 month, then I don't suggest anything because the signature says I have a maximum of 2 parameters.FROM kibana_sample_data_logs | STATS COUNT_DISTINCT(agent) BY BUCKETS(timestamp, 3,// Here the second parameter is a constant, so the rules change. I unlock parameters 3 and 4 with the possible (constant) values: ?start . ?end, select from datepicker