ESQL - date nanos range bug? #125345
Conversation
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
|
Hi @not-napoleon, I've created a changelog YAML for you. |
|
Hi @not-napoleon, I've updated the changelog YAML for you. |
nik9000
left a comment
There was a problem hiding this comment.
I left some small stuff about the trace logs.
| */ | ||
| DATE_NANOS_DATE_DIFF(), | ||
| /** | ||
| * Indicates that https://github.com/elastic/elasticsearch/issues/125439 is fixed |
There was a problem hiding this comment.
I'd prefer if we added a short description so I don't have to open github when reading the code.
| + attribute | ||
| + "<" | ||
| + attribute.dataType() | ||
| + ">]" |
There was a problem hiding this comment.
Could you do this with {} so we don't build the string every time?
| DataType dataType = value.dataType(); | ||
| if (DataType.isDateTime(dataType) && DataType.isDateTime(lower.dataType()) && DataType.isDateTime(upper.dataType())) { | ||
| logger.trace( | ||
| "Translating Range into lucene query. dataType is [" + dataType + "] upper is [" + upper + "] lower is [" + lower + "]" |
There was a problem hiding this comment.
Could you do this with {} so we don't build the string every time?
| @Override | ||
| public final PhysicalOperation sourcePhysicalOperation(EsQueryExec esQueryExec, LocalExecutionPlannerContext context) { | ||
| final LuceneOperator.Factory luceneFactory; | ||
| logger.trace("Query Exec is " + esQueryExec); |
fang-xing-esql
left a comment
There was a problem hiding this comment.
LGTM, thank you for fixing it quickly @not-napoleon! I added a couple of comments, they can be addressed as follow ups.
| -18489600 | -18489599 | 2023-03-23T12:15:03.360103847Z | ||
| ; | ||
|
|
||
| Regression out of bounds in where clause |
There was a problem hiding this comment.
Some more tests that I think could be helpful are having both lower and upper bounds with literals that can disqualify rows, more combinations of >, >=, < and <=, for example
WHERE nanos > to_datenanos("2023-10-23T12:15:00") AND nanos <= to_datenanos("2023-10-23T13:53:55.832987654Z")
WHERE nanos >= to_datenanos("2023-10-23T12:15:00") AND nanos <= now()
| import static org.elasticsearch.xpack.esql.type.EsqlDataTypeConverter.versionToString; | ||
|
|
||
| // BETWEEN or range - is a mix of gt(e) AND lt(e) | ||
| public class Range extends ScalarFunction implements TranslationAware.SingleValueTranslationAware { |
There was a problem hiding this comment.
There is a method areBoundariesInvalid to validate the lower and upper bounds, date_nanos can be added in there too, this can be a follow up.
…os-range-query-bug' into esql-date-nanos-range-query-bug
|
@elasticmachine run elasticsearch-ci/part-2 |
Fixes elastic#125439 We were incorrectly formatting nanosecond dates when building lucene queries. We had missed this in our testing because none of the CSV tests were running against Lucene. This happened because the date nanos test data includes multivalue fields. Our warning behavior for multivalue fields is inconsistent between queries run in Lucene and queries run in pure ES|QL without pushdown. Our warning tests, however, require that the specified warnings be present in all execution paths. When we first built the date nanos CSV tests, we worked around this by always using an MV function to unpack the multivalue fields. But we forgot that using an MV function prevents the entire query from being pushed down to Lucene, and thus that path wasn't being tested. In this PR, I've duplicated many of the tests to have a version that doesn't use the MV function, and uses warningRegex instead of warning. The regex version does not fail if the header is absent, so it's safe to use in both modes. Rewriting the tests this way revealed several situations in which this bug can manifest, all of which are fixed in this PR. I cannot be confidant that there aren't more paths that can trigger this bug and aren't covered by these tests, but I haven't found any yet. I've left some trace level logging that I found helpful while debugging this. --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Fixes elastic#125439 We were incorrectly formatting nanosecond dates when building lucene queries. We had missed this in our testing because none of the CSV tests were running against Lucene. This happened because the date nanos test data includes multivalue fields. Our warning behavior for multivalue fields is inconsistent between queries run in Lucene and queries run in pure ES|QL without pushdown. Our warning tests, however, require that the specified warnings be present in all execution paths. When we first built the date nanos CSV tests, we worked around this by always using an MV function to unpack the multivalue fields. But we forgot that using an MV function prevents the entire query from being pushed down to Lucene, and thus that path wasn't being tested. In this PR, I've duplicated many of the tests to have a version that doesn't use the MV function, and uses warningRegex instead of warning. The regex version does not fail if the header is absent, so it's safe to use in both modes. Rewriting the tests this way revealed several situations in which this bug can manifest, all of which are fixed in this PR. I cannot be confidant that there aren't more paths that can trigger this bug and aren't covered by these tests, but I haven't found any yet. I've left some trace level logging that I found helpful while debugging this. --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Fixes elastic#125439 We were incorrectly formatting nanosecond dates when building lucene queries. We had missed this in our testing because none of the CSV tests were running against Lucene. This happened because the date nanos test data includes multivalue fields. Our warning behavior for multivalue fields is inconsistent between queries run in Lucene and queries run in pure ES|QL without pushdown. Our warning tests, however, require that the specified warnings be present in all execution paths. When we first built the date nanos CSV tests, we worked around this by always using an MV function to unpack the multivalue fields. But we forgot that using an MV function prevents the entire query from being pushed down to Lucene, and thus that path wasn't being tested. In this PR, I've duplicated many of the tests to have a version that doesn't use the MV function, and uses warningRegex instead of warning. The regex version does not fail if the header is absent, so it's safe to use in both modes. Rewriting the tests this way revealed several situations in which this bug can manifest, all of which are fixed in this PR. I cannot be confidant that there aren't more paths that can trigger this bug and aren't covered by these tests, but I haven't found any yet. I've left some trace level logging that I found helpful while debugging this. --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Fixes #125439 We were incorrectly formatting nanosecond dates when building lucene queries. We had missed this in our testing because none of the CSV tests were running against Lucene. This happened because the date nanos test data includes multivalue fields. Our warning behavior for multivalue fields is inconsistent between queries run in Lucene and queries run in pure ES|QL without pushdown. Our warning tests, however, require that the specified warnings be present in all execution paths. When we first built the date nanos CSV tests, we worked around this by always using an MV function to unpack the multivalue fields. But we forgot that using an MV function prevents the entire query from being pushed down to Lucene, and thus that path wasn't being tested. In this PR, I've duplicated many of the tests to have a version that doesn't use the MV function, and uses warningRegex instead of warning. The regex version does not fail if the header is absent, so it's safe to use in both modes. Rewriting the tests this way revealed several situations in which this bug can manifest, all of which are fixed in this PR. I cannot be confidant that there aren't more paths that can trigger this bug and aren't covered by these tests, but I haven't found any yet. I've left some trace level logging that I found helpful while debugging this. --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Fixes #125439 We were incorrectly formatting nanosecond dates when building lucene queries. We had missed this in our testing because none of the CSV tests were running against Lucene. This happened because the date nanos test data includes multivalue fields. Our warning behavior for multivalue fields is inconsistent between queries run in Lucene and queries run in pure ES|QL without pushdown. Our warning tests, however, require that the specified warnings be present in all execution paths. When we first built the date nanos CSV tests, we worked around this by always using an MV function to unpack the multivalue fields. But we forgot that using an MV function prevents the entire query from being pushed down to Lucene, and thus that path wasn't being tested. In this PR, I've duplicated many of the tests to have a version that doesn't use the MV function, and uses warningRegex instead of warning. The regex version does not fail if the header is absent, so it's safe to use in both modes. Rewriting the tests this way revealed several situations in which this bug can manifest, all of which are fixed in this PR. I cannot be confidant that there aren't more paths that can trigger this bug and aren't covered by these tests, but I haven't found any yet. I've left some trace level logging that I found helpful while debugging this. --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Fixes #125439 We were incorrectly formatting nanosecond dates when building lucene queries. We had missed this in our testing because none of the CSV tests were running against Lucene. This happened because the date nanos test data includes multivalue fields. Our warning behavior for multivalue fields is inconsistent between queries run in Lucene and queries run in pure ES|QL without pushdown. Our warning tests, however, require that the specified warnings be present in all execution paths. When we first built the date nanos CSV tests, we worked around this by always using an MV function to unpack the multivalue fields. But we forgot that using an MV function prevents the entire query from being pushed down to Lucene, and thus that path wasn't being tested. In this PR, I've duplicated many of the tests to have a version that doesn't use the MV function, and uses warningRegex instead of warning. The regex version does not fail if the header is absent, so it's safe to use in both modes. Rewriting the tests this way revealed several situations in which this bug can manifest, all of which are fixed in this PR. I cannot be confidant that there aren't more paths that can trigger this bug and aren't covered by these tests, but I haven't found any yet. I've left some trace level logging that I found helpful while debugging this. --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
…millis (#125595) Follow up to #125345. If the query contained both a nanos and a millis comparison, we were formatting the dates incorrectly for the lucene push down. This PR adds a test and a fix for that case. --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
…millis (elastic#125595) Follow up to elastic#125345. If the query contained both a nanos and a millis comparison, we were formatting the dates incorrectly for the lucene push down. This PR adds a test and a fix for that case. --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
…millis (elastic#125595) Follow up to elastic#125345. If the query contained both a nanos and a millis comparison, we were formatting the dates incorrectly for the lucene push down. This PR adds a test and a fix for that case. --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
…millis (elastic#125595) Follow up to elastic#125345. If the query contained both a nanos and a millis comparison, we were formatting the dates incorrectly for the lucene push down. This PR adds a test and a fix for that case. --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
…millis (#125595) (#125617) Follow up to #125345. If the query contained both a nanos and a millis comparison, we were formatting the dates incorrectly for the lucene push down. This PR adds a test and a fix for that case. --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
…millis (#125595) (#125618) Follow up to #125345. If the query contained both a nanos and a millis comparison, we were formatting the dates incorrectly for the lucene push down. This PR adds a test and a fix for that case. --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
…millis (#125595) (#125619) Follow up to #125345. If the query contained both a nanos and a millis comparison, we were formatting the dates incorrectly for the lucene push down. This PR adds a test and a fix for that case. --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Fixes elastic#125439 We were incorrectly formatting nanosecond dates when building lucene queries. We had missed this in our testing because none of the CSV tests were running against Lucene. This happened because the date nanos test data includes multivalue fields. Our warning behavior for multivalue fields is inconsistent between queries run in Lucene and queries run in pure ES|QL without pushdown. Our warning tests, however, require that the specified warnings be present in all execution paths. When we first built the date nanos CSV tests, we worked around this by always using an MV function to unpack the multivalue fields. But we forgot that using an MV function prevents the entire query from being pushed down to Lucene, and thus that path wasn't being tested. In this PR, I've duplicated many of the tests to have a version that doesn't use the MV function, and uses warningRegex instead of warning. The regex version does not fail if the header is absent, so it's safe to use in both modes. Rewriting the tests this way revealed several situations in which this bug can manifest, all of which are fixed in this PR. I cannot be confidant that there aren't more paths that can trigger this bug and aren't covered by these tests, but I haven't found any yet. I've left some trace level logging that I found helpful while debugging this. --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
…millis (elastic#125595) Follow up to elastic#125345. If the query contained both a nanos and a millis comparison, we were formatting the dates incorrectly for the lucene push down. This PR adds a test and a fix for that case. --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Fixes #125439
We were incorrectly formatting nanosecond dates when building lucene queries. We had missed this in our testing because none of the CSV tests were running against Lucene. This happened because the date nanos test data includes multivalue fields. Our warning behavior for multivalue fields is inconsistent between queries run in Lucene and queries run in pure ES|QL without pushdown. Our warning tests, however, require that the specified warnings be present in all execution paths. When we first built the date nanos CSV tests, we worked around this by always using an MV function to unpack the multivalue fields. But we forgot that using an MV function prevents the entire query from being pushed down to Lucene, and thus that path wasn't being tested.
In this PR, I've duplicated many of the tests to have a version that doesn't use the MV function, and uses
warningRegexinstead ofwarning. The regex version does not fail if the header is absent, so it's safe to use in both modes. Rewriting the tests this way revealed several situations in which this bug can manifest, all of which are fixed in this PR. I cannot be confidant that there aren't more paths that can trigger this bug and aren't covered by these tests, but I haven't found any yet.I've left some trace level logging that I found helpful while debugging this.