Conversation
…lest/esql-qstr-function # Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/LocalPhysicalPlanOptimizer.java
# Conflicts: # x-pack/plugin/esql/qa/testFixtures/src/main/resources/meta.csv-spec # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
|
@elasticsearchmachine run buildkite/docs-build-pr |
# Conflicts: # x-pack/plugin/esql/qa/testFixtures/src/main/resources/meta.csv-spec # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
fang-xing-esql
left a comment
There was a problem hiding this comment.
Thanks for putting these together @carlosdelest, my comments are added.
| var expected = QueryBuilders.boolQuery().must(queryString).must(range); | ||
| assertThat(query.query().toString(), is(expected.toString())); | ||
| } | ||
|
|
There was a problem hiding this comment.
It will be great if there are tests with more combinations of functions and operators, to capture the current behaviors. For example:
qstr("last_name: Smith") OR emp_no > 10010 ===> disjunctive
qstr("last_name: Smith") and CIDR_MATCH(ip, "127.0.0.1/32") === both functions can be pushed down into Lucene
qstr(...) and/or qstr(...) ===> it looks like multiple qstrs are allowed but not combined for now, perhaps it can be a candidate for further optimization.
qsrt("last_name: Smith") and/or length(first_name) <= 8 ===> the other side of and/or is not pushed down to Lucene
There was a problem hiding this comment.
Thanks @fang-xing-esql ! I've added these tests in c5b45ec.
Are there any other combinations that you can think of, or any rule of thumb that I could use to understand what cases should I be implementing? I haven't found more examples in LocalPhysicalPlanOptimizerTests that could guide me here.
it looks like multiple qstrs are allowed but not combined for now, perhaps it can be a candidate for further optimization.
Probably so - I wonder if queries pushed down to Lucene should be left to Lucene to optimize? 🤔
There was a problem hiding this comment.
Thanks for adding more tests!
Are there any other combinations that you can think of, or any rule of thumb that I could use to understand what cases should I be implementing? I haven't found more examples in
LocalPhysicalPlanOptimizerTeststhat could guide me here.
It could be challenging to write effective testcases. We always try to improve test coverage, but it is not guaranteed that everything is covered, there could be many edge cases, I found PR reviews help me a lot to improve test coverage. There are many existing test cases, and it is normal not be able to remember all of them :). These are the things that come to my mind first when writing tests for new functions:
- Verify resolutions - function and type resolution. This is done(majority of them) by
Analyzer, and these tests are usually covered byAnalyzerTestsandVerifierTests. - Verify optimizer builds an optimal/desired plan. These are usually covered by
LogicalPlanOptimizerTests,PhysicalPlanOptimizerTests,QueryTranslatorTests, and their local version if exist. For the functions and operators that can appear under where clause, I would consider test compound predicates - conjunctive and disjunctive. For this function in particular, because it got pushed down into Lucene, we could get some combinations of bool, should, must queries, which makes me think that we may want to cover conjunctive and disjunctive, and perhaps some nested cases. Most of the ES|QL functions have their own compute engine implementation, and this doesn't, which makes me think that we may want to cover some combinations of them. I think these are not edge cases, they might be quite common use cases. I'll try to think about more scenarios, and update it here, just don't want to throw all of them at one time :-). - Verify the query returns correct results. These are mainly done in CsvTests, REST and yml tests. I would add more complicated(conjunctive and disjunctive) tests to verify we get correct results.
it looks like multiple qstrs are allowed but not combined for now, perhaps it can be a candidate for further optimization.
Probably so - I wonder if queries pushed down to Lucene should be left to Lucene to optimize? 🤔
Yeah, Lucene might be able to do optimizations on multiple queries pushed down to it. Honestly I'm not an expert of it, still learning. There is something we can do in optimizer, an example of it is CombineBinaryComparisons, if Lucene doesn't do it or doesn't do it well, this rule can be a reference.
| /** | ||
| * Full text function that performs a {@link QueryStringQuery} . | ||
| */ | ||
| public class QueryStringFunction extends FullTextFunction { |
There was a problem hiding this comment.
Usually an ES|QL function has its own AbstractFunctionTestCase, an important usage is to generate documents(some of them are used by Kibana) automatically. This function doesn't have its compute engine implementation, I'm not quite sure how to make an AbstractFunctionTestCase for it yet, need to figure out how to generate the docs automatically, or manually create them.
There was a problem hiding this comment.
QueryStringFunction takes only string literal, however if a numeric type is provided, it passes Analyzer, if a field is provided as input, it fails at ExpressionTranslator, these could be candidates for VerifierTests.
Override resolveType() is the usual way to validate input data types and foldable for functions during Analyzer phase, perhaps refer to an ES|QL string function as an example.
There was a problem hiding this comment.
Good point on AbstractFunctionTestCase, but we don't want to generate documentation yet since the function is only available in snapshots. Also I might be looking at the wrong place, but we have another function Rate that is only available in snapshots and it also seems to be missing its own AbstractFunctionTestCase.
So I wonder if we can defer the need for a AbstractFunctionTestCase for now, since it seems it requires a bit of rework so that AbstractFunctionTestCase can also work with functions that do not have a compute engine implementation.
There was a problem hiding this comment.
QueryStringFunction takes only string literal, however if a numeric type is provided, it passes Analyzer, if a field is provided as input, it fails at ExpressionTranslator, these could be candidates for VerifierTests.
Override resolveType() is the usual way to validate input data types and foldable for functions during Analyzer phase, perhaps refer to an ES|QL string function as an example.
I've tried to address this in 7b2cbaa
There was a problem hiding this comment.
++ Ioana on documentation and AbstractFunctionTestCase - we can probably defer this until we plan to move it out of snapshot builds
There was a problem hiding this comment.
I added a ui label, so that Kibana is aware of this. As far as I know AbstractFunctionTestCase generates docs for Kibana, we can figure out the best option to move forward.
There was a problem hiding this comment.
++ Ioana on documentation and
AbstractFunctionTestCase- we can probably defer this until we plan to move it out of snapshot builds
Can an issue be created to keep track of AbstractFunctionTestCase if it won't be included in this PR? It is important when this function is taken out of snapshot, we may not want to loose track of it. Thank you!
| | limit 5; | ||
| // end::qstr-with-field[] | ||
|
|
||
| book_no:keyword | author:text |
There was a problem hiding this comment.
A result tag(qstr-with-field-result) is missing, the tag is referenced in docs/reference/esql/functions/examples/qstr.asciidoc
|
|
||
| import static org.hamcrest.Matchers.equalTo; | ||
|
|
||
| @FunctionName("qstr") |
There was a problem hiding this comment.
Thanks for including the generated docs.
Don't forget to commit the changes to AbstractFunctionTestCase, otherwise you may get these messages the next time ./gradlew -p x-pack/plugin/esql/ test runs.
Changes not staged for commit:
(use "git add/rm <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
deleted: docs/reference/esql/functions/description/qstr.asciidoc
deleted: docs/reference/esql/functions/examples/qstr.asciidoc
deleted: docs/reference/esql/functions/kibana/definition/qstr.json
deleted: docs/reference/esql/functions/kibana/docs/qstr.md
deleted: docs/reference/esql/functions/layout/qstr.asciidoc
deleted: docs/reference/esql/functions/parameters/qstr.asciidoc
deleted: docs/reference/esql/functions/signature/qstr.svg
deleted: docs/reference/esql/functions/types/qstr.asciidoc
There was a problem hiding this comment.
I did that but reverted the change, as this implies changes from the Categorize function. I've opened this PR for dealing with that specific change. Can you please review?
There was a problem hiding this comment.
I guess we are pending on the PR to change AbstractFunctionTestCase to generate docs for snapshot functions, without it if just checking in QSTR related docs, it will be a surprise the next time people run ./gradlew -p x-pack/plugin/esql/ test before pushing PRs, these docs will be marked as deleted.
Just added some more comments related to docs, we are almost there. |
| "examples" : [ | ||
| "from books \n| where qstr(\"author: Faulkner\")\n| keep book_no, author \n| sort book_no \n| limit 5;" | ||
| ] | ||
| } |
There was a problem hiding this comment.
Somehow the "preview" : true is not here, I'll update here if I find out why.
There was a problem hiding this comment.
This was due to the docs being generated without the merged version from main - fixed in 28b72f9
| ---- | ||
| [%header.monospaced.styled,format=dsv,separator=|] | ||
| |=== | ||
| include::{esql-specs}/qstr-function.csv-spec[tag=qstr-with-field-result] |
There was a problem hiding this comment.
This qstr-with-field-result tag is not in qstr-function.csv-spec.
| [[esql-qstr]] | ||
| === `QSTR` | ||
|
|
||
| preview::["Do not use `VALUES` on production environments. This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features."] |
There was a problem hiding this comment.
It is weird VALUES is mentioned here.
There was a problem hiding this comment.
This was due to the docs being generated without the merged version from main - fixed in 28b72f9
|
@fang-xing-esql , I just merged #113080 so docs generation can be done for Can you please check if there are any other pending concerns, or anything we should track for a follow-up PR, so we can merge this one? Thanks! |
fang-xing-esql
left a comment
There was a problem hiding this comment.
LGTM! Thank you for enabling full text search in ES|QL @carlosdelest !
# Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
|
Thanks for your help @fang-xing-esql ! |
💔 Backport failed
You can use sqren/backport to manually backport by running |
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
(cherry picked from commit 8d1b22e) # Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/EsqlFunctionRegistry.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/EsqlPlugin.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/SerializationTestUtils.java
Adds a QSTR function, behind snapshot builds.
This is very much WIP for sharing ideas and early feedback.
Current approach:
FullTextFunctionbase class for full text query functionsQueryStringFunctionreturns a boolean - the type was done just to fit as part of WHERE boolean expressions, conceptually it returns a query to be executed at Lucene. This may need its own data type 🤔FullTextFunctions are able to be pushed down to Lucene as part of the local physical plan optimizing, similar to MATCH operatorMATCHcommand worked, we're limiting the commands that can be used before the function, so we don't allow commands that may alter the existing columns (DROP,KEEP,EVAL).