Skip to content

ESQL QSTR function#112590

Merged
carlosdelest merged 47 commits intoelastic:mainfrom
carlosdelest:carlosdelest/esql-qstr-function
Sep 19, 2024
Merged

ESQL QSTR function#112590
carlosdelest merged 47 commits intoelastic:mainfrom
carlosdelest:carlosdelest/esql-qstr-function

Conversation

@carlosdelest
Copy link
Member

@carlosdelest carlosdelest commented Sep 6, 2024

Adds a QSTR function, behind snapshot builds.

This is very much WIP for sharing ideas and early feedback.

Current approach:

  • Use a FullTextFunction base class for full text query functions
  • QueryStringFunction returns a boolean - the type was done just to fit as part of WHERE boolean expressions, conceptually it returns a query to be executed at Lucene. This may need its own data type 🤔
  • FullTextFunctions are able to be pushed down to Lucene as part of the local physical plan optimizing, similar to MATCH operator
  • Similar to how the MATCH command worked, we're limiting the commands that can be used before the function, so we don't allow commands that may alter the existing columns (DROP, KEEP, EVAL).
# Conflicts:
#	x-pack/plugin/esql/qa/testFixtures/src/main/resources/meta.csv-spec
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
@carlosdelest
Copy link
Member Author

@elasticsearchmachine run buildkite/docs-build-pr

# Conflicts:
#	x-pack/plugin/esql/qa/testFixtures/src/main/resources/meta.csv-spec
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
@carlosdelest carlosdelest marked this pull request as ready for review September 10, 2024 17:41
@carlosdelest carlosdelest requested a review from a team as a code owner September 10, 2024 17:41
@carlosdelest carlosdelest added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Sep 10, 2024
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Sep 10, 2024
@carlosdelest carlosdelest added :Analytics/ES|QL AKA ESQL and removed needs:triage Requires assignment of a team area label labels Sep 10, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

Copy link
Member

@fang-xing-esql fang-xing-esql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting these together @carlosdelest, my comments are added.

var expected = QueryBuilders.boolQuery().must(queryString).must(range);
assertThat(query.query().toString(), is(expected.toString()));
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be great if there are tests with more combinations of functions and operators, to capture the current behaviors. For example:

qstr("last_name: Smith") OR emp_no > 10010 ===> disjunctive
qstr("last_name: Smith") and CIDR_MATCH(ip, "127.0.0.1/32") === both functions can be pushed down into Lucene
qstr(...) and/or qstr(...) ===> it looks like multiple qstrs are allowed but not combined for now, perhaps it can be a candidate for further optimization.
qsrt("last_name: Smith") and/or length(first_name) <= 8 ===> the other side of and/or is not pushed down to Lucene

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fang-xing-esql ! I've added these tests in c5b45ec.

Are there any other combinations that you can think of, or any rule of thumb that I could use to understand what cases should I be implementing? I haven't found more examples in LocalPhysicalPlanOptimizerTests that could guide me here.

it looks like multiple qstrs are allowed but not combined for now, perhaps it can be a candidate for further optimization.

Probably so - I wonder if queries pushed down to Lucene should be left to Lucene to optimize? 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding more tests!

Are there any other combinations that you can think of, or any rule of thumb that I could use to understand what cases should I be implementing? I haven't found more examples in LocalPhysicalPlanOptimizerTests that could guide me here.

It could be challenging to write effective testcases. We always try to improve test coverage, but it is not guaranteed that everything is covered, there could be many edge cases, I found PR reviews help me a lot to improve test coverage. There are many existing test cases, and it is normal not be able to remember all of them :). These are the things that come to my mind first when writing tests for new functions:

  1. Verify resolutions - function and type resolution. This is done(majority of them) by Analyzer, and these tests are usually covered by AnalyzerTests and VerifierTests.
  2. Verify optimizer builds an optimal/desired plan. These are usually covered by LogicalPlanOptimizerTests, PhysicalPlanOptimizerTests, QueryTranslatorTests, and their local version if exist. For the functions and operators that can appear under where clause, I would consider test compound predicates - conjunctive and disjunctive. For this function in particular, because it got pushed down into Lucene, we could get some combinations of bool, should, must queries, which makes me think that we may want to cover conjunctive and disjunctive, and perhaps some nested cases. Most of the ES|QL functions have their own compute engine implementation, and this doesn't, which makes me think that we may want to cover some combinations of them. I think these are not edge cases, they might be quite common use cases. I'll try to think about more scenarios, and update it here, just don't want to throw all of them at one time :-).
  3. Verify the query returns correct results. These are mainly done in CsvTests, REST and yml tests. I would add more complicated(conjunctive and disjunctive) tests to verify we get correct results.

it looks like multiple qstrs are allowed but not combined for now, perhaps it can be a candidate for further optimization.

Probably so - I wonder if queries pushed down to Lucene should be left to Lucene to optimize? 🤔

Yeah, Lucene might be able to do optimizations on multiple queries pushed down to it. Honestly I'm not an expert of it, still learning. There is something we can do in optimizer, an example of it is CombineBinaryComparisons, if Lucene doesn't do it or doesn't do it well, this rule can be a reference.

/**
* Full text function that performs a {@link QueryStringQuery} .
*/
public class QueryStringFunction extends FullTextFunction {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually an ES|QL function has its own AbstractFunctionTestCase, an important usage is to generate documents(some of them are used by Kibana) automatically. This function doesn't have its compute engine implementation, I'm not quite sure how to make an AbstractFunctionTestCase for it yet, need to figure out how to generate the docs automatically, or manually create them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QueryStringFunction takes only string literal, however if a numeric type is provided, it passes Analyzer, if a field is provided as input, it fails at ExpressionTranslator, these could be candidates for VerifierTests.

Override resolveType() is the usual way to validate input data types and foldable for functions during Analyzer phase, perhaps refer to an ES|QL string function as an example.

Copy link
Contributor

@ioanatia ioanatia Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point on AbstractFunctionTestCase, but we don't want to generate documentation yet since the function is only available in snapshots. Also I might be looking at the wrong place, but we have another function Rate that is only available in snapshots and it also seems to be missing its own AbstractFunctionTestCase.
So I wonder if we can defer the need for a AbstractFunctionTestCase for now, since it seems it requires a bit of rework so that AbstractFunctionTestCase can also work with functions that do not have a compute engine implementation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QueryStringFunction takes only string literal, however if a numeric type is provided, it passes Analyzer, if a field is provided as input, it fails at ExpressionTranslator, these could be candidates for VerifierTests.

Override resolveType() is the usual way to validate input data types and foldable for functions during Analyzer phase, perhaps refer to an ES|QL string function as an example.

I've tried to address this in 7b2cbaa

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ Ioana on documentation and AbstractFunctionTestCase - we can probably defer this until we plan to move it out of snapshot builds

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a ui label, so that Kibana is aware of this. As far as I know AbstractFunctionTestCase generates docs for Kibana, we can figure out the best option to move forward.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ Ioana on documentation and AbstractFunctionTestCase - we can probably defer this until we plan to move it out of snapshot builds

Can an issue be created to keep track of AbstractFunctionTestCase if it won't be included in this PR? It is important when this function is taken out of snapshot, we may not want to loose track of it. Thank you!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've actually given it a go here in 4b97e6d - it performs type checking and folding.

| limit 5;
// end::qstr-with-field[]

book_no:keyword | author:text
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A result tag(qstr-with-field-result) is missing, the tag is referenced in docs/reference/esql/functions/examples/qstr.asciidoc

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I added that in 70dd5d2


import static org.hamcrest.Matchers.equalTo;

@FunctionName("qstr")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for including the generated docs.

Don't forget to commit the changes to AbstractFunctionTestCase, otherwise you may get these messages the next time ./gradlew -p x-pack/plugin/esql/ test runs.

Changes not staged for commit:
  (use "git add/rm <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	deleted:    docs/reference/esql/functions/description/qstr.asciidoc
	deleted:    docs/reference/esql/functions/examples/qstr.asciidoc
	deleted:    docs/reference/esql/functions/kibana/definition/qstr.json
	deleted:    docs/reference/esql/functions/kibana/docs/qstr.md
	deleted:    docs/reference/esql/functions/layout/qstr.asciidoc
	deleted:    docs/reference/esql/functions/parameters/qstr.asciidoc
	deleted:    docs/reference/esql/functions/signature/qstr.svg
	deleted:    docs/reference/esql/functions/types/qstr.asciidoc
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did that but reverted the change, as this implies changes from the Categorize function. I've opened this PR for dealing with that specific change. Can you please review?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we are pending on the PR to change AbstractFunctionTestCase to generate docs for snapshot functions, without it if just checking in QSTR related docs, it will be a surprise the next time people run ./gradlew -p x-pack/plugin/esql/ test before pushing PRs, these docs will be marked as deleted.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just merged #113080 👍

@fang-xing-esql
Copy link
Member

fang-xing-esql commented Sep 17, 2024

Let's figure out the technical difficulty to have the docs in place in this PR, it is preferable to have it here so that the downstream components can use them more smoothly, but if it is too much effort we can separate them, just don't want to lose track of it.

I think we have a strategy as the docs can be generated for Kibana but not included ES docs. LMKWYT!

Just added some more comments related to docs, we are almost there.

"examples" : [
"from books \n| where qstr(\"author: Faulkner\")\n| keep book_no, author \n| sort book_no \n| limit 5;"
]
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow the "preview" : true is not here, I'll update here if I find out why.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was due to the docs being generated without the merged version from main - fixed in 28b72f9

----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/qstr-function.csv-spec[tag=qstr-with-field-result]
Copy link
Member

@fang-xing-esql fang-xing-esql Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This qstr-with-field-result tag is not in qstr-function.csv-spec.

[[esql-qstr]]
=== `QSTR`

preview::["Do not use `VALUES` on production environments. This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features."]
Copy link
Member

@fang-xing-esql fang-xing-esql Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is weird VALUES is mentioned here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was due to the docs being generated without the merged version from main - fixed in 28b72f9

@carlosdelest
Copy link
Member Author

@fang-xing-esql , I just merged #113080 so docs generation can be done for QSTR.

Can you please check if there are any other pending concerns, or anything we should track for a follow-up PR, so we can merge this one?

Thanks!

Copy link
Member

@fang-xing-esql fang-xing-esql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you for enabling full text search in ES|QL @carlosdelest !

# Conflicts:
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
@carlosdelest carlosdelest added v8.16.0 auto-backport Automatically create backport pull requests when merged labels Sep 19, 2024
@carlosdelest carlosdelest merged commit 8d1b22e into elastic:main Sep 19, 2024
@carlosdelest
Copy link
Member Author

Thanks for your help @fang-xing-esql !

@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.x Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 112590

@carlosdelest
Copy link
Member Author

💚 All backports created successfully

Status Branch Result
8.x

Questions ?

Please refer to the Backport tool documentation

carlosdelest added a commit to carlosdelest/elasticsearch that referenced this pull request Sep 19, 2024
(cherry picked from commit 8d1b22e)

# Conflicts:
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/EsqlFunctionRegistry.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/EsqlPlugin.java
#	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/SerializationTestUtils.java
@ioanatia ioanatia mentioned this pull request Sep 23, 2024
@ioanatia ioanatia mentioned this pull request Sep 10, 2024
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged ES|QL-ui Impacts ES|QL UI >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.16.0 v9.0.0

8 participants