Conversation
|
Documentation preview: |
|
@abatsakis please enable the option "Allow edits and access to secrets by maintainers" on your PR. For more information, see the documentation. |
|
Pinging @elastic/es-ql (Team:QL) |
|
Pinging @elastic/elasticsearch-esql (:Query Languages/ES|QL) |
luigidellaquila
left a comment
There was a problem hiding this comment.
I added a comment regarding a possible significant optimization.
| } | ||
|
|
||
| @Evaluator | ||
| static BytesRef process(BytesRef str, BytesRef regex, BytesRef newStr) { |
There was a problem hiding this comment.
The regex is a very good candidate to be optimized as a @Fixed parameter.
You can have an additional process method like
@Evaluator(extraName = "Constant")
static BytesRef process(BytesRef str, @Fixed Pattern regex, BytesRef newStr) { ... }
(not sure it works in this order, maybe you'll have to switch second and third parameter)
and compile the regex Pattern only once in case the expression is foldable (that is likely the most common use case). It will make the execution much more efficient.
For a practical example see https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/scalar/date/DateFormat.java#L87
abdonpijpelink
left a comment
There was a problem hiding this comment.
Left a small docs suggestion (every command on a new line; remove the semicolon).
| //tag::replaceString[] | ||
| ROW str = "Hello World" | EVAL str = REPLACE(str, "World", "Universe") | KEEP str; | ||
| // end::replaceString[] |
There was a problem hiding this comment.
| //tag::replaceString[] | |
| ROW str = "Hello World" | EVAL str = REPLACE(str, "World", "Universe") | KEEP str; | |
| // end::replaceString[] | |
| //tag::replaceString[] | |
| ROW str = "Hello World" | |
| | EVAL str = REPLACE(str, "World", "Universe") | |
| | KEEP str | |
| // end::replaceString[] | |
| ; |
| } | ||
|
|
||
| try { | ||
| return new BytesRef(str.utf8ToString().replaceAll(regex.pattern(), newStr.utf8ToString())); |
There was a problem hiding this comment.
Now that you have the compiled Pattern, you can use it for a more efficient replaceAll():
return new BytesRef(regex.matcher(str.utf8ToString()).replaceAll(newStr.utf8ToString()));
luigidellaquila
left a comment
There was a problem hiding this comment.
LGTM!
Thanks for your patience @abatsakis, I just left one last small comment about tests, but for the rest the PR is 👍 👍 👍
| ; | ||
|
|
||
| replace complex | ||
| from employees | where emp_no <= 10010 | eval f_l = replace(replace(substring(last_name, 1, 20), "al", "AB"), "a", "Z") | keep emp_no, last_name, f_l; |
There was a problem hiding this comment.
Please add a | sort emp_no somewhere in these two queries, otherwise they'll randomly fail in multi-node
|
@abatsakis I'll polish this one and push it over the finishing line, if that's ok with you. |
| * and click {@code Build->Recompile 'FunctionName.java'} in IntelliJ. This should generate | ||
| * an {@link org.elasticsearch.compute.operator.EvalOperator.ExpressionEvaluator} implementation | ||
| * and click {@code Build->Recompile 'FunctionName.java'} in IntelliJ or run the | ||
| * {@code CsvTests} again.. This should generate an |
There was a problem hiding this comment.
| * {@code CsvTests} again.. This should generate an | |
| * {@code CsvTests} again. This should generate an |
| ; | ||
|
|
||
| replaceRegex | ||
| from hosts | where host == "epsilon" | eval l1=replace(host_group, "\\s+", "") | sort l1 | keep l1; |
There was a problem hiding this comment.
just in case any other change might be applied: keep host_group too.
| @@ -0,0 +1,16 @@ | |||
| [[esql-replace]] | |||
| === `REPLACE` | |||
| This function substitutes the replacement string (3rd argument) for every occurrence of the regular expression (2nd argument) in the string (1st argument). | |||
There was a problem hiding this comment.
Optional alternative: The function substitutes in the string (1st argument) any match of the regular expression (2nd argument) with the replacement string (3rd argument).
I'd find it easier to "read the function" as it flows.
| regexPattern = Pattern.compile(((BytesRef) regex.fold()).utf8ToString()); | ||
| } catch (PatternSyntaxException pse) { | ||
| // TODO this is not right (inconsistent). See also https://github.com/elastic/elasticsearch/issues/100038 | ||
| // this should generate a header warning and return null (as do the rest of this functionality in evaluators), |
There was a problem hiding this comment.
IMO throwing here (as it now happens) is what we should align to: the argument (the reg. expression) isn't correct and we should inform the user before execution; less so, but somewhat similar to a syntax error.
We could let that happen maybe even earlier (like when folding). So the try-catch could be dropped, IMO.
(But we can go ahead with merging this as is and revisiting when deciding on #100038).
There was a problem hiding this comment.
I am letting this one as is for now. I reckon we'll have a discussion about this general approach (not only here) and whatever we decide then we'll be applied everywhere. I don't like the inconsistent behavior, but I do also see a reasoning in throwing an exception here. I'm very much in favor of the consistency approach, though.
| assertThat(process("I love cats and cats are amazing.", "\\bcats\\b", "dogs"), equalTo("I love dogs and dogs are amazing.")); | ||
| } | ||
|
|
||
| public void testUnicode() { |
…esql/replaceFunction
|
@elasticmachine update branch |
Co-authored-by: Alexandros Batsakis <abatsakis@splunk.com> Co-authored-by: Andrei Stefan <andrei@elastic.co>
Co-authored-by: Alexandros Batsakis <abatsakis@splunk.com> Co-authored-by: Andrei Stefan <andrei@elastic.co>
The EVAL replace function that substitutes the
new stringfor every occurrence of theregexin thesourcestring.