[ES|QL] pushing down eval expression when it requires data access #136610
[ES|QL] pushing down eval expression when it requires data access #136610pmpailis merged 33 commits intoelastic:mainfrom
Conversation
…re_function_error_no_shards
…thub.com:pmpailis/elasticsearch into fix_133462_fixing_score_function_error_no_shards
…re_function_error_no_shards
|
Hi @pmpailis, I've created a changelog YAML for you. |
…thub.com:pmpailis/elasticsearch into fix_133462_fixing_score_function_error_no_shards
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
| required_capability: pushing_down_eval_with_score | ||
|
|
||
| FROM books | ||
| | EVAL title_match = SCORE(MATCH(title, "rings")) |
There was a problem hiding this comment.
This also works with the fork command, e.g.
FROM index
| FORK
(EVAL score1 = SCORE(MATCH(TITLE, "LOTR")))
(EVAL score2 = SCORE(MATCH(TITLE, "ring")))Should I add an explicit test-case for fork as well?
There was a problem hiding this comment.
not sure that's necessary, as other functions are not explicitly tested with FORK. @ioanatia do you think this is necessary?
There was a problem hiding this comment.
Thanks Carlos! Will proceed with merging and we can add additional tests in a follow up if needed!
carlosdelest
left a comment
There was a problem hiding this comment.
LGTM. A couple of comments on testing but looks 💯
|
|
||
| FROM books | ||
| | EVAL title_match = SCORE(MATCH(title, "rings")) | ||
| | LIMIT 2 |
There was a problem hiding this comment.
I wonder if this test is reproducible if we don't have a SORT 🤔 , as the order is undefined and we may get some score back...
There was a problem hiding this comment.
Yeap, fair point. Tried this with 1000 iterations and got no failures but will change the query term to aardvark to make sure!
| assertThat(query.limit(), is(l(42))); | ||
| assertThat(query.query(), is(unscore(existsQuery("last_name")))); | ||
| } | ||
|
|
There was a problem hiding this comment.
I think it would be interesting to add a test case for using the EVAL result in the Top N, something like:
FROM test
| EVAL s = SCORE(MATCH(first_name, "foo"))
| WHERE last_name IS NOT NULL
| SORT s
| LIMIT 42
| SORT last_name
| LIMIT 2| * This is used to prevent pushing down limits past operations that need to evaluate expressions using document data. | ||
| */ | ||
| private boolean evalAliasNeedsData(Alias alias) { | ||
| ArrayDeque<Expression> exprStack = new ArrayDeque<>(); |
There was a problem hiding this comment.
Nit - you're using a Dequeue for traversing the tree. I think it would be more idiomatic if you do something like:
Holder<Boolean> hasScore = new Holder<>(false);
p.forEachExpression(expr -> {
if (expr instanceof Score) {
hasScore.set(true);
}
});
return hasScore.get();There was a problem hiding this comment.
Yeap, makes sense! Will update as suggested !
…thub.com:pmpailis/elasticsearch into fix_133462_fixing_score_function_error_no_shards
We want to ensure that the
SCOREfunction will be executed on data nodes even when dealing with evalFUNCTION. To that end, in this PR we update thePushDownAndCombineLimitsrule so thatLIMITis not pushed further from anEVALexpression if:SCOREBy doing so, the
evalis pushed to the data node, where the check above does not hold, so that the limit is pushed and reaches the underlying LuceneOperator.Closes #133462.