ES|QL: Add initial grammar and planning for RRF (snapshot) by ioanatia · Pull Request #123396 · elastic/elasticsearch

ioanatia · 2025-02-25T16:57:28Z

Part of #123391 where we will keep track of any follow ups.

RRF is split into 3 parts:

RrfScoreEval receives a discriminator column. It will assign a score for each row based on the position in the subset.
Dedup is a SurrogateLogicalPlan that expands into
STATS _score =SUM(_score), field1 = VALUES(field1), field2=VALUES(field2), ... BY _id, _index, where:
- _score =SUM(_score) gives us the final RRF score
- we dedup by grouping by _id and _index
- field1, field2 ... are the rest of the available columns that are not _score, _id, _index and that we want to carry over
SORT BY _score, _id, _index DESC - so that we return the sorted results; we use _id and _index as a way to ensure the result order is deterministic (similar to what we do for _search).

The Dedup step is the one that needs more consideration - at this stage I grouped by _id and _index since it was the easiest ATM, but ideally we might want to use an internal search ID (that's composed by the shard ID + doc ID).
The other annoying aspect of grouping by _id and _index in the current implementation is that we require having METADATA _id, _index. It would be nice to evolve RRF to a place where this is not needed.

elasticsearchmachine · 2025-02-25T16:57:53Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

elasticsearchmachine · 2025-02-25T16:57:54Z

Hi @ioanatia, I've created a changelog YAML for you.

ChrisHegarty

Overall this change look very concise and clean. I left a few small comments.

...ugin/esql/compute/src/main/java/org/elasticsearch/compute/operator/RrfScoreEvalOperator.java

x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/parser/StatementParserTests.java

tteofili

very clean impl, LGTM!

tteofili · 2025-03-04T11:15:03Z

...ugin/esql/compute/src/main/java/org/elasticsearch/compute/operator/RrfScoreEvalOperator.java

+
+            int rank = counters.getOrDefault(fork, 1);
+            counters.put(fork, rank + 1);
+            scores.appendDouble(1.0 / (60 + rank));


minor: this is currently configurable in _search, so we probably need to expose it as an option in the future here too

You are right, we need to make the rank constant configurable.
This is added as a separate feature in the meta issue #123391
It will require a syntax change for RRF, so I'd like to keep it separate for now.

...ck/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/LocalExecutionPlanner.java

carlosdelest

Looks really good - I like the separation into individual pieces (dedup, operator, order).

I have some minor questions, and some error messages I think could be better

...ugin/esql/compute/src/main/java/org/elasticsearch/compute/operator/RrfScoreEvalOperator.java

carlosdelest · 2025-03-04T10:40:59Z

x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/AnalyzerTests.java

+            from test
+            | rrf
+            """));
+        assertThat(e.getMessage(), containsString("Unknown column [_score]"));


I think we should provide a better error message for missing metadata attrs- something like "_score is needed for using RRF. Please add METADATA _score to your FROM command".

I looked into this - looks simple enough at a first glance. We can just modify this to have a custom error message when MetadataAttribute.isSupported(name) is true:

elasticsearch/x-pack/plugin/ql/src/main/java/org/elasticsearch/xpack/ql/expression/UnresolvedAttribute.java

Lines 113 to 121 in a5e0423

public static String errorMessage(String name, List<String> potentialMatches) {

String msg = "Unknown column [" + name + "]";

if (CollectionUtils.isEmpty(potentialMatches) == false) {

msg += ", did you mean "

+ (potentialMatches.size() == 1 ? "[" + potentialMatches.get(0) + "]" : "any of " + potentialMatches.toString())

+ "?";

}

return msg;

}

However we would return an error message like "Please add METADATA _score to your FROM command" even if you use ROW:

ROW a = 1, b = "two", c = null | WHERE _score > 1

I know this is a very narrow corner case, but it would be an unintended behaviour.
It's not straighforward to get the context when we call UnresolvedAttribute.errorMessage whether the source command supports metadata attributes or not. So I think at most, we can look into this separately and not make the change here.

I think it would be OK to error with "_score is needed for using RRF. Use FROM ... METADATA _score".

We can assume that full text search needs FROM, as FTFs need an index attribute to operate on?

We can refine this in a follow up, but it will be very confusing for users to receive "unknown column _score" - being a metadata attribute means users won't understand where's that coming from without referring to docs

agreed - added as a follow up in #123391

carlosdelest · 2025-03-04T10:42:37Z

x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/AnalyzerTests.java

+    public void testRrfError() {
+        assumeTrue("requires RRF capability", EsqlCapabilities.Cap.FORK.isEnabled());
+
+        var e = expectThrows(VerificationException.class, () -> analyze("""


Shouldn't a explicit message like "FORK is needed before RRF" be added here so users have a clear understanding of the RRF usage?

Added that in RrfScoreEval by implementing PostAnalysisVerificationAware.
However the check for unresolved attributes is done before the PostAnalysisVerificationAware checks.
I don't want to add a check just for RRF in the Verifier before we do the unresolved attributes check:

elasticsearch/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Verifier.java

Lines 75 to 88 in b663616

Collection<Failure> verify(LogicalPlan plan, BitSet partialMetrics) {

assert partialMetrics != null;

Failures failures = new Failures();

// quick verification for unresolved attributes

checkUnresolvedAttributes(plan, failures);

// in case of failures bail-out as all other checks will be redundant

if (failures.hasFailures()) {

return failures.failures();

}

// collect plan checkers

var planCheckers = planCheckers(plan);

(planCheckers is looking for plans that implement PostAnalysisVerificationAware).

This deserves a bit more thought, so I am adding it as a follow up #123391

That's reasonable. Thanks for looking into this.

x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/AnalyzerTests.java

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/LogicalPlanBuilder.java

fang-xing-esql

Thank you @ioanatia, I added some questions about the usage of RRF that I can think of.

x-pack/plugin/esql/qa/testFixtures/src/main/resources/rrf.csv-spec

fang-xing-esql · 2025-03-05T22:41:30Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/LogicalPlanBuilder.java

+
+            List<Order> order = List.of(
+                new Order(source, scoreAttr, Order.OrderDirection.DESC, Order.NullsPosition.LAST),
+                new Order(source, idAttr, Order.OrderDirection.ASC, Order.NullsPosition.LAST),


I wonder if there is a specific reason that we decide to sort on _id before _index? Or the order of these two fields doesn't matter?

I don't think it matters - we just need a tiebreaker.

I was wondering if the two extra sort keys(_id and _index) are necessary, as longer sort key length may affect performance. ES|QL does not guarantee the order of the results unless an explicit sort is coded in the query, it is similar as SQL. This could be a potential performance related follow up, in case we see performance issue with RRF.

fang-xing-esql · 2025-03-05T22:45:19Z

x-pack/plugin/esql/qa/testFixtures/src/main/resources/rrf.csv-spec

+required_capability: match_operator_colon
+
+FROM books METADATA _id, _index, _score
+| FORK ( WHERE title:"Tolkien" | SORT _score DESC | LIMIT 3 )


If we can have some queries with disjunctions in the where clause of each fork leg that will be great, just to add a bit more complexity to make sure it works as expected. There are some queries with disjunctions in the match function and operator's csvtests, that can be used as a reference.

fang-xing-esql · 2025-03-05T22:50:26Z

x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/AnalyzerTests.java

+        assertThat(e.getMessage(), containsString("Unknown column [_score]"));
+        assertThat(e.getMessage(), containsString("Unknown column [_fork]"));
+
+        e = expectThrows(VerificationException.class, () -> analyze("""


I wonder if the sequence between FORK and RRF matters? For example if the sequence of fork and RRF is reversed, do we recognized it as a valid query?

| RRF | FORK (WHERE a:"x") (WHERE a:"y")

Do we allow multiple fork or RRF, like below? Do they make sense? ES|QL does not prevent multiple occurrence of the same processing commands, commands like where, eval etc. can be used multiple times in the same query, is this also true for RRF and fork?

| FORK (WHERE a:"x") (WHERE a:"y") | RRF | RRF or | FORK (WHERE a:"x") (WHERE a:"y") | RRF | FORK (WHERE b:"x") (WHERE b:"y") | RRF

I have put a validation for RrfScoreEval such that we only allow RRF after a FORK command.
It might seem a bit extreme, but it makes sense in practice because while we might be able to execute the following queries, they don't make a lot of sense:

| RRF | FORK (WHERE a:"x") (WHERE a:"y")

or

| FORK (WHERE a:"x") (WHERE a:"y") | RRF | RRF

Another thing to note is that we currently have a restriction for FORK where it's possible to only have a single FORK command in a query, so the following is not something we can do atm:

| FORK (WHERE a:"x") (WHERE a:"y") | RRF | FORK (WHERE b:"x") (WHERE b:"y") | RRF

I did see one thing that was concerning when I tried to do:

| FORK (WHERE a:"x") (WHERE a:"y") | RRF | RRF

this would lead to an unexecutable query because when do the RRF planning this expands to:

| FORK (WHERE a:"x") (WHERE a:"y") | RrfScoreEval | Dedup | Sort | RrfScoreEval | Dedup | Sort

The first SORT does not have a LIMIT so it cannot be translated to a TOP N.
I need to think more about this, not about supporting the case where we do RRF after RRF, but how to avoid this case of having unexecutable queries - I added it as a follow in #123391

carlosdelest · 2025-03-07T14:38:02Z

x-pack/plugin/esql/qa/testFixtures/src/main/resources/rrf.csv-spec

 | FORK ( WHERE emp_no:10001 )
       ( WHERE emp_no:10002 )
 | RRF
+| EVAL _score = round(_score, 4)


Nice trick! ❤️

carlosdelest

LGTM! 💯

fang-xing-esql

LGTM, thank you!

fang-xing-esql · 2025-03-10T13:07:05Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/LogicalPlanBuilder.java

+
+            List<Order> order = List.of(
+                new Order(source, scoreAttr, Order.OrderDirection.DESC, Order.NullsPosition.LAST),
+                new Order(source, idAttr, Order.OrderDirection.ASC, Order.NullsPosition.LAST),


I was wondering if the two extra sort keys(_id and _index) are necessary, as longer sort key length may affect performance. ES|QL does not guarantee the order of the results unless an explicit sort is coded in the query, it is similar as SQL. This could be a potential performance related follow up, in case we see performance issue with RRF.

…23396)

ioanatia added 6 commits February 25, 2025 17:00

Add initial grammar and planning for RRF

33eb3fe

Move logic from analyzer to parser

9b00645

Fix references and serialization

406b3ea

Add csv test

f08abb7

Group by index and id - collect the values for the rest of the columns

f8c2654

Add statement parser and analyzer tests

c6cbfd0

ioanatia added >feature Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL v9.1.0 labels Feb 25, 2025

Update docs/changelog/123396.yaml

587ef71

ioanatia marked this pull request as draft February 25, 2025 16:58

ioanatia requested a review from ChrisHegarty February 25, 2025 16:58

ioanatia mentioned this pull request Feb 25, 2025

ES|QL: Simple RRF with no score customization #123391

Closed

[CI] Auto commit changes from spotless

ee50a80

ChrisHegarty reviewed Feb 27, 2025

View reviewed changes

...ugin/esql/compute/src/main/java/org/elasticsearch/compute/operator/RrfScoreEvalOperator.java Outdated Show resolved Hide resolved

x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/parser/StatementParserTests.java Show resolved Hide resolved

ioanatia marked this pull request as ready for review February 27, 2025 10:31

ioanatia and others added 3 commits February 27, 2025 15:12

Add javadoc

281da1d

Make integration test more resilient

0a45f00

Merge branch 'main' into esql_rrf

e401c23

tteofili approved these changes Mar 4, 2025

View reviewed changes

carlosdelest reviewed Mar 4, 2025

View reviewed changes

fang-xing-esql reviewed Mar 5, 2025

View reviewed changes

ioanatia and others added 3 commits March 7, 2025 14:17

Address review feedback

40168dc

[CI] Auto commit changes from spotless

4b3a8d7

Merge branch 'main' into esql_rrf

d56d280

ioanatia requested review from ChrisHegarty and fang-xing-esql March 7, 2025 13:51

ioanatia requested a review from carlosdelest March 7, 2025 13:52

ChrisHegarty approved these changes Mar 7, 2025

View reviewed changes

carlosdelest reviewed Mar 7, 2025

View reviewed changes

carlosdelest approved these changes Mar 7, 2025

View reviewed changes

ioanatia added 2 commits March 7, 2025 15:53

Fix forbidden APIs check

b6b1a84

Fix EsqlNodeSubclassTests

d1de323

fang-xing-esql approved these changes Mar 10, 2025

View reviewed changes

ioanatia merged commit cda8255 into elastic:main Mar 11, 2025
17 checks passed

ioanatia deleted the esql_rrf branch March 11, 2025 09:18

albertzaharovits pushed a commit to albertzaharovits/elasticsearch that referenced this pull request Mar 13, 2025

ES|QL: Add initial grammar and planning for RRF (snapshot) (elastic#1…

39640ad

…23396)

jfreden pushed a commit to jfreden/elasticsearch that referenced this pull request Mar 13, 2025

ES|QL: Add initial grammar and planning for RRF (snapshot) (elastic#1…

11e9a8f

…23396)

stratoula added the ES|QL-ui Impacts ES|QL UI label Mar 19, 2025

stratoula mentioned this pull request Mar 19, 2025

[ES|QL] Support the new RRF command elastic/kibana#215092

Closed

	public static String errorMessage(String name, List<String> potentialMatches) {
	String msg = "Unknown column [" + name + "]";
	if (CollectionUtils.isEmpty(potentialMatches) == false) {
	msg += ", did you mean "
	+ (potentialMatches.size() == 1 ? "[" + potentialMatches.get(0) + "]" : "any of " + potentialMatches.toString())
	+ "?";
	}
	return msg;
	}

	Collection<Failure> verify(LogicalPlan plan, BitSet partialMetrics) {
	assert partialMetrics != null;
	Failures failures = new Failures();

	// quick verification for unresolved attributes
	checkUnresolvedAttributes(plan, failures);

	// in case of failures bail-out as all other checks will be redundant
	if (failures.hasFailures()) {
	return failures.failures();
	}

	// collect plan checkers
	var planCheckers = planCheckers(plan);

Conversation

ioanatia commented Feb 25, 2025

elasticsearchmachine commented Feb 25, 2025

elasticsearchmachine commented Feb 25, 2025

ChrisHegarty left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tteofili left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Uh oh!

carlosdelest left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ioanatia Mar 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fang-xing-esql left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carlosdelest left a comment

Choose a reason for hiding this comment

fang-xing-esql left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Uh oh!

Labels

7 participants

ioanatia Mar 7, 2025 •

edited

Loading