Remote Lookup Join implementation#129013
Conversation
cceb86d to
bb3d64b
Compare
29df38f to
59d98b2
Compare
cb90a8b to
d8ec53d
Compare
|
Hi @smalyshev, I've created a changelog YAML for you. |
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
|
Pinging @elastic/es-search-foundations (Team:Search Foundations) |
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java
Outdated
Show resolved
Hide resolved
alex-spies
left a comment
There was a problem hiding this comment.
All done now :) This is very nice!
I think this can nearly be merged, I would only make sure that we don't accidentally skip fwc tests as mentioned here. Other than that, my remarks are mostly on the minor side and also you already addressed a bunch of them :)
I mentioned some thoughts for follow-ups here. In addition, I saw many FIXMEs in the tests that I agree with. But that's for future work :)
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/join/LookupJoin.java
Outdated
Show resolved
Hide resolved
...ulti-clusters/src/javaRestTest/java/org/elasticsearch/xpack/esql/ccq/MultiClusterSpecIT.java
Outdated
Show resolved
Hide resolved
...ulti-clusters/src/javaRestTest/java/org/elasticsearch/xpack/esql/ccq/MultiClusterSpecIT.java
Outdated
Show resolved
Hide resolved
| // If the query contains lookup indices, use only remotes to avoid duplication | ||
| onlyRemotes = true; |
There was a problem hiding this comment.
Err, doesn't this mean this suite doesn't at all test the case where a lookup happens both on remote and the local cluster?
There was a problem hiding this comment.
This one doesn't because it wouldn't match the results - we'd have more rows than the local case. We have IT tests for that case. I haven't found any easy way to make these tests work in "both" case without substantially changing them.
...ulti-clusters/src/javaRestTest/java/org/elasticsearch/xpack/esql/ccq/MultiClusterSpecIT.java
Show resolved
Hide resolved
| ZonedDateTime nowUtc = ZonedDateTime.now(ZoneOffset.UTC); | ||
| ZonedDateTime nextMidnight = nowUtc.plusDays(1).withHour(0).withMinute(0).withSecond(0).withNano(0); | ||
| // If we're too close to midnight, we could create index with one day and query with another, and it'd fail. | ||
| assumeTrue("Skip if too close to midnight", Duration.between(nowUtc, nextMidnight).toMinutes() >= 5); |
| assertThat(columns, hasItems("lookup_key", "lookup_name", "lookup_tag", "v", "tag")); | ||
|
|
||
| List<List<Object>> values = getValuesList(resp); | ||
| assertThat(values, hasSize(10)); |
There was a problem hiding this comment.
Yeah, I really don't know if a missing remote lookup index should lead to skipping the whole remote. That's a behavior we need to double check in the future.
...c/internalClusterTest/java/org/elasticsearch/xpack/esql/action/CrossClusterLookupJoinIT.java
Show resolved
Hide resolved
...c/internalClusterTest/java/org/elasticsearch/xpack/esql/action/CrossClusterLookupJoinIT.java
Show resolved
Hide resolved
...c/internalClusterTest/java/org/elasticsearch/xpack/esql/action/CrossClusterLookupJoinIT.java
Outdated
Show resolved
Hide resolved
alex-spies
left a comment
There was a problem hiding this comment.
Alright, this looks mighty good already. Let's ![]()
I'm not sure anymore if we already have a test that confirms that remote lookup joins remain disabled on non-SNAPSHOT builds. (This PR adds so many tests, I lost track. Nice test coverage, though!)
Could you please double check that we have such a test before merging? As a failsafe against accidentally enabling/releasing the feature prematurely.
We don't have a specific test for that, how would I test that? |
Hm, we could add an IT with |
Remote lookup join implementation This patch enables using LOOKUP JOIN with cross-cluster queries. Example: FROM logs-*, remote:logs-* | LOOKUP JOIN clients on ip | SORT timestamp | LIMIT 100
Remote lookup join implementation This patch enables using LOOKUP JOIN with cross-cluster queries. Example: FROM logs-*, remote:logs-* | LOOKUP JOIN clients on ip | SORT timestamp | LIMIT 100
This patch enables using
LOOKUP JOINwith cross-cluster queries. Example:Currently some constructs are unsupported before
LOOKUP JOINwhen remote indices are involved:SORTLIMITSTATSENRICH _coordinator:...FORKThe LOOKUP JOIN with remote indices automatically assumes that the joins are executed remotely - i.e. LOOKUP JOIN with remote indices is forced to execute on the remote side of the query (inside
FragmentExec) - this is the source of the most exclusions above.