Skip to content

[ES|QL] Allow lookup join on mixed numeric fields#128263

Merged
fang-xing-esql merged 9 commits intoelastic:mainfrom
fang-xing-esql:lookup-join-mixed-numeric-fields
May 25, 2025
Merged

[ES|QL] Allow lookup join on mixed numeric fields#128263
fang-xing-esql merged 9 commits intoelastic:mainfrom
fang-xing-esql:lookup-join-mixed-numeric-fields

Conversation

@fang-xing-esql
Copy link
Member

Previously, lookup joins on numeric fields required that the left-hand side (LHS) and right-hand side (RHS) had exactly the same ES|QL data type - either integer, long, or double. For example, the following joins were permitted:

  • integer join integer - In ES|QL, the ES types byte, short, and integer are all considered integer, so joins between any two of them were allowed.
  • long join long - the joins between longs were allowed.
  • double join double - The ES types half_float, scaled_float, float and double are all considered double in ES|QL, the joins between any two of them are allowed.

This PR removes some of those strict type-matching requirements to support for joins on mixed numeric types. Additional test cases(edge cases) have been added in CsvTests to validate this behavior. With this change, the LHS and RHS no longer need to share the exact same ES|QL data type, aligning join behavior more closely with the behavior of the == operator.

Examples of now-permitted joins include:

  • Joins between two of the whole numbers(byte, short, integer, long) are allowed, integer and long are allowed to join against each other.
  • Joins between whole numbers(byte, short, integer, long) and rational numbers(half_float, scaled_float, float, double) are allowed, integer/long and double are allowed join against each other.

Observations during validating the joins on mixed numeric types fields.

  • The joins between two of the whole numbers - byte, short, integer and long work as expected, including the edge cases, no unexpected result has been observed, it is also consistent with the behavior of ==, the join between two whole numbers should be allowed safely.
  • The joins between a whole number and a rational number exposed some unexpected behavior, mainly because the loss of precision can occur when converting an integer/long to a float/double, specifically when dealing with large integer/long values.

Here is an example, one of the tests added that shows this behavior, and quite a lot of the tests that join between a whole number and a rational number show similar behaviors.

This is the current behavior of ==, 9223372036854775806 and 9223372036854775807 are two different long values, their corresponding double value is the same.

FROM languages_mixed_numerics
| WHERE language_code_long == language_code_double
| SORT language_code_long, language_code_double, language_name
| KEEP language_code_long, language_code_double, language_name
;

language_code_long:long | language_code_double:double | language_name:keyword
-9223372036854775808    | -9.223372036854776E18       | min_long
-2147483649             | -2.147483649E9              | min_int_minus_1
-2147483648             | -2.147483648E9              | min_int
...
2147483646              | 2.147483646E9               | max_int_minus_1
2147483647              | 2.147483647E9               | max_int
2147483648              | 2.147483648E9               | max_int_plus_1
9223372036854775806     | 9.223372036854776E18        | max_long_minus_1   ===> different longs map to the same double
9223372036854775807     | 9.223372036854776E18        | max_long   ===>

When long and double are joined together, duplicates can return, the duplicates returned from a join can be explained by the behavior of ==, as it is not always one-to-one match between a whole number and a rational number, it is still a bit surprising to see duplicates in the join results. Lookup join is a left join, we don't expect the same results to return after swapping the join sequence, however we need to be careful about inner joins(in the future), as we expect to get the same results after swapping join sequence.

FROM languages_mixed_numerics
| WHERE language_code_long is not null
| EVAL language_code_double = language_code_long
| LOOKUP JOIN languages_mixed_numerics ON language_code_double
| SORT language_code_double, language_name
| KEEP language_code_double, language_name
;

language_code_double:long | language_name:keyword
-9223372036854775808      | min_long                    ===> duplicates
-9223372036854775808      | min_long_minus_1
-2147483649               | min_int_minus_1
-2147483648               | min_int
...
2147483646                | max_int_minus_1
2147483647                | max_int
2147483648                | max_int_plus_1
9223372036854775806       | max_long                     ===> duplicates
9223372036854775806       | max_long_minus_1
9223372036854775806       | max_long_plus_1
9223372036854775807       | max_long
9223372036854775807       | max_long_minus_1
9223372036854775807       | max_long_plus_1
;

FROM languages_mixed_numerics
| WHERE language_code_double is not null
| EVAL language_code_long = language_code_double
| LOOKUP JOIN languages_mixed_numerics ON language_code_long
| SORT language_code_long, language_name
| KEEP language_code_long, language_name
;

language_code_long:double | language_name:keyword
-3.4028234663852886E38    | null
-9.223372036854776E18     | null
-9.223372036854776E18     | null
-2.147483649E9            | min_int_minus_1
-2.147483648E9            | min_int
...
2.147483646E9             | max_int_minus_1
2.147483647E9             | max_int
2.147483648E9             | max_int_plus_1
9.223372036854776E18      | null                      ===> duplicates
9.223372036854776E18      | null
9.223372036854776E18      | null
3.4028234663852886E38     | null
3.4028234663852886E39     | null
;
@elasticsearchmachine
Copy link
Collaborator

Hi @fang-xing-esql, I've created a changelog YAML for you.

| lookup join message_types_lookup on message
| rename type as message
| lookup join message_types_lookup on message
from languag*, -languages_mixed_numerics
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to existing tests are:

  • Remove trailing spaces
  • Exclude the new index from an existing test to keep its results unchanged.
@fang-xing-esql fang-xing-esql marked this pull request as ready for review May 21, 2025 21:00
@fang-xing-esql fang-xing-esql requested review from alex-spies, costin and craigtaverner and removed request for alex-spies May 21, 2025 21:00
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 21, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fang-xing-esql fang-xing-esql added the auto-backport Automatically create backport pull requests when merged label May 23, 2025
@fang-xing-esql
Copy link
Member Author

LGTM

Thanks for reviewing!

@fang-xing-esql fang-xing-esql enabled auto-merge (squash) May 23, 2025 19:35
auto-merge was automatically disabled May 23, 2025 19:39

Pull Request is not mergeable

@fang-xing-esql fang-xing-esql merged commit dfe1357 into elastic:main May 25, 2025
18 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.19 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 128263

@fang-xing-esql
Copy link
Member Author

💚 All backports created successfully

Status Branch Result
8.19

Questions ?

Please refer to the Backport tool documentation

elasticsearchmachine pushed a commit that referenced this pull request Jun 3, 2025
* allow lookup join on mixed numeric fields

(cherry picked from commit dfe1357)

# Conflicts:
#	x-pack/plugin/esql/qa/testFixtures/src/main/java/org/elasticsearch/xpack/esql/CsvTestsDataLoader.java
#	x-pack/plugin/esql/qa/testFixtures/src/main/resources/lookup-join.csv-spec
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
@fang-xing-esql fang-xing-esql deleted the lookup-join-mixed-numeric-fields branch January 30, 2026 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.19.0 v9.1.0

3 participants