ESQL: Make field fusion generic by nik9000 · Pull Request #137382 · elastic/elasticsearch

nik9000 · 2025-10-30T14:38:57Z

Speeds up queries like

FROM foo
| STATS SUM(LENGTH(field))

by fusing the LENGTH into the loading of the field if it has doc values. Running a fairly simple test:
https://gist.github.com/nik9000/9dac067f8ce29875a4fb0f0359a75091 I'm seeing that query drop from 48ms to 28ms. So, like, 40% faster.

More importantly, this makes the mechanism for fusing functions into field loading generic. All you have to do is implement BlockLoaderExpression on your expression and return non-null from tryFuse.

Speeds up queries like ``` FROM foo | STATS SUM(LENGTH(field)) ``` by fusing the `LENGTH` into the loading of the `field` if it has doc values. Running a fairly simple test: https://gist.github.com/nik9000/9dac067f8ce29875a4fb0f0359a75091 I'm seeing that query drop from 48ms to 28ms. So, like, 40% faster. More importantly, this makes the mechanism for fusing functions into field loading generic. All you have to do is implement `BlockLoaderExpression` on your expression and return non-null from `tryFuse`.

elasticsearchmachine · 2025-10-30T14:39:21Z

Hi @nik9000, I've created a changelog YAML for you.

carlosdelest · 2025-10-30T16:15:33Z

...java/org/elasticsearch/xpack/esql/expression/function/blockloader/BlockLoaderExpression.java

+     * "fusing" the expression into the load. Or null if the fusion isn't possible.
+     */
+    @Nullable
+    Fuse tryFuse(SearchStats stats);


Let's try to find another name - we already have Fuse as a command. ExpressionFieldLoader?

Is FusedExpression ok? Or still too indicative?

Naming... 😅

I come from staring at FUSE enough that it carries a lot of weight.

For me, this feature involves BlockLoaders. And Expressions that are applied to them. I understand that fuse means getting together those two, but it's not something I would think of immediately without more context.

I'd prefer to be overly explicit here, and call this BlockLoaderExpression or something similar that helps me bridge those two concepts together. But, naming...

carlosdelest · 2025-10-30T16:18:14Z

...lasticsearch/xpack/esql/optimizer/rules/logical/local/PushDownVectorSimilarityFunctions.java

+        BlockLoaderExpression.Fuse fuse
    ) {
-        // Only replace if exactly one side is a literal and the other a field attribute
-        if ((similarityFunction.left() instanceof Literal ^ similarityFunction.right() instanceof Literal) == false) {


Nice! It's much better to let the Expression deal with the details and make this generic 👍

…e_length

nik9000 · 2025-10-30T19:46:11Z

x-pack/plugin/esql-core/src/main/java/org/elasticsearch/xpack/esql/core/type/EsField.java

+     */
+    public boolean pushable() {
+        return true;
+    }


This bothers me. I needed this because without it we'd try to push this:

FROM foo | WHERE LENGTH(kwd) < 10

to the index. Now, we might be able to do that with a specialized lucene query. But we don't have one of those. Without those change instead what happens is:

LENGTH(kwd) becomes $$kwd$length$hash$.

We identify $$kwd$length$hash$ < 10 as pushable.

This tells us we can't push it. But it's kind of picky. If SearchStats took EsField it could check this easy enough. That might be a good solution to this.

The MultiTypeEsField is created with aggregatable=false, so that predicates on it don't get pushed down incorrectly.

Adding pushable should also work.

Adding pushable should also work.

I'm going to see if I can do aggregatable=false

Just setting aggregatable to false doesn't do it. But I can return false from getExactInfo which seems to do the trick. I'm not entirely sure it's the best solution, but it doesn't invent a new thing.

But! I'm not sure that's right either. exact seems to be a concept we use at type resolution time - but I'm not sure why. It's a left-over from old QL that had a more useful meaning there.

I wonder if it'd be better to keep pushable and maybe rename to existsInEsIndex or something.

I've flipped this to using exact and that does seem to work. Not sure if I like it more.

...sql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LocalLogicalPlanOptimizerTests.java

Adds special purpose `BlockLoader` implementations for the `MV_MIN` and `MV_MAX` functions for `keyword` fields with doc values. These are a noop for single valued keywords but should be *much* faster for multivalued keywords. These aren't plugged in yet. We can plug them in and performance test them in elastic#137382. And they give us two more functions we can use to demonstrate elastic#137382.

…e_length

nik9000 · 2025-10-31T19:52:07Z

...sql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LocalLogicalPlanOptimizerTests.java

+    }
+
+    public void testLengthInWhereAndEval() {
+        assumeFalse("fix me", true);


QL friends: This one looks fun!

The reason that we get duplicated reference attributes here is that when PushExpressionsToFieldLoad creates a new FunctionEsField in EsRelation, it was generated under a specific command context, and it doesn't look at the the whole query plan level. So when the same LENGTH(last_name) is referenced in multiple commands in the query, duplicated FunctionEsFields are added into EsRelation.

ResolveUnionTypes has a very similar workflow. It iterates through the entire query plan to prepare the attributes added into EsRelation

++, I'm rewriting this to look more like ResolveUnionTypes in #137392

++, I'm rewriting this to look more like ResolveUnionTypes in #137392

Should I wait for you to do that rewrite before merging this PR? Or will should I merge first and then you'll fix it.

Up to you! I'm addressing in #137564, but it still has to be reviewed. Feel free to merge this and I'll deal with integrating it.

server/src/main/java/org/elasticsearch/index/mapper/blockloader/BlockLoaderFunctionConfig.java

julian-elastic · 2025-11-05T19:36:30Z

I am done with my first round of code review, overall looks pretty good!
I think we should also add some csv cases to actually verify the data is correct with the push down.
And maybe a nightly performance test for both length and dense vector to demonstrate the improvement.

...sql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LocalLogicalPlanOptimizerTests.java

nik9000 · 2025-11-06T12:52:03Z

I think we should also add some csv cases to actually verify the data is correct with the push down.

I believe we already have these tests, but I'll double check and add a few more out of paranoia.

And maybe a nightly performance test for both length and dense vector to demonstrate the improvement.

We have it for vectors. I'll look at it for string length.

We'll want it for aggregate_metric_double which we're going to use this on soon. And MV_MIN and MV_MAX, I think.

BASE=d657f7bef51da69d79134325ab5c3c5352ddf264 HEAD=05af8536e27b1e0c2d03d418fa19dc43f13b01e6 Branch=main

nik9000 · 2025-11-06T18:49:15Z

Hey folks. This is ready for another round. I'm going to add some more csv-spec tests this afternoon.

I decided to go with @julian-elastic's first approach using the enum. It's probably the wrong approach, but I think some kind of big interface or strings are worse. But only marginally. I think we won't know a nice approach until we have a dozen of these things and we start to hate the enum. But we can change it then.

nik9000 · 2025-11-07T13:44:48Z

| 90th percentile service time | esql-avg-message-length | 11078 | 6670 | -4407.95 | ms | -39.79% |

BASE=f08e7317360562458eec6fc609df81184ae53a9a HEAD=8504ed04897b23fa6781f37dd80f059965c6cd14 Branch=main

In elastic/elasticsearch#137382 we're pushing functions into field loading and using LENGTH as an example. This adds a rally track to demonstrate the performance difference: ``` | 90th | esql-avg-message-length | 11078 | 6670 | -4407.95 | ms | -39.79% | ```

julian-elastic · 2025-11-10T15:52:42Z

...sql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LocalLogicalPlanOptimizerTests.java

+
+        Filter filter = as(eval2.child(), Filter.class);
+        And and = as(filter.condition(), And.class);
+        GreaterThan left = as(and.left(), GreaterThan.class);


I think it should be 2? I made one of the pushdowns on first name :) Just to let you know so you don't spend extra time debugging when working on #137679. You don't need to change to 2 right for this PR.

julian-elastic · 2025-11-10T16:04:04Z

...rc/javaRestTest/java/org/elasticsearch/xpack/esql/qa/single_node/PushExpressionToLoadIT.java

+        );
+    }
+
+    public void testLengthNotPushedToText() throws IOException {


Why can't this optimization work with Text?

I'll push a comment to explain it but the sort version is that we haven't written the code yet. Text fields are loaded from _source and we've only implemented this optimization for loading from doc values. Worse, we've only implemented it for the particular kind of doc values that keyword uses. wildcard fields don't use the same encoding. We'd have to write another push down implementation for those.

julian-elastic

Looks good! Thank you for addressing my concerns! I left a few more small comments, but they can be addressed in the next PR.

nik9000 · 2025-11-10T16:31:25Z

I'll merge this now and open a follow up with some instructions and explanations based on @julian-elastic's last comments.

Implements most remaining block loaders for MV_MIN and MV_MAX. Once #137382 is in we can push MV_MIN and MV_MAX into the block loaders for most field types. This is compelling it significantly reduces the amount of data loaded when using MV_MIN and MV_MAX.

nik9000 added >enhancement :Analytics/ES|QL AKA ESQL v9.3.0 labels Oct 30, 2025

nik9000 and others added 3 commits October 30, 2025 10:39

Update docs/changelog/137382.yaml

7758ab9

[CI] Auto commit changes from spotless

47c874e

More tests

6b0fead

carlosdelest reviewed Oct 30, 2025

View reviewed changes

nik9000 added 2 commits October 30, 2025 15:42

Tests

1dd6d96

Merge remote-tracking branch 'nik9000/esql_fuse_length' into esql_fus…

ce8e6aa

…e_length

nik9000 commented Oct 30, 2025

View reviewed changes

elasticsearchmachine and others added 5 commits October 30, 2025 19:49

[CI] Auto commit changes from spotless

d50a74b

Add names back

8746cfa

Merge branch 'main' into esql_fuse_length

15eb5e9

Renam

d01184b

[CI] Auto commit changes from spotless

d6897d8

nik9000 mentioned this pull request Oct 31, 2025

ESQL: improve performance - Merge functions into loaders (sometimes) #103636

Closed

4 tasks

nik9000 requested review from alex-spies, fang-xing-esql and julian-elastic October 31, 2025 14:05

nik9000 mentioned this pull request Oct 31, 2025

Block loaders for MV_MIN and MV_MAX for keywords #137473

Merged

nik9000 added 3 commits October 31, 2025 13:45

Merge branch 'main' into esql_fuse_length

e091312

More tests

538b72b

Merge remote-tracking branch 'nik9000/esql_fuse_length' into esql_fus…

2fc8fd5

…e_length

nik9000 commented Oct 31, 2025

View reviewed changes

nik9000 marked this pull request as ready for review October 31, 2025 19:52

nik9000 requested a review from carlosdelest October 31, 2025 19:52

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Oct 31, 2025

julian-elastic reviewed Nov 5, 2025

View reviewed changes

server/src/main/java/org/elasticsearch/index/mapper/blockloader/BlockLoaderFunctionConfig.java Outdated Show resolved Hide resolved

julian-elastic reviewed Nov 5, 2025

View reviewed changes

...sql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LocalLogicalPlanOptimizerTests.java Show resolved Hide resolved

nik9000 mentioned this pull request Nov 6, 2025

ES|QL - Improve performance of V_MAGNITUDE function #137535

Open

nik9000 mentioned this pull request Nov 6, 2025

ESQL: Leftovers from making field pushing generic #137679

Open

16 tasks

More tests

05af853

phananh1010 added a commit to phananh1010/elasticsearch that referenced this pull request Nov 6, 2025

Mirror upstream elastic#137382 as single snapshot commit for AI review

27689c9

BASE=d657f7bef51da69d79134325ab5c3c5352ddf264 HEAD=05af8536e27b1e0c2d03d418fa19dc43f13b01e6 Branch=main

nik9000 added 3 commits November 6, 2025 20:35

one more

5428f0b

Merge branch 'main' into esql_fuse_length

78e7742

Merge branch 'main' into esql_fuse_length

8504ed0

phananh1010 added a commit to phananh1010/elasticsearch that referenced this pull request Nov 7, 2025

Mirror upstream elastic#137382 as single snapshot commit for AI review

5b04163

BASE=f08e7317360562458eec6fc609df81184ae53a9a HEAD=8504ed04897b23fa6781f37dd80f059965c6cd14 Branch=main

nik9000 mentioned this pull request Nov 7, 2025

Add a benchmark for an ESQL agg on LENGTH elastic/rally-tracks#902

Merged

nik9000 added 2 commits November 7, 2025 16:08

Merge branch 'main' into esql_fuse_length

9c19496

Merge branch 'main' into esql_fuse_length

6297ae1

nik9000 requested review from carlosdelest, fang-xing-esql and julian-elastic November 9, 2025 12:48

nik9000 mentioned this pull request Nov 10, 2025

ESQL: Most remaining block loads from MV_MIN and MV_MAX #137820

Merged

julian-elastic reviewed Nov 10, 2025

View reviewed changes

julian-elastic approved these changes Nov 10, 2025

View reviewed changes

nik9000 merged commit 97f96b4 into elastic:main Nov 10, 2025
34 checks passed

carlosdelest mentioned this pull request Nov 11, 2025

ES|QL - vector similarity pushdown follow up #137564

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ESQL: Make field fusion generic#137382

ESQL: Make field fusion generic#137382
nik9000 merged 37 commits intoelastic:mainfrom
nik9000:esql_fuse_length

nik9000 commented Oct 30, 2025

elasticsearchmachine commented Oct 30, 2025

carlosdelest Oct 30, 2025

nik9000 Oct 30, 2025

carlosdelest Oct 31, 2025

carlosdelest Oct 30, 2025

nik9000 Oct 30, 2025

fang-xing-esql Nov 3, 2025

nik9000 Nov 4, 2025

nik9000 Nov 4, 2025

nik9000 Nov 4, 2025

nik9000 Nov 4, 2025

Uh oh!

nik9000 Oct 31, 2025

fang-xing-esql Nov 3, 2025

carlosdelest Nov 4, 2025

nik9000 Nov 4, 2025

carlosdelest Nov 4, 2025

Uh oh!

julian-elastic commented Nov 5, 2025

Uh oh!

nik9000 commented Nov 6, 2025

nik9000 commented Nov 6, 2025

nik9000 commented Nov 7, 2025

julian-elastic Nov 10, 2025

nik9000 Nov 10, 2025

julian-elastic Nov 10, 2025

nik9000 Nov 10, 2025

julian-elastic left a comment

nik9000 commented Nov 10, 2025

Uh oh!

Labels

5 participants

Conversation

nik9000 commented Oct 30, 2025

elasticsearchmachine commented Oct 30, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Uh oh!

julian-elastic commented Nov 5, 2025

Uh oh!

nik9000 commented Nov 6, 2025

nik9000 commented Nov 6, 2025

nik9000 commented Nov 7, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

julian-elastic left a comment

Choose a reason for hiding this comment

nik9000 commented Nov 10, 2025

Uh oh!

Labels

5 participants