Speed up block serialization by dnhatn · Pull Request #124394 · elastic/elasticsearch

dnhatn · 2025-03-08T01:30:53Z

Currently, we use NamedWriteable for serializing blocks. While convenient, it incurs a noticeable performance penalty when pages contain thousands of blocks. Since block types are small and already centered in ElementType, we can safely switch from NamedWriteable to typed code. For example, the NamedWriteable alone of a small page with 10K fields would be 180KB, whereas the new method reduces it to 10KB. Below are the serialization improvements with FROM idx | LIMIT 10000 where the target index has 10K fields:

write_exchange_response executed 173 times took: 73.2ms -> 26.7ms
read_exchange_response executed 173 times took: 49.4ms -> 25.8ms

I might open another PR to avoid serializing positionCount as we should already have it from the page.
We might need to do the same thing for plan serialization.

elasticsearchmachine · 2025-03-08T18:37:50Z

Hi @dnhatn, I've created a changelog YAML for you.

elasticsearchmachine · 2025-03-08T19:39:46Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

costin

👍

costin · 2025-03-10T07:25:09Z

x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/data/ElementType.java

+    BOOLEAN(0, "Boolean", BlockFactory::newBooleanBlockBuilder, BooleanBlock::readFrom),
+    INT(1, "Int", BlockFactory::newIntBlockBuilder, IntBlock::readFrom),
+    LONG(2, "Long", BlockFactory::newLongBlockBuilder, LongBlock::readFrom),
+    FLOAT(3, "Float", BlockFactory::newFloatBlockBuilder, FloatBlock::readFrom),
+    DOUBLE(4, "Double", BlockFactory::newDoubleBlockBuilder, DoubleBlock::readFrom),
    /**
     * Blocks containing only null values.
     */
-    NULL("Null", (blockFactory, estimatedSize) -> new ConstantNullBlock.Builder(blockFactory)),
+    NULL(5, "Null", (blockFactory, estimatedSize) -> new ConstantNullBlock.Builder(blockFactory), BlockStreamInput::readConstantNullBlock),

-    BYTES_REF("BytesRef", BlockFactory::newBytesRefBlockBuilder),
+    BYTES_REF(6, "BytesRef", BlockFactory::newBytesRefBlockBuilder, BytesRefBlock::readFrom),

    /**
     * Blocks that reference individual lucene documents.
     */
-    DOC("Doc", DocBlock::newBlockBuilder),
+    DOC(7, "Doc", DocBlock::newBlockBuilder, in -> { throw new UnsupportedOperationException("can't read doc blocks"); }),

    /**
     * Composite blocks which contain array of sub-blocks.
     */
-    COMPOSITE("Composite", BlockFactory::newAggregateMetricDoubleBlockBuilder),
+    COMPOSITE(8, "Composite", BlockFactory::newAggregateMetricDoubleBlockBuilder, CompositeBlock::readFrom),

    /**
     * Intermediate blocks which don't support retrieving elements.
     */
-    UNKNOWN("Unknown", (blockFactory, estimatedSize) -> { throw new UnsupportedOperationException("can't build null blocks"); });
+    UNKNOWN(9, "Unknown", (blockFactory, estimatedSize) -> { throw new UnsupportedOperationException("can't build null blocks"); }, in -> {


Minute point: for future extensibility, maybe space out (multiple of two) the writeable code instead of using consecutive numbers: e.g.:
0 - null
1 - unknown
2-3 - unused
4-15: java primitives (including those not supported yet such as byte)
16-32: rest of the objects (doc, composite, etc..)

I think we can add new element types with the next ids.

server/src/main/java/org/elasticsearch/TransportVersions.java

x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/data/ElementType.java

luigidellaquila

LGTM, thanks Nhat!

server/src/main/java/org/elasticsearch/TransportVersions.java

dnhatn · 2025-03-10T18:53:53Z

Thanks everyone!

Currently, we use NamedWriteable for serializing blocks. While convenient, it incurs a noticeable performance penalty when pages contain thousands of blocks. Since block types are small and already centered in ElementType, we can safely switch from NamedWriteable to typed code. For example, the NamedWriteable alone of a small page with 10K fields would be 180KB, whereas the new method reduces it to 10KB. Below are the serialization improvements with FROM idx | LIMIT 10000 where the target index has 10K fields: - write_exchange_response executed 173 times took: 73.2ms -> 26.7ms - read_exchange_response executed 173 times took: 49.4ms -> 25.8ms

dnhatn · 2025-03-13T23:05:13Z

💚 All backports created successfully

Status	Branch	Result
✅	8.x

Questions ?

Please refer to the Backport tool documentation

Currently, we use NamedWriteable for serializing blocks. While convenient, it incurs a noticeable performance penalty when pages contain thousands of blocks. Since block types are small and already centered in ElementType, we can safely switch from NamedWriteable to typed code. For example, the NamedWriteable alone of a small page with 10K fields would be 180KB, whereas the new method reduces it to 10KB. Below are the serialization improvements with FROM idx | LIMIT 10000 where the target index has 10K fields: - write_exchange_response executed 173 times took: 73.2ms -> 26.7ms - read_exchange_response executed 173 times took: 49.4ms -> 25.8ms (cherry picked from commit 79a1626)

Adjust wire version after backporting to 8.x. Relates #124394

Adjust wire version after backporting to 8.x. Relates elastic#124394

elasticsearchmachine added the v9.1.0 label Mar 8, 2025

dnhatn force-pushed the serialize-block-code branch 11 times, most recently from 8fa9212 to 53f0fb8 Compare March 8, 2025 16:41

Compact serialization of blocks

502d522

dnhatn force-pushed the serialize-block-code branch from 53f0fb8 to 502d522 Compare March 8, 2025 18:15

dnhatn changed the title ~~Serialize block type code~~ Mar 8, 2025

dnhatn added v8.19.0 :Analytics/ES|QL AKA ESQL >enhancement auto-backport Automatically create backport pull requests when merged labels Mar 8, 2025

Update docs/changelog/124394.yaml

54755f2

dnhatn marked this pull request as ready for review March 8, 2025 19:39

dnhatn requested a review from nik9000 March 8, 2025 19:39

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Mar 8, 2025

dnhatn requested review from costin and luigidellaquila March 8, 2025 19:39

dnhatn changed the title ~~Avoid NamedWritable in block serialization~~ Mar 9, 2025

costin approved these changes Mar 10, 2025

View reviewed changes

idegtiarenko reviewed Mar 10, 2025

View reviewed changes

server/src/main/java/org/elasticsearch/TransportVersions.java Show resolved Hide resolved

idegtiarenko reviewed Mar 10, 2025

View reviewed changes

x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/data/ElementType.java Show resolved Hide resolved

idegtiarenko approved these changes Mar 10, 2025

View reviewed changes

luigidellaquila approved these changes Mar 10, 2025

View reviewed changes

server/src/main/java/org/elasticsearch/TransportVersions.java Show resolved Hide resolved

dnhatn added 2 commits March 10, 2025 10:50

immutable

9d7f55b

Merge remote-tracking branch 'elastic/main' into serialize-block-code

e666c6b

dnhatn merged commit 79a1626 into elastic:main Mar 10, 2025
17 checks passed

dnhatn deleted the serialize-block-code branch March 10, 2025 18:54

elasticsearchmachine added the backport pending label Mar 10, 2025

dnhatn mentioned this pull request Mar 13, 2025

[8.x] Speed up block serialization (#124394) #124840

Merged

elastic deleted a comment from elasticsearchmachine Mar 20, 2025

dnhatn mentioned this pull request Mar 20, 2025

Adjust BWC for block serialization #125276

Merged

dnhatn added a commit that referenced this pull request Mar 20, 2025

Adjust bwc for block serialization (#125276)

44a1190

Adjust wire version after backporting to 8.x. Relates #124394

dnhatn removed the backport pending label Mar 20, 2025

smalyshev pushed a commit to smalyshev/elasticsearch that referenced this pull request Mar 21, 2025

Adjust bwc for block serialization (elastic#125276)

0940cc7

Adjust wire version after backporting to 8.x. Relates elastic#124394

omricohenn pushed a commit to omricohenn/elasticsearch that referenced this pull request Mar 28, 2025

Adjust bwc for block serialization (elastic#125276)

1c2c4fe

Adjust wire version after backporting to 8.x. Relates elastic#124394

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up block serialization#124394

Speed up block serialization#124394
dnhatn merged 4 commits intoelastic:mainfrom
dnhatn:serialize-block-code

dnhatn commented Mar 8, 2025 •

edited

Loading

elasticsearchmachine commented Mar 8, 2025

elasticsearchmachine commented Mar 8, 2025

costin left a comment

costin Mar 10, 2025

dnhatn Mar 10, 2025

Uh oh!

Uh oh!

luigidellaquila left a comment

Uh oh!

dnhatn commented Mar 10, 2025

Uh oh!

dnhatn commented Mar 13, 2025

Labels

5 participants

Conversation

dnhatn commented Mar 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

elasticsearchmachine commented Mar 8, 2025

elasticsearchmachine commented Mar 8, 2025

costin left a comment

Choose a reason for hiding this comment

costin Mar 10, 2025

Choose a reason for hiding this comment

dnhatn Mar 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

luigidellaquila left a comment

Choose a reason for hiding this comment

Uh oh!

dnhatn commented Mar 10, 2025

Uh oh!

dnhatn commented Mar 13, 2025

💚 All backports created successfully

Questions ?

Labels

5 participants

dnhatn commented Mar 8, 2025 •

edited

Loading