ESQL: Speed up TO_IP by nik9000 · Pull Request #126338 · elastic/elasticsearch

nik9000 · 2025-04-04T17:40:25Z

Speed up the TO_IP method by converting directly from utf-8 encoded strings to the ip encoding. Previously we did:

utf-8 -> String -> INetAddress -> ip encoding

In a step towards solving #125460 this creates three IP parsing functions, one the rejects leading zeros, one that interprets leading zeros as decimal numbers, and one the interprets leading zeros as octal numbers. IPs have historically been parsed in all three of those ways.

This plugs the "rejects leading zeros" parser into TO_IP because that's the behavior it had before.

Here is the performance:

Benchmark               Score    Error  Units
leadingZerosAreDecimal  14.007 ± 0.093  ns/op
leadingZerosAreOctal    15.020 ± 0.373  ns/op
leadingZerosRejected    14.176 ± 3.861  ns/op
original                32.950 ± 1.062  ns/op

So this is roughly 45% faster than what we had.

Speed up the TO_IP method by converting directly from utf-8 encoded strings to the ip encoding. Previously we did: ``` utf-8 -> String -> INetAddress -> ip encoding ``` In a step towards solving elastic#125460 this creates three IP parsing functions, one the rejects leading zeros, one that interprets leading zeros as decimal numbers, and one the interprets leading zeros as octal numbers. IPs have historically been parsed in all three of those ways. This plugs the "rejects leading zeros" parser into `TO_IP` because that's the behavior it had before. Here is the performance: ``` Benchmark Score Error Units leadingZerosAreDecimal 14.007 ± 0.093 ns/op leadingZerosAreOctal 15.020 ± 0.373 ns/op leadingZerosRejected 14.176 ± 3.861 ns/op original 32.950 ± 1.062 ns/op ``` So this is roughly 45% faster than what we had.

elasticsearchmachine · 2025-04-04T17:40:49Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

elasticsearchmachine · 2025-04-04T17:40:49Z

Hi @nik9000, I've created a changelog YAML for you.

…nto benchmark_instructions_jdk24

elasticsearchmachine · 2025-04-07T13:36:25Z

💔 Backport failed

Status	Branch	Result
❌	9.0	Commit could not be cherrypicked due to conflicts
❌	8.x	Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 126338

nik9000 · 2025-04-07T13:40:50Z

Note so we don't lose it: There's some talk about use SIMD to parse ipv4 addresses. They fit in a single SIMD register after all. The reading I did doesn't discuss leading zeros so that'd be fun! Anyway, I wanted something I could backport easily and our ways of directly accessing SIMD code didn't look super easy to backport at the moment, so I left it for later.

nik9000 · 2025-04-07T13:41:02Z

Commit could not be cherrypicked due to conflicts

Betrayal!

Speed up the TO_IP method by converting directly from utf-8 encoded strings to the ip encoding. Previously we did: ``` utf-8 -> String -> INetAddress -> ip encoding ``` In a step towards solving elastic#125460 this creates three IP parsing functions, one the rejects leading zeros, one that interprets leading zeros as decimal numbers, and one the interprets leading zeros as octal numbers. IPs have historically been parsed in all three of those ways. This plugs the "rejects leading zeros" parser into `TO_IP` because that's the behavior it had before. Here is the performance: ``` Benchmark Score Error Units leadingZerosAreDecimal 14.007 ± 0.093 ns/op leadingZerosAreOctal 15.020 ± 0.373 ns/op leadingZerosRejected 14.176 ± 3.861 ns/op original 32.950 ± 1.062 ns/op ``` So this is roughly 45% faster than what we had. This includes a big chunk of elastic#124676 - but not the behavior change - just the code that allowed it.

nik9000 · 2025-04-07T21:31:57Z

9.0: #126431
8.x: #126433

Speed up the TO_IP method by converting directly from utf-8 encoded strings to the ip encoding. Previously we did: ``` utf-8 -> String -> INetAddress -> ip encoding ``` In a step towards solving elastic#125460 this creates three IP parsing functions, one the rejects leading zeros, one that interprets leading zeros as decimal numbers, and one the interprets leading zeros as octal numbers. IPs have historically been parsed in all three of those ways. This plugs the "rejects leading zeros" parser into `TO_IP` because that's the behavior it had before. Here is the performance: ``` Benchmark Score Error Units leadingZerosAreDecimal 14.007 ± 0.093 ns/op leadingZerosAreOctal 15.020 ± 0.373 ns/op leadingZerosRejected 14.176 ± 3.861 ns/op original 32.950 ± 1.062 ns/op ``` So this is roughly 45% faster than what we had. This includes a big chunk of elastic#124676 - but not the behavior change - just the code that allowed it.

Speed up the TO_IP method by converting directly from utf-8 encoded strings to the ip encoding. Previously we did: ``` utf-8 -> String -> INetAddress -> ip encoding ``` In a step towards solving #125460 this creates three IP parsing functions, one the rejects leading zeros, one that interprets leading zeros as decimal numbers, and one the interprets leading zeros as octal numbers. IPs have historically been parsed in all three of those ways. This plugs the "rejects leading zeros" parser into `TO_IP` because that's the behavior it had before. Here is the performance: ``` Benchmark Score Error Units leadingZerosAreDecimal 14.007 ± 0.093 ns/op leadingZerosAreOctal 15.020 ± 0.373 ns/op leadingZerosRejected 14.176 ± 3.861 ns/op original 32.950 ± 1.062 ns/op ``` So this is roughly 45% faster than what we had. This includes a big chunk of #124676 - but not the behavior change - just the code that allowed it.

nik9000 · 2025-04-08T12:40:44Z

Backports in.

I wrote an `&&` when I meant and `||` in elastic#126338 and that caused some impressive looking line noise to parse as valid ipv4 addresses. Randomized tests caught it eventually.

I wrote an `&&` when I meant and `||` in #126338 and that caused some impressive looking line noise to parse as valid ipv4 addresses. Randomized tests caught it eventually.

I wrote an `&&` when I meant and `||` in elastic#126338 and that caused some impressive looking line noise to parse as valid ipv4 addresses. Randomized tests caught it eventually.

I wrote an `&&` when I meant and `||` in #126338 and that caused some impressive looking line noise to parse as valid ipv4 addresses. Randomized tests caught it eventually.

nik9000 added >enhancement :Analytics/ES|QL AKA ESQL v9.0.0 v8.19.0 v9.1.0 labels Apr 4, 2025

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Apr 4, 2025

Update docs/changelog/126338.yaml

079e273

nik9000 added 2 commits April 4, 2025 15:34

Update instructions more

faf9b2f

Merge remote-tracking branch 'nik9000/benchmark_instructions_jdk24' i…

ca2cd63

…nto benchmark_instructions_jdk24

idegtiarenko approved these changes Apr 7, 2025

View reviewed changes

nik9000 added the auto-backport Automatically create backport pull requests when merged label Apr 7, 2025

nik9000 merged commit 7e1e45e into elastic:main Apr 7, 2025
17 checks passed

elasticsearchmachine added the backport pending label Apr 7, 2025

nik9000 mentioned this pull request Apr 29, 2025

ESQL: No, line noise isn't a valid ip #127527

Merged

nik9000 removed the backport pending label Apr 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ESQL: Speed up TO_IP#126338

ESQL: Speed up TO_IP#126338
nik9000 merged 4 commits intoelastic:mainfrom
nik9000:benchmark_instructions_jdk24

nik9000 commented Apr 4, 2025

elasticsearchmachine commented Apr 4, 2025

elasticsearchmachine commented Apr 4, 2025

Uh oh!

elasticsearchmachine commented Apr 7, 2025

nik9000 commented Apr 7, 2025

nik9000 commented Apr 7, 2025

nik9000 commented Apr 7, 2025

nik9000 commented Apr 8, 2025

Labels

3 participants

Conversation

nik9000 commented Apr 4, 2025

elasticsearchmachine commented Apr 4, 2025

elasticsearchmachine commented Apr 4, 2025

Uh oh!

elasticsearchmachine commented Apr 7, 2025

💔 Backport failed

nik9000 commented Apr 7, 2025

nik9000 commented Apr 7, 2025

nik9000 commented Apr 7, 2025

nik9000 commented Apr 8, 2025

Labels

3 participants