Speed up OptimizedScalarQuantizer by iverase · Pull Request #131599 · elastic/elasticsearch

iverase · 2025-07-21T06:48:57Z

While reviewing the code in OptimizedScalarQuantizer, I noticed that we are quantizing the same vector a few times, once when we computing the loss and then again when computing the next grid points.I wondered if we could reuse the valu between those two calls and avoid that repeated computation.

This PR does that, it uses the destination array to keep the quantize value during the loss computation and give to the method computing the grid points. In addition we can skip the final quantization of the vector if the method that optimize the intervals finishes without computing a worst loss.

The only side effect is that we need to remove the legacy method on osq. That's ok as it was only used for benchmark comparison.

The results how a clear speed up in both, scalar and vector variants.

Current values with 128 bits preferred size:

Benchmark                                 (bits)  (dims)   Mode  Cnt    Score    Error   Units
OptimizedScalarQuantizerBenchmark.scalar       1     384  thrpt   15  139.486 ± 23.817  ops/ms
OptimizedScalarQuantizerBenchmark.scalar       1     702  thrpt   15   79.059 ± 14.286  ops/ms
OptimizedScalarQuantizerBenchmark.scalar       1    1024  thrpt   15   50.415 ±  7.558  ops/ms
OptimizedScalarQuantizerBenchmark.scalar       4     384  thrpt   15  136.449 ± 21.873  ops/ms
OptimizedScalarQuantizerBenchmark.scalar       4     702  thrpt   15   69.242 ± 15.013  ops/ms
OptimizedScalarQuantizerBenchmark.scalar       4    1024  thrpt   15   43.425 ±  1.643  ops/ms
OptimizedScalarQuantizerBenchmark.scalar       7     384  thrpt   15  149.420 ± 16.853  ops/ms
OptimizedScalarQuantizerBenchmark.scalar       7     702  thrpt   15   77.437 ±  6.671  ops/ms
OptimizedScalarQuantizerBenchmark.scalar       7    1024  thrpt   15   53.494 ±  7.536  ops/ms
OptimizedScalarQuantizerBenchmark.vector       1     384  thrpt   15  562.416 ± 46.832  ops/ms
OptimizedScalarQuantizerBenchmark.vector       1     702  thrpt   15  306.875 ± 47.434  ops/ms
OptimizedScalarQuantizerBenchmark.vector       1    1024  thrpt   15  216.386 ± 26.207  ops/ms
OptimizedScalarQuantizerBenchmark.vector       4     384  thrpt   15  509.608 ± 85.495  ops/ms
OptimizedScalarQuantizerBenchmark.vector       4     702  thrpt   15  292.796 ± 55.263  ops/ms
OptimizedScalarQuantizerBenchmark.vector       4    1024  thrpt   15  187.569 ± 15.714  ops/ms
OptimizedScalarQuantizerBenchmark.vector       7     384  thrpt   15  539.447 ± 42.931  ops/ms
OptimizedScalarQuantizerBenchmark.vector       7     702  thrpt   15  309.357 ± 27.685  ops/ms
OptimizedScalarQuantizerBenchmark.vector       7    1024  thrpt   15  114.017 ± 71.001  ops/ms

With this PR:

Benchmark                                 (bits)  (dims)   Mode  Cnt    Score     Error   Units
OptimizedScalarQuantizerBenchmark.scalar       1     384  thrpt   15  169.414 ±  23.188  ops/ms
OptimizedScalarQuantizerBenchmark.scalar       1     702  thrpt   15   87.899 ±   9.614  ops/ms
OptimizedScalarQuantizerBenchmark.scalar       1    1024  thrpt   15   62.872 ±  10.971  ops/ms
OptimizedScalarQuantizerBenchmark.scalar       4     384  thrpt   15  161.959 ±  31.947  ops/ms
OptimizedScalarQuantizerBenchmark.scalar       4     702  thrpt   15   81.247 ±   6.511  ops/ms
OptimizedScalarQuantizerBenchmark.scalar       4    1024  thrpt   15   58.583 ±  17.166  ops/ms
OptimizedScalarQuantizerBenchmark.scalar       7     384  thrpt   15  181.835 ±  21.244  ops/ms
OptimizedScalarQuantizerBenchmark.scalar       7     702  thrpt   15   97.614 ±  15.205  ops/ms
OptimizedScalarQuantizerBenchmark.scalar       7    1024  thrpt   15   65.772 ±   9.829  ops/ms
OptimizedScalarQuantizerBenchmark.vector       1     384  thrpt   15  638.882 ±  80.574  ops/ms
OptimizedScalarQuantizerBenchmark.vector       1     702  thrpt   15  369.157 ±  44.456  ops/ms
OptimizedScalarQuantizerBenchmark.vector       1    1024  thrpt   15  245.174 ±  31.757  ops/ms
OptimizedScalarQuantizerBenchmark.vector       4     384  thrpt   15  615.784 ± 110.064  ops/ms
OptimizedScalarQuantizerBenchmark.vector       4     702  thrpt   15  363.637 ±  82.684  ops/ms
OptimizedScalarQuantizerBenchmark.vector       4    1024  thrpt   15  211.976 ±  12.900  ops/ms
OptimizedScalarQuantizerBenchmark.vector       7     384  thrpt   15  686.756 ±  64.638  ops/ms
OptimizedScalarQuantizerBenchmark.vector       7     702  thrpt   15  356.240 ±  37.930  ops/ms
OptimizedScalarQuantizerBenchmark.vector       7    1024  thrpt   15  245.471 ±   6.831  ops/ms

elasticsearchmachine · 2025-07-21T06:49:22Z

Pinging @elastic/es-search-relevance (Team:Search Relevance)

elasticsearchmachine · 2025-07-21T06:49:22Z

Hi @iverase, I've created a changelog YAML for you.

benwtrent

This optimization makes sense to me.

We don't need to keep the legacy interface.

My only concern is making sure recall is unchanged. Looking at the code, all the paths already did a "Math.round" except now some of the paths are using int instead of rounding floats. Which is fine.

The speed ups are hilarious!

iverase · 2025-07-22T12:52:16Z

My only concern is making sure recall is unchanged.

I am pretty sure the new code is equivalent to the old one, we are just caching the results from the resulls of Math.round between function calls.

iverase added 2 commits July 21, 2025 07:34

Speed up OptimizedScalarQuantizer

dd69f08

iter

ea2c8d0

iverase requested review from benwtrent and john-wagster July 21, 2025 06:48

iverase added >enhancement :Search Relevance/Vectors Vector search v9.2.0 labels Jul 21, 2025

elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Jul 21, 2025

iverase and others added 5 commits July 21, 2025 07:49

Update docs/changelog/131599.yaml

1bfecf8

iter

15e873d

Merge branch 'main' into speed_osq

3699c46

iter

cf38523

iter

0e5c5cb

benwtrent approved these changes Jul 22, 2025

View reviewed changes

iverase merged commit 4468239 into elastic:main Jul 22, 2025
33 checks passed

iverase deleted the speed_osq branch July 22, 2025 13:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up OptimizedScalarQuantizer#131599

Speed up OptimizedScalarQuantizer#131599
iverase merged 7 commits intoelastic:mainfrom
iverase:speed_osq

iverase commented Jul 21, 2025

elasticsearchmachine commented Jul 21, 2025

elasticsearchmachine commented Jul 21, 2025

benwtrent left a comment

iverase commented Jul 22, 2025

Uh oh!

Labels

3 participants

Conversation

iverase commented Jul 21, 2025

elasticsearchmachine commented Jul 21, 2025

elasticsearchmachine commented Jul 21, 2025

benwtrent left a comment

Choose a reason for hiding this comment

iverase commented Jul 22, 2025

Uh oh!

Labels

3 participants