Skip to content

Speed up OptimizedScalarQuantizer#131599

Merged
iverase merged 7 commits intoelastic:mainfrom
iverase:speed_osq
Jul 22, 2025
Merged

Speed up OptimizedScalarQuantizer#131599
iverase merged 7 commits intoelastic:mainfrom
iverase:speed_osq

Conversation

@iverase
Copy link
Contributor

@iverase iverase commented Jul 21, 2025

While reviewing the code in OptimizedScalarQuantizer, I noticed that we are quantizing the same vector a few times, once when we computing the loss and then again when computing the next grid points.I wondered if we could reuse the valu between those two calls and avoid that repeated computation.

This PR does that, it uses the destination array to keep the quantize value during the loss computation and give to the method computing the grid points. In addition we can skip the final quantization of the vector if the method that optimize the intervals finishes without computing a worst loss.

The only side effect is that we need to remove the legacy method on osq. That's ok as it was only used for benchmark comparison.

The results how a clear speed up in both, scalar and vector variants.

Current values with 128 bits preferred size:

Benchmark                                 (bits)  (dims)   Mode  Cnt    Score    Error   Units
OptimizedScalarQuantizerBenchmark.scalar       1     384  thrpt   15  139.486 ± 23.817  ops/ms
OptimizedScalarQuantizerBenchmark.scalar       1     702  thrpt   15   79.059 ± 14.286  ops/ms
OptimizedScalarQuantizerBenchmark.scalar       1    1024  thrpt   15   50.415 ±  7.558  ops/ms
OptimizedScalarQuantizerBenchmark.scalar       4     384  thrpt   15  136.449 ± 21.873  ops/ms
OptimizedScalarQuantizerBenchmark.scalar       4     702  thrpt   15   69.242 ± 15.013  ops/ms
OptimizedScalarQuantizerBenchmark.scalar       4    1024  thrpt   15   43.425 ±  1.643  ops/ms
OptimizedScalarQuantizerBenchmark.scalar       7     384  thrpt   15  149.420 ± 16.853  ops/ms
OptimizedScalarQuantizerBenchmark.scalar       7     702  thrpt   15   77.437 ±  6.671  ops/ms
OptimizedScalarQuantizerBenchmark.scalar       7    1024  thrpt   15   53.494 ±  7.536  ops/ms
OptimizedScalarQuantizerBenchmark.vector       1     384  thrpt   15  562.416 ± 46.832  ops/ms
OptimizedScalarQuantizerBenchmark.vector       1     702  thrpt   15  306.875 ± 47.434  ops/ms
OptimizedScalarQuantizerBenchmark.vector       1    1024  thrpt   15  216.386 ± 26.207  ops/ms
OptimizedScalarQuantizerBenchmark.vector       4     384  thrpt   15  509.608 ± 85.495  ops/ms
OptimizedScalarQuantizerBenchmark.vector       4     702  thrpt   15  292.796 ± 55.263  ops/ms
OptimizedScalarQuantizerBenchmark.vector       4    1024  thrpt   15  187.569 ± 15.714  ops/ms
OptimizedScalarQuantizerBenchmark.vector       7     384  thrpt   15  539.447 ± 42.931  ops/ms
OptimizedScalarQuantizerBenchmark.vector       7     702  thrpt   15  309.357 ± 27.685  ops/ms
OptimizedScalarQuantizerBenchmark.vector       7    1024  thrpt   15  114.017 ± 71.001  ops/ms

With this PR:

Benchmark                                 (bits)  (dims)   Mode  Cnt    Score     Error   Units
OptimizedScalarQuantizerBenchmark.scalar       1     384  thrpt   15  169.414 ±  23.188  ops/ms
OptimizedScalarQuantizerBenchmark.scalar       1     702  thrpt   15   87.899 ±   9.614  ops/ms
OptimizedScalarQuantizerBenchmark.scalar       1    1024  thrpt   15   62.872 ±  10.971  ops/ms
OptimizedScalarQuantizerBenchmark.scalar       4     384  thrpt   15  161.959 ±  31.947  ops/ms
OptimizedScalarQuantizerBenchmark.scalar       4     702  thrpt   15   81.247 ±   6.511  ops/ms
OptimizedScalarQuantizerBenchmark.scalar       4    1024  thrpt   15   58.583 ±  17.166  ops/ms
OptimizedScalarQuantizerBenchmark.scalar       7     384  thrpt   15  181.835 ±  21.244  ops/ms
OptimizedScalarQuantizerBenchmark.scalar       7     702  thrpt   15   97.614 ±  15.205  ops/ms
OptimizedScalarQuantizerBenchmark.scalar       7    1024  thrpt   15   65.772 ±   9.829  ops/ms
OptimizedScalarQuantizerBenchmark.vector       1     384  thrpt   15  638.882 ±  80.574  ops/ms
OptimizedScalarQuantizerBenchmark.vector       1     702  thrpt   15  369.157 ±  44.456  ops/ms
OptimizedScalarQuantizerBenchmark.vector       1    1024  thrpt   15  245.174 ±  31.757  ops/ms
OptimizedScalarQuantizerBenchmark.vector       4     384  thrpt   15  615.784 ± 110.064  ops/ms
OptimizedScalarQuantizerBenchmark.vector       4     702  thrpt   15  363.637 ±  82.684  ops/ms
OptimizedScalarQuantizerBenchmark.vector       4    1024  thrpt   15  211.976 ±  12.900  ops/ms
OptimizedScalarQuantizerBenchmark.vector       7     384  thrpt   15  686.756 ±  64.638  ops/ms
OptimizedScalarQuantizerBenchmark.vector       7     702  thrpt   15  356.240 ±  37.930  ops/ms
OptimizedScalarQuantizerBenchmark.vector       7    1024  thrpt   15  245.471 ±   6.831  ops/ms
@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Jul 21, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine
Copy link
Collaborator

Hi @iverase, I've created a changelog YAML for you.

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This optimization makes sense to me.

We don't need to keep the legacy interface.

My only concern is making sure recall is unchanged. Looking at the code, all the paths already did a "Math.round" except now some of the paths are using int instead of rounding floats. Which is fine.

The speed ups are hilarious!

@iverase
Copy link
Contributor Author

iverase commented Jul 22, 2025

My only concern is making sure recall is unchanged.

I am pretty sure the new code is equivalent to the old one, we are just caching the results from the resulls of Math.round between function calls.

@iverase iverase merged commit 4468239 into elastic:main Jul 22, 2025
33 checks passed
@iverase iverase deleted the speed_osq branch July 22, 2025 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.2.0

3 participants