Skip to content

Introduce an adaptive HNSW Patience collector#138685

Merged
tteofili merged 29 commits intoelastic:mainfrom
tteofili:hnsw_adaptive_patience_collector
Dec 12, 2025
Merged

Introduce an adaptive HNSW Patience collector#138685
tteofili merged 29 commits intoelastic:mainfrom
tteofili:hnsw_adaptive_patience_collector

Conversation

@tteofili
Copy link
Contributor

@tteofili tteofili commented Nov 26, 2025

This introduces an extension of Lucene's HnswQueueSaturationCollector that avoids any static parameters for patience and saturation threshold.
HnswQueueSaturationCollector patience parameter depends on the k param, which is also manipulated by our query API, because of num_candidates, making one such static param less controllable.
Instead of a static queue saturation and patience setting, this collector accumulates a smoothed discovery rate and an adaptive saturation threshold based on discovery rate mean and stdDev.
This is likely to work better with different doc to doc and query to vector distributions.

@tteofili
Copy link
Contributor Author

Buildkite benchmark this with so-vector please

@tteofili
Copy link
Contributor Author

tteofili commented Nov 26, 2025

baseline (early_termination=false)

index_name       index_type  visit_percentage(%)  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall   visited  filter_selectivity  filter_cached  oversampling_factor
---------------  ----------  -------------------  -----------  ----------------  -------------  ------  ------  --------  ------------------  -------------  -------------------  
wiki1024en.docs        hnsw                0.000         1.47              5.67           3.86  680.27    0.97  40448.13                1.00           true                 3.00
fiqa-en.docs           hnsw                0.000         0.25              0.67           2.62  3937.01    0.93  8683.86                1.00           true                 3.00
quora-E5-small         hnsw                0.000         1.14              4.16           3.65  877.19    0.97  35952.75                1.00           true                 3.00

baseline (early_termination=true with Lucene's defaults)

index_name       index_type  visit_percentage(%)  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall   visited  filter_selectivity  filter_cached  oversampling_factor
---------------  ----------  -------------------  -----------  ----------------  -------------  ------  ------  --------  ------------------  -------------  -------------------  
wiki1024en.docs        hnsw                0.000         1.50              5.67           3.78  666.67    0.97  40448.13                1.00           true                 3.00
fiqa-en.docs           hnsw                0.000         0.26              0.71           2.74  3846.15    0.93  8608.64                1.00           true                 3.00
quora-E5-small         hnsw                0.000         1.21              4.33           3.58  826.45    0.97  35952.75                1.00           true                 3.00

baseline (early_termination=true with p=max(7,k*0.1), s=0.995, see #130564 (comment))

index_name       index_type  visit_percentage(%)  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall   visited  filter_selectivity  filter_cached  oversampling_factor
---------------  ----------  -------------------  -----------  ----------------  -------------  ------  ------  --------  ------------------  -------------  -------------------  
wiki1024en.docs        hnsw                0.000         1.26              4.55           3.61  793.65    0.97  31300.27                1.00           true                 3.00
fiqa-en.docs           hnsw                0.000         0.23              0.59           2.54  4310.34    0.93  7431.86                1.00           true                 3.00
quora-E5-small.        hnsw                0.000         1.04              3.56           3.42  961.54    0.97  27188.93                1.00           true                 3.00

candidate

index_name       index_type  visit_percentage(%)  latency(ms)  net_cpu_time(ms)  avg_cpu_count      QPS  recall   visited  filter_selectivity  filter_cached  oversampling_factor
---------------  ----------  -------------------  -----------  ----------------  -------------  -------  ------  --------  ------------------  -------------  -------------------  
wiki1024en.docs        hnsw                0.000         0.85              2.46           2.89  1176.47    0.97  16543.96                1.00           true                 3.00
fiqa-en.docs           hnsw                0.000         0.24              0.57           2.40  4201.68    0.92  6755.88                1.00           true                 3.00
quora-E5-small         hnsw                0.000         1.80              2.52           1.40  555.56    0.96  14069.28                1.00           true                 3.00

the candidate is much faster and more lightweight (much less visited) than the current collector with tweaked params, although with 1% recall loss (with 3x oversampling) on bbq_hnsw (current default).

@benwtrent
Copy link
Member

@tteofili I wonder if the adaptive collection is impacted by quantization loss...Consider as we lose information, the distances might be harder to distinguish.

An adaptive collector makes much more sense to me than a static value, and the performance impact here (with just 1% recall change), is very interesting indeed! Looks like a worthwhile investigation.

@tteofili
Copy link
Contributor Author

tteofili commented Dec 3, 2025

adding some more experiments for least quantized / unquantized HNSW.

hnsw

baseline

index_name       index_type  visit_percentage(%)  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall   visited  filter_selectivity  filter_cached  oversampling_factor
---------------  ----------  -------------------  -----------  ----------------  -------------  ------  ------  --------  ------------------  -------------  -------------------  
wiki1024en.docs        hnsw                0.000         3.32             20.77           6.26  301.20    1.00  38185.25                1.00           true                 3.00
fiqa-en.docs           hnsw                0.000         0.70              3.94           5.61  1424.50    1.00  8525.93                1.00           true                 3.00
quora-E5-small         hnsw                0.000         1.95             10.30           5.28  512.82    0.99  34262.71                1.00           true                 3.00

baseline (early_termination=true with Lucene's defaults)

index_name       index_type  visit_percentage(%)  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall   visited  filter_selectivity  filter_cached  oversampling_factor
---------------  ----------  -------------------  -----------  ----------------  -------------  ------  ------  --------  ------------------  -------------  -------------------  
wiki1024en.docs        hnsw                0.000         2.47             13.94           5.64  404.86    1.00  26590.62                1.00           true                 3.00
fiqa-en.docs           hnsw                0.000         0.65              3.57           5.51  1543.21    1.00  8090.53                1.00           true                 3.00
quora-E5-small         hnsw                0.000         1.81              8.06           4.45  552.49    0.99  23787.85                1.00           true                 3.00

baseline (early_termination=true with p=max(7,k*0.1), s=0.995

index_name       index_type  visit_percentage(%)  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall   visited  filter_selectivity  filter_cached  oversampling_factor
---------------  ----------  -------------------  -----------  ----------------  -------------  ------  ------  --------  ------------------  -------------  -------------------  
wiki1024en.docs        hnsw                0.000         1.84              9.65           5.24  543.48    1.00  18529.00                1.00           true                 3.00
fiqa-en.docs           hnsw                0.000         0.59              2.94           4.99  1694.92    0.99  6469.23                1.00           true                 3.00
quora-E5-small         hnsw                0.000         1.31              5.11           3.90  763.36    0.99  16568.72                1.00           true                 3.00

candidate

index_name       index_type  visit_percentage(%)  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall   visited  filter_selectivity  filter_cached  oversampling_factor
---------------  ----------  -------------------  -----------  ----------------  -------------  ------  ------  --------  ------------------  -------------  -------------------  
wiki1024en.docs        hnsw                0.000         1.62              8.01           4.94  617.28    1.00  15601.90                1.00           true                 3.00
fiqa-en.docs.          hnsw                0.000         0.77              3.36           4.36  1298.70    0.99  6544.63                1.00           true                 3.00
quora-E5-small        hnsw                0.000         1.44              5.50           3.82  694.44    0.98  13227.69                1.00           true                 3.00

int8_hnsw

baseline

index_name       index_type  visit_percentage(%)  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall   visited  filter_selectivity  filter_cached  oversampling_factor
---------------  ----------  -------------------  -----------  ----------------  -------------  ------  ------  --------  ------------------  -------------  -------------------  
wiki1024en.docs        hnsw                0.000         2.45             12.50           5.10  408.16    1.00  38092.98                1.00           true                 3.00
fiqa-en.docs           hnsw                0.000         0.57              2.34           4.13  1760.56    1.00  8517.96                1.00           true                 3.00
quora-E5-small         hnsw                0.000         1.75              6.68           3.82  571.43    1.00  34263.00                1.00           true                 3.00

baseline (early_termination=true with Lucene's defaults)

index_name       index_type  visit_percentage(%)  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall   visited  filter_selectivity  filter_cached  oversampling_factor
---------------  ----------  -------------------  -----------  ----------------  -------------  ------  ------  --------  ------------------  -------------  -------------------  
wiki1024en.docs        hnsw                0.000         1.92              8.64           4.50  520.83    1.00  26505.97                1.00           true                 3.00
fiqa-en.docs           hnsw                0.000         0.49              2.06           4.20  2040.82    1.00  8093.01                1.00           true                 3.00
quora-E5-small         hnsw                0.000         1.25              4.38           3.50  800.00    0.99  16385.81                1.00           true                 3.00

baseline (early_termination=true with p=max(7,k*0.1), s=0.995

index_name       index_type  visit_percentage(%)  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall   visited  filter_selectivity  filter_cached  oversampling_factor
---------------  ----------  -------------------  -----------  ----------------  -------------  ------  ------  --------  ------------------  -------------  -------------------  
wiki1024en.docs        hnsw                0.000         1.49              6.02           4.04  671.14    1.00  18397.55                1.00           true                 3.00
fiqa-en.docs           hnsw                0.000         0.45              1.32           2.94  2232.14    0.99  6476.45                1.00           true                 3.00
quora-E5-small         hnsw                0.000         1.06              3.03           2.86  943.40    0.99  16385.81                1.00           true                 3.00

candidate

index_name       index_type  visit_percentage(%)  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall   visited  filter_selectivity  filter_cached  oversampling_factor
---------------  ----------  -------------------  -----------  ----------------  -------------  ------  ------  --------  ------------------  -------------  -------------------  
wiki1024en.docs        hnsw                0.000         1.29              4.84           3.75  775.19    0.99  15550.11                1.00           true                 3.00
fiqa-en.docs           hnsw                0.000         0.44              1.64           3.72  2272.73    0.98  6402.80                1.00           true                 3.00
quora-E5-small         hnsw                0.000         0.81              2.30           2.84  1234.57    0.98  13018.14                1.00           true                 3.00

int4_hnsw

baseline

index_name       index_type  visit_percentage(%)  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall   visited  filter_selectivity  filter_cached  oversampling_factor
---------------  ----------  -------------------  -----------  ----------------  -------------  ------  ------  --------  ------------------  -------------  -------------------  
wiki1024en.docs        hnsw                0.000         3.93             20.09           5.11  254.45    1.00  38296.08                1.00           true                 3.00
fiqa-en.docs           hnsw                0.000         0.77              3.49           4.54  1302.08    0.99  8568.21                1.00           true                 3.00
quora-E5-small         hnsw                0.000         2.32             10.03           4.32  431.03    0.99  34442.25                1.00           true                 3.00

baseline (early_termination=true with Lucene's defaults)

index_name       index_type  visit_percentage(%)  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall   visited  filter_selectivity  filter_cached  oversampling_factor
---------------  ----------  -------------------  -----------  ----------------  -------------  ------  ------  --------  ------------------  -------------  -------------------  
wiki1024en.docs        hnsw                0.000         2.80             13.98           4.99  357.14    1.00  26799.29                1.00           true                 3.00
fiqa-en.docs           hnsw                0.000         0.72              2.99           4.16  1388.89    0.99  8138.12                1.00           true                 3.00
quora-E5-small         hnsw                0.000         2.18              8.73           4.00  458.72    0.99  24107.82                1.00           true                 3.00

baseline (early_termination=true with p=max(7,k*0.1), s=0.995

index_name       index_type  visit_percentage(%)  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall   visited  filter_selectivity  filter_cached  oversampling_factor
---------------  ----------  -------------------  -----------  ----------------  -------------  ------  ------  --------  ------------------  -------------  -------------------  
wiki1024en.docs        hnsw                0.000         2.81             11.41           4.06  355.87    1.00  18696.71                1.00           true                 3.00
fiqa-en.docs           hnsw                0.000         0.58              2.31           3.97  1718.21    0.99  6541.55                1.00           true                 3.00
quora-E5-small         hnsw                0.000         1.32              4.79           3.63  757.58    0.99  16651.42                1.00           true                 3.00

candidate

index_name       index_type  visit_percentage(%)  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall   visited  filter_selectivity  filter_cached  oversampling_factor
---------------  ----------  -------------------  -----------  ----------------  -------------  ------  ------  --------  ------------------  -------------  -------------------  
wiki1024en.docs        hnsw                0.000         1.74              7.96           4.57  574.71    0.99  15695.13                1.00           true                 3.00
fiqa-en.docs          hnsw                0.000         0.59              2.36           3.97  1683.50    0.98  6500.19                1.00           true                 3.00
quora-E5-small         hnsw                0.000         1.15              3.96           3.44  869.57    0.98  13306.62                1.00           true                 3.00
@benwtrent
Copy link
Member

Its frankly pretty amazing how recall is almost exactly the same with more than 2x fewer vectors visited.

@tteofili
Copy link
Contributor Author

tteofili commented Dec 3, 2025

I'm also running some experiments to see how this behaves across filtering selectivity.

@tteofili tteofili marked this pull request as ready for review December 5, 2025 14:10
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Dec 5, 2025
@tteofili
Copy link
Contributor Author

tteofili commented Dec 5, 2025

the filter results are inline with the non filtered experiments above (max 2% recall at ~2x less visited).

@elasticsearchmachine elasticsearchmachine added Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch and removed needs:triage Requires assignment of a team area label labels Dec 5, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @tteofili, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@benwtrent
Copy link
Member

I want to run a benchmark...but once I am done with that, I will respond back.

The numbers are very encouraging. It seems like a no brainer to me :)

@benwtrent
Copy link
Member

For larger scoring vectors (e.g. max inner product), I am seeing a recall drop of 5% fairly consistently across the board (the story is the same with various quantization levels):

index_name                      index_type  visit_percentage(%)  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall   visited  filter_selectivity  filter_cached  oversampling_factor
------------------------------  ----------  -------------------  -----------  ----------------  -------------  ------  ------  --------  ------------------  -------------  -------------------
cohere-wikipedia-docs-768d.vec        hnsw                0.000         4.86              0.00           0.00  205.76    0.92  14215.24                1.00           true                 0.00
cohere-wikipedia-docs-768d.vec        hnsw                0.000         9.54              0.00           0.00  104.82    0.95  21314.09                1.00           true                 0.00
cohere-wikipedia-docs-768d.vec        hnsw                0.000         9.61              0.00           0.00  104.06    0.97  30024.02                1.00           true                 0.00

vs

index_name                      index_type  quantized_bits  num_candidates  visit_percentage(%)  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall   visited  filter_selectivity  filter_cached  oversampling_factor
------------------------------  ----------  --------------  --------------  -------------------  -----------  ----------------  -------------  ------  ------  --------  ------------------  -------------  -------------------
cohere-wikipedia-docs-768d.vec        hnsw              32             100                0.000         3.82              0.00           0.00  261.78    0.88  11822.33                1.00           true                 0.00
cohere-wikipedia-docs-768d.vec        hnsw              32             250                0.000         4.50              0.00           0.00  222.22    0.90  14721.81                1.00           true                 0.00
cohere-wikipedia-docs-768d.vec        hnsw              32             500                0.000         5.15              0.00           0.00  194.17    0.92  17172.22                1.00           true                 0.00
@benwtrent
Copy link
Member

benwtrent commented Dec 6, 2025

Here are all the runs in a row. 4M vectors over 16 segments.

I am not sure about the graph density or anything, let me know if I can provide additional info.

index_name                      index_type  quantized_bits  num_candidates  early_termination  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall   visited  oversampling_factor
------------------------------  ----------  --------------  --------------  -----------------  -----------  ----------------  -------------  ------  ------  --------  -------------------
cohere-wikipedia-docs-768d.vec        hnsw              32             100              false         4.42              0.00           0.00  226.24    0.92  14215.24                 0.00
cohere-wikipedia-docs-768d.vec        hnsw              32             100               true         3.50              0.00           0.00  285.71    0.88  11822.33                 0.00
cohere-wikipedia-docs-768d.vec        hnsw              32             250              false         6.55              0.00           0.00  152.67    0.95  21314.09                 0.00
cohere-wikipedia-docs-768d.vec        hnsw              32             250               true         4.27              0.00           0.00  234.19    0.90  14721.81                 0.00
cohere-wikipedia-docs-768d.vec        hnsw              32             500              false         8.43              0.00           0.00  118.62    0.97  30024.02                 0.00
cohere-wikipedia-docs-768d.vec        hnsw              32             500               true         4.93              0.00           0.00  202.84    0.92  17172.22                 0.00
cohere-wikipedia-docs-768d.vec        hnsw               7             100              false         3.55              0.00           0.00  281.69    0.86  14343.65                 0.00
cohere-wikipedia-docs-768d.vec        hnsw               7             100               true         2.84              0.00           0.00  352.11    0.83  11934.26                 0.00
cohere-wikipedia-docs-768d.vec        hnsw               7             250              false         5.25              0.00           0.00  190.48    0.88  21521.75                 0.00
cohere-wikipedia-docs-768d.vec        hnsw               7             250               true         3.59              0.00           0.00  278.55    0.85  14926.82                 0.00
cohere-wikipedia-docs-768d.vec        hnsw               7             500              false         7.38              0.00           0.00  135.50    0.89  30337.36                 0.00
cohere-wikipedia-docs-768d.vec        hnsw               7             500               true         4.02              0.00           0.00  248.76    0.86  17309.01                 0.00
cohere-wikipedia-docs-768d.vec        hnsw               4             100              false         8.35              0.00           0.00  119.76    0.59  20520.01                 0.00
cohere-wikipedia-docs-768d.vec        hnsw               4             100               true         7.43              0.00           0.00  134.59    0.59  18509.20                 0.00
cohere-wikipedia-docs-768d.vec        hnsw               4             250              false        12.47              0.00           0.00   80.19    0.60  30289.52                 0.00
cohere-wikipedia-docs-768d.vec        hnsw               4             250               true         9.37              0.00           0.00  106.72    0.59  23279.30                 0.00
cohere-wikipedia-docs-768d.vec        hnsw               4             500              false        17.41              0.00           0.00   57.44    0.60  41979.97                 0.00
cohere-wikipedia-docs-768d.vec        hnsw               4             500               true        10.99              0.00           0.00   90.99    0.59  26711.21                 0.00
cohere-wikipedia-docs-768d.vec        hnsw               1             100              false         4.41              0.00           0.00  226.76    0.91  24878.19                 3.00
cohere-wikipedia-docs-768d.vec        hnsw               1             100               true         3.20              0.00           0.00  312.50    0.87  16579.68                 3.00
cohere-wikipedia-docs-768d.vec        hnsw               1             250              false         4.69              0.00           0.00  213.22    0.91  24878.19                 3.00
cohere-wikipedia-docs-768d.vec        hnsw               1             250               true         3.14              0.00           0.00  318.47    0.87  16579.68                 3.00
cohere-wikipedia-docs-768d.vec        hnsw               1             500              false         5.70              0.00           0.00  175.44    0.92  31921.15                 3.00
cohere-wikipedia-docs-768d.vec        hnsw               1             500               true         3.38              0.00           0.00  295.86    0.88  18435.73                 3.00
cohere-wikipedia-docs-768d.vec        hnsw               1             100              false         2.46              0.00           0.00  406.50    0.69  15067.47                 0.00
cohere-wikipedia-docs-768d.vec        hnsw               1             100               true         2.17              0.00           0.00  460.83    0.68  12525.45                 0.00
cohere-wikipedia-docs-768d.vec        hnsw               1             250              false         3.97              0.00           0.00  251.89    0.70  22534.66                 0.00
cohere-wikipedia-docs-768d.vec        hnsw               1             250               true         2.62              0.00           0.00  381.68    0.69  15655.45                 0.00
cohere-wikipedia-docs-768d.vec        hnsw               1             500              false         5.48              0.00           0.00  182.48    0.70  31621.15                 0.00
cohere-wikipedia-docs-768d.vec        hnsw               1             500               true         3.18              0.00           0.00  314.47    0.69  18135.73                 0.00

EDIT:

I also did this same dataset with a force-merge to test an extreme case. Good news is that recall in multi-segment is still higher than single segment, even with the more aggressive early termination checks. However, it does seem to indicate to me that maybe we don't do early termination when there are very few segments, or a single segment...

Single segment recall (obviously, ignore qps and latency...focus on visited vs. recall):

index_name                      index_type  quantized_bits  num_candidates  early_termination  latency(ms)  net_cpu_time(ms)  avg_cpu_count      QPS  recall  visited  oversampling_factor
------------------------------  ----------  --------------  --------------  -----------------  -----------  ----------------  -------------  -------  ------  -------  -------------------
cohere-wikipedia-docs-768d.vec        hnsw              32             100              false         0.55              0.00           0.00  1818.18    0.82  1737.21                 0.00
cohere-wikipedia-docs-768d.vec        hnsw              32             100               true         0.35              0.00           0.00  2857.14    0.73  1055.95                 0.00
cohere-wikipedia-docs-768d.vec        hnsw              32             250              false         0.97              0.00           0.00  1030.93    0.90  3649.02                 0.00
cohere-wikipedia-docs-768d.vec        hnsw              32             250               true         0.39              0.00           0.00  2564.10    0.78  1363.25                 0.00
cohere-wikipedia-docs-768d.vec        hnsw              32             500              false         1.80              0.00           0.00   555.56    0.93  6600.57                 0.00
cohere-wikipedia-docs-768d.vec        hnsw              32             500               true         0.50              0.00           0.00  2000.00    0.83  1788.34                 0.00
@benwtrent
Copy link
Member

One other thing I an SLIGHTLY concerned about is making sure if our "visited/recall" curve is improved with the change. My concern is that we are just visiting less, but in visiting less, we just get the same relative recall drop. So ultimately, this doesn't significantly help things anymore than what we are doing now.

Copy link
Contributor

@pmpailis pmpailis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀 - running some benchmarks as well, but seems a reasonable change.

@benwtrent
Copy link
Member

I reran with the current adaptive approach utilizing k, and my numbers are way better. Of course, latency is not the same, but the recall is much better within the ball-park:

index_name                                                      index_type  visit_percentage(%)  latency(ms)  net_cpu_time(ms)  avg_cpu_count      QPS  recall   visited  filter_selectivity  filter_cached  oversampling_factor
--------------------------------------------------------------  ----------  -------------------  -----------  ----------------  -------------  -------  ------  --------  ------------------  -------------  -------------------
target/knn_index/cohere-wikipedia-docs-768d.vec-16-200.index          hnsw                0.000         4.15              0.00           0.00   240.96    0.91  13566.30                1.00           true                 0.00
target/knn_index/cohere-wikipedia-docs-768d.vec-16-200.index          hnsw                0.000         6.82              0.00           0.00   146.63    0.95  19569.17                1.00           true                 0.00
target/knn_index/cohere-wikipedia-docs-768d.vec-16-200.index          hnsw                0.000         7.73              0.00           0.00   129.37    0.96  25971.29                1.00           true                 0.00
target/knn_index/cohere-wikipedia-docs-768d.vec-16-200-4.index        hnsw                0.000         8.89              0.00           0.00   112.49    0.59  20049.69                1.00           true                 0.00
target/knn_index/cohere-wikipedia-docs-768d.vec-16-200-4.index        hnsw                0.000        12.90              0.00           0.00    77.52    0.60  29012.23                1.00           true                 0.00
target/knn_index/cohere-wikipedia-docs-768d.vec-16-200-4.index        hnsw                0.000        17.07              0.00           0.00    58.58    0.59  38889.21                1.00           true                 0.00
target/knn_index/cohere-wikipedia-docs-768d.vec-16-200-1.index        hnsw                0.000         4.71              0.00           0.00   212.31    0.90  22960.01                1.00           true                 3.00
target/knn_index/cohere-wikipedia-docs-768d.vec-16-200-1.index        hnsw                0.000         4.61              0.00           0.00   216.92    0.90  22960.01                1.00           true                 3.00
target/knn_index/cohere-wikipedia-docs-768d.vec-16-200-1.index        hnsw                0.000         5.68              0.00           0.00   176.06    0.91  28407.71                1.00           true                 3.00
target/knn_index/cohere-wikipedia-docs-768d.vec-16-200-1.index        hnsw                0.000         2.85              0.00           0.00   350.88    0.69  14535.57                1.00           true                 0.00
target/knn_index/cohere-wikipedia-docs-768d.vec-16-200-1.index        hnsw                0.000         4.18              0.00           0.00   239.23    0.70  20868.47                1.00           true                 0.00
target/knn_index/cohere-wikipedia-docs-768d.vec-16-200-1.index        hnsw                0.000         5.70              0.00           0.00   175.44    0.70  28107.71                1.00           true                 0.00
Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My tests show that recall change is 0-1% over all quantization levels.

Additionally, for certain data distributions, the num visited improves significantly. I am all for this.

:shipit:

@tteofili tteofili enabled auto-merge (squash) December 12, 2025 16:36
@tteofili tteofili disabled auto-merge December 12, 2025 16:44
@tteofili tteofili enabled auto-merge (squash) December 12, 2025 16:44
@benwtrent benwtrent self-assigned this Dec 12, 2025
@tteofili tteofili merged commit fd9f0ff into elastic:main Dec 12, 2025
35 checks passed
parkertimmins pushed a commit to parkertimmins/elasticsearch that referenced this pull request Dec 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.3.0

4 participants