Introduce an adaptive HNSW Patience collector#138685
Conversation
|
Buildkite benchmark this with so-vector please |
|
baseline (early_termination=false) baseline (early_termination=true with Lucene's defaults) baseline (early_termination=true with p=max(7,k*0.1), s=0.995, see #130564 (comment)) candidate the candidate is much faster and more lightweight (much less |
|
@tteofili I wonder if the adaptive collection is impacted by quantization loss...Consider as we lose information, the distances might be harder to distinguish. An adaptive collector makes much more sense to me than a static value, and the performance impact here (with just 1% recall change), is very interesting indeed! Looks like a worthwhile investigation. |
|
adding some more experiments for least quantized / unquantized HNSW. hnswbaseline baseline (early_termination=true with Lucene's defaults) baseline (early_termination=true with p=max(7,k*0.1), s=0.995 candidate int8_hnswbaseline baseline (early_termination=true with Lucene's defaults) baseline (early_termination=true with p=max(7,k*0.1), s=0.995 candidate int4_hnswbaseline baseline (early_termination=true with Lucene's defaults) baseline (early_termination=true with p=max(7,k*0.1), s=0.995 candidate |
…i/elasticsearch into hnsw_adaptive_patience_collector
|
Its frankly pretty amazing how recall is almost exactly the same with more than 2x fewer vectors visited. |
|
I'm also running some experiments to see how this behaves across filtering selectivity. |
…ptive_patience_collector
…ptive_patience_collector
|
the filter results are inline with the non filtered experiments above (max 2% recall at ~2x less visited). |
|
Hi @tteofili, I've created a changelog YAML for you. |
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
…i/elasticsearch into hnsw_adaptive_patience_collector
|
I want to run a benchmark...but once I am done with that, I will respond back. The numbers are very encouraging. It seems like a no brainer to me :) |
|
For larger scoring vectors (e.g. max inner product), I am seeing a recall drop of 5% fairly consistently across the board (the story is the same with various quantization levels): vs |
|
Here are all the runs in a row. 4M vectors over 16 segments. I am not sure about the graph density or anything, let me know if I can provide additional info. EDIT: I also did this same dataset with a force-merge to test an extreme case. Good news is that recall in multi-segment is still higher than single segment, even with the more aggressive early termination checks. However, it does seem to indicate to me that maybe we don't do early termination when there are very few segments, or a single segment... Single segment recall (obviously, ignore qps and latency...focus on visited vs. recall): |
|
One other thing I an SLIGHTLY concerned about is making sure if our "visited/recall" curve is improved with the change. My concern is that we are just visiting less, but in visiting less, we just get the same relative recall drop. So ultimately, this doesn't significantly help things anymore than what we are doing now. |
server/src/main/java/org/elasticsearch/search/vectors/AdaptiveHnswQueueSaturationCollector.java
Outdated
Show resolved
Hide resolved
…ptive_patience_collector
…i/elasticsearch into hnsw_adaptive_patience_collector
|
I reran with the current adaptive approach utilizing |
benwtrent
left a comment
There was a problem hiding this comment.
My tests show that recall change is 0-1% over all quantization levels.
Additionally, for certain data distributions, the num visited improves significantly. I am all for this.
![]()
…i/elasticsearch into hnsw_adaptive_patience_collector
This introduces an extension of Lucene's
HnswQueueSaturationCollectorthat avoids any static parameters for patience and saturation threshold.HnswQueueSaturationCollectorpatience parameter depends on thekparam, which is also manipulated by our query API, because ofnum_candidates, making one such static param less controllable.Instead of a static queue saturation and patience setting, this collector accumulates a smoothed discovery rate and an adaptive saturation threshold based on discovery rate mean and stdDev.
This is likely to work better with different doc to doc and query to vector distributions.