Skip to content

Improved CAGRA build parameter heuristics#1448

Merged
rapids-bot[bot] merged 20 commits intorapidsai:mainfrom
achirkin:fea-cagra-hnsw-heuristics
Nov 3, 2025
Merged

Improved CAGRA build parameter heuristics#1448
rapids-bot[bot] merged 20 commits intorapidsai:mainfrom
achirkin:fea-cagra-hnsw-heuristics

Conversation

@achirkin
Copy link
Contributor

@achirkin achirkin commented Oct 22, 2025

Changes to the build parameter heuristics:

  • Move the code from HNSW namespace to CAGRA namespace to avoid depending on HNSW target
  • Add one more variant of the heuristics: allow generating smaller graph to better match the performance of the HNSW-generated graph
  • Implement automatic switch between NN-Descent and IVF-PQ as the graph-build algorithms depending on the dataset size: NN-Descent tends to perform better on smaller-scale datasets

PR also include C and java bindings.
Resolves #1265

@achirkin achirkin self-assigned this Oct 22, 2025
@achirkin achirkin requested a review from a team as a code owner October 22, 2025 14:01
@achirkin achirkin added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Oct 22, 2025
@achirkin achirkin requested a review from a team as a code owner October 23, 2025 14:20
@achirkin achirkin force-pushed the fea-cagra-hnsw-heuristics branch from ef02185 to 582db6f Compare October 23, 2025 14:24
@rapidsai rapidsai deleted a comment from copy-pr-bot bot Oct 23, 2025
@achirkin achirkin requested a review from a team as a code owner October 23, 2025 15:19
Copy link
Member

@KyleFromNVIDIA KyleFromNVIDIA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved trivial CMake changes

cuvs::distance::DistanceType metric)
{
cagra::index_params params;
params.graph_degree = 2 + M * 2 / 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hard am variant sets graph_degree = 2 * M, it is surprising to see that soft variant can lead to similar search performance with graph_degree < M. The benchmarks for 768 and 1536 dimension looked good. Was it also tested for smaller dimensional datasets?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I've tested it on DEEP-100M and glove datasets. The hard-M variant actually shows much higher recall and lower throughput for the same search 'ef' parameter (the QPS-recall curve is close to HNSW, but all points on it are 'shifted' towards higher recall and lower throughput).

Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Artem for the PR, it looks good to me!

Copy link
Contributor

@mythrocks mythrocks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to my eyes. Minor nit regarding javadoc for the new function.

@achirkin achirkin force-pushed the fea-cagra-hnsw-heuristics branch from e601477 to eeb41ac Compare November 3, 2025 10:56
@achirkin achirkin force-pushed the fea-cagra-hnsw-heuristics branch from eeb41ac to 6904106 Compare November 3, 2025 10:56
@achirkin
Copy link
Contributor Author

achirkin commented Nov 3, 2025

/merge

@rapids-bot rapids-bot bot merged commit d8fdd7d into rapidsai:main Nov 3, 2025
162 of 164 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

8 participants