Skip to content

Conversation

@philippgille
Copy link
Owner

@philippgille philippgille commented Mar 16, 2024

We added benchmarks in #46.

Now we used them + CPU and memory profiles gathered with them to improve the performance.

⚠️ This PR only addresses the vector similarity search, not the metadata or full text filtering.

Each individual improvement is a separate commit. The overall improvement is 60-80% reduction in query duration and > 99% reduction in memory allocations:

goos: linux
goarch: amd64
pkg: github.com/philippgille/chromem-go
cpu: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
                                    │    before     │                after                │
                                    │    sec/op     │    sec/op     vs base               │
Collection_Query_NoContent_100-8       413.7µ ±  4%   109.9µ ±  1%  -73.44% (p=0.002 n=6)
Collection_Query_NoContent_1000-8     2759.4µ ±  0%   536.8µ ±  1%  -80.55% (p=0.002 n=6)
Collection_Query_NoContent_5000-8     12.980m ±  1%   4.985m ± 15%  -61.60% (p=0.002 n=6)
Collection_Query_NoContent_25000-8     66.56m ±  1%   14.97m ± 10%  -77.51% (p=0.002 n=6)
Collection_Query_NoContent_100000-8   282.41m ±  3%   56.50m ± 11%  -79.99% (p=0.002 n=6)
Collection_Query_100-8                 416.7µ ±  2%   110.0µ ±  0%  -73.61% (p=0.002 n=6)
Collection_Query_1000-8               2792.8µ ± 23%   536.8µ ±  0%  -80.78% (p=0.002 n=6)
Collection_Query_5000-8               15.643m ±  1%   4.869m ±  5%  -68.88% (p=0.002 n=6)
Collection_Query_25000-8               78.29m ±  1%   15.01m ±  3%  -80.82% (p=0.002 n=6)
Collection_Query_100000-8             338.54m ±  5%   56.48m ±  4%  -83.32% (p=0.002 n=6)
geomean                                12.97m         3.008m        -76.81%

                                    │     before      │                after                │
                                    │      B/op       │     B/op      vs base               │
Collection_Query_NoContent_100-8      1211.007Ki ± 0%   6.330Ki ± 0%  -99.48% (p=0.002 n=6)
Collection_Query_NoContent_1000-8     12082.16Ki ± 0%   34.83Ki ± 0%  -99.71% (p=0.002 n=6)
Collection_Query_NoContent_5000-8      60394.2Ki ± 0%   162.8Ki �� 0%  -99.73% (p=0.002 n=6)
Collection_Query_NoContent_25000-8    301962.1Ki ± 0%   794.8Ki ± 0%  -99.74% (p=0.002 n=6)
Collection_Query_NoContent_100000-8   1179.510Mi ± 0%   3.057Mi ± 0%  -99.74% (p=0.002 n=6)
Collection_Query_100-8                1211.006Ki ± 0%   6.329Ki ± 0%  -99.48% (p=0.002 n=6)
Collection_Query_1000-8               12082.11Ki ± 0%   34.83Ki ± 0%  -99.71% (p=0.002 n=6)
Collection_Query_5000-8                60394.1Ki ± 0%   162.8Ki ± 0%  -99.73% (p=0.002 n=6)
Collection_Query_25000-8              301962.1Ki ± 0%   794.8Ki ± 0%  -99.74% (p=0.002 n=6)
Collection_Query_100000-8             1179.510Mi ± 0%   3.057Mi ± 0%  -99.74% (p=0.002 n=6)
geomean                                  49.13Mi        155.0Ki       -99.69%

                                    │     before     │               after               │
                                    │   allocs/op    │ allocs/op   vs base               │
Collection_Query_NoContent_100-8         238.00 ± 0%   44.00 ± 0%  -81.51% (p=0.002 n=6)
Collection_Query_NoContent_1000-8       2038.50 ± 0%   44.00 ± 0%  -97.84% (p=0.002 n=6)
Collection_Query_NoContent_5000-8      10039.00 ± 0%   44.00 ± 0%  -99.56% (p=0.002 n=6)
Collection_Query_NoContent_25000-8     50038.00 ± 0%   44.00 ± 0%  -99.91% (p=0.002 n=6)
Collection_Query_NoContent_100000-8   200038.00 ± 0%   44.00 ± 0%  -99.98% (p=0.002 n=6)
Collection_Query_100-8                   238.00 ± 0%   44.00 ± 0%  -81.51% (p=0.002 n=6)
Collection_Query_1000-8                 2038.00 ± 0%   44.00 ± 0%  -97.84% (p=0.002 n=6)
Collection_Query_5000-8                10038.00 ± 0%   44.00 ± 0%  -99.56% (p=0.002 n=6)
Collection_Query_25000-8               50038.00 ± 0%   44.00 ± 0%  -99.91% (p=0.002 n=6)
Collection_Query_100000-8             200038.50 ± 0%   44.00 ± 0%  -99.98% (p=0.002 n=6)
geomean                                  8.661k        44.00       -99.49%

Benchmarked on Framework Laptop 13 (first generation).

Benchmarked before the first commit of this PR, and after.

Benchmarked with: go test -benchmem -run=^$ -count 6 -bench . (6 counts because benchstat (used for printing the diff shown ⬆️ ) asks for it).

Not relevant for single query, but for concurrent ones
For now we check this by computing the length. In the future
we could pass a flag if it's already known whether a vector
is normalized, which is the case for many embedding models.
Greatly reduces number of allocations. For a query of 5,000 documents
from ~5000 allocations to ~50.
Number of allocations are also now constant, i.e. 50 for querying
100,000 documents.
- Normalizes only once instead of each time
- Embedding creation takes time anyway, while query should be as fast as possible
@philippgille philippgille merged commit acb1e3f into main Mar 16, 2024
@philippgille philippgille deleted the query-perf branch March 16, 2024 18:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants