Standalone, reproducible benchmark suite for comparing Basic Memory against competitor memory systems.
- Deterministic retrieval benchmarks (Recall@5/10, MRR, Precision@5, content-hit, latency)
- Optional LLM-as-judge scoring (Pydantic Evals)
- Public artifacts with provenance and reproducibility metadata
- Clean dependency isolation from the core
basic-memoryrepository
- Providers:
bm-local(warmbm mcpstdio session)bm-cloud(optional, credential-gated)mem0-localzep-reference(reference-only in v1)
- Datasets:
- LoCoMo (primary)
- LongMemEval scaffold (placeholder)
- Built-in synthetic smoke corpus
uv sync --group devOptional judge dependencies:
uv sync --group dev --extra judgeuv run bm-bench datasets fetch --dataset locomouv run bm-bench convert locomouv run bm-bench run retrieval \
--providers bm-local,mem0-local \
--corpus-dir benchmarks/generated/locomo/docs \
--queries-path benchmarks/generated/locomo/queries.jsonuv run bm-bench run judge --run-dir benchmarks/runs/<run-id>uv run bm-bench publish --run-dir benchmarks/runs/<run-id>By default this project tracks Basic Memory from main.
Each run manifest stores:
- BM source (
github mainor local path override) - resolved BM commit SHA
Local override:
uv run bm-bench run retrieval \
--bm-local-path /Users/phernandez/dev/basicmachines/basic-memorymem0-local requires model credentials available in environment.
At minimum, set:
export OPENAI_API_KEY=...If unavailable, provider status will be recorded as SKIPPED(reason).
bm-local verifies index readiness before querying.
- If the installed
bmsupportsbm status --json, readiness is polled from that output. - If
--jsonis not available in the installedbm, the benchmark proceeds after reindex.
Per run (benchmarks/runs/<run-id>/):
manifest.jsonprovider-status.jsonper-query-retrieval.jsonlretrieval-summary.jsonper-query-judge.jsonl(optional)judge-summary.json(optional)summary.md
just bench-smoke
just bench-fetch-locomo
just bench-convert-locomo
just bench-run-bm-local
just bench-run-mem0-local
just bench-run-full
just bench-judge
just bench-publish RUN_DIR=benchmarks/runs/<run-id>Dataset publication follows licensing constraints:
- If redistribution is permitted: snapshot + checksum may be published.
- If not: canonical source links + downloader + checksum verification are published.