recall

In ANN search, the fraction of the true top-k nearest neighbors an approximate algorithm returns; the quality axis that trades off against latency and memory.

also known as recall@k · recall-at-k

stack application

Recall (specifically, recall@k) is the quality metric for approximate nearest-neighbor search. Given a ground-truth exact top-k set for a query and the top-k set returned by the ANN algorithm, recall is the fraction of the ground-truth set that appears in the returned set. Recall@10 = 0.95 means the ANN algorithm found 9.5 of the 10 true nearest neighbors on average across the eval set.

Recall is the quality axis on which ANN methods live or die. Exact brute force is recall 1.0 but prohibitively slow. Any approximation trades some recall for enormous speed and memory wins. Production RAG workflows typically target recall@10 in the 0.90–0.98 range; anything lower degrades downstream relevance visibly, anything higher costs far more in latency or memory for diminishing returns.

How recall depends on index parameters:

  • HNSW: recall grows with ef (search beam width). Higher ef = more graph nodes visited per query = higher recall and latency.
  • IVF flat: recall grows with nprobe (number of inverted lists visited).
  • Quantization: each quantization technique has its own recall floor vs no-quantization.

How it’s measured: on a held-out ground-truth set (true top-k computed by brute force), run the ANN algorithm and compute set intersection. ANN-Benchmarks standardizes this across datasets and indexes; its recall-vs-QPS curves are the canonical way to compare methods.

The trap to avoid: reporting a single recall number without the latency or memory it came with. “HNSW gets 0.98 recall” is meaningless without the ef it took to get there and the p99 latency at that ef. Always report the three-axis trade-off curve.

sources