ANN search

Approximate Nearest Neighbor search — algorithms that return near-best matches from a vector set while trading recall for latency and memory; the backbone of modern semantic search and RAG.

also known as ann · approximate-nearest-neighbor · vector-search

stack application · memory

Approximate Nearest Neighbor (ANN) search is the class of algorithms that find close-enough nearest neighbors in a set of high-dimensional vectors much faster than exact brute-force search. Exact nearest-neighbor search over a billion 768-dim vectors is hopeless at query time; ANN search trades a small amount of recall for orders-of-magnitude speedup, which is what makes semantic search, retrieval-augmented generation (RAG), recommender systems, and image/audio similarity practical.

The main families:

  • Graph-based: HNSW, NSG, Vamana (the DiskANN algorithm). The 2026 default for in-memory.
  • Tree-based: Annoy (random projection trees), FLANN. Less common today.
  • Inverted-file / clustering: IVF flat, IVF-PQ. Often hybridized with HNSW.
  • Hash-based: LSH. Still useful for specific distance metrics, rarely first-line.
  • Quantization + flat: PQ, SQ, binary quantization used alongside any structural index to shrink memory.

ANN systems are evaluated on three axes, and you can only max two:

  • Recall — fraction of true-top-k returned (target: 0.9+ for production RAG).
  • Latency — p50/p99 query time (target: sub-10 ms for interactive).
  • Memory — bytes per vector including index overhead (target: small enough to keep the index RAM-resident).

Operational truth: there is no single best ANN index. The right choice depends on dataset size, recall SLO, query rate, and whether you can afford DRAM residency. ANN-Benchmarks is the canonical reproducible comparison site; FAISS’ wiki has the best practical advice for choosing between IVF variants and HNSW.

sources