ANN search
Approximate Nearest Neighbor search — algorithms that return near-best matches from a vector set while trading recall for latency and memory; the backbone of modern semantic search and RAG.
Approximate Nearest Neighbor (ANN) search is the class of algorithms that find close-enough nearest neighbors in a set of high-dimensional vectors much faster than exact brute-force search. Exact nearest-neighbor search over a billion 768-dim vectors is hopeless at query time; ANN search trades a small amount of recall for orders-of-magnitude speedup, which is what makes semantic search, retrieval-augmented generation (RAG), recommender systems, and image/audio similarity practical.
The main families:
- Graph-based: HNSW, NSG, Vamana (the DiskANN algorithm). The 2026 default for in-memory.
- Tree-based: Annoy (random projection trees), FLANN. Less common today.
- Inverted-file / clustering: IVF flat, IVF-PQ. Often hybridized with HNSW.
- Hash-based: LSH. Still useful for specific distance metrics, rarely first-line.
- Quantization + flat: PQ, SQ, binary quantization used alongside any structural index to shrink memory.
ANN systems are evaluated on three axes, and you can only max two:
- Recall — fraction of true-top-k returned (target: 0.9+ for production RAG).
- Latency — p50/p99 query time (target: sub-10 ms for interactive).
- Memory — bytes per vector including index overhead (target: small enough to keep the index RAM-resident).
Operational truth: there is no single best ANN index. The right choice depends on dataset size, recall SLO, query rate, and whether you can afford DRAM residency. ANN-Benchmarks is the canonical reproducible comparison site; FAISS’ wiki has the best practical advice for choosing between IVF variants and HNSW.