DiskANN: Fast accurate billion-point nearest neighbor search on a single node


Intro & Problem Statement

  • KNN:

    • dataset P, target k, query q. Extract KNN of q from P quickly.

    • Application domains: embeddings

    • Curse of dimensionality, almost impossible to find exact search result (motivation for ANN)

  • ANN:

    • Goal: maximize recall while retrieving the results as quickly as possible

    • Trade-off: recall v.s latency (and/or indexing time)

    • Algorithms

      • k-d trees

      • Locality Sensitive Hashing (LSH)

      • Graph-based: HNSW, NSG

Indexing large datasets: two approaches

  1. Based on Inverted Index + Data Compression and includes methods such as FAISS and IVFOADC+G+P

    1. Clustered + Compressed (e.x. PQ)

    2. Benefit : small memory footprint, and low latency of retrieving result using accelerator

    3. Problem : Recall is low because compression is lossy

  2. Divide the dataset into disjoint shards, and build in-memory index for each shard

    1. Problem : increased search latency and reduced throughput since the query needs to be routed to several shards


  • SOTA approximate nearest neighbor search (ANNS) algorithms generate indices that must be stored in main memory for fast high-recall search.

    • Expensive, Limit the size of the dataset

    • E.x. FAISS: supports searching only from RAM

  • Search throughput of an SSD-resident index is limited by the number of random disk accesses/query and latency is limited by the round-trips (each round-trip can consist of multiple reads) to the disk.

    • SSD: a few hundred microseconds to serve a random read and can service about ~300K random reads per second

      • But: search applications require mean latencies of a few milliseconds for KNN search.

    • Main challenges for SSD-resident index are

      • reducing the number of random SSD accesses to a few dozen

      • reducing the number of round trip requests to disk to under ten, preferably five

    • Naively mapping indices to SSDs does not work! (Generate several hundreds of disk reads per query)


  • DiskANN: SSD-resident ANNS system based on new graph-based indexing algorithm called Vamana

    • DiskANN: index and serve 1B dataset (d = 100s) using 64GB RAM, with low latency and high precision

    • Vamana:

      • generate graph indices with smaller diameter to minimize the number of sequential disk reads

      • Used in memory, outperform SOTA algorithms like HNSW and NSG

      • Partitions and merge

      • Compression: cache in RAM

      • Note: what's the contribution of last two...?

Vamana Graph Construction Algorithm

Comparison to HNSW and NSG

  • HNSW & NSG have no tunable parameter α (default value is 1). This is the main factor that Vamana achieves a better trade-off between graph degree and diameter.

  • Some features that help Vamana and NSG add long-range edges, while HNSW has an additional step of constructing a hierarchy of graphs over a nested sequence of samples of the dataset

  • Pertains to the initial graph: HNSW and Vamana have simpler initializations over NSG

  • Vamana makes two passes over the dataset to improve graph quality

DiskANN System Design

Key questions to address

  1. How do we build a graph over a billion points?

  2. How do we do distance comparisons between the query point and points in our candidate list at search time, if we cannot even store the vector data?

Index construction algorithms

  • Address question 1

  • Key: send each base point to multiple nearby centers to obtain overlapping clusters

    • partition dataset into K clusters using K-means

    • Assign each base point to the l-closest centers (typically l = 2)

    • Build Vamana indices for the points assigned to each of the clusters

    • Merge all different graphs into a single graph by taking union of edges

  • Why good: overlapping nature of clusters provide connectivity for the algorithm to succeed even if the query's NN are split across multiple shards

  • Note: Use full-precision coordinates to build the graph index. Graph with full-precision vectors on SSD

Search algorithms

  • Use PQ data at search time. Store compressed vectors of all the data points in memory.

    • Beam Search

      • Natural way to search is to use Algorithm 1, fetching the neighborhood information from the SSD as needed. Use compressed vectors to guide the best vertices (and neighbors) to read from disk

      • To reduce the number of round trips to SSD (to fetch neighborhoods sequentially) without increasing compute (distance calculations) excessively, fetch the neighborhoods of a small number, W (say 4,8) of the closest points in L\V in one shot, and update L to be the top L candidates in L along with all the neighbors retrieved in this step.

      • Beam width: to strike a balance between latency and throughput


  • Cache frequently visited vertices

    • Either known query distribution or all vertices that are 3/4 hops from the starting point

  • Fetching and re-ranking full-precision coordinates stored on the SSD

    • Full precision coordinates essentially piggyback on the cost of expanding the neighborhoods

Metric for success

  • High recall

  • Low query latency


With HNSW, NSG for In-memory search performance

  • Dataset: SIFT1M, GIST1M, DEEP1M

  • Vamana matches or outperforms the current best ANNS methods on both hundred and thousand-dimensional datasets obtained from different sources

With HNSW, NSG for number of hops

  • More suitable for SSD-based serving than other graph-based algorithms as it makes 2-3 times fewer hops for search to converge on large datasets compared to HNSW and NSG

    • Hops = number of rounds of disk reads on the critical path of the search

Billion-scale datasets: one-shot Vamana v.s Merged Vamana

  • Dataset: ANN_SIFT1B

  • Take away

    • Partition and merging are fast and can be done directly on disk, so the entire build process consumes under 64 GB main memory

    • Single index outperforms merged index, which traverses more links to reach the same neighborhoods, thus increasing search latency

    • Merged index is still a good choice for Billion-scale k-ANN indexing

      • Require no more than 20% extra latency and target recall when compared to single index

Billion-scale datasets: DiskANN v.s IVF-based Methods


    • Uses inverted indexing and PQ to develop indicies with low-memory footprint and serve queries with high-throughput and good 1-recall@100

    • Not compared with FAISS: because need GPUs might not be available and https://arxiv.org/abs/1802.02422 demonstrate superior recall over FAISS (??)

  • DiskANN: matches memory footprint, achieve significantly higher recall at same latency


  • Single machine setting. Are these meaningful experimental comparisons in our case? It's a single node and it hasn't been compared with FAISS

  • Apply this ...

Last updated