DiskANN: Fast accurate billion-point nearest neighbor search on a single node

https://papers.nips.cc/paper/2019/file/09853c7fb1d3f8ee67a61b6bf4a7f8e6-Paper.pdf

KNN:
- dataset P, target k, query q. Extract KNN of q from P quickly.
- Application domains: embeddings
- Curse of dimensionality, almost impossible to find exact search result (motivation for ANN)
ANN:
- Goal: maximize recall while retrieving the results as quickly as possible
- Trade-off: recall v.s latency (and/or indexing time)
- Algorithms
  - k-d trees
  - Locality Sensitive Hashing (LSH)
  - Graph-based: HNSW, NSG

Based on Inverted Index + Data Compression and includes methods such as FAISS and IVFOADC+G+P
1. Clustered + Compressed (e.x. PQ)
2. Benefit : small memory footprint, and low latency of retrieving result using accelerator
3. Problem : Recall is low because compression is lossy
Divide the dataset into disjoint shards, and build in-memory index for each shard
1. Problem : increased search latency and reduced throughput since the query needs to be routed to several shards

SOTA approximate nearest neighbor search (ANNS) algorithms generate indices that must be stored in main memory for fast high-recall search.
- Expensive, Limit the size of the dataset
- E.x. FAISS: supports searching only from RAM
Search throughput of an SSD-resident index is limited by the number of random disk accesses/query and latency is limited by the round-trips (each round-trip can consist of multiple reads) to the disk.
- SSD: a few hundred microseconds to serve a random read and can service about ~300K random reads per second
  - But: search applications require mean latencies of a few milliseconds for KNN search.
- Main challenges for SSD-resident index are
  - reducing the number of random SSD accesses to a few dozen
  - reducing the number of round trip requests to disk to under ten, preferably five
- Naively mapping indices to SSDs does not work! (Generate several hundreds of disk reads per query)

HNSW & NSG have no tunable parameter α (default value is 1). This is the main factor that Vamana achieves a better trade-off between graph degree and diameter.
Some features that help Vamana and NSG add long-range edges, while HNSW has an additional step of constructing a hierarchy of graphs over a nested sequence of samples of the dataset
Pertains to the initial graph: HNSW and Vamana have simpler initializations over NSG
Vamana makes two passes over the dataset to improve graph quality

How do we build a graph over a billion points?
How do we do distance comparisons between the query point and points in our candidate list at search time, if we cannot even store the vector data?

Address question 1
Key: send each base point to multiple nearby centers to obtain overlapping clusters
- partition dataset into K clusters using K-means
- Assign each base point to the l-closest centers (typically l = 2)
- Build Vamana indices for the points assigned to each of the clusters
- Merge all different graphs into a single graph by taking union of edges
Why good: overlapping nature of clusters provide connectivity for the algorithm to succeed even if the query's NN are split across multiple shards
Note: Use full-precision coordinates to build the graph index. Graph with full-precision vectors on SSD

Use PQ data at search time. Store compressed vectors of all the data points in memory.
- Beam Search
  - Natural way to search is to use Algorithm 1, fetching the neighborhood information from the SSD as needed. Use compressed vectors to guide the best vertices (and neighbors) to read from disk
  - To reduce the number of round trips to SSD (to fetch neighborhoods sequentially) without increasing compute (distance calculations) excessively, fetch the neighborhoods of a small number, W (say 4,8) of the closest points in L\V in one shot, and update L to be the top L candidates in L along with all the neighbors retrieved in this step.
  - Beam width: to strike a balance between latency and throughput

Cache frequently visited vertices
- Either known query distribution or all vertices that are 3/4 hops from the starting point
Fetching and re-ranking full-precision coordinates stored on the SSD
- Full precision coordinates essentially piggyback on the cost of expanding the neighborhoods

Dataset: SIFT1M, GIST1M, DEEP1M
Vamana matches or outperforms the current best ANNS methods on both hundred and thousand-dimensional datasets obtained from different sources

More suitable for SSD-based serving than other graph-based algorithms as it makes 2-3 times fewer hops for search to converge on large datasets compared to HNSW and NSG
- Hops = number of rounds of disk reads on the critical path of the search

Dataset: ANN_SIFT1B
Take away
- Partition and merging are fast and can be done directly on disk, so the entire build process consumes under 64 GB main memory
- Single index outperforms merged index, which traverses more links to reach the same neighborhoods, thus increasing search latency
- Merged index is still a good choice for Billion-scale k-ANN indexing
  - Require no more than 20% extra latency and target recall when compared to single index

IVFOADC+G+P
- Uses inverted indexing and PQ to develop indicies with low-memory footprint and serve queries with high-throughput and good 1-recall@100
- Not compared with FAISS: because need GPUs might not be available and https://arxiv.org/abs/1802.02422 demonstrate superior recall over FAISS (??)
DiskANN: matches memory footprint, achieve significantly higher recall at same latency

Thoughts

Single machine setting. Are these meaningful experimental comparisons in our case? It's a single node and it hasn't been compared with FAISS
Apply this ...

Last updated 4 years ago

Was this helpful?