🐣
Reading List
  • Starting point
  • Reference list
  • PhD application guidelines
  • Big Data System
    • Index
      • Architecture
        • Storage
          • Sun's Network File System (NFS)
      • Execution Engine, Resource Negotiator, Schedulers
        • Execution Engines
        • Resource Negotiator
        • Schedulers
      • Machine Learning
      • SQL Framework
      • Stream Processing
      • Graph Processing
      • Potpourri: Hardware, Serverless and Approximation
  • Operating System
    • Index
      • OSTEP
        • Virtualization
          • CPU Abstraction: the Process
          • Interlude: Process API
          • Mechanism: Limited Direct Execution
        • Intro
  • Networking
    • Index
      • CS 294 (Distributed System)
        • Week 1 - Global State and Clocks
          • Distributed Snapshots: Determining Global States of Distributed Systems
          • Time, Clocks, and the Ordering of Events in a Distributed System
        • Weak 5 - Weak Consistency
          • Dynamo: Amazon's Highly Available Key-value Store
          • Replicating Data Consistency Explained Through Baseball
          • Managing update conflicts in Bayou, a weakly connected replicated storage system
      • CS 268 (Adv Network)
        • Intro
        • Internet Architecture
          • Towards an Active Network Architecture
          • The Design Philosophy of the DARPA Internet Protocols
        • Beyond best-effort/Unicast
          • Core Based Trees (CBT)
          • Multicast Routing in Internetworks and Extended LANs
        • Congestion Control
        • SDN
          • ONIX: A Distributed Control Platform for Large-scale Production Networks
          • B4: Experience with a Globally-Deployed Software Defined WAN
          • How SDN will shape networking
          • The Future of Networking, and the Past of Protocols
        • Datacenter Networking
          • Fat tree
          • Jellyfish
        • BGP
          • The Case for Separating Routing from Routers
        • Programmable Network
          • NetCache
          • RMT
        • Datacenter Congestion Control
          • Swift
          • pFabric
        • WAN CC
          • Starvation (Sigcomm 22)
        • P2P
          • Design and Evaluation of IPFS: A Storage Layer for the Decentralized Web
          • The Impact of DHT Routing Geometry on Resilience and Proximity
        • Net SW
          • mTCP
          • The Click modular router
        • NFV
          • Performance Interfaces for Network Functions
          • Making Middleboxes Someone Else's Problem: Network Processing as a Cloud Service
        • Ethics
          • On the morals of network research and beyond
          • The collateral damage of internet censorship by DNS injection
          • Encore: Lightweight Measurement of Web Censorship with Cross-Origin Requests
        • Low Latency
          • Aquila: A unified, low-latency fabric for datacenter networks
          • cISP: A Speed-of-Light Internet Service Provider
        • Disaggregation
          • Network Requirements for Resource Disaggregation
        • Tenant Networking
          • Invisinets
          • NetHint: While-Box Networking for Multi-Tenant Data Centers
        • Verification
          • A General Approach to Network Configuration Verification
          • Header Space Analysis: Static Checking for Networks
        • ML
          • SwitchML
          • Fast Distributed Deep Learning over RDMA
      • Computer Networking: A Top-Down Approach
        • Chapter 1. Computer Network and the Internet
          • 1.1 What Is the Internet?
          • 1.2 The Network Edge
          • 1.3 The Network Core
        • Stanford CS144
          • Chapter 1
            • 1.1 A Day in the Life of an Application
            • 1.2 The 4-Layer Internet Model
            • 1.3 The IP Service Model
            • 1.4 A Day in the Life of a Packet
            • 1.6 Layering Principle
            • 1.7 Encapsulation Principle
            • 1.8 Memory layout and Endianness
            • 1.9 IPv4 Addresses
            • 1.10 Longest Prefix Match
            • 1.11 Address Resolution Protocol (ARP)
            • 1.12 The Internet and IP Recap
      • Reading list
        • Elastic hyperparameter tuning on the cloud
        • Rethinking Networking Abstractions for Cloud Tenants
        • Democratizing Cellular Access with AnyCell
        • Dagger: Efficient and Fast RPCs in Cloud Microservices in Near-Memory Reconfigurable NICs
        • Sage: Practical & Scalable ML-Driven Performance Debugging in Microservices
        • Faster and Cheaper Serverless Computing on Harvested Resources
        • Network-accelerated Distributed Machine Learning for Multi-Tenant Settings
        • User-Defined Cloud
        • LegoOS: A Disseminated Distributed OS for Hardware Resource Disaggregation
        • Beyond Jain's Fairness Index: Setting the Bar For The Deployment of Congestion Control Algorithms
        • IncBricks: Toward In-Network Computation with an In-Network Cache
  • Persistence
    • Index
      • Hardware
        • Enhancing Lifetime and Security of PCM-Based Main Memory with Start-Gap Wear Leveling
        • An Empirical Guide to the Behavior and Use of Scalable Persistent Memory
  • Database
    • Index
  • Group
    • WISR Group
      • Group
        • Offloading distributed applications onto smartNICs using iPipe
        • Semeru: A memory-disaggregated managed runtime
      • Cache
        • Index
          • TACK: Improving Wireless Transport Performance by Taming Acknowledgements
          • LHD: Improving Cache Hit Rate by Maximizing Hit Density
          • AdaptSize: Orchestrating the Hot Object Memory Cache in a Content Delivery Network
          • Clustered Bandits
          • Important Sampling
          • Contexual Bandits and Reinforcement Learning
          • Reinforcement Learning for Caching with Space-Time Popularity Dynamics
          • Hyperbolic Caching: Flexible Caching for Web Applications
          • Learning Cache Replacement with CACHEUS
          • Footprint Descriptors: Theory and Practice of Cache Provisioning in a Global CDN
      • Hyperparam Exploration
        • Bayesian optimization in cloud machine learning engine
    • Shivaram's Group
      • Tools
      • Group papers
        • PushdownDB: Accelerating a DBMS using S3 Computation
        • Declarative Machine Learning Systems
        • P3: Distributed Deep Graph Learning at Scale
        • Accelerating Graph Sampling for Graph Machine Learning using GPUs
        • Unicorn: A System for Searching the Social Graph
        • Dorylus: Affordable, Scalable, and Accurate GNN Training with Distributed CPU Servers and Serverless
        • Garaph: Efficient GPU-accelerated GraphProcessing on a Single Machine with Balanced Replication
        • MOSAIC: Processing a Trillion-Edge Graph on a Single Machine
        • Fluid: Resource-aware Hyperparameter Tuning Engine
        • Lists
          • Wavelet: Efficient DNN Training with Tick-Tock Scheduling
          • GPU Lifetimes on Titan Supercomputer: Survival Analysis and Reliability
          • ZeRO-Infinity and DeepSpeed: Unlocking unprecedented model scale for deep learning training
          • ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning
          • KungFu: Making Training inDistributed Machine Learning Adaptive
        • Disk ANN
      • Queries Processing
        • Building An Elastic Query Engine on Disaggregated Storage
        • GRIP: Multi-Store Capacity-Optimized High-Performance NN Search
        • Milvus: A Purpose-Built Vector Data Management System
        • Query2box: Reasoning over Knowledge Graphs in Vector Space using Box Embeddings
        • Billion-scale Approximate Nearest Neighbor Search
        • DiskANN: Fast accurate billion-point nearest neighbor search on a single node
        • KGvec2go - Knowledge Graph Embeddings as a Service
    • Seminar & Talk
      • Berkeley System Seminar
        • RR: Engineering Record and Replay for Deployability
        • Immortal Threads: Multithreaded Event-driven Intermittent Computing on Ultra-Low-Power Microcontroll
      • Berkeley DB Seminar
        • TAOBench: An End-to-End Benchmark for Social Network Workloads
      • PS2
      • Sky Seminar Series
        • Spring 23
          • Next-Generation Optical Networks for Emerging ML Workloads
      • Reading List
        • Confluo: Distributed Monitoring and Diagnosis Stack for High-speed Networks
        • Rearchitecting Linux Storage Stack for µs Latency and High Throughput
        • eBPF: rethinking the linux kernel
        • BPF for Storage: An Exokernel-Inspired Approach
        • High Velocity Kernel File Systems with Bento
        • Incremental Path Towards a Safe OS Kernel
        • Toward Reconfigurable Kernel Datapaths with Learned Optimizations
        • A Vision for Runtime Programmable Networks
        • The Demikernel and the future of kernal-bypass systems
        • Floem: A programming system for NIC-accelerated network applications
        • High Performance Data Center Operating Systems
        • Leveraging Service Meshes as a New Network Layer
        • Automatically Discovering Machine Learning Optimizations
        • Beyond Data and Model Parallelism for Deep Neural Networks
        • IOS: Inter-Operator Scheduler for CNN Acceleration
        • Building An Elastic Query Engine on Disaggregated Storage
        • Sundial: Fault-tolerant Clock Synchronization for Datacenters
        • MIND: In-Network Memory Management for Disaggregated Data Centers
        • Understanding host network stack overheads
        • From Laptop to Lambda: Outsourcing Everyday Jobs to Thousands of Transient Functional Containers
        • Redesigning Storage Systems for Future Workloads Hardware and Performance Requirements
        • Are Machine Learning Cloud APIs Used Correctly?
        • Fault-tolerant and transactional stateful serverless workflows
      • Reading Groups
        • Network reading group
          • Recap
          • ML & Networking
            • Video Streaming
              • Overview
              • Reducto: On-Camera Filtering for Resource Efficient Real-Time Video Analytics
              • Learning in situ: a randomized experiment in video streaming
              • SENSEI: Aligning Video Streaming Quality with Dynamic User Sensitivity
              • Neural Adaptive Video Streaming with Pensieve
              • Server-Driven Video Streaming for Deep Learning Inference
            • Congestion Control
              • ABC: A Simple Explicit Congestion Controller for Wireless Networks
              • TCP Congestion Control: A Systems Approach
                • Chapter 1: Introduction
              • A Deep Reinforcement Learning Perspective on Internet Congestion Control
              • Pantheon: the training ground for Internet congestion-control research
            • Other
              • On the Use of ML for Blackbox System Performance Prediction
              • Marauder: Synergized Caching and Prefetching for Low-Risk Mobile App Acceleration
              • Horcrux: Automatic JavaScript Parallelism for Resource-Efficient Web Computation
              • Snicket: Query-Driven Distributed Tracing
            • Workshop
          • Homa: A Receiver-Driven Low-Latency Transport Protocol Using Network Priorities
        • DB reading group
          • CliqueMap: Productionizing an RMA-Based Distributed Caching System
          • Hash maps overview
          • Dark Silicon and the End of Multicore Scaling
        • WISR
          • pFabric: Minimal Near-Optimal Datacenter Transport
          • Scaling Distributed Machine Learning within-Network Aggregation
          • WCMP: Weighted Cost Multipathing for Improved Fairness in Data Centers
          • Data center TCP (DCTCP)
      • Wisconsin Seminar
        • Enabling Hyperscale Web Services
        • The Lottery Ticket Hypothesis
        • External Merge Sort for Top-K Queries: Eager input filtering guided by histograms
      • Stanford MLSys Seminar
        • Episode 17
        • Episode 18
  • Cloud Computing
    • Index
      • Cloud Reading Group
        • Owl: Scale and Flexibility in Distribution of Hot Contents
        • RubberBand: cloud-based hyperparameter tuning
  • Distributed System
    • Distributed Systems Lecture Series
      • 1.1 Introduction
  • Conference
    • Index
      • Stanford Graph Learning Workshop
        • Overview of Graph Representation Learning
      • NSDI 2022
      • OSDI 21
        • Graph Embeddings and Neural Networks
        • Data Management
        • Storage
        • Preview
        • Optimizations and Scheduling for ML
          • Oort: Efficient Federated Learning via Guided Participant Selection
          • PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections
      • HotOS 21
        • FlexOS: Making OS Isolation Flexible
      • NSDI 21
        • Distributed System
          • Fault-Tolerant Replication with Pull-Based Consensus in MongoDB
          • Ownership: A Distributed Futures System for Fine-Grained Tasks
          • Caerus: NIMBLE Task Scheduling for Serverless Analytics
          • Ship Computer or Data? Why not both?
          • EPaxos Revisited
          • MilliSort and MilliQuery: Large-Scale Data-Intensive Computing in Milliseconds
        • TEGRA: Efficient Ad-Hoc Analytics on Evolving Graphs
        • GAIA: A System for Interactive Analysis on Distributed Graphs Using a High-Level Language
      • CIDR 21
        • Cerebro: A Layered Data Platform for Scalable Deep Learning
        • Magpie: Python at Speed and Scale using Cloud Backends
        • Lightweight Inspection of Data Preprocessingin Native Machine Learning Pipelines
        • Lakehouse: A New Generation of Open Platforms that UnifyData Warehousing and Advanced Analytics
      • MLSys 21
        • Chips and Compilers Symposium
        • Support sparse computations in ML
      • SOSP 21
        • SmartNic
          • LineFS: Efficient SmartNIC offload of a distributed file system with pipeline parallelism
          • Xenic: SmartNIC-accelerated distributed transacitions
        • Graphs
          • Mycelium: Large-Scale Distributed Graph Queries with Differential Privacy
          • dSpace: Composable Abstractions for Smart Spaces
        • Consistency
          • Efficient and Scalable Thread-Safety Violation Detection
          • Understanding and Detecting Software Upgrade Failures in Distributed Systems
        • NVM
          • HeMem: Scalable Tiered Memory Management for Big Data Applications and Real NVM
        • Learning
          • Bladerunner: Stream Processing at Scale for a Live View of Backend Data Mutations at the Edge
          • Faster and Cheaper Serverless Computing on Harvested Resources
  • Random
    • Reading List
      • Random Thoughts
      • Hesse
      • Anxiety
  • Grad School
    • Index
      • Resources for undergraduate students
Powered by GitBook
On this page
  • Intro & Problem Statement
  • Motivation
  • Contribution
  • Vamana Graph Construction Algorithm
  • DiskANN System Design
  • Metric for success
  • Result

Was this helpful?

  1. Group
  2. Shivaram's Group
  3. Queries Processing

DiskANN: Fast accurate billion-point nearest neighbor search on a single node

https://papers.nips.cc/paper/2019/file/09853c7fb1d3f8ee67a61b6bf4a7f8e6-Paper.pdf

Intro & Problem Statement

  • KNN:

    • dataset P, target k, query q. Extract KNN of q from P quickly.

    • Application domains: embeddings

    • Curse of dimensionality, almost impossible to find exact search result (motivation for ANN)

  • ANN:

    • Goal: maximize recall while retrieving the results as quickly as possible

    • Trade-off: recall v.s latency (and/or indexing time)

    • Algorithms

      • k-d trees

      • Locality Sensitive Hashing (LSH)

      • Graph-based: HNSW, NSG

Indexing large datasets: two approaches

  1. Based on Inverted Index + Data Compression and includes methods such as FAISS and IVFOADC+G+P

    1. Clustered + Compressed (e.x. PQ)

    2. Benefit : small memory footprint, and low latency of retrieving result using accelerator

    3. Problem : Recall is low because compression is lossy

  2. Divide the dataset into disjoint shards, and build in-memory index for each shard

    1. Problem : increased search latency and reduced throughput since the query needs to be routed to several shards

Motivation

  • SOTA approximate nearest neighbor search (ANNS) algorithms generate indices that must be stored in main memory for fast high-recall search.

    • Expensive, Limit the size of the dataset

    • E.x. FAISS: supports searching only from RAM

  • Search throughput of an SSD-resident index is limited by the number of random disk accesses/query and latency is limited by the round-trips (each round-trip can consist of multiple reads) to the disk.

    • SSD: a few hundred microseconds to serve a random read and can service about ~300K random reads per second

      • But: search applications require mean latencies of a few milliseconds for KNN search.

    • Main challenges for SSD-resident index are

      • reducing the number of random SSD accesses to a few dozen

      • reducing the number of round trip requests to disk to under ten, preferably five

    • Naively mapping indices to SSDs does not work! (Generate several hundreds of disk reads per query)

Contribution

  • DiskANN: SSD-resident ANNS system based on new graph-based indexing algorithm called Vamana

    • DiskANN: index and serve 1B dataset (d = 100s) using 64GB RAM, with low latency and high precision

    • Vamana:

      • generate graph indices with smaller diameter to minimize the number of sequential disk reads

      • Used in memory, outperform SOTA algorithms like HNSW and NSG

      • Partitions and merge

      • Compression: cache in RAM

      • Note: what's the contribution of last two...?

Vamana Graph Construction Algorithm

Comparison to HNSW and NSG

  • HNSW & NSG have no tunable parameter α (default value is 1). This is the main factor that Vamana achieves a better trade-off between graph degree and diameter.

  • Some features that help Vamana and NSG add long-range edges, while HNSW has an additional step of constructing a hierarchy of graphs over a nested sequence of samples of the dataset

  • Pertains to the initial graph: HNSW and Vamana have simpler initializations over NSG

  • Vamana makes two passes over the dataset to improve graph quality

DiskANN System Design

Key questions to address

  1. How do we build a graph over a billion points?

  2. How do we do distance comparisons between the query point and points in our candidate list at search time, if we cannot even store the vector data?

Index construction algorithms

  • Address question 1

  • Key: send each base point to multiple nearby centers to obtain overlapping clusters

    • partition dataset into K clusters using K-means

    • Assign each base point to the l-closest centers (typically l = 2)

    • Build Vamana indices for the points assigned to each of the clusters

    • Merge all different graphs into a single graph by taking union of edges

  • Why good: overlapping nature of clusters provide connectivity for the algorithm to succeed even if the query's NN are split across multiple shards

  • Note: Use full-precision coordinates to build the graph index. Graph with full-precision vectors on SSD

Search algorithms

  • Use PQ data at search time. Store compressed vectors of all the data points in memory.

    • Beam Search

      • Natural way to search is to use Algorithm 1, fetching the neighborhood information from the SSD as needed. Use compressed vectors to guide the best vertices (and neighbors) to read from disk

      • To reduce the number of round trips to SSD (to fetch neighborhoods sequentially) without increasing compute (distance calculations) excessively, fetch the neighborhoods of a small number, W (say 4,8) of the closest points in L\V in one shot, and update L to be the top L candidates in L along with all the neighbors retrieved in this step.

      • Beam width: to strike a balance between latency and throughput

Other

  • Cache frequently visited vertices

    • Either known query distribution or all vertices that are 3/4 hops from the starting point

  • Fetching and re-ranking full-precision coordinates stored on the SSD

    • Full precision coordinates essentially piggyback on the cost of expanding the neighborhoods

Metric for success

  • High recall

  • Low query latency

Result

With HNSW, NSG for In-memory search performance

  • Dataset: SIFT1M, GIST1M, DEEP1M

  • Vamana matches or outperforms the current best ANNS methods on both hundred and thousand-dimensional datasets obtained from different sources

With HNSW, NSG for number of hops

  • More suitable for SSD-based serving than other graph-based algorithms as it makes 2-3 times fewer hops for search to converge on large datasets compared to HNSW and NSG

    • Hops = number of rounds of disk reads on the critical path of the search

Billion-scale datasets: one-shot Vamana v.s Merged Vamana

  • Dataset: ANN_SIFT1B

  • Take away

    • Partition and merging are fast and can be done directly on disk, so the entire build process consumes under 64 GB main memory

    • Single index outperforms merged index, which traverses more links to reach the same neighborhoods, thus increasing search latency

    • Merged index is still a good choice for Billion-scale k-ANN indexing

      • Require no more than 20% extra latency and target recall when compared to single index

Billion-scale datasets: DiskANN v.s IVF-based Methods

  • IVFOADC+G+P

    • Uses inverted indexing and PQ to develop indicies with low-memory footprint and serve queries with high-throughput and good 1-recall@100

  • DiskANN: matches memory footprint, achieve significantly higher recall at same latency

Thoughts

  • Single machine setting. Are these meaningful experimental comparisons in our case? It's a single node and it hasn't been compared with FAISS

  • Apply this ...

PreviousBillion-scale Approximate Nearest Neighbor SearchNextKGvec2go - Knowledge Graph Embeddings as a Service

Last updated 4 years ago

Was this helpful?

Not compared with FAISS: because need GPUs might not be available and demonstrate superior recall over FAISS (??)

https://arxiv.org/abs/1802.02422
Greedy Search
RobustPrune used in Vamana Indexing Algorithm
Vamana Indexing algorithm