🐣
Reading List
  • Starting point
  • Reference list
  • PhD application guidelines
  • Big Data System
    • Index
      • Architecture
        • Storage
          • Sun's Network File System (NFS)
      • Execution Engine, Resource Negotiator, Schedulers
        • Execution Engines
        • Resource Negotiator
        • Schedulers
      • Machine Learning
      • SQL Framework
      • Stream Processing
      • Graph Processing
      • Potpourri: Hardware, Serverless and Approximation
  • Operating System
    • Index
      • OSTEP
        • Virtualization
          • CPU Abstraction: the Process
          • Interlude: Process API
          • Mechanism: Limited Direct Execution
        • Intro
  • Networking
    • Index
      • CS 294 (Distributed System)
        • Week 1 - Global State and Clocks
          • Distributed Snapshots: Determining Global States of Distributed Systems
          • Time, Clocks, and the Ordering of Events in a Distributed System
        • Weak 5 - Weak Consistency
          • Dynamo: Amazon's Highly Available Key-value Store
          • Replicating Data Consistency Explained Through Baseball
          • Managing update conflicts in Bayou, a weakly connected replicated storage system
      • CS 268 (Adv Network)
        • Intro
        • Internet Architecture
          • Towards an Active Network Architecture
          • The Design Philosophy of the DARPA Internet Protocols
        • Beyond best-effort/Unicast
          • Core Based Trees (CBT)
          • Multicast Routing in Internetworks and Extended LANs
        • Congestion Control
        • SDN
          • ONIX: A Distributed Control Platform for Large-scale Production Networks
          • B4: Experience with a Globally-Deployed Software Defined WAN
          • How SDN will shape networking
          • The Future of Networking, and the Past of Protocols
        • Datacenter Networking
          • Fat tree
          • Jellyfish
        • BGP
          • The Case for Separating Routing from Routers
        • Programmable Network
          • NetCache
          • RMT
        • Datacenter Congestion Control
          • Swift
          • pFabric
        • WAN CC
          • Starvation (Sigcomm 22)
        • P2P
          • Design and Evaluation of IPFS: A Storage Layer for the Decentralized Web
          • The Impact of DHT Routing Geometry on Resilience and Proximity
        • Net SW
          • mTCP
          • The Click modular router
        • NFV
          • Performance Interfaces for Network Functions
          • Making Middleboxes Someone Else's Problem: Network Processing as a Cloud Service
        • Ethics
          • On the morals of network research and beyond
          • The collateral damage of internet censorship by DNS injection
          • Encore: Lightweight Measurement of Web Censorship with Cross-Origin Requests
        • Low Latency
          • Aquila: A unified, low-latency fabric for datacenter networks
          • cISP: A Speed-of-Light Internet Service Provider
        • Disaggregation
          • Network Requirements for Resource Disaggregation
        • Tenant Networking
          • Invisinets
          • NetHint: While-Box Networking for Multi-Tenant Data Centers
        • Verification
          • A General Approach to Network Configuration Verification
          • Header Space Analysis: Static Checking for Networks
        • ML
          • SwitchML
          • Fast Distributed Deep Learning over RDMA
      • Computer Networking: A Top-Down Approach
        • Chapter 1. Computer Network and the Internet
          • 1.1 What Is the Internet?
          • 1.2 The Network Edge
          • 1.3 The Network Core
        • Stanford CS144
          • Chapter 1
            • 1.1 A Day in the Life of an Application
            • 1.2 The 4-Layer Internet Model
            • 1.3 The IP Service Model
            • 1.4 A Day in the Life of a Packet
            • 1.6 Layering Principle
            • 1.7 Encapsulation Principle
            • 1.8 Memory layout and Endianness
            • 1.9 IPv4 Addresses
            • 1.10 Longest Prefix Match
            • 1.11 Address Resolution Protocol (ARP)
            • 1.12 The Internet and IP Recap
      • Reading list
        • Elastic hyperparameter tuning on the cloud
        • Rethinking Networking Abstractions for Cloud Tenants
        • Democratizing Cellular Access with AnyCell
        • Dagger: Efficient and Fast RPCs in Cloud Microservices in Near-Memory Reconfigurable NICs
        • Sage: Practical & Scalable ML-Driven Performance Debugging in Microservices
        • Faster and Cheaper Serverless Computing on Harvested Resources
        • Network-accelerated Distributed Machine Learning for Multi-Tenant Settings
        • User-Defined Cloud
        • LegoOS: A Disseminated Distributed OS for Hardware Resource Disaggregation
        • Beyond Jain's Fairness Index: Setting the Bar For The Deployment of Congestion Control Algorithms
        • IncBricks: Toward In-Network Computation with an In-Network Cache
  • Persistence
    • Index
      • Hardware
        • Enhancing Lifetime and Security of PCM-Based Main Memory with Start-Gap Wear Leveling
        • An Empirical Guide to the Behavior and Use of Scalable Persistent Memory
  • Database
    • Index
  • Group
    • WISR Group
      • Group
        • Offloading distributed applications onto smartNICs using iPipe
        • Semeru: A memory-disaggregated managed runtime
      • Cache
        • Index
          • TACK: Improving Wireless Transport Performance by Taming Acknowledgements
          • LHD: Improving Cache Hit Rate by Maximizing Hit Density
          • AdaptSize: Orchestrating the Hot Object Memory Cache in a Content Delivery Network
          • Clustered Bandits
          • Important Sampling
          • Contexual Bandits and Reinforcement Learning
          • Reinforcement Learning for Caching with Space-Time Popularity Dynamics
          • Hyperbolic Caching: Flexible Caching for Web Applications
          • Learning Cache Replacement with CACHEUS
          • Footprint Descriptors: Theory and Practice of Cache Provisioning in a Global CDN
      • Hyperparam Exploration
        • Bayesian optimization in cloud machine learning engine
    • Shivaram's Group
      • Tools
      • Group papers
        • PushdownDB: Accelerating a DBMS using S3 Computation
        • Declarative Machine Learning Systems
        • P3: Distributed Deep Graph Learning at Scale
        • Accelerating Graph Sampling for Graph Machine Learning using GPUs
        • Unicorn: A System for Searching the Social Graph
        • Dorylus: Affordable, Scalable, and Accurate GNN Training with Distributed CPU Servers and Serverless
        • Garaph: Efficient GPU-accelerated GraphProcessing on a Single Machine with Balanced Replication
        • MOSAIC: Processing a Trillion-Edge Graph on a Single Machine
        • Fluid: Resource-aware Hyperparameter Tuning Engine
        • Lists
          • Wavelet: Efficient DNN Training with Tick-Tock Scheduling
          • GPU Lifetimes on Titan Supercomputer: Survival Analysis and Reliability
          • ZeRO-Infinity and DeepSpeed: Unlocking unprecedented model scale for deep learning training
          • ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning
          • KungFu: Making Training inDistributed Machine Learning Adaptive
        • Disk ANN
      • Queries Processing
        • Building An Elastic Query Engine on Disaggregated Storage
        • GRIP: Multi-Store Capacity-Optimized High-Performance NN Search
        • Milvus: A Purpose-Built Vector Data Management System
        • Query2box: Reasoning over Knowledge Graphs in Vector Space using Box Embeddings
        • Billion-scale Approximate Nearest Neighbor Search
        • DiskANN: Fast accurate billion-point nearest neighbor search on a single node
        • KGvec2go - Knowledge Graph Embeddings as a Service
    • Seminar & Talk
      • Berkeley System Seminar
        • RR: Engineering Record and Replay for Deployability
        • Immortal Threads: Multithreaded Event-driven Intermittent Computing on Ultra-Low-Power Microcontroll
      • Berkeley DB Seminar
        • TAOBench: An End-to-End Benchmark for Social Network Workloads
      • PS2
      • Sky Seminar Series
        • Spring 23
          • Next-Generation Optical Networks for Emerging ML Workloads
      • Reading List
        • Confluo: Distributed Monitoring and Diagnosis Stack for High-speed Networks
        • Rearchitecting Linux Storage Stack for µs Latency and High Throughput
        • eBPF: rethinking the linux kernel
        • BPF for Storage: An Exokernel-Inspired Approach
        • High Velocity Kernel File Systems with Bento
        • Incremental Path Towards a Safe OS Kernel
        • Toward Reconfigurable Kernel Datapaths with Learned Optimizations
        • A Vision for Runtime Programmable Networks
        • The Demikernel and the future of kernal-bypass systems
        • Floem: A programming system for NIC-accelerated network applications
        • High Performance Data Center Operating Systems
        • Leveraging Service Meshes as a New Network Layer
        • Automatically Discovering Machine Learning Optimizations
        • Beyond Data and Model Parallelism for Deep Neural Networks
        • IOS: Inter-Operator Scheduler for CNN Acceleration
        • Building An Elastic Query Engine on Disaggregated Storage
        • Sundial: Fault-tolerant Clock Synchronization for Datacenters
        • MIND: In-Network Memory Management for Disaggregated Data Centers
        • Understanding host network stack overheads
        • From Laptop to Lambda: Outsourcing Everyday Jobs to Thousands of Transient Functional Containers
        • Redesigning Storage Systems for Future Workloads Hardware and Performance Requirements
        • Are Machine Learning Cloud APIs Used Correctly?
        • Fault-tolerant and transactional stateful serverless workflows
      • Reading Groups
        • Network reading group
          • Recap
          • ML & Networking
            • Video Streaming
              • Overview
              • Reducto: On-Camera Filtering for Resource Efficient Real-Time Video Analytics
              • Learning in situ: a randomized experiment in video streaming
              • SENSEI: Aligning Video Streaming Quality with Dynamic User Sensitivity
              • Neural Adaptive Video Streaming with Pensieve
              • Server-Driven Video Streaming for Deep Learning Inference
            • Congestion Control
              • ABC: A Simple Explicit Congestion Controller for Wireless Networks
              • TCP Congestion Control: A Systems Approach
                • Chapter 1: Introduction
              • A Deep Reinforcement Learning Perspective on Internet Congestion Control
              • Pantheon: the training ground for Internet congestion-control research
            • Other
              • On the Use of ML for Blackbox System Performance Prediction
              • Marauder: Synergized Caching and Prefetching for Low-Risk Mobile App Acceleration
              • Horcrux: Automatic JavaScript Parallelism for Resource-Efficient Web Computation
              • Snicket: Query-Driven Distributed Tracing
            • Workshop
          • Homa: A Receiver-Driven Low-Latency Transport Protocol Using Network Priorities
        • DB reading group
          • CliqueMap: Productionizing an RMA-Based Distributed Caching System
          • Hash maps overview
          • Dark Silicon and the End of Multicore Scaling
        • WISR
          • pFabric: Minimal Near-Optimal Datacenter Transport
          • Scaling Distributed Machine Learning within-Network Aggregation
          • WCMP: Weighted Cost Multipathing for Improved Fairness in Data Centers
          • Data center TCP (DCTCP)
      • Wisconsin Seminar
        • Enabling Hyperscale Web Services
        • The Lottery Ticket Hypothesis
        • External Merge Sort for Top-K Queries: Eager input filtering guided by histograms
      • Stanford MLSys Seminar
        • Episode 17
        • Episode 18
  • Cloud Computing
    • Index
      • Cloud Reading Group
        • Owl: Scale and Flexibility in Distribution of Hot Contents
        • RubberBand: cloud-based hyperparameter tuning
  • Distributed System
    • Distributed Systems Lecture Series
      • 1.1 Introduction
  • Conference
    • Index
      • Stanford Graph Learning Workshop
        • Overview of Graph Representation Learning
      • NSDI 2022
      • OSDI 21
        • Graph Embeddings and Neural Networks
        • Data Management
        • Storage
        • Preview
        • Optimizations and Scheduling for ML
          • Oort: Efficient Federated Learning via Guided Participant Selection
          • PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections
      • HotOS 21
        • FlexOS: Making OS Isolation Flexible
      • NSDI 21
        • Distributed System
          • Fault-Tolerant Replication with Pull-Based Consensus in MongoDB
          • Ownership: A Distributed Futures System for Fine-Grained Tasks
          • Caerus: NIMBLE Task Scheduling for Serverless Analytics
          • Ship Computer or Data? Why not both?
          • EPaxos Revisited
          • MilliSort and MilliQuery: Large-Scale Data-Intensive Computing in Milliseconds
        • TEGRA: Efficient Ad-Hoc Analytics on Evolving Graphs
        • GAIA: A System for Interactive Analysis on Distributed Graphs Using a High-Level Language
      • CIDR 21
        • Cerebro: A Layered Data Platform for Scalable Deep Learning
        • Magpie: Python at Speed and Scale using Cloud Backends
        • Lightweight Inspection of Data Preprocessingin Native Machine Learning Pipelines
        • Lakehouse: A New Generation of Open Platforms that UnifyData Warehousing and Advanced Analytics
      • MLSys 21
        • Chips and Compilers Symposium
        • Support sparse computations in ML
      • SOSP 21
        • SmartNic
          • LineFS: Efficient SmartNIC offload of a distributed file system with pipeline parallelism
          • Xenic: SmartNIC-accelerated distributed transacitions
        • Graphs
          • Mycelium: Large-Scale Distributed Graph Queries with Differential Privacy
          • dSpace: Composable Abstractions for Smart Spaces
        • Consistency
          • Efficient and Scalable Thread-Safety Violation Detection
          • Understanding and Detecting Software Upgrade Failures in Distributed Systems
        • NVM
          • HeMem: Scalable Tiered Memory Management for Big Data Applications and Real NVM
        • Learning
          • Bladerunner: Stream Processing at Scale for a Live View of Backend Data Mutations at the Edge
          • Faster and Cheaper Serverless Computing on Harvested Resources
  • Random
    • Reading List
      • Random Thoughts
      • Hesse
      • Anxiety
  • Grad School
    • Index
      • Resources for undergraduate students
Powered by GitBook
On this page
  • Data Models and Languages
  • Database Theory
  • DBMS Architecture
  • Operating System Issues
  • File Organizations and Access Methods
  • Query Complexity
  • Query Processing and Optimization
  • Concurrency Control and Recovery
  • Distributed and Parallel Data Processing
  • Data Extraction and Integration
  • Data Analysis and Decision Support
  • DBMS and Search Engines
  • Emerging Topics

Was this helpful?

  1. Database

Index

Data Models and Languages

  • Codd, E. F., A Relational Model of Data for Large Shared Data Banks, Communications of the ACM, 1970

  • J. Ullman. Database and Knowledge Base Systems, vol. I. Chapter 3 (Logic as a Data Model).

  • Stonebraker, M., Inclusion of New Types In Relational Data Base Systems, Proceedings of the International Conference on Data Engineering, 1986.

  • Stonebraker M., and Hellerstein J., What Goes Around Comes Around.

Database Theory

  • S. Abiteboul, R. Hull, V. Vianu. Foundations of Databases. Available for free from: http: //webdam.inria.fr/Alice/

    • Conjunctive Queries (Chapters 3,4)

    • Chapter 6, Sections 6.2 and 6.4

    • Datalog (Chapter 12, Sections 12.1 - 12.3, Chapter 13, Section 13.1 - 13.3)

  • T. J. Green, S. Huang, B. T. Loo, W. Zhou, Datalog and Recursive Query Processing, Foundations and Trends in Databases, Vol. 5, 2012.

  • H. Ngo, C. Re, A. Rudra, Skew Strikes Back: New Developments in the Theory of Join Algorithms, SIGMOD Record, 2013.

  • Cheney, Chiticariu, Tan, Provenance in Databases: Why, How and Where, Foundations and Trends in Databases, 2009.

  • C. Dwork, A Firm Foundation for Private Data Analysis, Communications of the ACM, 2011.

DBMS Architecture

  • Chamberlin, D. D., Astrahan, M. M., Blasgen, M. W., Gray, J. N., King, W. F., Lindsay, B. G., Lorie, R., Mehl, J. W., Price, T. G., Putzolu, F., Selinger, P. G., Schkolnick, M., Slutz, D. R., Traiger, I. L., Wade, B. W. and Yost, R. A. A History and Evaluation of System R, Communications of the ACM 24(10), 1981.

  • Stonebraker, M., Wong E., Kreps P., and Held G., The Design and Implementation of INGRES, ACM Transactions on Database Systems 1(3), 1976.

  • Hellerstein J. M., Stonebraker M. and Hamilton, J. R., Architecture of a Database System, Foundations and Trends in Databases 1(2), 2007.

Operating System Issues

  • Chou, H., and DeWitt, D., An Evaluation of Buffer Management Strategies for Relational Database Systems, Proceedings of the International Conference on Very Large Data Bases (VLDB), 1985.

  • O’Neil, E. J., O’Neil, P. E., Weikum G., The LRU-K Page Replacement Algorithm For Database Disk Buffering, Proceedings of the ACM-SIGMOD International Conference on Management of Data, 1993.

  • Stonebraker, M., Operating System Support for Database Management, Communications of the ACM 24(7), 1981.

File Organizations and Access Methods

  • Comer, D., The Ubiquitous B-Tree, ACM Computing Surveys 11(2), June 1979.

  • Guttman, A., R-Trees: A Dynamic Index Structure for Spatial Searching, SIGMOD Conference, 1984.

  • Patrick E. O’Neil, Dallan Quass: Improved Query Performance with Variant Indexes. SIGMOD Conference, 1997: 38-49.

Query Complexity

  • Optimal implementation of conjunctive queries in relational databases, Chandra, Merlin, STOC 1977

  • The Complexity of Relational Query Languages, Vardi, STOC 1982

  • Algorithms for acyclic database schemes, Yannakakis, VLDB 1981.

  • Size bounds and query plans for relational joins, Atserias, Grohe, Marx, FOCS 2008

  • Hypertree Decompositions and Tractable Queries, Gottlob, Leone, Scarcello, JCSS 2002

  • Leapfrog Triejoin: a worst-case optimal join algorithm, Veldhuizen, ICDT 2014

  • Skew Strikes Back: New Developments in the Theory of Join Algorithms, Ngo, Re, Rudra, SIGMOD RECORD 2013

Query Processing and Optimization

  • Shapiro, L. D., Join Processing in Database Systems with Large Main Memories, ACM Transactions on Database Systems 11(3), 1986.

  • Selinger, P., et al., Access Path Selection in a Relational Database Management System, Proceedings of the ACM-SIGMOD International Conference on Management of Data, 1979.

  • Surajit Chaudhuri: An Overview of Query Optimization in Relational Systems. PODS 1998: 34-43

  • Goetz Graefe: Query Evaluation Techniques for Large Databases. ACM Comput. Surv. 25(2): 73-170 (1993)

Concurrency Control and Recovery

  • Bernstein, P.A., Hadzilacos, V., and Goodman, N., Concurrency Control and Recovery in Database Systems, Addison-Wesley, 1987; can be freely downloaded from Bernstein’s webpage. (Chapters 1 and 2)

  • Gray, J., Lorie, R. A., Pulzolu, G. R., Traiger, I. L., Granularity of Locks and Degrees of Consistency in a Shared Data Base, Proceedings of the IFIP Working Conference on Modeling of Data Base Management Systems, 1979.

  • Kung, H., and Robinson, J., On Optimistic Methods for Concurrency Control, ACM Transactions on Database Systems 6(2), June 1981.

  • Berenson, H., Bernstein, P. A., Gray, J., Melton, J., O’Neil, E. J., O’Neil, P. E., A Critique of ANSI SQL Isolation Levels, Proceedings of the ACM-SIGMOD International Conference on Management of Data, 1995.

  • Lehman, P. and Yao, S., Efficient Locking for Concurrent Operations on B-Trees, ACM Transactions on Database Systems, 6(4): 650-670.

  • Mohan, C., Haderle, D., Lindsay, B., Pirahesh, H., Schwarz, P., ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging, ACM Transactions on Database Systems, 17(1): 94-162.

Distributed and Parallel Data Processing

  • MapReduce: simplified data processing on large clusters, Dean, Ghemawat, OSDI 2004

  • MapReduce and parallel DBMSs: friends or foes?, Stonebraker et al., CACM 2010

  • Optimizing Joins in a Map-Reduce Environment, Afrati, Ullman, EDBT 2010

  • A Guide to Formal Analysis of Join Processing in Massively Parallel Systems, Koutris, Suciu, SIGMOD Record 2016

  • Streaming

    • Models and issues in data stream systems, Babcock, Babu, Datar, Motwani, Widom, PODS 2002

    • The space complexity of approximating the frequency moments, Alon, Matias, Szegedy, STOC 1996

Data Extraction and Integration

  • Uncertain Data

    • Probabilistic Databases, Suciu, Olteanu, Re, Koch

    • Probabilistic Databases: Diamonds in the Dirt, Dalvi, Re, Suciu, CACM 2008

    • The dichotomy of probabilistic inference for unions of conjunctive queries, Dalvi, Suciu, JACM 2012

    • Consistent Query Answering: Five Easy Pieces, Chomicki, ICDT 2007

    • Consistent Query Answers in Inconsistent Databases, Arenas, Bertossi, Chomicki, PODS 1999

  • S. Sarawagi, Information Extraction, Foundations and Trends in Databases. Vol. 1, No. 3 (2007): read only Chapters 1-3.

  • Levy, Alon, Logic-based Techniques in Data Integration, available on the Web at http: //homes.cs.washington.edu/~alon/site/files/levy-di00.ps: read up to and including Section 5.1.

Data Analysis and Decision Support

  • Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart D., Venkatrao, M., Pellow, F., and Pirahesh, H., Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub- Totals. In Data Mining and Knowledge Discovery, 1(1): 29?53.

  • Agrawal, R., and R. Srikant, Fast Algorithms for Mining Association Rules. In Proceedings of the 20th International Conference on Very Large Data Bases, 487?499.

  • Zhang, T., Ramakrishnan R., and Livny M., BIRCH: A Clustering Algorithm for Large Multidimensional Datasets, Proceedings of the ACM SIGMOD International Conference on Management of Data, 1996

  • Graham Cormode, Minos Garofalakis, Peter J. Haas and Chris Jermaine, Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches, Foundations and Trends in Databases, Vol. 4, 2011.

DBMS and Search Engines

  • Brin, S. and Page, L., The Anatomy of a Large-Scale Hypertextual Web Search Engine, Proceedings of Computer Networks and ISDN Systems, 1998.

  • Singhal, A., Modern Information Retrieval: a Brief Overview. IEEE Data Engineering Bulletin, 24(4), 35- 43, 2001.

  • Page, L. and Brin, S. and Motwani, R. and Winograd, T., The Pagerank Citation Ranking: Bringing Order to the Web, Technical Report, 1999.

Emerging Topics

PreviousAn Empirical Guide to the Behavior and Use of Scalable Persistent MemoryNextWISR Group

Last updated 4 years ago

Was this helpful?

Doan, A., Halevy, A., Ives, Z., Principles of Data Integration, Chapter 1, Chapter 5, Chapter 4: read 4.1, 4.2.1 (only Edit Distance), 4.2.2 (only Overlap, Jaccard, and TF/IDF), 4.2.4, and 4.3 (only Inverted Index and Size Filtering). Chapter 7: up to and including 7.5.3. Chapter 9: read 9.1, 9.2, and 9.3.1. Chapters available from courses/dibook-chapters.

http://pages.cs.wisc.edu/~anhai/
The Case for Learned Index Structures