Index
Reference: CS744 (UW-Madison) and CS 494 (UIC)
Architecture
Compute + Overall
The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , L.A. Barroso, U. Holzle, Synthesis Lectures on Computer Architecture, 2009. Chapter 1 and 2.
Networks
VL2: A Scalable and Flexible Data Center Network, Greenberg et al., SIGCOMM 2009.
Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google’s Datacenter Network, Singh et al., SIGCOMM 2015.
Storage (in a bit detailed fashion)
The Hadoop Distributed File System, Schvachko et al, MSST, 2010
The Google File System, Ghemawat et al, SOSP, 2003.
Flat Datacenter Storage. Nightingale et. al, OSDI, 2012.
EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding. Rashmi et. al, OSDI, 2016
f4: Facebook’s Warm BLOB Storage System. Muralidhar et. al, OSDI, 2014.
Bigtable: A Distributed Storage System for Structured Data. Chang et. al, OSDI, 2006.
Dynamo: Amazon’s Highly Available Key-value Store. DeCandia et. al, SOSP, 2007.
Spanner: Google’s Globally-Distributed Database. Corbett et. al, OSDI, 2012.
An Analysis of Facebook Photo Caching. Huang et. al, SOSP, 2013.
Scaling Memcache at Facebook. Nishtala et. al, NSDI, 2013.
The Chubby lock service for loosely-coupled distributed systems. Mike Burrows, OSDI, 2006.
Execution Engines, Resource Negotiators, Schedulers
Execution Engines
Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks. Isard et. al, EuroSys, 2007.
CIEL: a universal execution engine for distributed data-flow computing. Murray et. al, NSDI, 2011.
Reining in the Outliers in Map-Reduce Clusters using Mantri, Ananthanarayanan et al, OSDI, 2010.
Encapsulation of parallelism in the Volcano query processing system. Goetz Graefe, SIGMOD, 1990.
PACMan: Coordinated Memory Caching for Parallel Jobs, Ananthanarayanan et. al, NSDI, 2012.
Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing, Zaharia et al, NSDI, 2012.
Apache Tez: A Unifying Framework for Modeling and Building Data Processing Applications, Saha et al, SIGMOD, 2015.
Flare: Optimizing Apache Spark with Native Compilation for Scale-Up Architectures and Medium-Size Data. Essertel et. al, OSDI, 2018.
Transaction: Obladi: Oblivious Serializable Transactions in the Cloud. Crooks et. al, OSDI, 2018.
Load balancing
Ananta: Cloud Scale Load Balancing. Patel et. al, SIGCOMM, 2013.
Duet: Cloud Scale Load Balancing with Hardware and Software. Gandhi et. al, SIGCOMM, 2014.
SilkRoad: Making Stateful Layer-4 Load Balancing Fast and Cheap Using Switching ASICs, Miao et. al, SIGCOMM, 2017.
Resource Negotiator
Apache Hadoop YARN: Yet Another Resource Negotiator, Vavilapalli et al, SOCC, 2013.
Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center, Hindman et al, NSDI, 2011.
Dominant Resource Fairness: Fair Allocation of Multiple Resource Types, Ghodsi et al, NSDI, 2011.
Borg: Large-scale cluster management at Google with Borg. Verma et. al, EuroSys, 2015.
Scheduling
Packing
Altruistic Scheduling in Multi-Resource Clusters. Grandl et. al, OSDI, 2016.
Multi-Resource Packing for Cluster Schedulers, Grandl et. al, SIGCOMM, 2014.
Quincy: Fair Scheduling for Distributed Computing Clusters. Isard et. al, SOSP, 2009.
Re-Planning
Dynamic Query Re-Planning using QOOP. Mahajan et. al, OSDI, 2018.
Threads
Arachne: Core-Aware Thread Management. Qin et. al, OSDI, 2018.
Cache
RobinHood: Tail Latency Aware Caching – Dynamic Reallocation from Cache-Rich to Cache-Poor. Berger et. al, OSDI, 2018.
Applications: Machine Learning
Scaling Distributed Machine Learning with the Parameter Server, Li et al, OSDI, 2014.
STRADS: A Distributed Framework for Scheduled Model Parallel Machine Learning, Kim et al, EuroSys, 2016.
SLAQ: Quality-Driven Scheduling for Distributed Machine Learning, Zhang et al, SoCC, 2017.
TensorFlow: A System for Large-Scale Machine Learning, Abadi et al, OSDI, 2016.
Pytorch Distributed: Experiences on Accelerating Data Parallel Training, Shen et al, VLDB, 2020
Gandiva: Introspective Cluster Scheduling for Deep Learning, Xiao et al, OSDI, 2018.
Clipper: A Low-Latency Online Prediction Serving System, Crankshaw et al, NSDI, 2017.
PipeDream: Generalized Pipeline Parallelism for DNN Training. Narayanan et al, SOSP 2019.
TVM: An Automated End-to-End Optimizing Compiler for Deep Learning, Chen et al, OSDI, 2018
Ray: A Distributed Framework for Emerging AI Applicationss, Moritz et al, OSDI, 2018.
Towards a Unified Architecture for in-RDBMS Analytics, Feng et al, SIGMOD, 2012
DeepCPU: Serving RNN-based Deep Learning Models 10x Faster. Zhang et. al, USENIX ATC, 2018.
PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems. Lee et. al, OSDI, 2018.
Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective, Hazelwood et. al, HPCA, 2018.
MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. Chen et. al, Neural Information Processing Systems, Workshop on Machine Learning Systems, 2015.
Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud, Low et al, VLDB, 2012.
Optimus: An Efficient Dynamic Resource Scheduler for Deep Learning Clusters. Peng et. al, EuroSys, 2018.
Tiresias: A GPU Cluster Manager for Distributed Deep Learning. Gu et. al, NSDI, 2019.
KeystoneML: Optimizing Pipelines for Large-Scale Advanced Analystics, Sparks et al, ICDE, 2017.
Project Adam: Building an Efficient and Scalable Deep Learning Training System, Chilimbi et al, OSDI, 2014.
DimmWitted: A Study of Main-Memory Statistical Analytics. Zhang and Re, VLDB, 2014.
Applications: Batch Analytics and SQL Frameworks
Spark SQL: Relational Data Processing in Spark, Armburst et al, SIGMOD, 2015.
Major technical advancements in Apache Hive, Huai et al, SIGMOD, 2014.
Clarinet: WAN-Aware Optimization for Analytics Queries, Viswanathan et al, OSDI, 2016.
Global Analytics in the Face of Bandwidth and Regulatory Constraints, Vulimiri et al, NSDI, 2015.
SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets. Chaiken et al, VLDB
The Snowflake Elastic Data Warehouse. Dageville et al, SIGMOD 2016.
Building an Elastic Query Engine on Disaggregated Storage. Vuppalapati et al, NSDI 2020.
Impala: A Modern, Open-Source SQL Engine for Hadoop. Kornacker et. al, CIDR, 2015.
Dremel: Interactive Analysis of Web-Scale Datasets. Melnik et. al, VLDB, 2010.
Trill: A High-Performance Incremental Query Processor for Diverse Analytics. Chandramouli et. al, VLDB, 2014.
Rethinking SIMD Vectorization for In-Memory Databases. Polychroniou et. al, SIGMOD, 2015.
Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited. Balkesen et. al, VLDB, 2013.
TAG: a Tiny AGgregation Service for Ad-Hoc Sensor Networks. Madden et. al, OSDI, 2002.
Applications: Stream Processing
Storm @Twitter , Toshniwal et al, SIGMOD, 2014.
Twitter Heron: Stream Processing at Scale, Kulkarni et al, SIGMOD, 2015.
Realtime Data Processing at Facebook. Chen et. al, SIGMOD, 2016.
Discretized Streams: Fault-Tolerant Streaming Computation at Scale, Zaharia et al, SOSP, 2013.
Reading: Spark Structured Streaming
Apache Flink: Stream and Batch Processing in a Single Engine, Carbone et al, Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 2015.
Kafka Distributed Messaging System for Log Processing, Kreps et al, NetDB Workshop, 2011.
Also this document of comparison of widely used Queuing Messaging Processing Systems.
StreamScope: Continuous Reliable Distributed Processing of Big Data Streams, Lin et al, NSDI, 2016.
Drizzle: Fast and Adaptable Stream Processing at Scale. Venkataraman et. al, SOSP, 2017.
Chi: A Scalable and Programmable Control Plane for Distributed Stream Processing Systems. Mai et. al, PVLDB, 2018.
Gloss: Seamless Live Reconfiguration and Reoptimization of Stream Programs. Rajadurai et. al, ASPLOS, 2018.
Aurora: a new model and architecture for data stream management. Abadi et. al, VLDB, 2003.
Three steps is all you need: fast, accurate, automatic scaling decisions for distributed streaming dataflows. Kalavri et. al, OSDI, 2018.
Naiad: A Timely Dataflow System, Murray et al, SOSP, 2013.
Applications: Graph Processing
Pregel: A System for Large-Scale Graph Processing, Malewicz et al, SIGMOD, 2010.
TAO: Facebook’s Distributed Data Store for the Social Graph. Bronson et. al, USENIX ATC, 2013.
PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs, Gonzalez et al, OSDI, 2012.
GraphX: Graph Processing in a Distributed Dataflow Framework, Gonzalez et al, OSDI, 2014.
PyTorch-BigGraph: A Large-Scale Graph Embedding System. Lerer et al, Proceedings of the 2nd SysML Conference, 2019.
Scalability! But at what COST? McSherry et al, HOTOS 2015.
Arabesque: A System for Distributed Graph Mining. Teixeira et. al, SOSP, 2015.
Fast and Concurrent RDF Queries with RDMA-based Distributed Graph Exploration. Shi et. al, OSDI, 2016.
ASAP: Fast, Approximate Pattern Mining at Scale. Iyer et. al, OSDI, 2018.
Grappa: A Latency-Tolerant Runtime for Large-Scale Irregular Applications. Nelson et. al, USENIX ATC, 2015.
One Trillion Edges: Graph Processing at Facebook-Scale. Ching et. al, VLDB, 2015.
Potpourri: Runtime, New Hardware Models, Serverless, and Approximation
Runtime
Weld: A Commom Runtime for High Performance Data Analytics, Palkar et al, CIDR, 2017.
Hardware
In-Datacenter Performance Analysis of a Tensor Processing Unit, Jouppi et al, CIDR, 2017.
A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services. Putnam et. al, ISCA, 2014.
Strata: A Cross Media File System. Kwon et. al, SOSP, 2017.
Serverless
Occupy the Cloud: Distributed Computing for the 99%, Jonas et al, SoCC, 2017.
Serverless Computation with OpenLambda. Hendrickson et. al, HotCloud, 2016.
Pocket: Elastic Ephemeral Storage for Serverless Analytics. Klimovic et. al, OSDI, 2018.
Peeking Behind the Curtains of Serverless Platforms, Wang et. al, USENIX ATC, 2018.
SOCK: Rapid Task Provisioning with Serverless-Optimized Containers, Oakes et. al, USENIX ATC, 2018.
Approximation
BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data, Agarwal et al, Eurosys, 2013.
BlinkML: Efficient Maximum Likelihood Estimation with Probabilistic Guarantees. Park et. al, SIGMOD, 2019.
Quickr: Lazily Approximating Complex AdHoc Queries in BigData Clusters. Kandula et. al, SIGMOD, 2016.
Other: RDMA
FaRM: Fast Remote Memory. Dragojevic et. al, NSDI, 2014.
No compromises: distributed transactions with consistency, availability, and performance. Dragojevic et. al, SOSP, 2015.
FaSST: Fast, Scalable and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs. Kalia et. al, OSDI, 2016.
Datacenter RPCs can be General and Fast. Kalia et. al, NSDI, 2019.
Distributed Lock Management with RDMA: Decentralization without Starvation. Yoon et. al, SIGMOD, 2018.
Efficient Memory Disaggregation with Infiniswap. Gu et. al, NSDI, 2017.
Accelerating Relational Databases by Leveraging Remote Memory and RDMA. Li et. al, SIGMOD, 2016.
Remote Memory in the Age of Fast Networks. Aguilera et. al, SoCC, 2017.
Other: Offload
Floem: A Programming System for NIC-Accelerated Network Applications. Phothilimthana et. al, OSDI, 2018.
Direct Universal Access: Making Data Center Resources Available to FPGA. Shu et. al, NSDI, 2019.
Last updated
Was this helpful?