🐣
Reading List
  • Starting point
  • Reference list
  • PhD application guidelines
  • Big Data System
    • Index
      • Architecture
        • Storage
          • Sun's Network File System (NFS)
      • Execution Engine, Resource Negotiator, Schedulers
        • Execution Engines
        • Resource Negotiator
        • Schedulers
      • Machine Learning
      • SQL Framework
      • Stream Processing
      • Graph Processing
      • Potpourri: Hardware, Serverless and Approximation
  • Operating System
    • Index
      • OSTEP
        • Virtualization
          • CPU Abstraction: the Process
          • Interlude: Process API
          • Mechanism: Limited Direct Execution
        • Intro
  • Networking
    • Index
      • CS 294 (Distributed System)
        • Week 1 - Global State and Clocks
          • Distributed Snapshots: Determining Global States of Distributed Systems
          • Time, Clocks, and the Ordering of Events in a Distributed System
        • Weak 5 - Weak Consistency
          • Dynamo: Amazon's Highly Available Key-value Store
          • Replicating Data Consistency Explained Through Baseball
          • Managing update conflicts in Bayou, a weakly connected replicated storage system
      • CS 268 (Adv Network)
        • Intro
        • Internet Architecture
          • Towards an Active Network Architecture
          • The Design Philosophy of the DARPA Internet Protocols
        • Beyond best-effort/Unicast
          • Core Based Trees (CBT)
          • Multicast Routing in Internetworks and Extended LANs
        • Congestion Control
        • SDN
          • ONIX: A Distributed Control Platform for Large-scale Production Networks
          • B4: Experience with a Globally-Deployed Software Defined WAN
          • How SDN will shape networking
          • The Future of Networking, and the Past of Protocols
        • Datacenter Networking
          • Fat tree
          • Jellyfish
        • BGP
          • The Case for Separating Routing from Routers
        • Programmable Network
          • NetCache
          • RMT
        • Datacenter Congestion Control
          • Swift
          • pFabric
        • WAN CC
          • Starvation (Sigcomm 22)
        • P2P
          • Design and Evaluation of IPFS: A Storage Layer for the Decentralized Web
          • The Impact of DHT Routing Geometry on Resilience and Proximity
        • Net SW
          • mTCP
          • The Click modular router
        • NFV
          • Performance Interfaces for Network Functions
          • Making Middleboxes Someone Else's Problem: Network Processing as a Cloud Service
        • Ethics
          • On the morals of network research and beyond
          • The collateral damage of internet censorship by DNS injection
          • Encore: Lightweight Measurement of Web Censorship with Cross-Origin Requests
        • Low Latency
          • Aquila: A unified, low-latency fabric for datacenter networks
          • cISP: A Speed-of-Light Internet Service Provider
        • Disaggregation
          • Network Requirements for Resource Disaggregation
        • Tenant Networking
          • Invisinets
          • NetHint: While-Box Networking for Multi-Tenant Data Centers
        • Verification
          • A General Approach to Network Configuration Verification
          • Header Space Analysis: Static Checking for Networks
        • ML
          • SwitchML
          • Fast Distributed Deep Learning over RDMA
      • Computer Networking: A Top-Down Approach
        • Chapter 1. Computer Network and the Internet
          • 1.1 What Is the Internet?
          • 1.2 The Network Edge
          • 1.3 The Network Core
        • Stanford CS144
          • Chapter 1
            • 1.1 A Day in the Life of an Application
            • 1.2 The 4-Layer Internet Model
            • 1.3 The IP Service Model
            • 1.4 A Day in the Life of a Packet
            • 1.6 Layering Principle
            • 1.7 Encapsulation Principle
            • 1.8 Memory layout and Endianness
            • 1.9 IPv4 Addresses
            • 1.10 Longest Prefix Match
            • 1.11 Address Resolution Protocol (ARP)
            • 1.12 The Internet and IP Recap
      • Reading list
        • Elastic hyperparameter tuning on the cloud
        • Rethinking Networking Abstractions for Cloud Tenants
        • Democratizing Cellular Access with AnyCell
        • Dagger: Efficient and Fast RPCs in Cloud Microservices in Near-Memory Reconfigurable NICs
        • Sage: Practical & Scalable ML-Driven Performance Debugging in Microservices
        • Faster and Cheaper Serverless Computing on Harvested Resources
        • Network-accelerated Distributed Machine Learning for Multi-Tenant Settings
        • User-Defined Cloud
        • LegoOS: A Disseminated Distributed OS for Hardware Resource Disaggregation
        • Beyond Jain's Fairness Index: Setting the Bar For The Deployment of Congestion Control Algorithms
        • IncBricks: Toward In-Network Computation with an In-Network Cache
  • Persistence
    • Index
      • Hardware
        • Enhancing Lifetime and Security of PCM-Based Main Memory with Start-Gap Wear Leveling
        • An Empirical Guide to the Behavior and Use of Scalable Persistent Memory
  • Database
    • Index
  • Group
    • WISR Group
      • Group
        • Offloading distributed applications onto smartNICs using iPipe
        • Semeru: A memory-disaggregated managed runtime
      • Cache
        • Index
          • TACK: Improving Wireless Transport Performance by Taming Acknowledgements
          • LHD: Improving Cache Hit Rate by Maximizing Hit Density
          • AdaptSize: Orchestrating the Hot Object Memory Cache in a Content Delivery Network
          • Clustered Bandits
          • Important Sampling
          • Contexual Bandits and Reinforcement Learning
          • Reinforcement Learning for Caching with Space-Time Popularity Dynamics
          • Hyperbolic Caching: Flexible Caching for Web Applications
          • Learning Cache Replacement with CACHEUS
          • Footprint Descriptors: Theory and Practice of Cache Provisioning in a Global CDN
      • Hyperparam Exploration
        • Bayesian optimization in cloud machine learning engine
    • Shivaram's Group
      • Tools
      • Group papers
        • PushdownDB: Accelerating a DBMS using S3 Computation
        • Declarative Machine Learning Systems
        • P3: Distributed Deep Graph Learning at Scale
        • Accelerating Graph Sampling for Graph Machine Learning using GPUs
        • Unicorn: A System for Searching the Social Graph
        • Dorylus: Affordable, Scalable, and Accurate GNN Training with Distributed CPU Servers and Serverless
        • Garaph: Efficient GPU-accelerated GraphProcessing on a Single Machine with Balanced Replication
        • MOSAIC: Processing a Trillion-Edge Graph on a Single Machine
        • Fluid: Resource-aware Hyperparameter Tuning Engine
        • Lists
          • Wavelet: Efficient DNN Training with Tick-Tock Scheduling
          • GPU Lifetimes on Titan Supercomputer: Survival Analysis and Reliability
          • ZeRO-Infinity and DeepSpeed: Unlocking unprecedented model scale for deep learning training
          • ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning
          • KungFu: Making Training inDistributed Machine Learning Adaptive
        • Disk ANN
      • Queries Processing
        • Building An Elastic Query Engine on Disaggregated Storage
        • GRIP: Multi-Store Capacity-Optimized High-Performance NN Search
        • Milvus: A Purpose-Built Vector Data Management System
        • Query2box: Reasoning over Knowledge Graphs in Vector Space using Box Embeddings
        • Billion-scale Approximate Nearest Neighbor Search
        • DiskANN: Fast accurate billion-point nearest neighbor search on a single node
        • KGvec2go - Knowledge Graph Embeddings as a Service
    • Seminar & Talk
      • Berkeley System Seminar
        • RR: Engineering Record and Replay for Deployability
        • Immortal Threads: Multithreaded Event-driven Intermittent Computing on Ultra-Low-Power Microcontroll
      • Berkeley DB Seminar
        • TAOBench: An End-to-End Benchmark for Social Network Workloads
      • PS2
      • Sky Seminar Series
        • Spring 23
          • Next-Generation Optical Networks for Emerging ML Workloads
      • Reading List
        • Confluo: Distributed Monitoring and Diagnosis Stack for High-speed Networks
        • Rearchitecting Linux Storage Stack for µs Latency and High Throughput
        • eBPF: rethinking the linux kernel
        • BPF for Storage: An Exokernel-Inspired Approach
        • High Velocity Kernel File Systems with Bento
        • Incremental Path Towards a Safe OS Kernel
        • Toward Reconfigurable Kernel Datapaths with Learned Optimizations
        • A Vision for Runtime Programmable Networks
        • The Demikernel and the future of kernal-bypass systems
        • Floem: A programming system for NIC-accelerated network applications
        • High Performance Data Center Operating Systems
        • Leveraging Service Meshes as a New Network Layer
        • Automatically Discovering Machine Learning Optimizations
        • Beyond Data and Model Parallelism for Deep Neural Networks
        • IOS: Inter-Operator Scheduler for CNN Acceleration
        • Building An Elastic Query Engine on Disaggregated Storage
        • Sundial: Fault-tolerant Clock Synchronization for Datacenters
        • MIND: In-Network Memory Management for Disaggregated Data Centers
        • Understanding host network stack overheads
        • From Laptop to Lambda: Outsourcing Everyday Jobs to Thousands of Transient Functional Containers
        • Redesigning Storage Systems for Future Workloads Hardware and Performance Requirements
        • Are Machine Learning Cloud APIs Used Correctly?
        • Fault-tolerant and transactional stateful serverless workflows
      • Reading Groups
        • Network reading group
          • Recap
          • ML & Networking
            • Video Streaming
              • Overview
              • Reducto: On-Camera Filtering for Resource Efficient Real-Time Video Analytics
              • Learning in situ: a randomized experiment in video streaming
              • SENSEI: Aligning Video Streaming Quality with Dynamic User Sensitivity
              • Neural Adaptive Video Streaming with Pensieve
              • Server-Driven Video Streaming for Deep Learning Inference
            • Congestion Control
              • ABC: A Simple Explicit Congestion Controller for Wireless Networks
              • TCP Congestion Control: A Systems Approach
                • Chapter 1: Introduction
              • A Deep Reinforcement Learning Perspective on Internet Congestion Control
              • Pantheon: the training ground for Internet congestion-control research
            • Other
              • On the Use of ML for Blackbox System Performance Prediction
              • Marauder: Synergized Caching and Prefetching for Low-Risk Mobile App Acceleration
              • Horcrux: Automatic JavaScript Parallelism for Resource-Efficient Web Computation
              • Snicket: Query-Driven Distributed Tracing
            • Workshop
          • Homa: A Receiver-Driven Low-Latency Transport Protocol Using Network Priorities
        • DB reading group
          • CliqueMap: Productionizing an RMA-Based Distributed Caching System
          • Hash maps overview
          • Dark Silicon and the End of Multicore Scaling
        • WISR
          • pFabric: Minimal Near-Optimal Datacenter Transport
          • Scaling Distributed Machine Learning within-Network Aggregation
          • WCMP: Weighted Cost Multipathing for Improved Fairness in Data Centers
          • Data center TCP (DCTCP)
      • Wisconsin Seminar
        • Enabling Hyperscale Web Services
        • The Lottery Ticket Hypothesis
        • External Merge Sort for Top-K Queries: Eager input filtering guided by histograms
      • Stanford MLSys Seminar
        • Episode 17
        • Episode 18
  • Cloud Computing
    • Index
      • Cloud Reading Group
        • Owl: Scale and Flexibility in Distribution of Hot Contents
        • RubberBand: cloud-based hyperparameter tuning
  • Distributed System
    • Distributed Systems Lecture Series
      • 1.1 Introduction
  • Conference
    • Index
      • Stanford Graph Learning Workshop
        • Overview of Graph Representation Learning
      • NSDI 2022
      • OSDI 21
        • Graph Embeddings and Neural Networks
        • Data Management
        • Storage
        • Preview
        • Optimizations and Scheduling for ML
          • Oort: Efficient Federated Learning via Guided Participant Selection
          • PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections
      • HotOS 21
        • FlexOS: Making OS Isolation Flexible
      • NSDI 21
        • Distributed System
          • Fault-Tolerant Replication with Pull-Based Consensus in MongoDB
          • Ownership: A Distributed Futures System for Fine-Grained Tasks
          • Caerus: NIMBLE Task Scheduling for Serverless Analytics
          • Ship Computer or Data? Why not both?
          • EPaxos Revisited
          • MilliSort and MilliQuery: Large-Scale Data-Intensive Computing in Milliseconds
        • TEGRA: Efficient Ad-Hoc Analytics on Evolving Graphs
        • GAIA: A System for Interactive Analysis on Distributed Graphs Using a High-Level Language
      • CIDR 21
        • Cerebro: A Layered Data Platform for Scalable Deep Learning
        • Magpie: Python at Speed and Scale using Cloud Backends
        • Lightweight Inspection of Data Preprocessingin Native Machine Learning Pipelines
        • Lakehouse: A New Generation of Open Platforms that UnifyData Warehousing and Advanced Analytics
      • MLSys 21
        • Chips and Compilers Symposium
        • Support sparse computations in ML
      • SOSP 21
        • SmartNic
          • LineFS: Efficient SmartNIC offload of a distributed file system with pipeline parallelism
          • Xenic: SmartNIC-accelerated distributed transacitions
        • Graphs
          • Mycelium: Large-Scale Distributed Graph Queries with Differential Privacy
          • dSpace: Composable Abstractions for Smart Spaces
        • Consistency
          • Efficient and Scalable Thread-Safety Violation Detection
          • Understanding and Detecting Software Upgrade Failures in Distributed Systems
        • NVM
          • HeMem: Scalable Tiered Memory Management for Big Data Applications and Real NVM
        • Learning
          • Bladerunner: Stream Processing at Scale for a Live View of Backend Data Mutations at the Edge
          • Faster and Cheaper Serverless Computing on Harvested Resources
  • Random
    • Reading List
      • Random Thoughts
      • Hesse
      • Anxiety
  • Grad School
    • Index
      • Resources for undergraduate students
Powered by GitBook
On this page
  • Main Idea
  • Intro
  • Consistency provided by cloud storage today
  • Consistency models proposed from research community
  • Questions to cope with
  • Read Consistency Guarantees
  • Model
  • Baseball as a Sample Application
  • Read Requirements for Participants
  • Conclusion

Was this helpful?

  1. Networking
  2. Index
  3. CS 294 (Distributed System)
  4. Weak 5 - Weak Consistency

Replicating Data Consistency Explained Through Baseball

Main Idea

  • Replication protocols often involve complex trade-offs between consistency, performance, and availability

  • Shed lights on the consistency models offered by cloud providers, what might be offered in the future, and why vendors are offering consistency choices

Intro

Consistency provided by cloud storage today

  • Azure: strongly consistent storage

    • Always see the latest value that was written for a data object

    • Reasonable to provide within a DC

    • But raises concern for geo-replicated storage

  • AWS S3: designed with weak consistency

    • Better performance and availability: read might be with stale data

    • Eventually consistent: read operation is directed to a replica that has not yet received all the writes that were accepted by some other replica

  • AWS DynamoDB: both eventually consistent reads and strongly consistent reads

  • AWS SimpleDB: same choices for clients that read data

  • Google App Engine Datastore: eventually consistent reads + default strong consistency

  • Yahoo's PNUTS: read-any, read-critical, read-latest

  • Quorum-based storage systems: eventual + strong consistency [by selecting different read and write quorums)

Consistency models proposed from research community

  • Trade-offs between consistency, performance, and availability

    • Availability is the likelihood of a read operation successfully returning suitably consistent data in the presence of server failures

    • Performance refers to the time it takes to complete a read operation, that is, the read latency

  • Two ends: strong and eventual

    • Stronger: lower performance but reduced availability for reads or writes or both

Questions to cope with

  • Are different consistencies useful in practice?

  • Can application developers cope with eventual consistency?

  • Should cloud storage systems offer an even greater choice of consistency than the consistent and eventually consistent reads offered by some of today’s services?

Read Consistency Guarantees

Model

  • Multiple clients perform read and write operations to a data store, concurrently access shared information

  • Data is replicated among a set of servers, but the details of the replication protocol are hidden from clients

  • Write: any operation that updates one or more data objects

    • Eventually received at all servers and performed in the same order

    • Order is consistent with order in which write operations are submitted by clients

    • In practice

      • the order could be enforced, even for concurrent writers

      • by performing all writes at a master server or by having servers run a consensus protocol to reach agreement on the global order

  • Read: return values of one or more data objects that were previously written

    • Not necessarily the latest values

    • Can request a consistency guarantee, which dictates the set of allowable return values

    • Each guarantee is defined by the set of previous writes whose results are visible to a read operation

Strong Consistency

  • Guarantees that a read operation returns the value that was last written for a given object

  • Append? applying all writes to that object

Eventual Consistency

  • The weakest of the guarantees

  • Allows the greatest set of possible return values

  • The term derives from:

    • Each replica eventually receives each write operation

    • If clients stopped performing writes then read operations would eventually return an object’s latest value

Consistent Prefix or Similar to Snapshot Isolation

  • A reader is guaranteed to observe an ordered sequence of writes starting with the first write to a data store

  • E.x. the read may be answered by a replica that receives writes in order from a master replica but has not yet received some recent writes

    • Reader sees a version of the data store that existed at the master at some time in the past

    • Similar to "snapshot isolation" consistency offered by many DBMS

  • For reads to a single data object in a system where write operations completely overwrite previous values of an object

    • Eventual consistency reads observe a consistent prefix

  • Main benefit

    • When reading multiple data objects [Q: why?]

    • Or when write operations incrementally update the objects

Bounded Staleness

  • Ensures that read results are not too out-of-date

  • Staleness defined by a period T, e.x. 5 mins or # of missing writes or the amount of inaccuracy in a data value

  • Guarantees that a read operation will return any values written more than T minutes ago or more recently written values

  • "Most natural concept for application developers"

Monotonic Reads / "Session Guarantee"

  • Applies to a sequence of read operations that are performed by a given storage system client

  • A client can read arbitrarily stale data, as with eventual consistency, but is guaranteed to observe a data store that is increasingly up-to-date over time

  • E.x. client issues a read operation and then later issues another read to the same object(s)

    • the second read will return the same value(s) or a more recently written value

Read My Writes

  • Also applies to a sequence of operations performed by a single client

  • Guarantees that the effects of all writes that were performed by the client are visible to the client’s subsequent reads

  • E.x. If a client writes a new value for a data object and then reads this object

    • the read will return the value that was last written by the client (or some other value that was later written by a different client)

  • For clients that issue no writes, the guarantees = eventual consistency

Summary

  • Last four read guarantees are all a form of eventual consistency, but stronger than what is typically provided in cloud storage system

    • None is stronger than any of the others

    • Applications can request multiple of these guarantees

    • E.x. a client could request both monotonic reads and read my writes so that it observes a data store that is consistent with its own actions

  • Strong consistency: desirable from consistency view point, but worst performance and availability

    • Generally requires reading from a designated primary site or from a majority of replicas

  • Eventual consistency: weakest consistency, but better performance and availability

    • Allows the clients to read from any replicas

  • Others

    • The “strength” of a consistency guarantee does not depend on when and how writes propagate between servers, but rather is defined by the size of the set of allowable results for a read operation

    • Smaller sets of possible read results indicate stronger consistency

Baseball as a Sample Application

  • Two teams' scores

    • Many of the scores returned by eventually consistent reads were never the actual score

    • Consistent prefix limits the result to scores the actually existed at some time

    • Bounded staleness depend on the desired bound

      • For bounded of 7 innings or more, result is the same for eventual consistency

    • Monotonic reads: possible values depends on what has been read in the past

    • Read my writes: depend on who is writing to the key-value store

Read Requirements for Participants

Official Scorekeeper

  • Task: responsible for maintaining the score of the game by writing it to the persistent key- value store

  • What consistency?

    • Read the most up-to-date previous score before adding one to produce the new score

      • Otherwise could write an incorrect score and undermining the game

    • He needs strongly consistent data, but he does not need to perform strong consistency reads

      • If there were multiple people playing the role of scorekeeper and taking turns updating the score, then they would need to perform reads that request strong consistency

      • The scorekeeper is the only person who updates the score

        • can request the read my writes guarantee and receive the same effect as a strong read

      • If scorekeeper changes in the middle of the game

        • the new scorekeeper could perform a strong read when he first updates the score

        • but can use read my writes for subsequent reads

      • Uses application-specific knowledge to obtain the benefits of a weaker consistency read without actually giving up any consistency

  • In real life

    • Strong consistency read

      • pessimistically assume that some client, anywhere in the world, may have just updated the data

      • system therefore must access a majority of servers (or a fixed set of servers) in order to ensure that the most recently written data is accessed by the submitted read operation

    • Read my writes

      • Simply needs to record the set of writes that were previously performed by the client and find some server that has seen all of these writes

      • In baseball games where the previous write was performed by the scorekeeper may have happened many minutes or even hours ago

        • Almost any server will have received the previous write and be able to answer the next read that requests the read my writes guarantee

        • [Q: essentially eventual consistency?]

Umpire

  • Task: officiates a baseball game from behind home plate

  • For the most part, does not really care about the current score of the game

  • Only care during the top half of the 9th inning (i.e. after the visiting team has batted and the home team is about to bat)

  • Never writes, only reads the values written by official scorekeeper

  • In order to receive up-to-date info, the umpire must perform strong consistency reads

    • Otherwise might end the game early

Radio Reporter

  • Task: periodically announce the scores of games that are in progress or have completed

    • E.g. every 30 mins

  • Broadcasts scores that are not completely up-to-date is okay

    • But don't want to report wrong scores

    • The reporter wants both his reads to be performed on a snapshot that hold a consistent prefix of the writes that were performed by the scorekeeper

    • But this is still not sufficient!

      • Could read a score of 2-5, the current score, then 30 mins later, read a score of 1-3

        • If reporter reads from a primary server and later reads from another server

      • [Q: need to understand this]

    • This can be avoided if the reporter requests the monotonic reads guarantee in addition to requesting a consistent prefix

      • [Q: why do I still need consistent prefix in this case?]

    • Or obtain the same effect as a monotonic read by requesting bounded staleness with a bound of less than 30 minutes

      • Since the reporter only reads data every 30 minutes, he must receive scores that are increasingly up-to-date

Sportswriter

  • Task: watches the game and later writes an article that appears in the morning paper or that is posted on some web site

  • Want the effect of strong consistency read (to report the final correct score), but he does not need to pay the cost

    • a bounded staleness read with a bound of one hour is sufficient to ensure that the sportswriter reads the final score

    • In practice, any server should be able to answer such a read

      • Eventual consistency: likely to be correct after an hour

      • But request bounded staleness is the only way to be 100% certain about it

Statistician

  • Task: keeping track of the season-long statistics for the team and for individual players

    • Sometime after each game has ended, adds the runs scored to the previous season total and writes this new value back into the data store

  • Requirements

    • Obtain the final score --> strong consistency read, or waits with bounded staleness

    • For reading the statistics for the reason, also wants strong consistency

      • Since she is the only person who writes statistics to the data store, she could use "read my writes" guarantee to get the latest value

Stat Watcher

  • Task: periodically check on the team’s season statistics

  • Requirements: are usually content with eventual consistency

    • Updated once per day, and numbers that are slightly out-of-date is okay

Conclusion

Take

  • All presented consistency guarantees are useful

  • Different clients may want different consistencies even when accessing the same data

  • Even simple databases may have diverse users with different consistency needs

  • Clients should be able to choose their desired consistency

    • The preferred consistency often depends on how the data is being used

    • knowledge of who writes data or when data was last written can sometimes allow clients to perform a relaxed consistency read

  • Others

    • Cloud providers should offer a wider range of read consistencies for better resource utilization and cost savings

PreviousDynamo: Amazon's Highly Available Key-value StoreNextManaging update conflicts in Bayou, a weakly connected replicated storage system

Last updated 2 years ago

Was this helpful?