Replicating Data Consistency Explained Through Baseball

Main Idea

  • Replication protocols often involve complex trade-offs between consistency, performance, and availability

  • Shed lights on the consistency models offered by cloud providers, what might be offered in the future, and why vendors are offering consistency choices


Consistency provided by cloud storage today

  • Azure: strongly consistent storage

    • Always see the latest value that was written for a data object

    • Reasonable to provide within a DC

    • But raises concern for geo-replicated storage

  • AWS S3: designed with weak consistency

    • Better performance and availability: read might be with stale data

    • Eventually consistent: read operation is directed to a replica that has not yet received all the writes that were accepted by some other replica

  • AWS DynamoDB: both eventually consistent reads and strongly consistent reads

  • AWS SimpleDB: same choices for clients that read data

  • Google App Engine Datastore: eventually consistent reads + default strong consistency

  • Yahoo's PNUTS: read-any, read-critical, read-latest

  • Quorum-based storage systems: eventual + strong consistency [by selecting different read and write quorums)

Consistency models proposed from research community

  • Trade-offs between consistency, performance, and availability

    • Availability is the likelihood of a read operation successfully returning suitably consistent data in the presence of server failures

    • Performance refers to the time it takes to complete a read operation, that is, the read latency

  • Two ends: strong and eventual

    • Stronger: lower performance but reduced availability for reads or writes or both

Questions to cope with

  • Are different consistencies useful in practice?

  • Can application developers cope with eventual consistency?

  • Should cloud storage systems offer an even greater choice of consistency than the consistent and eventually consistent reads offered by some of today’s services?

Read Consistency Guarantees


  • Multiple clients perform read and write operations to a data store, concurrently access shared information

  • Data is replicated among a set of servers, but the details of the replication protocol are hidden from clients

  • Write: any operation that updates one or more data objects

    • Eventually received at all servers and performed in the same order

    • Order is consistent with order in which write operations are submitted by clients

    • In practice

      • the order could be enforced, even for concurrent writers

      • by performing all writes at a master server or by having servers run a consensus protocol to reach agreement on the global order

  • Read: return values of one or more data objects that were previously written

    • Not necessarily the latest values

    • Can request a consistency guarantee, which dictates the set of allowable return values

    • Each guarantee is defined by the set of previous writes whose results are visible to a read operation

Strong Consistency

  • Guarantees that a read operation returns the value that was last written for a given object

  • Append? applying all writes to that object

Eventual Consistency

  • The weakest of the guarantees

  • Allows the greatest set of possible return values

  • The term derives from:

    • Each replica eventually receives each write operation

    • If clients stopped performing writes then read operations would eventually return an object’s latest value

Consistent Prefix or Similar to Snapshot Isolation

  • A reader is guaranteed to observe an ordered sequence of writes starting with the first write to a data store

  • E.x. the read may be answered by a replica that receives writes in order from a master replica but has not yet received some recent writes

    • Reader sees a version of the data store that existed at the master at some time in the past

    • Similar to "snapshot isolation" consistency offered by many DBMS

  • For reads to a single data object in a system where write operations completely overwrite previous values of an object

    • Eventual consistency reads observe a consistent prefix

  • Main benefit

    • When reading multiple data objects [Q: why?]

    • Or when write operations incrementally update the objects

Bounded Staleness

  • Ensures that read results are not too out-of-date

  • Staleness defined by a period T, e.x. 5 mins or # of missing writes or the amount of inaccuracy in a data value

  • Guarantees that a read operation will return any values written more than T minutes ago or more recently written values

  • "Most natural concept for application developers"

Monotonic Reads / "Session Guarantee"

  • Applies to a sequence of read operations that are performed by a given storage system client

  • A client can read arbitrarily stale data, as with eventual consistency, but is guaranteed to observe a data store that is increasingly up-to-date over time

  • E.x. client issues a read operation and then later issues another read to the same object(s)

    • the second read will return the same value(s) or a more recently written value

Read My Writes

  • Also applies to a sequence of operations performed by a single client

  • Guarantees that the effects of all writes that were performed by the client are visible to the client’s subsequent reads

  • E.x. If a client writes a new value for a data object and then reads this object

    • the read will return the value that was last written by the client (or some other value that was later written by a different client)

  • For clients that issue no writes, the guarantees = eventual consistency


  • Last four read guarantees are all a form of eventual consistency, but stronger than what is typically provided in cloud storage system

    • None is stronger than any of the others

    • Applications can request multiple of these guarantees

    • E.x. a client could request both monotonic reads and read my writes so that it observes a data store that is consistent with its own actions

  • Strong consistency: desirable from consistency view point, but worst performance and availability

    • Generally requires reading from a designated primary site or from a majority of replicas

  • Eventual consistency: weakest consistency, but better performance and availability

    • Allows the clients to read from any replicas

  • Others

    • The “strength” of a consistency guarantee does not depend on when and how writes propagate between servers, but rather is defined by the size of the set of allowable results for a read operation

    • Smaller sets of possible read results indicate stronger consistency

Baseball as a Sample Application

  • Two teams' scores

    • Many of the scores returned by eventually consistent reads were never the actual score

    • Consistent prefix limits the result to scores the actually existed at some time

    • Bounded staleness depend on the desired bound

      • For bounded of 7 innings or more, result is the same for eventual consistency

    • Monotonic reads: possible values depends on what has been read in the past

    • Read my writes: depend on who is writing to the key-value store

Read Requirements for Participants

Official Scorekeeper

  • Task: responsible for maintaining the score of the game by writing it to the persistent key- value store

  • What consistency?

    • Read the most up-to-date previous score before adding one to produce the new score

      • Otherwise could write an incorrect score and undermining the game

    • He needs strongly consistent data, but he does not need to perform strong consistency reads

      • If there were multiple people playing the role of scorekeeper and taking turns updating the score, then they would need to perform reads that request strong consistency

      • The scorekeeper is the only person who updates the score

        • can request the read my writes guarantee and receive the same effect as a strong read

      • If scorekeeper changes in the middle of the game

        • the new scorekeeper could perform a strong read when he first updates the score

        • but can use read my writes for subsequent reads

      • Uses application-specific knowledge to obtain the benefits of a weaker consistency read without actually giving up any consistency

  • In real life

    • Strong consistency read

      • pessimistically assume that some client, anywhere in the world, may have just updated the data

      • system therefore must access a majority of servers (or a fixed set of servers) in order to ensure that the most recently written data is accessed by the submitted read operation

    • Read my writes

      • Simply needs to record the set of writes that were previously performed by the client and find some server that has seen all of these writes

      • In baseball games where the previous write was performed by the scorekeeper may have happened many minutes or even hours ago

        • Almost any server will have received the previous write and be able to answer the next read that requests the read my writes guarantee

        • [Q: essentially eventual consistency?]


  • Task: officiates a baseball game from behind home plate

  • For the most part, does not really care about the current score of the game

  • Only care during the top half of the 9th inning (i.e. after the visiting team has batted and the home team is about to bat)

  • Never writes, only reads the values written by official scorekeeper

  • In order to receive up-to-date info, the umpire must perform strong consistency reads

    • Otherwise might end the game early

Radio Reporter

  • Task: periodically announce the scores of games that are in progress or have completed

    • E.g. every 30 mins

  • Broadcasts scores that are not completely up-to-date is okay

    • But don't want to report wrong scores

    • The reporter wants both his reads to be performed on a snapshot that hold a consistent prefix of the writes that were performed by the scorekeeper

    • But this is still not sufficient!

      • Could read a score of 2-5, the current score, then 30 mins later, read a score of 1-3

        • If reporter reads from a primary server and later reads from another server

      • [Q: need to understand this]

    • This can be avoided if the reporter requests the monotonic reads guarantee in addition to requesting a consistent prefix

      • [Q: why do I still need consistent prefix in this case?]

    • Or obtain the same effect as a monotonic read by requesting bounded staleness with a bound of less than 30 minutes

      • Since the reporter only reads data every 30 minutes, he must receive scores that are increasingly up-to-date


  • Task: watches the game and later writes an article that appears in the morning paper or that is posted on some web site

  • Want the effect of strong consistency read (to report the final correct score), but he does not need to pay the cost

    • a bounded staleness read with a bound of one hour is sufficient to ensure that the sportswriter reads the final score

    • In practice, any server should be able to answer such a read

      • Eventual consistency: likely to be correct after an hour

      • But request bounded staleness is the only way to be 100% certain about it


  • Task: keeping track of the season-long statistics for the team and for individual players

    • Sometime after each game has ended, adds the runs scored to the previous season total and writes this new value back into the data store

  • Requirements

    • Obtain the final score --> strong consistency read, or waits with bounded staleness

    • For reading the statistics for the reason, also wants strong consistency

      • Since she is the only person who writes statistics to the data store, she could use "read my writes" guarantee to get the latest value

Stat Watcher

  • Task: periodically check on the team’s season statistics

  • Requirements: are usually content with eventual consistency

    • Updated once per day, and numbers that are slightly out-of-date is okay



  • All presented consistency guarantees are useful

  • Different clients may want different consistencies even when accessing the same data

  • Even simple databases may have diverse users with different consistency needs

  • Clients should be able to choose their desired consistency

    • The preferred consistency often depends on how the data is being used

    • knowledge of who writes data or when data was last written can sometimes allow clients to perform a relaxed consistency read

  • Others

    • Cloud providers should offer a wider range of read consistencies for better resource utilization and cost savings

Last updated