Managing update conflicts in Bayou, a weakly connected replicated storage system

https://people.cs.umass.edu/~mcorner/courses/691M/papers/terry.pdf

Main Insight

  • Bayou is a replicated, weakly consistent storage system designed for a mobile computing environment with less than ideal network connectivity

    • Model: read and write to any replica without the need for explicit coordination with other replicas

    • Exploit domain-specific knowledge to achieve automatic conflict resolution at the granularity of individual update operations

Target applications

  • Non-real-time collaborative applications, such as shared calendar, mail, and bibliographic databases

Basic system models

  • Data collection: replicated in full at a number of servers

  • Client: applications running as clients interact with the servers through Bayou API

  • Two basic operations: read and write

    • Read: queries over a data collection

    • Write: insert, modify, and delete a number of data items in a collection

      • Contains a typical FS write or DB update, and carries info that let server decide if there is a conflict, how to fix it

      • Contains a globally unique writeID

  • Weakly consistent replication model: read-any / write-any style of access

  • Session guarantees: switching between servers is possible, reduce client-observed inconsistencies when accessing different servers

    • Basically this ensures that the effects of any writes made within a session are visible to reads within that session

  • Storage system at each Bayou server

    • Ordered log of writes

    • Data resulting from execution of these writes

    • Servers propagate writes among themselves during pair-wise contacts, called anti-entropy sessions (i.e. servers agree on set of Bayou writes they have seen, and the order in which to perform them)

    • Will eventually converge (by epidemic algorithms), but rate depends on network connectivity, frequency of anti-entropy, and policies by which servers select anti-entropy partners

Conflict detection and resolution

  • Accomodate application semantics

    • Goal: support application-specific notion of conflict, and policy for resolving conflicts

    • Two mechanisms (general and flexible)

      • Dependency checks

        • Each write operation includes a dependency check consisting of an application-supplied query and its expected result; it is the pre-condiction for performing the update

          • [Q: wouldn't this result in a very large overhead? sometimes not all applications know what they want; are there classifications on application tasks, or auto-generation on the dependencies?]

        • Can be used for detecting W-W conflicts and R-W conflicts

        • Can enforce arbitrary, multi-item integrity constraints on the data

      • Merge procedures

        • Once a conflict is detected, a merge procedure is run by Bayou server in attempt for resolving conflict

        • Logic written by application programmers

        • With each write operation: detect, merge, apply revised updates are performed atomically at each server

          • [Q: again, the question of overhead in terms of having to specifying this per write operation?]

        • When automatic resolution is not possible, the merge procedure will still run to completion and produce a revised update that log the detected conflicts

        • Allow replicas to always remain accessible (continue to read previously written data and issue new writes)

Replica Consistency

  • Eventual consistency

    • All servers eventually receive all writes via pair-wise anti-entropy process and two servers holding the same set of writes will have the same data contents

  • Two features

    • Writes are performed in the same, well-defined order at all servers

      • As write is accpeted by Bayou server, it is deemed tentative

      • Tentative writes are ordered according to timestamps assigned to them by their accepting servers

        • Timestamp: monotonically increasing at each server

        • Logical clocks to timestamp new writes

          • Generally synchronized with its real-time system clock, special circumstances which it advances its logical clock when writes are received during anti-entropy

      • Eventually each write is committed

        • committed writes are ordered according to the times at which they commit and before any tentative writes

    • Conflict detection and merge procedures are deterministic so that servers resolve the same conflicts in the same manner

      • Maintains a log of all write operations, sorted by committed / tentative timestamps

      • Merge procedures produce consistent update, fail deterministically

Write Stability and Commitment

  • A write is said to be stable at a server when it has been executed for the last time by that server

    • When the set of Writes that procede it in the server's Write log is fixed (i.e. the server has already received and executed any Writes that could possibly be ordered before the given Write)

  • API for inquiring about the stability of a specific Write

  • How to detect stable on the server?

    • A1: when it has a lower timestamp than all servers' clocks

      • Con: a server that remains disconnected can prevent Writes from stablizing

    • Bayou: commit procedure

      • A write becomes stable when it explicitly commits

      • Primary commit scheme: one server designated as the primary takes responsibility for committing updates

        • Which Writes have committed + in which order they were committed are propagated to other servers during anti-entropy

        • Better than 2-phase-commit: it alleviates the need to gather a majority quorum of servers

      • Readily accomodate primary unavailability

      • Cannot ensure that order in which the writes are committed is consistent with the tentative order indicated by their timestamps

Storage System Implementation Issues

  • Space-efficient write logging, efficient undo/redo of write operations, separate views of committed and tentative data, support for server-to-server anti-entropy

  • Main components: the Write Log, the Tuple Store, and the Undo Log

  • Security

    • Execute each merge procedure within a secure environment in which the only allowable external actions are reading and writing data using the access crednetials of the user who submitted the conflicting Write

Some reads

  • How does Bayou agree on the total order of committed writes?

    • One designated "primary replica"

    • It marks each write it receives with a permanent CSN (commit sequence number), that write is cimmitted, so a complete timestamp is <CSN, local-TS, node-id>

    • CSN notifications propagate with updates using the anti-entropy algorithm

    • The CSNs define a total order for committed writes

      • All nodes will eventually agree on it

      • Uncommitted writes come after all committed writes (infinite CSN)

  • Can primary replica choose any order to commit?

    • Total order must preserve order of writes originated at each node, but not necessarily order among different nodes' writes

Last updated