Managing update conflicts in Bayou, a weakly connected replicated storage system

https://people.cs.umass.edu/~mcorner/courses/691M/papers/terry.pdf

Bayou is a replicated, weakly consistent storage system designed for a mobile computing environment with less than ideal network connectivity
- Model: read and write to any replica without the need for explicit coordination with other replicas
- Exploit domain-specific knowledge to achieve automatic conflict resolution at the granularity of individual update operations

Non-real-time collaborative applications, such as shared calendar, mail, and bibliographic databases

Data collection: replicated in full at a number of servers
Client: applications running as clients interact with the servers through Bayou API
Two basic operations: read and write
- Read: queries over a data collection
- Write: insert, modify, and delete a number of data items in a collection
  - Contains a typical FS write or DB update, and carries info that let server decide if there is a conflict, how to fix it
  - Contains a globally unique writeID
Weakly consistent replication model: read-any / write-any style of access
Session guarantees: switching between servers is possible, reduce client-observed inconsistencies when accessing different servers
- Basically this ensures that the effects of any writes made within a session are visible to reads within that session
Storage system at each Bayou server
- Ordered log of writes
- Data resulting from execution of these writes
- Servers propagate writes among themselves during pair-wise contacts, called anti-entropy sessions (i.e. servers agree on set of Bayou writes they have seen, and the order in which to perform them)
- Will eventually converge (by epidemic algorithms), but rate depends on network connectivity, frequency of anti-entropy, and policies by which servers select anti-entropy partners

Eventual consistency
- All servers eventually receive all writes via pair-wise anti-entropy process and two servers holding the same set of writes will have the same data contents
Two features
- Writes are performed in the same, well-defined order at all servers
  - As write is accpeted by Bayou server, it is deemed tentative
  - Tentative writes are ordered according to timestamps assigned to them by their accepting servers
    Timestamp: monotonically increasing at each server
    Logical clocks to timestamp new writes
    Generally synchronized with its real-time system clock, special circumstances which it advances its logical clock when writes are received during anti-entropy
  - Eventually each write is committed
    committed writes are ordered according to the times at which they commit and before any tentative writes
- Conflict detection and merge procedures are deterministic so that servers resolve the same conflicts in the same manner
  - Maintains a log of all write operations, sorted by committed / tentative timestamps
  - Merge procedures produce consistent update, fail deterministically

A write is said to be stable at a server when it has been executed for the last time by that server
- When the set of Writes that procede it in the server's Write log is fixed (i.e. the server has already received and executed any Writes that could possibly be ordered before the given Write)
API for inquiring about the stability of a specific Write
How to detect stable on the server?
- A1: when it has a lower timestamp than all servers' clocks
  - Con: a server that remains disconnected can prevent Writes from stablizing
- Bayou: commit procedure
  - A write becomes stable when it explicitly commits
  - Primary commit scheme: one server designated as the primary takes responsibility for committing updates
    Which Writes have committed + in which order they were committed are propagated to other servers during anti-entropy
    Better than 2-phase-commit: it alleviates the need to gather a majority quorum of servers
  - Readily accomodate primary unavailability
  - Cannot ensure that order in which the writes are committed is consistent with the tentative order indicated by their timestamps

Space-efficient write logging, efficient undo/redo of write operations, separate views of committed and tentative data, support for server-to-server anti-entropy
Main components: the Write Log, the Tuple Store, and the Undo Log
Security
- Execute each merge procedure within a secure environment in which the only allowable external actions are reading and writing data using the access crednetials of the user who submitted the conflicting Write

Ref: https://www.scs.stanford.edu/nyu/01fa/notes/l16d.txt
How does Bayou agree on the total order of committed writes?
- One designated "primary replica"
- It marks each write it receives with a permanent CSN (commit sequence number), that write is cimmitted, so a complete timestamp is <CSN, local-TS, node-id>
- CSN notifications propagate with updates using the anti-entropy algorithm
- The CSNs define a total order for committed writes
  - All nodes will eventually agree on it
  - Uncommitted writes come after all committed writes (infinite CSN)
Can primary replica choose any order to commit?
- Total order must preserve order of writes originated at each node, but not necessarily order among different nodes' writes

Last updated 2 years ago

Was this helpful?