Managing update conflicts in Bayou, a weakly connected replicated storage system
https://people.cs.umass.edu/~mcorner/courses/691M/papers/terry.pdf
Main Insight
Bayou is a replicated, weakly consistent storage system designed for a mobile computing environment with less than ideal network connectivity
Model: read and write to any replica without the need for explicit coordination with other replicas
Exploit domain-specific knowledge to achieve automatic conflict resolution at the granularity of individual update operations
Target applications
Non-real-time collaborative applications, such as shared calendar, mail, and bibliographic databases
Basic system models
Data collection: replicated in full at a number of servers
Client: applications running as clients interact with the servers through Bayou API
Two basic operations: read and write
Read: queries over a data collection
Write: insert, modify, and delete a number of data items in a collection
Contains a typical FS write or DB update, and carries info that let server decide if there is a conflict, how to fix it
Contains a globally unique writeID
Weakly consistent replication model: read-any / write-any style of access
Session guarantees: switching between servers is possible, reduce client-observed inconsistencies when accessing different servers
Basically this ensures that the effects of any writes made within a session are visible to reads within that session
Storage system at each Bayou server
Ordered log of writes
Data resulting from execution of these writes
Servers propagate writes among themselves during pair-wise contacts, called anti-entropy sessions (i.e. servers agree on set of Bayou writes they have seen, and the order in which to perform them)
Will eventually converge (by epidemic algorithms), but rate depends on network connectivity, frequency of anti-entropy, and policies by which servers select anti-entropy partners
Conflict detection and resolution
Accomodate application semantics
Goal: support application-specific notion of conflict, and policy for resolving conflicts
Two mechanisms (general and flexible)
Dependency checks
Each write operation includes a dependency check consisting of an application-supplied query and its expected result; it is the pre-condiction for performing the update
[Q: wouldn't this result in a very large overhead? sometimes not all applications know what they want; are there classifications on application tasks, or auto-generation on the dependencies?]
Can be used for detecting W-W conflicts and R-W conflicts
Can enforce arbitrary, multi-item integrity constraints on the data
Merge procedures
Once a conflict is detected, a merge procedure is run by Bayou server in attempt for resolving conflict
Logic written by application programmers
With each write operation: detect, merge, apply revised updates are performed atomically at each server
[Q: again, the question of overhead in terms of having to specifying this per write operation?]
When automatic resolution is not possible, the merge procedure will still run to completion and produce a revised update that log the detected conflicts
Allow replicas to always remain accessible (continue to read previously written data and issue new writes)
Replica Consistency
Eventual consistency
All servers eventually receive all writes via pair-wise anti-entropy process and two servers holding the same set of writes will have the same data contents
Two features
Writes are performed in the same, well-defined order at all servers
As write is accpeted by Bayou server, it is deemed tentative
Tentative writes are ordered according to timestamps assigned to them by their accepting servers
Timestamp: monotonically increasing at each server
Logical clocks to timestamp new writes
Generally synchronized with its real-time system clock, special circumstances which it advances its logical clock when writes are received during anti-entropy
Eventually each write is committed
committed writes are ordered according to the times at which they commit and before any tentative writes
Conflict detection and merge procedures are deterministic so that servers resolve the same conflicts in the same manner
Maintains a log of all write operations, sorted by committed / tentative timestamps
Merge procedures produce consistent update, fail deterministically
Write Stability and Commitment
A write is said to be stable at a server when it has been executed for the last time by that server
When the set of Writes that procede it in the server's Write log is fixed (i.e. the server has already received and executed any Writes that could possibly be ordered before the given Write)
API for inquiring about the stability of a specific Write
How to detect stable on the server?
A1: when it has a lower timestamp than all servers' clocks
Con: a server that remains disconnected can prevent Writes from stablizing
Bayou: commit procedure
A write becomes stable when it explicitly commits
Primary commit scheme: one server designated as the primary takes responsibility for committing updates
Which Writes have committed + in which order they were committed are propagated to other servers during anti-entropy
Better than 2-phase-commit: it alleviates the need to gather a majority quorum of servers
Readily accomodate primary unavailability
Cannot ensure that order in which the writes are committed is consistent with the tentative order indicated by their timestamps
Storage System Implementation Issues
Space-efficient write logging, efficient undo/redo of write operations, separate views of committed and tentative data, support for server-to-server anti-entropy
Main components: the Write Log, the Tuple Store, and the Undo Log
Security
Execute each merge procedure within a secure environment in which the only allowable external actions are reading and writing data using the access crednetials of the user who submitted the conflicting Write
Some reads
How does Bayou agree on the total order of committed writes?
One designated "primary replica"
It marks each write it receives with a permanent CSN (commit sequence number), that write is cimmitted, so a complete timestamp is <CSN, local-TS, node-id>
CSN notifications propagate with updates using the anti-entropy algorithm
The CSNs define a total order for committed writes
All nodes will eventually agree on it
Uncommitted writes come after all committed writes (infinite CSN)
Can primary replica choose any order to commit?
Total order must preserve order of writes originated at each node, but not necessarily order among different nodes' writes
Last updated