# Distributed Snapshots: Determining Global States of Distributed Systems

### Reference review:

* <https://matt33.com/2019/10/27/paper-chandy-lamport/>
* Some examples that worth visit again: <https://www.cs.princeton.edu/courses/archive/fall16/cos418/docs/P8-chandy-lamport.pdf>&#x20;

### Main idea

* Problem: detecting global states&#x20;
  * Which in terms help solve: "stable property detection" problem, e.g. "computation has terminated", "system is deadlocked", "all tokens in a token ring has disappeared"&#x20;
  * Can also be used for checkpointing&#x20;
* Propose an algorithm by which a process in a distributed system determines a global state of the system during a computation&#x20;

### Problem setup&#x20;

* A process can record its own state and the messages it sends and receives, nothing else&#x20;
  * No share clocks or memory&#x20;
* To determine a global system state, a process p must enlist the cooperation of other processes that must record their own local states and send the recorded local states to p&#x20;
* Problem: algorithms by which processes record their own states and the states of communication channels so that the set of processes and channel states recorded from a global system state&#x20;
  * MUST run concurrently with, but not alter the underlying computation&#x20;
    * Don't stop sending the messages, don't stop the application&#x20;
* Example: taking photographies of migrating birds&#x20;
* Define a class of problem&#x20;

  * $$y(S)= true / false$$ for a global state S of D (distributed system)&#x20;
    * y is a stable property of D if $$y(S)$$ implies $$y(S')$$for all global states $$S'$$ of D reachable from global state $$S$$ of D&#x20;
    * i.e. if y is a stable property and y is true at a point in a computation of D, then y is true at all later points in that computation&#x20;

  <figure><img src="https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MVORxAomcgtzVVUqmws%2Fuploads%2FI72m9Mpk6MrYeA0iixb2%2Fimage.png?alt=media&#x26;token=e1c49d8e-3d63-43dd-8be2-f1b41966e19e" alt=""><figcaption><p>A stable property </p></figcaption></figure>
* The problem of initiating the next phase of computation is not considered here&#x20;

### Model of a distributed system

* Labeled, directed graphs&#x20;
  * Vertices: processes&#x20;
    * Defined by a set of states, an initial state, and a set of events&#x20;
    * An event $$e$$ in a process $$p$$ is an atomic action that may change the state of $$p$$ itself and the state of at most one channel $$c$$ incident on $$p$$
      * Event $$e$$ is defined by <img src="https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MVORxAomcgtzVVUqmws%2Fuploads%2FY6BAxiBQ7W6pS7TXS19B%2Fimage.png?alt=media&#x26;token=5bcf2a7b-f1a7-4feb-9a1f-859a43770230" alt="" data-size="line">
        * 1\) The process p in which the event occurs&#x20;
        * 2\) The state s of p immediately before the event&#x20;
        * 3\) The state s' of p immediately after the evetn&#x20;
        * 4\) The channel c (if any) whose state is altered by the event&#x20;
        * 5\) The message M, if any, sent along c (if c is a channel directed away from p) or received along c (if c is directed towards p)&#x20;
      * M and c are a special symbol, $$null$$, if the occurrence of e does not change the state of any channel&#x20;
      * Event can occur in global state $$S$$ if and only if&#x20;
        * 1\) The state of process p in global state S is s&#x20;
        * and 2) if c is a channel directed towards p, then the state of c in global state S is a sequence of messages with M at its head&#x20;
  * Edges: channels&#x20;
    * State = sequence of messages sent along the channel, excluding the messages received along the channel&#x20;
* Assumption&#x20;
  * Channel: infinite buffers, error-free, delivery messages in the order sent&#x20;

    <figure><img src="https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MVORxAomcgtzVVUqmws%2Fuploads%2FypNZCVFuJiSk0DXN2T9B%2Fimage.png?alt=media&#x26;token=6409ee31-fd86-41ce-9745-9e77a5c6349d" alt=""><figcaption></figcaption></figure>
* **A global state is a set of component process and channel states**&#x20;
  * Initial global state: state of each process is its initial state and the state of each channel is the empty sequence&#x20;
  * $$next(S, e)$$= global state immediately after the occurrence of event e in global state S&#x20;
  * <img src="https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MVORxAomcgtzVVUqmws%2Fuploads%2FNmK8GhqC8sF47lVLgCXU%2Fimage.png?alt=media&#x26;token=969b6ea5-4a54-4b87-bdbb-2cb7b6169224" alt="" data-size="line">= a sequence of events in component processes&#x20;
    * ***computation of the system*** iff $$e\_i$$ can occur in global state $$S$$
      * ![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MVORxAomcgtzVVUqmws%2Fuploads%2FoYePLLFAmcb859pKNZZG%2Fimage.png?alt=media\&token=224d86db-324c-4109-828e-73f352cc0764)
* Example 1&#x20;
  * ![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MVORxAomcgtzVVUqmws%2Fuploads%2FJhTf4hwD8hu0jqUDAcY7%2Fimage.png?alt=media\&token=3a210544-19fd-4342-851f-09ee24aec663)
  * State transition illustration (four states: in-p, in-c, in-q, in-c')&#x20;

    <figure><img src="https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MVORxAomcgtzVVUqmws%2Fuploads%2Fusli82X7t4HhMp7X4duj%2Fimage.png?alt=media&#x26;token=c503a734-a576-4590-98fb-3143e68356b9" alt=""><figcaption></figcaption></figure>

### Algorithm&#x20;

* Motivation, outline, termination&#x20;

#### Motivation&#x20;

* How do global state recording algorithms work?&#x20;
  * **Each process**: record its state
  * Two processes that **a channel** is incident on cooperate in recording the channel state&#x20;
  * Require the recorded process and channel states form **a "meaningful" global system state**&#x20;
* **Example of transition**&#x20;
  * One scenario: inconsistency arises because the state of p is recorded before p sent a message along c and the state of c is recorded after p sent the message&#x20;
    * <mark style="color:purple;">n = number of msgs sent along c before p's state is recorded</mark>
      * <mark style="color:purple;">在 p 的状态记录之前，p 发往channel c的msg数</mark>
    * <mark style="color:purple;">n' = number of msgs sent along c before c's state is recorded</mark>&#x20;
      * <mark style="color:purple;">在 c 的状态记录之前，p 发往channel c的msg数</mark>
    * the recorded global state may be inconsistent if n < n'&#x20;
      * 两个token的情况: p的状态记录在in-p中，其他的状态记录在in-c中
      * n = 0, n' = 1?&#x20;
  * Another scenario: state of c is recorded in global state in-p, the system then transits to global state in-c, and the states of c', p, and q are recorded in global state in-c
    * The recorded global state shows no token in the system&#x20;
    * Example suggests that the recorded global state may be inconsistent if the state of c is recorded before p sends a message along c and the state of p is recorded after p sends a message along c, that is, n > n'&#x20;
  * <mark style="color:red;">**Thus, a consistent global state requires (1)**</mark> $$n=n'$$
* Definition&#x20;
  * <mark style="color:green;">Let m be the number of messages received along c before q's state is recorded.</mark>&#x20;
  * <mark style="color:green;">Let m' be the number of messages received along c before c's state is recorded</mark>&#x20;
  * (2) $$m=m'$$: 与之前的分析相同&#x20;
  * (3) $$n'\geq m'$$
    * I.e. # of msgs received along a channel cannot exceed the # of msgs sent along that channel in every state&#x20;
  * (4) $$n\ge m$$
  * 理解
    * Channel要记录的state = list of msgs sent along the channel before the sender's state is recorded - list of msgs received along the channel before the receiver's state is recorded&#x20;
      * 需要example理解这个内容
* <mark style="background-color:red;">(1) - (4) suggests a simple algorithm by which</mark> <mark style="background-color:red;"></mark><mark style="background-color:red;">**q can record the state of channel c**</mark>&#x20;
  * Process p sends a special message, called a **marker**, after the nth message it sends along c&#x20;
  * **The state of c** is <mark style="color:orange;">the sequence of messages received by q after q records its own state and before q receives the marker along c</mark>&#x20;
  * To ensure (4), q must record its state, if it has not done so already, after receiving a marker along c and before q receives further messages along c&#x20;

#### Algorithm&#x20;

* <mark style="color:red;">Initiation</mark>&#x20;
  * can be done by one or more processes, each of which records its state spontaneously, without receiving markers from other processes&#x20;
* <mark style="color:red;">Marker-sending rule</mark> for a process p: for each channel c, incident on, and directed away from p
  * p sends one marker along c after p records its state and before p sends further messages along c&#x20;
  * Or&#x20;
    * p records its state and prepares a special marker message (differ from the application message)
    * p sends the marker message to all other processes (using N-1 outbound channels), in this case, only q
    * Start recording all incoming messages from channels $$C\_{ji}$$for j not equal to i
  * <mark style="color:red;">Marker-receiving rule</mark> for a process q: on receiving a marker along a channel c

    <figure><img src="https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MVORxAomcgtzVVUqmws%2Fuploads%2FTLeViNQMCaXDtZ1FlCJE%2Fimage.png?alt=media&#x26;token=050af98e-5c94-4e80-89c6-585c26bc8509" alt=""><figcaption></figcaption></figure>

    * If not receive the marker message for the first time&#x20;
      * Add all messages from inbound channels since we began recording to their states&#x20;
* Termination of the algorithm&#x20;
  * Need to ensure that&#x20;
    * (L1) no marker remains forever in an incident input channel&#x20;
    * (L2) it records its states within finite time of initiation of the algorithm&#x20;
  * Rules&#x20;
    * All processes have received a marker (and recorded their own state)
    * All processes have received a marker on the N - 1 incoming channels (and recorded their states)
    * Later, a central server can gather the partial state to build a global snapshot&#x20;

#### Proof and properties ref. paper

* Casual consistency&#x20;
  * Related to the Lamport clock partial ordering
  * An event is presnapshot if it occurs before the local snapshot on a process
  * Postsnapshot if afterwards
  * If event A happens casually before event B, and B is pre-snapshot, then A is too&#x20;

#### Some questions

* Also assume no failures on the channels? All messages arrive, no duplicate, in order and such?&#x20;
  * Too strong assumptions could reduce the deployability of the algorithm? &#x20;
* What if the markers are lost? Retransmission? In-correct ordering?&#x20;
* What if the topology is dynamically changing&#x20;
* How fast does it take to form a global state? (with all the mechanisms of snapshotting, collecting, and putting together states from different components)&#x20;
* Centralized controller seem to be more easy to implement and reason about&#x20;
