# Recap

### Distributed Systems&#x20;

* Definition: more than one machine working together to solve a problem
* Examples:&#x20;
  * client / server: web server and web client&#x20;
  * cluster: page rank computation&#x20;
* Why go distributed?
  * More computing power, more storage capacity, fault tolerance, data sharing
* New challenges&#x20;
  * System failures: need to worry about partial failure&#x20;
  * Communication failure: links unreliable&#x20;
    * bit errors
    * packet loss
    * node/link failure&#x20;
  * Why are network sockets less reliable than pipes?&#x20;
* Communication overview&#x20;
  * Raw messages: UDP
  * Reliable messages: TCP&#x20;
  * Remote procedure call: RPC&#x20;
* **Raw messages: UDP**
  * UDP: user datagram protocol
  * API:&#x20;
    * Reads and writes over socket file descriptors&#x20;
      * Socket: a communication connection point (endpoint) that you can name and address in a network&#x20;
    * Messages sent from / to ports to target a process on machine&#x20;
  * Provide minimal reliability features&#x20;
    * Messages may be lost
    * Messages may be reordered
    * Messages may be duplicated&#x20;
  * Only protection: checksums to ensure data not corrupted&#x20;
  * Advantages
    * Lightweight&#x20;
    * Some applications make better reliability decisions themselves (e.g., video conferencing programs)
  * Disadvantages
    * More difficult to write applications correctly &#x20;
* Reliable messages: layering strategy&#x20;
  * **TCP: Transmission Control Protocol**&#x20;
  * Using software to build reliable logical connections over unreliable physical connections
  * Techniques&#x20;
    * ACK: sender knows messages was received&#x20;
    * TIMEOUT: how long to wait?
      * Too long: system feels unresponsive&#x20;
      * Too short: messages needlessly re-sent, messages may have been dropped due to overloaded server. Resending makes overload worse!&#x20;
    * &#x20;Sequence numbers&#x20;
      * Senders gives each message an increasing unique seq number
      * Receiver knows it has seen all messages before N&#x20;
    * Buffer messages so arrive in order, timeouts are adaptive&#x20;
* **RPC: remote procedure call**&#x20;
  * Approach: create wrappers so calling a function on another machine feels just like calling a local function! (simplify application development)&#x20;
  * Help with two components&#x20;
    * Runtime library
      * Thread pool
      * Socket listeners call functions on server&#x20;
    * Stub generation
      * Create wrappers automatically&#x20;
        * Wrappers must do conversions&#x20;
          * Client arguments to message
          * Message to server arguments&#x20;
          * Convert server return value to message
          * Convert message to server return value&#x20;
          * Marshaling / unmarshaling, or serializing / desterilizing&#x20;
      * Many tools available (rpcgen, thrift, protobufs)&#x20;
  * RPC over UDP:&#x20;
    * Use function return as implicit ACK&#x20;
    * If function takes a long time, then send a separate ACK&#x20;
