Recap
Distributed Systems
Definition: more than one machine working together to solve a problem
Examples:
client / server: web server and web client
cluster: page rank computation
Why go distributed?
More computing power, more storage capacity, fault tolerance, data sharing
New challenges
System failures: need to worry about partial failure
Communication failure: links unreliable
bit errors
packet loss
node/link failure
Why are network sockets less reliable than pipes?
Communication overview
Raw messages: UDP
Reliable messages: TCP
Remote procedure call: RPC
Raw messages: UDP
UDP: user datagram protocol
API:
Reads and writes over socket file descriptors
Socket: a communication connection point (endpoint) that you can name and address in a network
Messages sent from / to ports to target a process on machine
Provide minimal reliability features
Messages may be lost
Messages may be reordered
Messages may be duplicated
Only protection: checksums to ensure data not corrupted
Advantages
Lightweight
Some applications make better reliability decisions themselves (e.g., video conferencing programs)
Disadvantages
More difficult to write applications correctly
Reliable messages: layering strategy
TCP: Transmission Control Protocol
Using software to build reliable logical connections over unreliable physical connections
Techniques
ACK: sender knows messages was received
TIMEOUT: how long to wait?
Too long: system feels unresponsive
Too short: messages needlessly re-sent, messages may have been dropped due to overloaded server. Resending makes overload worse!
Sequence numbers
Senders gives each message an increasing unique seq number
Receiver knows it has seen all messages before N
Buffer messages so arrive in order, timeouts are adaptive
RPC: remote procedure call
Approach: create wrappers so calling a function on another machine feels just like calling a local function! (simplify application development)
Help with two components
Runtime library
Thread pool
Socket listeners call functions on server
Stub generation
Create wrappers automatically
Wrappers must do conversions
Client arguments to message
Message to server arguments
Convert server return value to message
Convert message to server return value
Marshaling / unmarshaling, or serializing / desterilizing
Many tools available (rpcgen, thrift, protobufs)
RPC over UDP:
Use function return as implicit ACK
If function takes a long time, then send a separate ACK
Last updated