1.1 Introduction

A distributed system is

  • "... a system in which the failure of a compute you didn't even know existed can render your own computer unsuable" -- Leslie Lamport

  • ... multiple computers communicating via a network

  • ... trying to achieve some task together

  • Consists of "nodes" (computer, phone, car, robot, ...)


What make a system distributed?

  • It's inherently distributed: e.g., sending a message from your mobile phone to your friend's phone

  • For better reliability: even if one node fails, the system as a whole keeps functioning

  • For better performance: get data from a nearby node rather than one halfway round the world

    • Putting data closer to where the people are

  • To solve bigger problem: e.g. huge amounts of data, can't fit on one machine

Why NOT make a system distributed?

  • The trouble with distributed systems

    • Communication may fail (and we might not even know it has failed)

    • Processes may crash (and we might not know)

    • All of this may happen nondeterminisitically

  • Fault tolerance: we want the system as a whole to continue working, even when some parts are faulty

Last updated