From Laptop to Lambda: Outsourcing Everyday Jobs to Thousands of Transient Functional Containers

ATC 19

Presentation

  • Occasional task that needs 10,000 cores (in the cloud)

  • Many others share this dream

    • Outsourcing computation

    • Cluster-computing framework

    • Burst-parallel cloud functions

  • Limited speed-ups, high costs, limited applicability

  • gg: framework and a toolkit that makes it practical to outsource everyday applications using thousands of parallel threads in cloud services

  • Challenges of outsourcing applications to the cloud

    • Software dependencies must be managed

      • With data flow frameworks like Spark, Hadoop, and Dryad, the software dependencies remain unmanaged

      • Need a warm cluster with everything preloaded

      • Not amenable to occasional one-off tasks

      • A 10,000-core cluster on EC2 is expensive!

      • Thunk abstractions

        • Lightweight container

        • Identifies an executable, along with its argument, environment, and input data

        • Data is named by the hash of its content

          • Full functional footprint

    • Roundtrips to the cloud hurt performance

      • Current application-specific outsourcing tools perform better over fast networks: distcc, icecc, UCop

      • The laptop is in the driver's seat! Want to take laptop out of the loop. Minimize the communication in-between

      • Containers can reference each other's outputs: linked containers (gg IR)

      • Graphs, and dynamic dependency graphs etc.

    • Cloud functions are promising, but hard to use well

      • The dream: renting a supercomputer by the second

      • Warm clusters are expensive, cold clusters are slow to start

      • 10,000 workers for 10 seconds on AWS Lambda costs ~$5!

        • PyWren, Sprocket, Serverless MapReduce

      • Using cloud functions is challenging

      • gg on Lambda

        • Faster speedup

        • Getting data to the cloud is faster (HTTP pipelining, multi-threading)

      • Many applications require inter-function communication

        • Current systems use indirect techniques such as using shared storage (e.g., S3)

      • Using off-the-shelf NAT-traversal techniques, the Lambdas can talk to each other at speed up to 600 Mbps

  • Example: applications - software compilation

    • Build systems are often large and complicated; very difficult to manually rewrite them in gg IR.

    • We need a system that works with existing build systems, like make, Cmake, nija, etc.

    • Technique: model substitution - a technique to extract gg IR from existing applications

      • Idea: run the original build system, but replace every stage with a 'model' program that produces a thunk, instead of the actual output

    • gg on AWS Lambda is 2-5x faster than icecc outsourcing to a 384-core cluster

      • Performance doesn't increase as you get more cores, because the laptop becomes the bottleneck

      • And much more cheaper

    • Google Chrome: 18 mins...

Last updated