# From Laptop to Lambda: Outsourcing Everyday Jobs to Thousands of Transient Functional Containers

### Presentation&#x20;

* Occasional task that needs 10,000 cores (in the cloud)
* Many others share this dream&#x20;
  * Outsourcing computation
  * Cluster-computing framework&#x20;
  * Burst-parallel cloud functions &#x20;
* Limited speed-ups, high costs, limited applicability&#x20;
* gg: framework and a toolkit that makes it practical to outsource everyday applications using thousands of parallel threads in cloud services&#x20;
* Challenges of outsourcing applications to the cloud&#x20;
  * **Software dependencies must be managed**&#x20;
    * With data flow frameworks like Spark, Hadoop, and Dryad, the software dependencies remain unmanaged&#x20;
    * Need a warm cluster with everything preloaded
    * Not amenable to occasional one-off tasks&#x20;
    * A 10,000-core cluster on EC2 is expensive!&#x20;
    * **Thunk** abstractions&#x20;
      * **Lightweight container**&#x20;
      * Identifies an executable, along with its argument, environment, and input data&#x20;
      * Data is named by the hash of its content&#x20;
      * ![](/files/9byHgjJiN5ueWM3OyxJU)
        * Full functional footprint&#x20;
  * **Roundtrips to the cloud hurt performance**&#x20;
    * Current application-specific outsourcing tools perform better over fast networks: distcc, icecc, UCop&#x20;
    * The laptop is in the driver's seat! Want to take laptop out of the loop. Minimize the communication in-between&#x20;
    * ![](/files/0zBYsfMlGR0Joa5vvXv5)
    * Containers can reference each other's outputs: **linked containers (gg IR)**&#x20;
    * Graphs, and dynamic dependency graphs etc.&#x20;
  * **Cloud functions are promising, but hard to use well**&#x20;
    * The dream: renting a supercomputer by the second&#x20;
    * Warm clusters are expensive, cold clusters are slow to start&#x20;
    * 10,000 workers for 10 seconds on AWS Lambda costs \~$5!&#x20;
      * PyWren, Sprocket, Serverless MapReduce&#x20;
    * Using cloud functions is challenging&#x20;
    * gg on Lambda&#x20;
      * Faster speedup
      * Getting data to the cloud is faster (HTTP pipelining, multi-threading)&#x20;
    * Many applications require inter-function communication&#x20;
      * Current systems use indirect techniques such as using shared storage (e.g., S3)&#x20;
    * Using off-the-shelf **NAT-traversal techniques**, the Lambdas can talk to each other at speed up to 600 Mbps&#x20;
* Example: applications - software compilation&#x20;
  * Build systems are often large and complicated; very difficult to manually rewrite them in gg IR.
  * We need a system that works with existing build systems, like make, Cmake, nija, etc.
  * Technique: model substitution - a technique to extract gg IR from existing applications&#x20;
    * Idea: run the original build system, but replace every stage with a 'model' program that produces a thunk, instead of the actual output&#x20;
  * gg on AWS Lambda is 2-5x faster than icecc outsourcing to a 384-core cluster&#x20;
    * Performance doesn't increase as you get more cores, because the laptop becomes the bottleneck&#x20;
    * And much more cheaper&#x20;
  * Google Chrome: 18 mins...&#x20;


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://sliu583.gitbook.io/blog/specific-work/seminar-and-talk/fall-21-reading-list/from-laptop-to-lambda-outsourcing-everyday-jobs-to-thousands-of-transient-functional-containers.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
