# Faster and Cheaper Serverless Computing on Harvested Resources

### Presentation&#x20;

* Serverless computing&#x20;
  * User's angle&#x20;
    * Pay per use&#x20;
    * No worry about VMs&#x20;
    * Most popular offering - function as a service (FaaS)
      * Bounded completion time with no persistent states between invocations&#x20;
  * FaaS Provider's angle&#x20;
    * Provision, manage, and pay laaS providers for hosting VMs&#x20;
* Harvested Resources in Datacenters
  * IaaS providers offer surplus resources as VMs&#x20;
    * Harvest VM&#x20;
      * Allocated with a minimum size&#x20;
      * Grows and shrinks to harvest all unallocated resources&#x20;
      * Only evicted if minimum size is needed for a regular VM
      * Grace period before eviction (e.g., 30 seconds in Azure)
* Characterization - Harvest VMs&#x20;
  * Eviction:&#x20;
    * \>90% harvest VMs live longer than 1 day&#x20;
    * \>60% Harvest VMs live longer than 1 month&#x20;
    * Harvest VM evictions are generally infrequent&#x20;
  * Resource variation:
    * \>70% intervals longer than 10 minutes&#x20;
    * \~35% intervals longer 1 hour
    * Resource variation much more frequent than evictions&#x20;
* Characterization - FaaS workloads
  * Invocation duration & VM grace period&#x20;
    * Long invocation - longer than 30s (VM eviction grace period)&#x20;
    * Long application - at least 1 invocation longer than 30 seconds&#x20;
    * Short invocations can always terminate in VM grace period&#x20;
  * 96% invocations shorter than 30 seconds&#x20;
  * Long invocations take over 82% of the total execution time&#x20;
  * Application's angle&#x20;
    * 48.7% applications are long applications
    * Long applications take over 99.7% of the total execution time&#x20;
    * Nearly half of the applications are long, and long applications take over almost all the execution time&#x20;
* Take-aways
  * FaaS workloads are a good fit for Harvest VMs&#x20;
    * Short invocation durations
    * Relatively long harvest VM lifetime&#x20;
  * Harvest VM resource variation much more common than evictions&#x20;
  * Long applications take the vast majority of the total execution time&#x20;
* Handling Evictions&#x20;
  * How to eliminate or minimize invocation failure caused by Harvest VM evictions?
    * Use a mix of Harvest VMs and regular VMs
    * Strategies with different tradeoff between reliability and efficiency&#x20;
      * No failure: guaranteed no failure&#x20;
        * Guarantee no invocation failure caused by Harvest VM evictions&#x20;
        * Assign all long applications to regular VMs&#x20;
        * Efficiency?
          * Fraction of computation hosted by cheap Harvest VMs
          * 12% of computation capacity hosted by Harvest VMs&#x20;
      * Bounded failure
        * Upper bound (100 - x)% per application eviction failure rate
        * Assign application to regular VM if xth duration percentile longer than 30s&#x20;
          * Worst case: all long invocations on Harvest VM fails --> (100-x)% failure rate&#x20;
        * How about efficiency?
          * Failure rate < 1% --> 45.7% computation hosted by Harvest VMs&#x20;
          * Failure rate < 0.1% --> 28% computation hosted by Harvest VMs&#x20;
      * Live and Let Die&#x20;
        * No guarantee on failure rate&#x20;
        * All applications on Harvest VMs&#x20;
        * How about reliability?
          * Invocation failure rate&#x20;
            * Worst: 99.99% success rate&#x20;
            * 7 nines of reliability&#x20;
          * Failure require two low-probability events to happen simultaneously&#x20;
            * A harvest VM gets evicted while it is running a long invocation
        * &#x20;Best efficiency and actual low failure rate&#x20;
* Handling Resource Variability&#x20;
  * Join-the-shortest-queue (JSQ) leads to high cold start rate&#x20;
    * Invocations of the same application distributed across all VMs
    * Inter-arrival time longer than container keep-alive
  * Min-worker-set (MWS)
    * Consolidates each application to a minimal set of backend&#x20;
    * Shorter inter-arrival time --> warm starts
    * Consistent hashing to minimize reshuffling of home VMs&#x20;
* Implementation on OpenWhisk&#x20;
  * ![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MVORxAomcgtzVVUqmws%2Fuploads%2FnbUn8YQuFSAZX2Trs8vO%2Fimage.png?alt=media\&token=7be68160-6b6b-4a43-bf91-cebcad9d44b1)
  * Harvest Monitor
    * Collects resource information & eviction signal
  * Controller
    * Maintains data from Harvest Monitor
    * Implement MWS
  * Resource monitor
    * Tracks resource variation in the system
    * Spins up new VMs to maintain available resources&#x20;

Evaluation&#x20;

![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MVORxAomcgtzVVUqmws%2Fuploads%2F33y39rz0jMPKNWCnCVza%2Fimage.png?alt=media\&token=052e8aec-edb4-4419-a0fe-56fb3acb74af)

Conclusion&#x20;

* To host serverless platforms on harvested resources&#x20;
* Quantify the challenges of using harvested resources for serverless invocations, including Harvest VM evictions and resource variation&#x20;
* Demonstrate the reliability of hosting serverless workloads on harvested resources&#x20;
* Demonstrate the performance and economic benefits of hosting serverless platforms on harvested resources&#x20;


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://sliu583.gitbook.io/blog/conference/index/sosp-21/learning/faster-and-cheaper-serverless-computing-on-harvested-resources.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
