# Faster and Cheaper Serverless Computing on Harvested Resources

### Presentation&#x20;

* Serverless computing&#x20;
  * User's angle&#x20;
    * Pay per use&#x20;
    * No worry about VMs&#x20;
    * Most popular offering - function as a service (FaaS)
      * Bounded completion time with no persistent states between invocations&#x20;
  * FaaS Provider's angle&#x20;
    * Provision, manage, and pay laaS providers for hosting VMs&#x20;
* Harvested Resources in Datacenters
  * IaaS providers offer surplus resources as VMs&#x20;
    * Harvest VM&#x20;
      * Allocated with a minimum size&#x20;
      * Grows and shrinks to harvest all unallocated resources&#x20;
      * Only evicted if minimum size is needed for a regular VM
      * Grace period before eviction (e.g., 30 seconds in Azure)
* Characterization - Harvest VMs&#x20;
  * Eviction:&#x20;
    * \>90% harvest VMs live longer than 1 day&#x20;
    * \>60% Harvest VMs live longer than 1 month&#x20;
    * Harvest VM evictions are generally infrequent&#x20;
  * Resource variation:
    * \>70% intervals longer than 10 minutes&#x20;
    * \~35% intervals longer 1 hour
    * Resource variation much more frequent than evictions&#x20;
* Characterization - FaaS workloads
  * Invocation duration & VM grace period&#x20;
    * Long invocation - longer than 30s (VM eviction grace period)&#x20;
    * Long application - at least 1 invocation longer than 30 seconds&#x20;
    * Short invocations can always terminate in VM grace period&#x20;
  * 96% invocations shorter than 30 seconds&#x20;
  * Long invocations take over 82% of the total execution time&#x20;
  * Application's angle&#x20;
    * 48.7% applications are long applications
    * Long applications take over 99.7% of the total execution time&#x20;
    * Nearly half of the applications are long, and long applications take over almost all the execution time&#x20;
* Take-aways
  * FaaS workloads are a good fit for Harvest VMs&#x20;
    * Short invocation durations
    * Relatively long harvest VM lifetime&#x20;
  * Harvest VM resource variation much more common than evictions&#x20;
  * Long applications take the vast majority of the total execution time&#x20;
* Handling Evictions&#x20;
  * How to eliminate or minimize invocation failure caused by Harvest VM evictions?
    * Use a mix of Harvest VMs and regular VMs
    * Strategies with different tradeoff between reliability and efficiency&#x20;
      * No failure: guaranteed no failure&#x20;
        * Guarantee no invocation failure caused by Harvest VM evictions&#x20;
        * Assign all long applications to regular VMs&#x20;
        * Efficiency?
          * Fraction of computation hosted by cheap Harvest VMs
          * 12% of computation capacity hosted by Harvest VMs&#x20;
      * Bounded failure
        * Upper bound (100 - x)% per application eviction failure rate
        * Assign application to regular VM if xth duration percentile longer than 30s&#x20;
          * Worst case: all long invocations on Harvest VM fails --> (100-x)% failure rate&#x20;
        * How about efficiency?
          * Failure rate < 1% --> 45.7% computation hosted by Harvest VMs&#x20;
          * Failure rate < 0.1% --> 28% computation hosted by Harvest VMs&#x20;
      * Live and Let Die&#x20;
        * No guarantee on failure rate&#x20;
        * All applications on Harvest VMs&#x20;
        * How about reliability?
          * Invocation failure rate&#x20;
            * Worst: 99.99% success rate&#x20;
            * 7 nines of reliability&#x20;
          * Failure require two low-probability events to happen simultaneously&#x20;
            * A harvest VM gets evicted while it is running a long invocation
        * &#x20;Best efficiency and actual low failure rate&#x20;
* Handling Resource Variability&#x20;
  * Join-the-shortest-queue (JSQ) leads to high cold start rate&#x20;
    * Invocations of the same application distributed across all VMs
    * Inter-arrival time longer than container keep-alive
  * Min-worker-set (MWS)
    * Consolidates each application to a minimal set of backend&#x20;
    * Shorter inter-arrival time --> warm starts
    * Consistent hashing to minimize reshuffling of home VMs&#x20;
* Implementation on OpenWhisk&#x20;
  * ![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MVORxAomcgtzVVUqmws%2Fuploads%2FnbUn8YQuFSAZX2Trs8vO%2Fimage.png?alt=media\&token=7be68160-6b6b-4a43-bf91-cebcad9d44b1)
  * Harvest Monitor
    * Collects resource information & eviction signal
  * Controller
    * Maintains data from Harvest Monitor
    * Implement MWS
  * Resource monitor
    * Tracks resource variation in the system
    * Spins up new VMs to maintain available resources&#x20;

Evaluation&#x20;

![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MVORxAomcgtzVVUqmws%2Fuploads%2F33y39rz0jMPKNWCnCVza%2Fimage.png?alt=media\&token=052e8aec-edb4-4419-a0fe-56fb3acb74af)

Conclusion&#x20;

* To host serverless platforms on harvested resources&#x20;
* Quantify the challenges of using harvested resources for serverless invocations, including Harvest VM evictions and resource variation&#x20;
* Demonstrate the reliability of hosting serverless workloads on harvested resources&#x20;
* Demonstrate the performance and economic benefits of hosting serverless platforms on harvested resources&#x20;
