Faster and Cheaper Serverless Computing on Harvested Resources
https://dl.acm.org/doi/pdf/10.1145/3477132.3483580
Last updated
Was this helpful?
https://dl.acm.org/doi/pdf/10.1145/3477132.3483580
Last updated
Was this helpful?
Serverless computing
User's angle
Pay per use
No worry about VMs
Most popular offering - function as a service (FaaS)
Bounded completion time with no persistent states between invocations
FaaS Provider's angle
Provision, manage, and pay laaS providers for hosting VMs
Harvested Resources in Datacenters
IaaS providers offer surplus resources as VMs
Harvest VM
Allocated with a minimum size
Grows and shrinks to harvest all unallocated resources
Only evicted if minimum size is needed for a regular VM
Grace period before eviction (e.g., 30 seconds in Azure)
Characterization - Harvest VMs
Eviction:
>90% harvest VMs live longer than 1 day
>60% Harvest VMs live longer than 1 month
Harvest VM evictions are generally infrequent
Resource variation:
>70% intervals longer than 10 minutes
~35% intervals longer 1 hour
Resource variation much more frequent than evictions
Characterization - FaaS workloads
Invocation duration & VM grace period
Long invocation - longer than 30s (VM eviction grace period)
Long application - at least 1 invocation longer than 30 seconds
Short invocations can always terminate in VM grace period
96% invocations shorter than 30 seconds
Long invocations take over 82% of the total execution time
Application's angle
48.7% applications are long applications
Long applications take over 99.7% of the total execution time
Nearly half of the applications are long, and long applications take over almost all the execution time
Take-aways
FaaS workloads are a good fit for Harvest VMs
Short invocation durations
Relatively long harvest VM lifetime
Harvest VM resource variation much more common than evictions
Long applications take the vast majority of the total execution time
Handling Evictions
How to eliminate or minimize invocation failure caused by Harvest VM evictions?
Use a mix of Harvest VMs and regular VMs
Strategies with different tradeoff between reliability and efficiency
No failure: guaranteed no failure
Guarantee no invocation failure caused by Harvest VM evictions
Assign all long applications to regular VMs
Efficiency?
Fraction of computation hosted by cheap Harvest VMs
12% of computation capacity hosted by Harvest VMs
Bounded failure
Upper bound (100 - x)% per application eviction failure rate
Assign application to regular VM if xth duration percentile longer than 30s
Worst case: all long invocations on Harvest VM fails --> (100-x)% failure rate
How about efficiency?
Failure rate < 1% --> 45.7% computation hosted by Harvest VMs
Failure rate < 0.1% --> 28% computation hosted by Harvest VMs
Live and Let Die
No guarantee on failure rate
All applications on Harvest VMs
How about reliability?
Invocation failure rate
Worst: 99.99% success rate
7 nines of reliability
Failure require two low-probability events to happen simultaneously
A harvest VM gets evicted while it is running a long invocation
Best efficiency and actual low failure rate
Handling Resource Variability
Join-the-shortest-queue (JSQ) leads to high cold start rate
Invocations of the same application distributed across all VMs
Inter-arrival time longer than container keep-alive
Min-worker-set (MWS)
Consolidates each application to a minimal set of backend
Shorter inter-arrival time --> warm starts
Consistent hashing to minimize reshuffling of home VMs
Implementation on OpenWhisk
Harvest Monitor
Collects resource information & eviction signal
Controller
Maintains data from Harvest Monitor
Implement MWS
Resource monitor
Tracks resource variation in the system
Spins up new VMs to maintain available resources
Evaluation
Conclusion
To host serverless platforms on harvested resources
Quantify the challenges of using harvested resources for serverless invocations, including Harvest VM evictions and resource variation
Demonstrate the reliability of hosting serverless workloads on harvested resources
Demonstrate the performance and economic benefits of hosting serverless platforms on harvested resources