Building An Elastic Query Engine on Disaggregated Storage
https://www.usenix.org/conference/nsdi20/presentation/vuppalapati
Traditional shared-nothing architectures
Data partitioned across servers
Each server handles its own partition
Problems
Hardware-workload mismatch
Data re-shuffle during elasticity
Fundamental issue: tight coupling of compute & storage
Query Engines on Disaggregated Storage
Decouple compute and persistent storage
Independent scaling of resources
Work explores
Design aspects
Data-driven insights
Future directions
System focus on: snowflake
Warehousing as a service, in production for over 5 years, 1000s of customers, millions of queries per day
Statistics from 70 million queries over 14 day
Diversity of Queries
Read only --> 28%
Write only --> 13%
R/W --> 59%
Persistent data read/written varies over several orders of magnitude within each class
Query distribution over time
Read-only query load varies significantly over time
High-level architecture
Virtual warehouse: abstraction for computational resources, under the hood --> set of VMs, distributed execution of queries
Customers --> different warehouses
Decouples compute from persistent storage (S3)
Intermediate data: distributed join, or group by --> ephemeral storage
Ephemeral storage system key features
Co-located with compute in VWs
Opportunistic caching of persistent data (hide access latency)
Elasticity without data re-shuffle
Intermediate data characteristics
Data size: variation over 5 orders of magnitude
Difficult to predict intermediate data sizes upfront
Decouple compute & ephemeral storage?
Persistent data caching
Intermediate data volume: peak, average
Opportunistic caching of persistent data in ephemeral storage system, hides latency of S3 access
How to ensure consistency?
Each file assigned to unique node, consistent hashing
Write-through caching
Analysis, future directions in paper
Elasticity
Persistent storage: easy, offloaded to S3
Compute: easy, pre-warmed pool of VMs
Ephemeral storage - challenging, due to co-location with compute
Back to shared-nothing architecture problem (data re-shuffle)
Lazy consistent hashing
Locality aware task scheduling
Elasticity without data re-shuffle
Lazy consistent hashing
Elasticity
Resource scaling by up to 100x needed at times
What time-scale?
Granularity of warehouse elasticity >> changes in query load
Take-away: need finer-grained elasticity in order to better match demand
Resource utilization
System-wide resource utilizations: significant room for improvement in resource utilizaiton
Virtual Warehouse abstraction
Good performance isolation
Trade-off: low resource utilization
Solution: finer-grained elasticity with current design, or move to resource shared model
Alternate design: resource sharing
Pricing model: per-hour usage for warehouses
Move to per-second pricing --> less cost-effective
Solution: move to resource shared model
Better resource utilization
Helps support elasticity
Challenges
Maintaining isolation guarantees
Shared ephemeral storage system
Sharing cache
No pre-determined lifetime
Co-existence with int. data
Elasticity without violating isolation
Possible cross-tenant interference
Need private address-spaces for tenants
Last updated
Was this helpful?