Building An Elastic Query Engine on Disaggregated Storage

https://www.usenix.org/conference/nsdi20/presentation/vuppalapati

Traditional shared-nothing architectures
- Data partitioned across servers
- Each server handles its own partition
- Problems
  - Hardware-workload mismatch
  - Data re-shuffle during elasticity
  - Fundamental issue: tight coupling of compute & storage
Query Engines on Disaggregated Storage
- Decouple compute and persistent storage
- Independent scaling of resources
Work explores
- Design aspects
- Data-driven insights
- Future directions
System focus on: snowflake
- Warehousing as a service, in production for over 5 years, 1000s of customers, millions of queries per day
- Statistics from 70 million queries over 14 day
Diversity of Queries
- Read only --> 28%
- Write only --> 13%
- R/W --> 59%
- Persistent data read/written varies over several orders of magnitude within each class
Query distribution over time
- Read-only query load varies significantly over time
High-level architecture
- Virtual warehouse: abstraction for computational resources, under the hood --> set of VMs, distributed execution of queries
- Customers --> different warehouses
- Decouples compute from persistent storage (S3)
  - Intermediate data: distributed join, or group by --> ephemeral storage
    Ephemeral storage system key features
    Co-located with compute in VWs
    Opportunistic caching of persistent data (hide access latency)
    Elasticity without data re-shuffle
Intermediate data characteristics
- Data size: variation over 5 orders of magnitude
- Difficult to predict intermediate data sizes upfront
- Decouple compute & ephemeral storage?
Persistent data caching
- Intermediate data volume: peak, average
- Opportunistic caching of persistent data in ephemeral storage system, hides latency of S3 access
- How to ensure consistency?
  - Each file assigned to unique node, consistent hashing
  - Write-through caching
  - Analysis, future directions in paper
Elasticity
- Persistent storage: easy, offloaded to S3
- Compute: easy, pre-warmed pool of VMs
- Ephemeral storage - challenging, due to co-location with compute
  - Back to shared-nothing architecture problem (data re-shuffle)
Lazy consistent hashing
- Locality aware task scheduling
- Elasticity without data re-shuffle
- Lazy consistent hashing
Elasticity
- Resource scaling by up to 100x needed at times
- What time-scale?
  - Granularity of warehouse elasticity >> changes in query load
  - Take-away: need finer-grained elasticity in order to better match demand
Resource utilization
- System-wide resource utilizations: significant room for improvement in resource utilizaiton
- Virtual Warehouse abstraction
  - Good performance isolation
  - Trade-off: low resource utilization
- Solution: finer-grained elasticity with current design, or move to resource shared model
Alternate design: resource sharing
- Pricing model: per-hour usage for warehouses
- Move to per-second pricing --> less cost-effective
- Solution: move to resource shared model
  - Better resource utilization
  - Helps support elasticity
  - Challenges
    Maintaining isolation guarantees
    Shared ephemeral storage system
    Sharing cache
    No pre-determined lifetime
    Co-existence with int. data
    Elasticity without violating isolation
    Possible cross-tenant interference
    Need private address-spaces for tenants

PreviousIOS: Inter-Operator Scheduler for CNN Acceleration NextSundial: Fault-tolerant Clock Synchronization for Datacenters

Last updated 3 years ago

Was this helpful?