Building An Elastic Query Engine on Disaggregated Storage

Presents SnowFlake design and implementation, along with a discussion on how recent changes in cloud infrastructure (emerging hardware, fine-grained billing, etc.) have altered the many assumptions

Presentation

Traditional shared-nothing architectures
- Data partitioned across servers
- Each server handles its own partition
Hardware-workload mismatch
Data re-shuffle during elasticity
Fundamental issue in share-nothing architecture: tight coupling of compute and storage
Solution: decouple compute and persistent storage
- Independent scaling of resources
- Analyze: query engine on disaggregated storage
  - Design aspects
  - Data-driven insights
  - Future directions
- System: SnowFlake --> statistic from 70 million queries over 14 day period
Characteristics
- Diversity of queries
  - Read-only: 28%
    Write-only: 13%
    Read-write: 59%
    Three distinct query classes (vary in orders of magnitude)
- Query distribution over time
  - Read-only query load varies significantly overtime
High-level architecture
- Virtual warehouse
  - Abstractions for computational resources
  - Under the hood --> set of VMs
  - Distributed execution of queries
- Different customers --> different warehouses
- - Ephemeral storage system key features
    Co-located with compute in VWs
    Use both DRAM and local SSDs
    Opportunistic caching of persistent data
    Hide S3 access latency
    Elasticity without data re-shuffle
- Intermediate data characteristics
  - Intermediate data sizes --> variation over 5 orders of magnitude
  - Difficult to predict intermediate data sizes upfront
    A bit different than ours
  - Possibility: decouple compute & ephemeral storage
- Persistent data caching
  - Intermediate data volume --> peak
  - Opportunistic caching
    How to ensure consistency?
    Each file assigned to unique node
    Consistent hashing
    Write-through caching
- Elasticity
  - Persistent storage - easy, offloaded to S3
  - Compute - easy, pre-warmed pool of VMs
    Request --> take VMs from the pool
  - Ephemeral storage - challenging, due to co-location with compute
    Back to shared-nothing architecture problem (data re-shuffle)
    Reshuffle --> re-hashing to ensure consistency
  - Lazy consistent hashing
    Locality aware task scheduling
    Elasticity without data re-shuffle
    Waits until a task that actually reads it
- Do customers exploit elasticity in the wild?
  - Resource scaling by up to 100x needed at times
  - At what time-scales are warehouses resized?
    Granularity of warehouse elasticity >> changes in query load
    Need finer-grained elasticity in order to better match demand
  - Resource utilization
    System-wide resource utilization: CPU, memory, network-tx, network-rx
    Virtual warehouse abstraction
    Good performance isolation
    Trade-off: low resource utilization
    Significant room for improvement in resource utilization
    Solution #1: finer-grained elasticity with current design
    Solution #2: resource shared model
- Resource sharing
  - Per-second pricing --> pre-warmed pool not cost effective!
  - Sol #2: move to resource shared model
    Statistical multiplexing
    better resource utilization
    helps support elasticity
    Challenges
    Maintaining isolation guarantees
    Shared ephemeral storage system
    Sharing cache
    No pre-determined lifetime
    Co-existence with int. data
    Elasticity without violating isolation
    Cross-tenant interference
    Need private address spaces for tenants
- Findings are general applicable to other distributed system on top of disaggregated storage

PreviousQueries Processing NextGRIP: Multi-Store Capacity-Optimized High-Performance NN Search

Last updated 3 years ago

Was this helpful?