Building An Elastic Query Engine on Disaggregated Storage

https://www.usenix.org/conference/nsdi20/presentation/vuppalapati

  • Traditional shared-nothing architectures

    • Data partitioned across servers

    • Each server handles its own partition

    • Problems

      • Hardware-workload mismatch

      • Data re-shuffle during elasticity

      • Fundamental issue: tight coupling of compute & storage

  • Query Engines on Disaggregated Storage

    • Decouple compute and persistent storage

    • Independent scaling of resources

  • Work explores

    • Design aspects

    • Data-driven insights

    • Future directions

  • System focus on: snowflake

    • Warehousing as a service, in production for over 5 years, 1000s of customers, millions of queries per day

    • Statistics from 70 million queries over 14 day

  • Diversity of Queries

    • Read only --> 28%

    • Write only --> 13%

    • R/W --> 59%

    • Persistent data read/written varies over several orders of magnitude within each class

  • Query distribution over time

    • Read-only query load varies significantly over time

  • High-level architecture

    • Virtual warehouse: abstraction for computational resources, under the hood --> set of VMs, distributed execution of queries

    • Customers --> different warehouses

    • Decouples compute from persistent storage (S3)

      • Intermediate data: distributed join, or group by --> ephemeral storage

        • Ephemeral storage system key features

          • Co-located with compute in VWs

          • Opportunistic caching of persistent data (hide access latency)

          • Elasticity without data re-shuffle

  • Intermediate data characteristics

    • Data size: variation over 5 orders of magnitude

    • Difficult to predict intermediate data sizes upfront

    • Decouple compute & ephemeral storage?

  • Persistent data caching

    • Intermediate data volume: peak, average

    • Opportunistic caching of persistent data in ephemeral storage system, hides latency of S3 access

    • How to ensure consistency?

      • Each file assigned to unique node, consistent hashing

      • Write-through caching

      • Analysis, future directions in paper

  • Elasticity

    • Persistent storage: easy, offloaded to S3

    • Compute: easy, pre-warmed pool of VMs

    • Ephemeral storage - challenging, due to co-location with compute

      • Back to shared-nothing architecture problem (data re-shuffle)

  • Lazy consistent hashing

    • Locality aware task scheduling

    • Elasticity without data re-shuffle

    • Lazy consistent hashing

  • Elasticity

    • Resource scaling by up to 100x needed at times

    • What time-scale?

      • Granularity of warehouse elasticity >> changes in query load

      • Take-away: need finer-grained elasticity in order to better match demand

  • Resource utilization

    • System-wide resource utilizations: significant room for improvement in resource utilizaiton

    • Virtual Warehouse abstraction

      • Good performance isolation

      • Trade-off: low resource utilization

    • Solution: finer-grained elasticity with current design, or move to resource shared model

  • Alternate design: resource sharing

    • Pricing model: per-hour usage for warehouses

    • Move to per-second pricing --> less cost-effective

    • Solution: move to resource shared model

      • Better resource utilization

      • Helps support elasticity

      • Challenges

        • Maintaining isolation guarantees

        • Shared ephemeral storage system

          • Sharing cache

            • No pre-determined lifetime

            • Co-existence with int. data

          • Elasticity without violating isolation

            • Possible cross-tenant interference

            • Need private address-spaces for tenants

Last updated