> For the complete documentation index, see [llms.txt](https://sliu583.gitbook.io/blog/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://sliu583.gitbook.io/blog/specific-work/seminar-and-talk/fall-21-reading-list/building-an-elastic-query-engine-on-disaggregated-storage.md).

# Building An Elastic Query Engine on Disaggregated Storage

* Traditional shared-nothing architectures&#x20;
  * Data partitioned across servers
  * Each server handles its own partition&#x20;
  * Problems
    * Hardware-workload mismatch
    * Data re-shuffle during elasticity&#x20;
    * Fundamental issue: tight coupling of compute & storage&#x20;
* Query Engines on Disaggregated Storage&#x20;
  * Decouple compute and persistent storage
  * Independent scaling of resources&#x20;
* Work explores&#x20;
  * Design aspects&#x20;
  * Data-driven insights&#x20;
  * Future directions&#x20;
* System focus on: snowflake
  * Warehousing as a service, in production for over 5 years, 1000s of customers, millions of queries per day&#x20;
  * Statistics from 70 million queries over 14 day&#x20;
* Diversity of Queries&#x20;
  * Read only --> 28%&#x20;
  * Write only --> 13%
  * R/W --> 59%&#x20;
  * Persistent data read/written varies over several orders of magnitude within each class&#x20;
* Query distribution over time&#x20;
  * Read-only query load varies significantly over time&#x20;
* High-level architecture&#x20;
  * Virtual warehouse: abstraction for computational resources, under the hood --> set of VMs, distributed execution of queries &#x20;
  * Customers --> different warehouses&#x20;
  * Decouples compute from persistent storage (S3)&#x20;
    * Intermediate data: distributed join, or group by --> ephemeral storage&#x20;
      * Ephemeral storage system key features&#x20;
        * Co-located with compute in VWs
        * Opportunistic caching of persistent data (hide access latency)
        * Elasticity without data re-shuffle&#x20;
* Intermediate data characteristics&#x20;
  * Data size: variation over 5 orders of magnitude&#x20;
  * Difficult to predict intermediate data sizes upfront&#x20;
  * Decouple compute & ephemeral storage?&#x20;
* Persistent data caching&#x20;
  * Intermediate data volume: peak, average&#x20;
  * Opportunistic caching of persistent data in ephemeral storage system, hides latency of S3 access
  * How to ensure consistency?&#x20;
    * Each file assigned to unique node, consistent hashing&#x20;
    * Write-through caching&#x20;
    * Analysis, future directions in paper&#x20;
* Elasticity&#x20;
  * Persistent storage: easy, offloaded to S3&#x20;
  * Compute: easy, pre-warmed pool of VMs&#x20;
  * Ephemeral storage - challenging, due to co-location with compute&#x20;
    * Back to shared-nothing architecture problem (data re-shuffle)&#x20;
* Lazy consistent hashing&#x20;
  * Locality aware task scheduling&#x20;
  * Elasticity without data re-shuffle&#x20;
  * Lazy consistent hashing&#x20;
* Elasticity&#x20;
  * Resource scaling by up to 100x needed at times&#x20;
  * What time-scale?&#x20;
    * Granularity of warehouse elasticity >> changes in query load&#x20;
    * Take-away: need finer-grained elasticity in order to better match demand&#x20;
* Resource utilization&#x20;
  * System-wide resource utilizations: significant room for improvement in resource utilizaiton&#x20;
  * Virtual Warehouse abstraction
    * Good performance isolation
    * Trade-off: low resource utilization
  * Solution: finer-grained elasticity with current design, or move to resource shared model&#x20;
* Alternate design: resource sharing&#x20;
  * Pricing model: per-hour usage for warehouses&#x20;
  * Move to per-second pricing --> less cost-effective&#x20;
  * Solution: move to resource shared model
    * Better resource utilization&#x20;
    * Helps support elasticity&#x20;
    * Challenges&#x20;
      * Maintaining isolation guarantees&#x20;
      * Shared ephemeral storage system
        * Sharing cache&#x20;
          * No pre-determined lifetime&#x20;
          * Co-existence with int. data
        * &#x20;Elasticity without violating isolation
          * Possible cross-tenant interference
          * Need private address-spaces for tenants &#x20;
