CtrlK

Redesigning Storage Systems for Future Workloads Hardware and Performance Requirements

https://www.youtube.com/watch?v=UFxS2fepBLk

Redesigning Storage Systems

New: applications, hardware, performance requirements
Applications
- Billions of impressions collected by most web services for analytics
- Past workloads: read mostly, rarely updated
- Future workloads: example of IoTs
  - Mixed workload
  - Read / write ratio
  - Reads: read large volumes of sensor, data for control and analytics
  - Writes: new sensor data, continuously recorded
- We produce huge amount of data: grow exponentially
- Need to figure out a way to serve data in persistent storage
- Nature of the data is quite diverse: unstructured data
  - KVs: key to big, unstructured data
  - API: put(), get(), and scan()
  - Example: RocksDB, redis, cassandra
Hardware
- Landscape: RAM, HDD, Tape (archive)
- Now: NVM, SSD between layer of RAM and HDD; and Glass, DNA, ...
- Focus: SSD
  - New SSDs are much faster
  - Random accesses almost as fast as sequential accesses
Performance requirements
- Throughput
  - reads & writes / second
  - higher is better
  - should be steady
- Latency
  - time it takes to read or write
  - lower is better
  - tail latency (e.g., 99th percentile)
    High tail latency in KVs --> unpredictable performance. Difficult to provide QoS guarantees

Contributions

Applications: cluster metadata management. Mixed write-intensive workload.
- TRIAD [ATC '17]: decrease maintenance I/O to increase client throughput.
  - HyperLogLog-driven compaction
  - WAL repurposing
  - Hot/Cold key separation
Hardware: new servers with ample memory sizes (100s GB)
- FloDB: Scale KVs with memory size
  - New 2-level data structure
  - Mostly O(1) insert
  - Concurrent reads and updates
Performance requirements
- Need low, steady tail latency. No latency spikes.
- SILK: KVs I/O scheduling
  - Opportunistic I/O bandwidth allocation for KV maintenance ops
  - Prioritize client work
  - No tail latency spikes
KVell: fast KV for NVMe SSDs (SOSP 19')
- RocksDB: low, unstable throughput. CPU-bound on Fast SSDs.
  - Can't saturate the I/O bandwidth that comes with the device
- Existing KVs internals

PreviousFrom Laptop to Lambda: Outsourcing Everyday Jobs to Thousands of Transient Functional Containers NextAre Machine Learning Cloud APIs Used Correctly?

Last updated 3 years ago

Was this helpful?