Learning Cache Replacement with CACHEUS

https://www.usenix.org/conference/fast21/presentation/rodriguez

Talk

  • Workloads:

    • LFU-friendly

    • LRU-friendly

    • scan

    • churn

  • CACHEUS: a new class of fully adaptive, machine-learned caching algorithms that utilize a combination of experts designed to address these workload primitive types

    • Experts: SOTA ARC, LIRS, LFU, LR-LRU, CR-LFU

  • 17.766 simulation experiments on 329 workloads run against 6 different cache configurations

Cache: Fast but relatively small in capacity

Cache management + ML: improved performance, improves decision processes

Cache replacement algorithms

  • Non-adaptive:

    • LRU

    • LFU

    • LIRS (low inter-reference recency set)

  • Adaptive

    • ARC

    • Dynamic LIRS

  • ML-based adaptive

    • Adaptive caching using multiple experts (ACME)

    • RL on Cache Replacement (LeCaR)

    • Reinforcement learning (this work)

Workload Primitives

LeCaR

  • ML-Based

    • Simple: LRU, LFU as experts

    • Adaptive: update weights

    • Outperforms state-of-the-art: small cache sizes

Limitation of LeCaR

  • Fixed learning rate: empirically chosen

  • Can't handle Scan type

Improving LeCaR

  • Adaptive learning rate

  • Improving experts

    • Introduce scan resistance

      • Replace LRU with

        • ARC (unable to handle a scan followed by churn)

        • LIRS (not adaptive, limited ability to handle LRU-friendly worklod)

        • DLIRS (do not adapt well emprically

      • Scan resistant LRU: SR-LRU

    • Improve churn resistance

      • Churn resistant LFU (CR-LFU)

CACHEUS: Learning Rate Adaptation

  • Learning rate changed

    • Performance change

      • Using the gradient information

        • Positive: reinforce latest direction, update the learning rate in the same direction for the next time

        • Negative: reverse the latest direction

  • Learning rate unchanged

    • Performance change

      • Positive, no update

      • Negative, random jump

  • Performance low for 10 intervals

    • Restart learning rate

SR-LRU

  • Insert x into MRU (most recently used) position of Scan Resistant portion of the Cache

  • If miss in cache, hit in history: insert x into MRU position of Reuse portion of the cache instead of SR portion

  • If hit in the cache, then we don't care

  • Hit in the SR: move x into MRU position of reuse portion of cache

CR-LFU

  • Evict an item x from MRU position of the FLU portion of Cache

  • Evict an item from MRU position of the LFU portion of Cache, move the requested item to MRU position of MFU

  • Move x into MRU position of MFU portion of cache

Experiments

  • Datasets: 5 sources

    • FIU

    • MSR

    • CloudPhysics

    • CloudVPS

    • CloudCache

  • 6 Cache sizes

  • 6+1 algorithms compared

  • Total experiments: 17.766

Paper

Motivation

  • Caching algorithms do well for certain workloads do not perform well for others

    • ARC, LIRS, DLIRS, ML-based LeCaR ...

    • The production storage workloads of today are significantly diverse in their characteristic features and these features can vary overtime even within a single workload

  • Caching algorithms that do well for certain cache sizes do not necessarily perform well for other cache sizes

    • As cache size changes

      • workload-induced dynamic cache state, the cache-relevant workload features, and thereby the most effective strategies, can all vary

Contribution

  1. Identify the cache-relevant features that inform workload primitive types

  2. CACHEUS: inspired by LeCaR but overcomes an important shortcoming by being completely adaptive, with the elimination of all statically chosen hyperparameters

  3. Design of two lightweight experts: CR-LFU and SR-LRU

    1. CR: churn resistance

    2. SR: scan resistance

Understand the workloads

  1. Workload Primitive Types

    1. LRU-friendly: defined by an access sequence that is best handled by the least recently used (LRU) caching algorithm.

    2. LFU-friendly: defined by an access sequence that is best handled by the least frequently used (LFU) caching algorithm.

    3. Scan: defined by an access sequence where a subset of stored items are accessed exactly once.

    4. Churn: defined by repeated accesses to a subset of stored items with each item being accessed with equal probability

  2. Composing Workloads

    1. Modern storage workloads are typically a composition of the above workload primitive types.

    2. As cache size changes, a single workload's primitive type may vary.

      1. I.e. LRU-friendly type workload at cache size C1 may transform into a Churn type at a cache size C2 < C1, this can occur when items in the workload's LRU-friendly working set start getting removed from the cache prior to being reused.

Caching Algorithms

  • Adaptive Replacement Cache (ARC)

    • recency, frequency

    • Use two LRU lists

    • Able to:

      • Scan: limits the size of its T1 list used to identify and cache newly accessed items to preserve reused items in T2

        • But, when a scan is followed by a churn, ARC continues to evict from T1 and behaves similar to LRU

    • Unable to:

      • LFU-friendly: Unable to capture full frequency distribution of the workload and perform well for LFU-friendly workloads

      • Churn: inability to distinguish between items that are equally important --> continuous cache replacement

  • Low Interference Recency Set (LIRS)

    • SOTA: based on reuse distance

    • Well for

      • Scan workloads: routing one-time accesses via its short filtering list

        • But the size of Q is fixed to 1% of the cache, which cannot adapt to dynamic working sets

    • Not well for

      • LFU-friendly workloads

      • Unable to recognize reuse quickly enough for items with low overall reuse

  • Dynamic LIRS (DLIRS)

    • Incorporates adaptation in LIRS. Dynamically adjust the cache partitions assigned to high and low reuse-distance items.

    • Well for

      • Scan

      • LRU-friendly

    • Not well for

      • LFU-unfriendliness

    • But: not perform as well as LIRS in practice

  • Learning Cache Replacement (LeCaR)

    • ML based technique that uses reinforcement learning and regret minimization to control dynamic use of two cache replacement policies, LRU and LFU

LeCaR

  • On each eviction, an expert is chosen randomly with probabilities proportional to the weights w(LRU) and w(LFU). LeCaR dynamically learns these weights by assigning penalties for wrongful evictions.

  • Learning rate parameter: set the magnitude of change when the algorithm makes a poor decision.

    • Larger: quicker learning, but needs larger corrections when the learning is flawed

  • Discount rate parameter: decide how quickly to stop learning

CACHEUS

Note:

  • Think about: Like LeCaR, CACHEUS uses exactly two experts. The usage of more than two experts was considered for early CACHEUS versions. Interestingly, the performance with more than two experts was significantly worse than when using only LRU and LFU. Having multiple experts is generally not beneficial unless the selected experts are orthogonal in nature, and operate based on completely different and complementary strategies. The intuition here is that multiple experts will overlap in their eviction decisions thereby affecting learning outcomes and deteriorating the performance. We demonstrate in this paper that with two well-chosen experts CACHEUS is able to best the state-of-the-art with statistical significance

    • No way of saying which one is better

Evaluation

Setup

  • 17,766 simulation experiments

  • 329 workloads

    • For each workload, evaluate against 6 different cache configs that are sized relative to the workload's footprint

  • 5 different production storage I/O datasets

Last updated