Learning Cache Replacement with CACHEUS

https://www.usenix.org/conference/fast21/presentation/rodriguez

Talk

Workloads:
- LFU-friendly
- LRU-friendly
- scan
- churn
CACHEUS: a new class of fully adaptive, machine-learned caching algorithms that utilize a combination of experts designed to address these workload primitive types
- Experts: SOTA ARC, LIRS, LFU, LR-LRU, CR-LFU
17.766 simulation experiments on 329 workloads run against 6 different cache configurations

Cache: Fast but relatively small in capacity

Cache management + ML: improved performance, improves decision processes

Non-adaptive:
- LRU
- LFU
- LIRS (low inter-reference recency set)
Adaptive
- ARC
- Dynamic LIRS
ML-based adaptive
- Adaptive caching using multiple experts (ACME)
- RL on Cache Replacement (LeCaR)
- Reinforcement learning (this work)

ML-Based
- Simple: LRU, LFU as experts
- Adaptive: update weights
- Outperforms state-of-the-art: small cache sizes

Learning rate changed
- Performance change
  - Using the gradient information
    Positive: reinforce latest direction, update the learning rate in the same direction for the next time
    Negative: reverse the latest direction

Learning rate unchanged
- Performance change
  - Positive, no update
  - Negative, random jump
Performance low for 10 intervals
- Restart learning rate

Insert x into MRU (most recently used) position of Scan Resistant portion of the Cache
If miss in cache, hit in history: insert x into MRU position of Reuse portion of the cache instead of SR portion
If hit in the cache, then we don't care
Hit in the SR: move x into MRU position of reuse portion of cache

Evict an item x from MRU position of the FLU portion of Cache
Evict an item from MRU position of the LFU portion of Cache, move the requested item to MRU position of MFU
Move x into MRU position of MFU portion of cache

Caching algorithms do well for certain workloads do not perform well for others
- ARC, LIRS, DLIRS, ML-based LeCaR ...
- The production storage workloads of today are significantly diverse in their characteristic features and these features can vary overtime even within a single workload
Caching algorithms that do well for certain cache sizes do not necessarily perform well for other cache sizes
- As cache size changes
  - workload-induced dynamic cache state, the cache-relevant workload features, and thereby the most effective strategies, can all vary

Identify the cache-relevant features that inform workload primitive types
CACHEUS: inspired by LeCaR but overcomes an important shortcoming by being completely adaptive, with the elimination of all statically chosen hyperparameters
Design of two lightweight experts: CR-LFU and SR-LRU
1. CR: churn resistance
2. SR: scan resistance

Workload Primitive Types
1. LRU-friendly: defined by an access sequence that is best handled by the least recently used (LRU) caching algorithm.
2. LFU-friendly: defined by an access sequence that is best handled by the least frequently used (LFU) caching algorithm.
3. Scan: defined by an access sequence where a subset of stored items are accessed exactly once.
4. Churn: defined by repeated accesses to a subset of stored items with each item being accessed with equal probability
Composing Workloads
1. Modern storage workloads are typically a composition of the above workload primitive types.
2. As cache size changes, a single workload's primitive type may vary.
  1. I.e. LRU-friendly type workload at cache size C1 may transform into a Churn type at a cache size C2 < C1, this can occur when items in the workload's LRU-friendly working set start getting removed from the cache prior to being reused.

Adaptive Replacement Cache (ARC)
- recency, frequency
- Use two LRU lists
- Able to:
  - Scan: limits the size of its T1 list used to identify and cache newly accessed items to preserve reused items in T2
    But, when a scan is followed by a churn, ARC continues to evict from T1 and behaves similar to LRU
- Unable to:
  - LFU-friendly: Unable to capture full frequency distribution of the workload and perform well for LFU-friendly workloads
  - Churn: inability to distinguish between items that are equally important --> continuous cache replacement
Low Interference Recency Set (LIRS)
- SOTA: based on reuse distance
- Well for
  - Scan workloads: routing one-time accesses via its short filtering list
    But the size of Q is fixed to 1% of the cache, which cannot adapt to dynamic working sets
- Not well for
  - LFU-friendly workloads
  - Unable to recognize reuse quickly enough for items with low overall reuse
Dynamic LIRS (DLIRS)
- Incorporates adaptation in LIRS. Dynamically adjust the cache partitions assigned to high and low reuse-distance items.
- Well for
  - Scan
  - LRU-friendly
- Not well for
  - LFU-unfriendliness
- But: not perform as well as LIRS in practice
Learning Cache Replacement (LeCaR)
- ML based technique that uses reinforcement learning and regret minimization to control dynamic use of two cache replacement policies, LRU and LFU

LeCaR

On each eviction, an expert is chosen randomly with probabilities proportional to the weights w(LRU) and w(LFU). LeCaR dynamically learns these weights by assigning penalties for wrongful evictions.
Learning rate parameter: set the magnitude of change when the algorithm makes a poor decision.
- Larger: quicker learning, but needs larger corrections when the learning is flawed
Discount rate parameter: decide how quickly to stop learning

Note:

Think about: Like LeCaR, CACHEUS uses exactly two experts. The usage of more than two experts was considered for early CACHEUS versions. Interestingly, the performance with more than two experts was significantly worse than when using only LRU and LFU. Having multiple experts is generally not beneficial unless the selected experts are orthogonal in nature, and operate based on completely different and complementary strategies. The intuition here is that multiple experts will overlap in their eviction decisions thereby affecting learning outcomes and deteriorating the performance. We demonstrate in this paper that with two well-chosen experts CACHEUS is able to best the state-of-the-art with statistical significance
- No way of saying which one is better

17,766 simulation experiments
329 workloads
- For each workload, evaluate against 6 different cache configs that are sized relative to the workload's footprint
5 different production storage I/O datasets

Last updated 4 years ago

Was this helpful?