Learning Cache Replacement with CACHEUS
https://www.usenix.org/conference/fast21/presentation/rodriguez
Last updated
Was this helpful?
https://www.usenix.org/conference/fast21/presentation/rodriguez
Last updated
Was this helpful?
Workloads:
LFU-friendly
LRU-friendly
scan
churn
CACHEUS: a new class of fully adaptive, machine-learned caching algorithms that utilize a combination of experts designed to address these workload primitive types
Experts: SOTA ARC, LIRS, LFU, LR-LRU, CR-LFU
17.766 simulation experiments on 329 workloads run against 6 different cache configurations
Cache: Fast but relatively small in capacity
Cache management + ML: improved performance, improves decision processes
Non-adaptive:
LRU
LFU
LIRS (low inter-reference recency set)
Adaptive
ARC
Dynamic LIRS
ML-based adaptive
Adaptive caching using multiple experts (ACME)
RL on Cache Replacement (LeCaR)
Reinforcement learning (this work)
ML-Based
Simple: LRU, LFU as experts
Adaptive: update weights
Outperforms state-of-the-art: small cache sizes
Fixed learning rate: empirically chosen
Can't handle Scan type
Adaptive learning rate
Improving experts
Introduce scan resistance
Replace LRU with
ARC (unable to handle a scan followed by churn)
LIRS (not adaptive, limited ability to handle LRU-friendly worklod)
DLIRS (do not adapt well emprically
Scan resistant LRU: SR-LRU
Improve churn resistance
Churn resistant LFU (CR-LFU)
Learning rate changed
Performance change
Using the gradient information
Positive: reinforce latest direction, update the learning rate in the same direction for the next time
Negative: reverse the latest direction
Learning rate unchanged
Performance change
Positive, no update
Negative, random jump
Performance low for 10 intervals
Restart learning rate
Insert x into MRU (most recently used) position of Scan Resistant portion of the Cache
If miss in cache, hit in history: insert x into MRU position of Reuse portion of the cache instead of SR portion
If hit in the cache, then we don't care
Hit in the SR: move x into MRU position of reuse portion of cache
Evict an item x from MRU position of the FLU portion of Cache
Evict an item from MRU position of the LFU portion of Cache, move the requested item to MRU position of MFU
Move x into MRU position of MFU portion of cache
Datasets: 5 sources
FIU
MSR
CloudPhysics
CloudVPS
CloudCache
6 Cache sizes
6+1 algorithms compared
Total experiments: 17.766
Caching algorithms do well for certain workloads do not perform well for others
ARC, LIRS, DLIRS, ML-based LeCaR ...
The production storage workloads of today are significantly diverse in their characteristic features and these features can vary overtime even within a single workload
Caching algorithms that do well for certain cache sizes do not necessarily perform well for other cache sizes
As cache size changes
workload-induced dynamic cache state, the cache-relevant workload features, and thereby the most effective strategies, can all vary
Identify the cache-relevant features that inform workload primitive types
CACHEUS: inspired by LeCaR but overcomes an important shortcoming by being completely adaptive, with the elimination of all statically chosen hyperparameters
Design of two lightweight experts: CR-LFU and SR-LRU
CR: churn resistance
SR: scan resistance
Workload Primitive Types
LRU-friendly: defined by an access sequence that is best handled by the least recently used (LRU) caching algorithm.
LFU-friendly: defined by an access sequence that is best handled by the least frequently used (LFU) caching algorithm.
Scan: defined by an access sequence where a subset of stored items are accessed exactly once.
Churn: defined by repeated accesses to a subset of stored items with each item being accessed with equal probability
Composing Workloads
Modern storage workloads are typically a composition of the above workload primitive types.
As cache size changes, a single workload's primitive type may vary.
I.e. LRU-friendly type workload at cache size C1 may transform into a Churn type at a cache size C2 < C1, this can occur when items in the workload's LRU-friendly working set start getting removed from the cache prior to being reused.
Adaptive Replacement Cache (ARC)
recency, frequency
Use two LRU lists
Able to:
Scan: limits the size of its T1 list used to identify and cache newly accessed items to preserve reused items in T2
But, when a scan is followed by a churn, ARC continues to evict from T1 and behaves similar to LRU
Unable to:
LFU-friendly: Unable to capture full frequency distribution of the workload and perform well for LFU-friendly workloads
Churn: inability to distinguish between items that are equally important --> continuous cache replacement
Low Interference Recency Set (LIRS)
SOTA: based on reuse distance
Well for
Scan workloads: routing one-time accesses via its short filtering list
But the size of Q is fixed to 1% of the cache, which cannot adapt to dynamic working sets
Not well for
LFU-friendly workloads
Unable to recognize reuse quickly enough for items with low overall reuse
Dynamic LIRS (DLIRS)
Incorporates adaptation in LIRS. Dynamically adjust the cache partitions assigned to high and low reuse-distance items.
Well for
Scan
LRU-friendly
Not well for
LFU-unfriendliness
But: not perform as well as LIRS in practice
Learning Cache Replacement (LeCaR)
ML based technique that uses reinforcement learning and regret minimization to control dynamic use of two cache replacement policies, LRU and LFU
LeCaR
On each eviction, an expert is chosen randomly with probabilities proportional to the weights w(LRU) and w(LFU). LeCaR dynamically learns these weights by assigning penalties for wrongful evictions.
Learning rate parameter: set the magnitude of change when the algorithm makes a poor decision.
Larger: quicker learning, but needs larger corrections when the learning is flawed
Discount rate parameter: decide how quickly to stop learning
Note:
Think about: Like LeCaR, CACHEUS uses exactly two experts. The usage of more than two experts was considered for early CACHEUS versions. Interestingly, the performance with more than two experts was significantly worse than when using only LRU and LFU. Having multiple experts is generally not beneficial unless the selected experts are orthogonal in nature, and operate based on completely different and complementary strategies. The intuition here is that multiple experts will overlap in their eviction decisions thereby affecting learning outcomes and deteriorating the performance. We demonstrate in this paper that with two well-chosen experts CACHEUS is able to best the state-of-the-art with statistical significance
No way of saying which one is better
17,766 simulation experiments
329 workloads
For each workload, evaluate against 6 different cache configs that are sized relative to the workload's footprint
5 different production storage I/O datasets