# Learning Cache Replacement with CACHEUS

## Talk&#x20;

* Workloads:&#x20;
  * LFU-friendly
  * LRU-friendly
  * scan&#x20;
  * churn&#x20;
* CACHEUS: a new class of fully adaptive, machine-learned caching algorithms that utilize a combination of experts designed to address these workload primitive types&#x20;
  * Experts: SOTA ARC, LIRS, LFU, LR-LRU, CR-LFU&#x20;
* 17.766 simulation experiments on 329 workloads run against 6 different cache configurations&#x20;

Cache: Fast but relatively small in capacity&#x20;

![](/files/-MWaoyDUNpJZ0A0g7gCG)

Cache management + ML: improved performance, improves decision processes&#x20;

#### Cache replacement algorithms&#x20;

* Non-adaptive:&#x20;
  * LRU
  * LFU
  * LIRS (low inter-reference recency set)
* Adaptive
  * ARC&#x20;
  * Dynamic LIRS&#x20;
* ML-based adaptive&#x20;
  * Adaptive caching using multiple experts (ACME)
  * RL on Cache Replacement (LeCaR)
  * Reinforcement learning (this work)

#### Workload Primitives&#x20;

![](/files/-MWapSqRcfpX-SqNpTWb)

![Non handles all primitive types ](/files/-MWappncJyeWEHCvthv_)

#### LeCaR&#x20;

* ML-Based&#x20;
  * Simple: LRU, LFU as experts
  * Adaptive: update weights
  * Outperforms state-of-the-art: small cache sizes&#x20;

![](/files/-MWaq3mIt7tWtYFbxjNV)

#### Limitation of LeCaR

* Fixed learning rate: empirically chosen&#x20;
* Can't handle Scan type&#x20;

#### Improving LeCaR

* Adaptive learning rate
* Improving experts
  * Introduce scan resistance&#x20;
    * Replace LRU with&#x20;
      * ARC (unable to handle a scan followed by churn)
      * LIRS (not adaptive, limited ability to handle LRU-friendly worklod)
      * DLIRS (do not adapt well emprically&#x20;
    * Scan resistant LRU: SR-LRU
  * Improve churn resistance&#x20;
    * Churn resistant LFU (CR-LFU)&#x20;

![](/files/-MWaqmYVw6LAzMWoLX8X)

#### CACHEUS: Learning Rate Adaptation&#x20;

* Learning rate changed&#x20;
  * Performance change&#x20;
    * Using the gradient information&#x20;
      * Positive: reinforce latest direction, update the learning rate in the same direction for the next time &#x20;
      * Negative: reverse the latest direction&#x20;

![](/files/-MWar5BKw7T9Oca8xP9v)

* Learning rate unchanged&#x20;
  * Performance change&#x20;
    * Positive, no update
    * Negative, random jump&#x20;
* Performance low for 10 intervals
  * Restart learning rate&#x20;

#### SR-LRU

![](/files/-MWarZrM1_O9tG-VLBNT)

* Insert x into MRU (most recently used) position of Scan Resistant portion of the Cache&#x20;
* If miss in cache, hit in history: insert x into MRU position of Reuse portion of the cache instead of SR portion&#x20;
* If hit in the cache, then we don't care
* Hit in the SR: move x into MRU position of reuse portion of cache&#x20;

#### CR-LFU

![](/files/-MWauOIgzmvAlOfXz9qo)

* Evict an item x from MRU position of the FLU portion of Cache&#x20;
* Evict an item from MRU position of the LFU portion of Cache, move the requested item to MRU position of MFU
* Move x into MRU position of MFU portion of cache&#x20;

#### Experiments&#x20;

* Datasets: 5 sources
  * FIU
  * MSR
  * CloudPhysics
  * CloudVPS
  * CloudCache&#x20;
* 6 Cache sizes&#x20;
* 6+1 algorithms compared&#x20;
* Total experiments: 17.766

## Paper

### Motivation

* Caching algorithms do well for certain **workloads** do not perform well for others&#x20;
  * ARC, LIRS, DLIRS, ML-based LeCaR ...
  * The production storage workloads of today are significantly diverse in their characteristic features and these features can vary overtime even within a single workload&#x20;
* Caching algorithms that do well for certain cache sizes do not necessarily perform well for other **cache sizes**&#x20;
  * As cache size changes&#x20;
    * workload-induced dynamic cache state, the cache-relevant workload features, and thereby the most effective strategies, can all vary&#x20;

### Contribution&#x20;

1. Identify the cache-relevant features that inform **workload primitive types**&#x20;
2. **CACHEUS**: inspired by LeCaR but overcomes an important shortcoming by being completely adaptive, with the elimination of all statically chosen hyperparameters
3. **Design of two lightweight experts**: CR-LFU and SR-LRU&#x20;
   1. CR: churn resistance&#x20;
   2. SR: scan resistance&#x20;

#### Understand the workloads&#x20;

1. **Workload Primitive Types**&#x20;

   1. *LRU-friendly:* defined by an access sequence that is best handled by the least recently used (LRU) caching algorithm.
   2. *LFU-friendly:* defined by an access sequence that is best handled by the least frequently used (LFU) caching algorithm.&#x20;
   3. *Scan:* defined by an access sequence where a subset of stored items are accessed exactly once.&#x20;
   4. *Churn:* defined by repeated accesses to a subset of stored items with each item being accessed with equal probability

   &#x20;
2. **Composing Workloads**&#x20;
   1. Modern storage workloads are typically a composition of the above workload primitive types.&#x20;
   2. As cache size changes, a single workload's primitive type may vary.&#x20;
      1. I.e. LRU-friendly type workload at cache size C1 may transform into a Churn type at a cache size C2 < C1, this can occur when items in the workload's LRU-friendly working set start getting removed from the cache prior to being reused.&#x20;

#### Caching Algorithms

![](/files/-MX8FiiN27leuq0RdDMd)

* Adaptive Replacement Cache (ARC)
  * recency, frequency
  * Use two LRU lists&#x20;
  * Able to:
    * Scan: limits the size of its T1 list used to identify and cache newly accessed items to preserve reused items in T2&#x20;
      * But, when a scan is followed by a churn, ARC continues to evict from T1 and behaves similar to LRU&#x20;
  * Unable to:
    * LFU-friendly: Unable to capture full frequency distribution of the workload and perform well for LFU-friendly workloads&#x20;
    * Churn: inability to distinguish between items that are equally important --> continuous cache replacement&#x20;
* Low Interference Recency Set (LIRS)&#x20;
  * SOTA: based on reuse distance&#x20;
  * Well for&#x20;
    * Scan workloads: routing one-time accesses via its short filtering list&#x20;
      * But the size of Q is fixed to 1% of the cache, which cannot adapt to dynamic working sets&#x20;
  * Not well for&#x20;
    * LFU-friendly workloads&#x20;
    * Unable to recognize reuse quickly enough for items with low overall reuse&#x20;
* Dynamic LIRS (DLIRS)&#x20;
  * Incorporates adaptation in LIRS. Dynamically adjust the cache partitions assigned to high and low reuse-distance items.&#x20;
  * Well for&#x20;
    * Scan&#x20;
    * LRU-friendly&#x20;
  * Not well for&#x20;
    * LFU-unfriendliness&#x20;
  * But: not perform as well as LIRS in practice&#x20;
* Learning Cache Replacement (LeCaR)&#x20;
  * ML based technique that uses reinforcement learning and regret minimization to control dynamic use of two cache replacement policies, LRU and LFU&#x20;

LeCaR

* On each eviction, an expert is chosen randomly with probabilities proportional to the weights w(LRU) and w(LFU). LeCaR dynamically learns these weights by assigning penalties for wrongful evictions.&#x20;
* Learning rate parameter: set the magnitude of change when the algorithm makes a poor decision.
  * Larger: quicker learning, but needs larger corrections when the learning is flawed&#x20;
* Discount rate parameter: decide how quickly to stop learning&#x20;

### CACHEUS&#x20;

Note:

* Think about: Like LeCaR, CACHEUS uses exactly two experts. The usage of more than two experts was considered for early CACHEUS versions. Interestingly, the performance with more than two experts was significantly worse than when using only LRU and LFU. Having multiple experts is generally not beneficial unless the selected experts are orthogonal in nature, and operate based on completely different and complementary strategies. The intuition here is that multiple experts will overlap in their eviction decisions thereby affecting learning outcomes and deteriorating the performance. We demonstrate in this paper that with two well-chosen experts CACHEUS is able to best the state-of-the-art with statistical significance
  * No way of saying which one is better&#x20;

### Evaluation

#### Setup&#x20;

* 17,766 simulation experiments
* 329 workloads&#x20;
  * For each workload, evaluate against 6 different cache configs that are sized relative to the workload's footprint&#x20;
* 5 different production storage I/O datasets&#x20;


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://sliu583.gitbook.io/blog/specific-work/wisr-group/cache/index/learning-cache-replacement-with-cacheus.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
