# LHD: Improving Cache Hit Rate by Maximizing Hit Density

### Presentation&#x20;

* Key-value cache is 100x faster than database&#x20;
  * Crucial in modern web-application, web-server --> cache (sit at a separate cluster, DRAM, MemCache), faster than underlying database&#x20;
* At 98% cache hit rate
* +1% hit rate --> 35% speedup
* Even small hit rate improvements cause significant speedup&#x20;

Choosing the right eviction policy is hard

* Key-value caches have unique challenges&#x20;
  * Variable object sizes
  * Variable workloads&#x20;
* Prior policies are heuristics that combine regency and frequency&#x20;
  * No theoretical foundation&#x20;
  * Require hand-tuning --> fragile to workload changes&#x20;
* No policy works for all workloads&#x20;
  * Prior system simulates many cache policy configurations to find right one per workload&#x20;

Goal: Auto-tuning eviction policy across workloads

* &#x20;The "big picture" of key-value caching
  * Goal: maximizing cache hit rate
  * Constraint: limited cache space&#x20;
  * Uncertainty: in practice, don't know what is accessed again
  * Difficulty: variable sizes&#x20;
* Where does cache space go?&#x20;

![](/files/8SffZ6NArl8HlfdBChis)

* Hit density (HD)
  * Hit density combines hit probability and expected cost&#x20;
    * Q: Don't understand, if the cost incorporates the size, why isn't that being considered as the final objective that we want to achieve?&#x20;
  * **object's hit probability / (object size \* object's expected lifetime)**&#x20;
* Least hit density (LHD) policy: Trying to evict object with smallest hit density&#x20;
* But how do we predict these quantities?&#x20;
  * Estimating hit density (HD)&#x20;
    * Age - # accesses since object was last requested
    * Random variables&#x20;
      * H - hit age: P \[H = 100] = probability an object hits after 100 accesses (101 access)&#x20;
      * L - life time: P\[L = 100] = probability an object hits or is evicted after 100 accesses&#x20;
    * Easy to estimate HD from these quantities&#x20;
      * ![](/files/wgeU7h7aezpVyLAb3msM)
      * Hit probability&#x20;
      * Expected lifetime&#x20;
* Example: estimating HD from object age&#x20;
  * Estimate HD using conditional probability&#x20;
  * Monitor distribution of H & L online&#x20;
  * By definition, object of age a wasn't requested at age <= a&#x20;
  * Ignore all events before a&#x20;

![](/files/emr2M8kq4mZIewvyX1XK)

* LHD by example
  * Users ask repeated for common objects and some user-specific objects&#x20;
  * Best hand-tuned policy for this app: cache common media + as much user-specific as fits&#x20;
* Probability of referencing object again
  * Common object modeled as scan, user-specific object modeled as Zipf&#x20;
  * ![](/files/cseLbeLXI30hVVdKDjGG)
  * ![](/files/31roDp3hdnCynKTbhyRK)
  * Guard the objects up to the peak (slightly older), and after the peak, eviction policy to perform somethings like LRU&#x20;
* Improving LHD using additional object features&#x20;
  * Conditional probability lets us easily add information&#x20;
  * Condition H & L upon additional informative object features, e.g.,&#x20;
    * Which app requested this object?
    * How long has this object taken to hit in the past?
  * Features inform decisions --> LHD learns the "right" policy
    * No hard-code heuristics&#x20;
* LHD gets more hits across many traces, and needs much less space&#x20;
* RANKCACHE: translate theory to practice&#x20;
  * The problem&#x20;
    * Prior complex policies require complex data structures&#x20;
    * Synchronization --> poor scalability --> unacceptable request throughput&#x20;
  * Policies like GDSF require O(log N) heaps&#x20;
  * Even O(1) LRU sometimes too slow because of synchronization&#x20;
  * Many key-value systems approximate LRU with CLOCK / FIFO
  * Can LHD achieve similar request throughput to production system?&#x20;
* RankCache makes LHD fast
  * Track information approximately&#x20;
  * Precompute HD as table indexed by age & app id & etc
  * Randomly sample objects to find victim&#x20;
    * Similar to Redis, Memshare
  * Tolerate rare races in eviction policy&#x20;
* Related work
  * Using conditional probabilities for eviction policies in CPU caches&#x20;
    * EVA
    * Fixed object sizes&#x20;
    * Different ranking function&#x20;
  * Prior replacement policies&#x20;
    * Key-value: Hyperbolic, simulations, AdaptSize, Cliffhanger
    * Non key-value: ARC, SLRU, LRU-K
    * Heuristic based
    * Require tuning or simulation&#x20;
* Future directions&#x20;
  * Dynamic latency / bandwidth optimization&#x20;
    * Smoothly and dynamically switch between optimized hit ratio and byte hit ratio&#x20;
  * Optimizing end-to-end response latency
    * App touches multiple objects per request
    * One such object evicted --> others should be evicted too
  * Modeling cost, e.g., to maximize write endurance in FLASH / NVM
    * Predict which objects are worth writing to 2nd tier storage from memory&#x20;


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://sliu583.gitbook.io/blog/specific-work/wisr-group/cache/index/lhd-improving-cache-hit-rate-by-maximizing-hit-density.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
