LHD: Improving Cache Hit Rate by Maximizing Hit Density

https://www.usenix.org/conference/nsdi18/presentation/beckmann

Presentation

Key-value cache is 100x faster than database
- Crucial in modern web-application, web-server --> cache (sit at a separate cluster, DRAM, MemCache), faster than underlying database
At 98% cache hit rate
+1% hit rate --> 35% speedup
Even small hit rate improvements cause significant speedup

Choosing the right eviction policy is hard

Key-value caches have unique challenges
- Variable object sizes
- Variable workloads
Prior policies are heuristics that combine regency and frequency
- No theoretical foundation
- Require hand-tuning --> fragile to workload changes
No policy works for all workloads
- Prior system simulates many cache policy configurations to find right one per workload

Goal: Auto-tuning eviction policy across workloads

The "big picture" of key-value caching
- Goal: maximizing cache hit rate
- Constraint: limited cache space
- Uncertainty: in practice, don't know what is accessed again
- Difficulty: variable sizes
Where does cache space go?

Hit density (HD)
- Hit density combines hit probability and expected cost
  - Q: Don't understand, if the cost incorporates the size, why isn't that being considered as the final objective that we want to achieve?
- object's hit probability / (object size * object's expected lifetime)
Least hit density (LHD) policy: Trying to evict object with smallest hit density
But how do we predict these quantities?
- Estimating hit density (HD)
  - Age - # accesses since object was last requested
  - Random variables
    H - hit age: P [H = 100] = probability an object hits after 100 accesses (101 access)
    L - life time: P[L = 100] = probability an object hits or is evicted after 100 accesses
  - Easy to estimate HD from these quantities
    Hit probability
    Expected lifetime
Example: estimating HD from object age
- Estimate HD using conditional probability
- Monitor distribution of H & L online
- By definition, object of age a wasn't requested at age <= a
- Ignore all events before a

LHD by example
- Users ask repeated for common objects and some user-specific objects
- Best hand-tuned policy for this app: cache common media + as much user-specific as fits
Probability of referencing object again
- Common object modeled as scan, user-specific object modeled as Zipf
- Guard the objects up to the peak (slightly older), and after the peak, eviction policy to perform somethings like LRU
Improving LHD using additional object features
- Conditional probability lets us easily add information
- Condition H & L upon additional informative object features, e.g.,
  - Which app requested this object?
  - How long has this object taken to hit in the past?
- Features inform decisions --> LHD learns the "right" policy
  - No hard-code heuristics
LHD gets more hits across many traces, and needs much less space
RANKCACHE: translate theory to practice
- The problem
  - Prior complex policies require complex data structures
  - Synchronization --> poor scalability --> unacceptable request throughput
- Policies like GDSF require O(log N) heaps
- Even O(1) LRU sometimes too slow because of synchronization
- Many key-value systems approximate LRU with CLOCK / FIFO
- Can LHD achieve similar request throughput to production system?
RankCache makes LHD fast
- Track information approximately
- Precompute HD as table indexed by age & app id & etc
- Randomly sample objects to find victim
  - Similar to Redis, Memshare
- Tolerate rare races in eviction policy
Related work
- Using conditional probabilities for eviction policies in CPU caches
  - EVA
  - Fixed object sizes
  - Different ranking function
- Prior replacement policies
  - Key-value: Hyperbolic, simulations, AdaptSize, Cliffhanger
  - Non key-value: ARC, SLRU, LRU-K
  - Heuristic based
  - Require tuning or simulation
Future directions
- Dynamic latency / bandwidth optimization
  - Smoothly and dynamically switch between optimized hit ratio and byte hit ratio
- Optimizing end-to-end response latency
  - App touches multiple objects per request
  - One such object evicted --> others should be evicted too
- Modeling cost, e.g., to maximize write endurance in FLASH / NVM
  - Predict which objects are worth writing to 2nd tier storage from memory

PreviousTACK: Improving Wireless Transport Performance by Taming Acknowledgements NextAdaptSize: Orchestrating the Hot Object Memory Cache in a Content Delivery Network

Last updated 3 years ago

Was this helpful?