LHD: Improving Cache Hit Rate by Maximizing Hit Density

https://www.usenix.org/conference/nsdi18/presentation/beckmann

Presentation

  • Key-value cache is 100x faster than database

    • Crucial in modern web-application, web-server --> cache (sit at a separate cluster, DRAM, MemCache), faster than underlying database

  • At 98% cache hit rate

  • +1% hit rate --> 35% speedup

  • Even small hit rate improvements cause significant speedup

Choosing the right eviction policy is hard

  • Key-value caches have unique challenges

    • Variable object sizes

    • Variable workloads

  • Prior policies are heuristics that combine regency and frequency

    • No theoretical foundation

    • Require hand-tuning --> fragile to workload changes

  • No policy works for all workloads

    • Prior system simulates many cache policy configurations to find right one per workload

Goal: Auto-tuning eviction policy across workloads

  • The "big picture" of key-value caching

    • Goal: maximizing cache hit rate

    • Constraint: limited cache space

    • Uncertainty: in practice, don't know what is accessed again

    • Difficulty: variable sizes

  • Where does cache space go?

  • Hit density (HD)

    • Hit density combines hit probability and expected cost

      • Q: Don't understand, if the cost incorporates the size, why isn't that being considered as the final objective that we want to achieve?

    • object's hit probability / (object size * object's expected lifetime)

  • Least hit density (LHD) policy: Trying to evict object with smallest hit density

  • But how do we predict these quantities?

    • Estimating hit density (HD)

      • Age - # accesses since object was last requested

      • Random variables

        • H - hit age: P [H = 100] = probability an object hits after 100 accesses (101 access)

        • L - life time: P[L = 100] = probability an object hits or is evicted after 100 accesses

      • Easy to estimate HD from these quantities

        • Hit probability

        • Expected lifetime

  • Example: estimating HD from object age

    • Estimate HD using conditional probability

    • Monitor distribution of H & L online

    • By definition, object of age a wasn't requested at age <= a

    • Ignore all events before a

  • LHD by example

    • Users ask repeated for common objects and some user-specific objects

    • Best hand-tuned policy for this app: cache common media + as much user-specific as fits

  • Probability of referencing object again

    • Common object modeled as scan, user-specific object modeled as Zipf

    • Guard the objects up to the peak (slightly older), and after the peak, eviction policy to perform somethings like LRU

  • Improving LHD using additional object features

    • Conditional probability lets us easily add information

    • Condition H & L upon additional informative object features, e.g.,

      • Which app requested this object?

      • How long has this object taken to hit in the past?

    • Features inform decisions --> LHD learns the "right" policy

      • No hard-code heuristics

  • LHD gets more hits across many traces, and needs much less space

  • RANKCACHE: translate theory to practice

    • The problem

      • Prior complex policies require complex data structures

      • Synchronization --> poor scalability --> unacceptable request throughput

    • Policies like GDSF require O(log N) heaps

    • Even O(1) LRU sometimes too slow because of synchronization

    • Many key-value systems approximate LRU with CLOCK / FIFO

    • Can LHD achieve similar request throughput to production system?

  • RankCache makes LHD fast

    • Track information approximately

    • Precompute HD as table indexed by age & app id & etc

    • Randomly sample objects to find victim

      • Similar to Redis, Memshare

    • Tolerate rare races in eviction policy

  • Related work

    • Using conditional probabilities for eviction policies in CPU caches

      • EVA

      • Fixed object sizes

      • Different ranking function

    • Prior replacement policies

      • Key-value: Hyperbolic, simulations, AdaptSize, Cliffhanger

      • Non key-value: ARC, SLRU, LRU-K

      • Heuristic based

      • Require tuning or simulation

  • Future directions

    • Dynamic latency / bandwidth optimization

      • Smoothly and dynamically switch between optimized hit ratio and byte hit ratio

    • Optimizing end-to-end response latency

      • App touches multiple objects per request

      • One such object evicted --> others should be evicted too

    • Modeling cost, e.g., to maximize write endurance in FLASH / NVM

      • Predict which objects are worth writing to 2nd tier storage from memory

Last updated