# Neural Adaptive Video Streaming with Pensieve

* Motivation&#x20;
  * Users start leaving if video doesn't play in 2 seconds&#x20;
  * Dynamic streaming over HTTP (DASH)&#x20;
    * ![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MVORxAomcgtzVVUqmws%2Fuploads%2F2zhGFAcdkoQF029BMW9M%2Fimage.png?alt=media\&token=126abba8-d2f6-4731-b440-b75881c0406c)
    * ![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MVORxAomcgtzVVUqmws%2Fuploads%2FXdQdpL9Zx1nOsgE4xe4f%2Fimage.png?alt=media\&token=f06463c2-d817-43f0-b3a3-250442a9c7c1)
      * Bitrate: higher --> higher quality&#x20;
      * Video rate > capacity --> empty playback buffer
  * Adaptive Bitrate (ABR) Algorithms&#x20;
    * Variable bit rate adaptively: depends on the network and playback buffer condition&#x20;
    * Large impact on QoE&#x20;
  * Why is ABR challenging?&#x20;
    * Network throughput is variable & uncertain&#x20;
    * Conflicting QoE goals&#x20;
      * Bitrate&#x20;
      * Rebuffering time&#x20;
      * Smoothness&#x20;
    * Cascading effects of decisions&#x20;
* Contribution: Pensieve&#x20;
  * Learns ABR algorithm automatically through experience
  * Contributions&#x20;
    * First network control system use modern "deep" reinforcement learning
    * Delivers 12-25% better QoE, with 10-30% less rebuffering than previous ABR algorithms&#x20;
    * Tailors ABR decisions for different network conditions in a data-driven way
* Previous Fixed ABR algoithms&#x20;
  * Rate-based: pick bitrate based on predicted throughput
    * FESTIVE, PANDA, CS2P
  * Buffer-based: pick bitrate based on buffer occupancy&#x20;
    * BBA, BOLA
  * Hybrid: use both throughput prediction & buffer occupancy&#x20;
    * PBA, MPC
      * MPC: maximize QoE(t, t+T) subject to system dynamics&#x20;
        * Problem: needs accurate throughput model&#x20;
        * Use simplified prediction (conservative throughput prediction)
  * Simplified inaccurate model leads to suboptimal performance&#x20;
* Solution: learn from video streaming sessions in actual network conditions&#x20;

#### Reinforcement Learning&#x20;

* Learning agent interacts with environment&#x20;
* ![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MVORxAomcgtzVVUqmws%2Fuploads%2FzwzgSk8fJnNRRZonw0x8%2Fimage.png?alt=media\&token=f4c2c22c-ff99-4af7-bcb3-933d8759886e)
* Goal: maximize the cumulative reward&#x20;
* Application&#x20;
  * Finite horizon control problem&#x20;
  * ![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MVORxAomcgtzVVUqmws%2Fuploads%2FIW6CsoW7V66eDPmXMM5H%2Fimage.png?alt=media\&token=6cff7769-c3e9-4f5d-9f25-f9d194a8682b)
  * Collect experience data: trajectory of \[state, action, reward]
  * Train: estimate from empirical data&#x20;
* Good at&#x20;
  * Learn the dynamics directly from experience&#x20;
  * Optimize the high level QoE objective end-to-end
  * Extract control rules from raw high-dimensional signals&#x20;
* Training system
  * Large corpus of network traces&#x20;
  * Video playback
  * Model update&#x20;
* Trace-driven Evaluation&#x20;
  * Data set, video, video player, video server&#x20;
  * ![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MVORxAomcgtzVVUqmws%2Fuploads%2FZtGXKaWrqcEgNypw7I19%2Fimage.png?alt=media\&token=74bc397a-7e12-4e2a-9def-1e22e6809546)
  * Improves the best previous scheme by 12-25% and is within 9-14% of the offline optimal&#x20;
* QoE breakdown&#x20;
  * Achieve the best collective QoE&#x20;
  * Reduce rebuffering by 10-32% over second best algorithm&#x20;
* Can this learning generalize?&#x20;
  * Training and testing data are drown from the same dataset now&#x20;
  * Video streaming might be in different kinds of network
  * Generate synthetic trace from a Hidden Markov model&#x20;
    * Covering the network conditions?&#x20;
    * Covers a wide range of average throughput and network variation&#x20;
  * 5% degradation compared with Pensieve trained on real network trace&#x20;
* Lessons we learned&#x20;
  * Build a fast experimentation / simulation platform&#x20;
    * Doesn't model TCP, and the underlying network packet&#x20;
    * Coarse-grain chunk simulator&#x20;
  * Data diversity is more important than "accuracy"&#x20;
    * Want the agent to experience a diverse mix of network variation&#x20;
  * Think carefully about controller state space (observation signals)&#x20;
    * Too large a state space --> slow and difficult learning
    * Too small a state space --> loss of information
    * When in doubt, include rather than cut the signal&#x20;
