Neural Adaptive Video Streaming with Pensieve

https://dl.acm.org/doi/abs/10.1145/3098822.3098843

Motivation
- Users start leaving if video doesn't play in 2 seconds
- Dynamic streaming over HTTP (DASH)
  - Bitrate: higher --> higher quality
    Video rate > capacity --> empty playback buffer
- Adaptive Bitrate (ABR) Algorithms
  - Variable bit rate adaptively: depends on the network and playback buffer condition
  - Large impact on QoE
- Why is ABR challenging?
  - Network throughput is variable & uncertain
  - Conflicting QoE goals
    Bitrate
    Rebuffering time
    Smoothness
  - Cascading effects of decisions
Contribution: Pensieve
- Learns ABR algorithm automatically through experience
- Contributions
  - First network control system use modern "deep" reinforcement learning
  - Delivers 12-25% better QoE, with 10-30% less rebuffering than previous ABR algorithms
  - Tailors ABR decisions for different network conditions in a data-driven way
Previous Fixed ABR algoithms
- Rate-based: pick bitrate based on predicted throughput
  - FESTIVE, PANDA, CS2P
- Buffer-based: pick bitrate based on buffer occupancy
  - BBA, BOLA
- Hybrid: use both throughput prediction & buffer occupancy
  - PBA, MPC
    MPC: maximize QoE(t, t+T) subject to system dynamics
    Problem: needs accurate throughput model
    Use simplified prediction (conservative throughput prediction)
- Simplified inaccurate model leads to suboptimal performance
Solution: learn from video streaming sessions in actual network conditions

Reinforcement Learning

Learning agent interacts with environment
Goal: maximize the cumulative reward
Application
- Finite horizon control problem
- Collect experience data: trajectory of [state, action, reward]
- Train: estimate from empirical data
Good at
- Learn the dynamics directly from experience
- Optimize the high level QoE objective end-to-end
- Extract control rules from raw high-dimensional signals
Training system
- Large corpus of network traces
- Video playback
- Model update
Trace-driven Evaluation
- Data set, video, video player, video server
- Improves the best previous scheme by 12-25% and is within 9-14% of the offline optimal
QoE breakdown
- Achieve the best collective QoE
- Reduce rebuffering by 10-32% over second best algorithm
Can this learning generalize?
- Training and testing data are drown from the same dataset now
- Video streaming might be in different kinds of network
- Generate synthetic trace from a Hidden Markov model
  - Covering the network conditions?
  - Covers a wide range of average throughput and network variation
- 5% degradation compared with Pensieve trained on real network trace
Lessons we learned
- Build a fast experimentation / simulation platform
  - Doesn't model TCP, and the underlying network packet
  - Coarse-grain chunk simulator
- Data diversity is more important than "accuracy"
  - Want the agent to experience a diverse mix of network variation
- Think carefully about controller state space (observation signals)
  - Too large a state space --> slow and difficult learning
  - Too small a state space --> loss of information
  - When in doubt, include rather than cut the signal

PreviousSENSEI: Aligning Video Streaming Quality with Dynamic User Sensitivity NextServer-Driven Video Streaming for Deep Learning Inference

Last updated 3 years ago

Was this helpful?