Neural Adaptive Video Streaming with Pensieve
https://dl.acm.org/doi/abs/10.1145/3098822.3098843
Last updated
Was this helpful?
https://dl.acm.org/doi/abs/10.1145/3098822.3098843
Last updated
Was this helpful?
Motivation
Users start leaving if video doesn't play in 2 seconds
Dynamic streaming over HTTP (DASH)
Bitrate: higher --> higher quality
Video rate > capacity --> empty playback buffer
Adaptive Bitrate (ABR) Algorithms
Variable bit rate adaptively: depends on the network and playback buffer condition
Large impact on QoE
Why is ABR challenging?
Network throughput is variable & uncertain
Conflicting QoE goals
Bitrate
Rebuffering time
Smoothness
Cascading effects of decisions
Contribution: Pensieve
Learns ABR algorithm automatically through experience
Contributions
First network control system use modern "deep" reinforcement learning
Delivers 12-25% better QoE, with 10-30% less rebuffering than previous ABR algorithms
Tailors ABR decisions for different network conditions in a data-driven way
Previous Fixed ABR algoithms
Rate-based: pick bitrate based on predicted throughput
FESTIVE, PANDA, CS2P
Buffer-based: pick bitrate based on buffer occupancy
BBA, BOLA
Hybrid: use both throughput prediction & buffer occupancy
PBA, MPC
MPC: maximize QoE(t, t+T) subject to system dynamics
Problem: needs accurate throughput model
Use simplified prediction (conservative throughput prediction)
Simplified inaccurate model leads to suboptimal performance
Solution: learn from video streaming sessions in actual network conditions
Learning agent interacts with environment
Goal: maximize the cumulative reward
Application
Finite horizon control problem
Collect experience data: trajectory of [state, action, reward]
Train: estimate from empirical data
Good at
Learn the dynamics directly from experience
Optimize the high level QoE objective end-to-end
Extract control rules from raw high-dimensional signals
Training system
Large corpus of network traces
Video playback
Model update
Trace-driven Evaluation
Data set, video, video player, video server
Improves the best previous scheme by 12-25% and is within 9-14% of the offline optimal
QoE breakdown
Achieve the best collective QoE
Reduce rebuffering by 10-32% over second best algorithm
Can this learning generalize?
Training and testing data are drown from the same dataset now
Video streaming might be in different kinds of network
Generate synthetic trace from a Hidden Markov model
Covering the network conditions?
Covers a wide range of average throughput and network variation
5% degradation compared with Pensieve trained on real network trace
Lessons we learned
Build a fast experimentation / simulation platform
Doesn't model TCP, and the underlying network packet
Coarse-grain chunk simulator
Data diversity is more important than "accuracy"
Want the agent to experience a diverse mix of network variation
Think carefully about controller state space (observation signals)
Too large a state space --> slow and difficult learning
Too small a state space --> loss of information
When in doubt, include rather than cut the signal