Neural Adaptive Video Streaming with Pensieve

https://dl.acm.org/doi/abs/10.1145/3098822.3098843

  • Motivation

    • Users start leaving if video doesn't play in 2 seconds

    • Dynamic streaming over HTTP (DASH)

        • Bitrate: higher --> higher quality

        • Video rate > capacity --> empty playback buffer

    • Adaptive Bitrate (ABR) Algorithms

      • Variable bit rate adaptively: depends on the network and playback buffer condition

      • Large impact on QoE

    • Why is ABR challenging?

      • Network throughput is variable & uncertain

      • Conflicting QoE goals

        • Bitrate

        • Rebuffering time

        • Smoothness

      • Cascading effects of decisions

  • Contribution: Pensieve

    • Learns ABR algorithm automatically through experience

    • Contributions

      • First network control system use modern "deep" reinforcement learning

      • Delivers 12-25% better QoE, with 10-30% less rebuffering than previous ABR algorithms

      • Tailors ABR decisions for different network conditions in a data-driven way

  • Previous Fixed ABR algoithms

    • Rate-based: pick bitrate based on predicted throughput

      • FESTIVE, PANDA, CS2P

    • Buffer-based: pick bitrate based on buffer occupancy

      • BBA, BOLA

    • Hybrid: use both throughput prediction & buffer occupancy

      • PBA, MPC

        • MPC: maximize QoE(t, t+T) subject to system dynamics

          • Problem: needs accurate throughput model

          • Use simplified prediction (conservative throughput prediction)

    • Simplified inaccurate model leads to suboptimal performance

  • Solution: learn from video streaming sessions in actual network conditions

Reinforcement Learning

  • Learning agent interacts with environment

  • Goal: maximize the cumulative reward

  • Application

    • Finite horizon control problem

    • Collect experience data: trajectory of [state, action, reward]

    • Train: estimate from empirical data

  • Good at

    • Learn the dynamics directly from experience

    • Optimize the high level QoE objective end-to-end

    • Extract control rules from raw high-dimensional signals

  • Training system

    • Large corpus of network traces

    • Video playback

    • Model update

  • Trace-driven Evaluation

    • Data set, video, video player, video server

    • Improves the best previous scheme by 12-25% and is within 9-14% of the offline optimal

  • QoE breakdown

    • Achieve the best collective QoE

    • Reduce rebuffering by 10-32% over second best algorithm

  • Can this learning generalize?

    • Training and testing data are drown from the same dataset now

    • Video streaming might be in different kinds of network

    • Generate synthetic trace from a Hidden Markov model

      • Covering the network conditions?

      • Covers a wide range of average throughput and network variation

    • 5% degradation compared with Pensieve trained on real network trace

  • Lessons we learned

    • Build a fast experimentation / simulation platform

      • Doesn't model TCP, and the underlying network packet

      • Coarse-grain chunk simulator

    • Data diversity is more important than "accuracy"

      • Want the agent to experience a diverse mix of network variation

    • Think carefully about controller state space (observation signals)

      • Too large a state space --> slow and difficult learning

      • Too small a state space --> loss of information

      • When in doubt, include rather than cut the signal

Last updated