Learning in situ: a randomized experiment in video streaming

https://www.usenix.org/system/files/nsdi20-paper-yan.pdf

Presentation

Video streaming dominates internet traffic
Adaptive bitrate (ABR) top optimize users' quality of experience (QoE)
- Decides the quality level of each video chunck to send
- Primary goals: higher video quality, fewer stalls
- Prior work: BBA, MPC, CS2P, Pensieve, Oboe
What does it take to create a learned ABR algorithm that robustly performs well over the wild internet?
- Confidence intervals in video streaming are bigger than expected
  - Puffer: a live streaming platform running a randomized experiment
  - Randomized experiments (one of the ABR scheme being tested)
  - Existing ABR algorithms found benefits like 10%-20% based on experiments lasting hours or days between a few network nodes
  - Need 2 years of data per scheme are needed to measure 20% precision
  - Want higher video quality: y axis
    And fewer stalls: x axis
    Better QoE: up and to the right
    Most schemes are statically indistinguishable (noise)
  - Reason: Internet is way more noisy and heavy-tailed than we thought
    Only 4% of the 637,189 streams had any stalls
    Distributions of throughputs and watch times are highly skewed
- A simple (buffer-based) ABR algorithm performs better than expected
  - BBA [SIGCOMM '14]: simple buffer-based ABR algorithm
    Buffer: allows a streamed video or media file to be loaded while the user is watching or listening to it. Pre-download and store video in a temporary cache before playback begins on whatever device you are using
  - Research baseline
  - Other work: MPC-HM [SIGCOMM '15]
    Predicts throughput using the harmonic mean (HM) of past throughputs
    assumes throughput can be modeled with HM
    assumes transmission time = predicted throughput x chunk size?
    But the observed throughput actually vary with chunk size, due to congestion control and varying bandwidth
  - Other work: Pensieve [SIGCOMM '17]
    Reinforcement learning
    Requires network simulators as training environments
    Assumes training in simulation generalizes to wild Internet?
  - Comparison
    No algorithm can perform better than BBA in both aspects
  - Algorithms that make fewer assumptions are perhaps more general
- Our way of outperforming existing schemes is learning in situ (i.e. in place on the actual deployment environment)
  - Fugu uses classical model predictive control
  - Fugu replaces the throughput predictor in MPC-HM with a transmission time predictor
    NN-based: predict how long it takes for a client to receive a given chunk. "How long would each chunk take?"
    Input:
    Size and transmission times of past chunks
    Size of a chunk to be transmitted (not a throughput predictor)
    Low-level TCP statistics (min RTT, RTT, CWND, packets in flight, delivery rate)
    Output:
    probability distribution over transmission time (not a point estimate)
    Useful for maximize expected QoE
    Training: supervised learning in situ (in place) on real data from deployment environment
    Chunk-by-chunk series of each individual video stream
    Chunk i: size, timestamp sent, timestamp acknowledged, TCP statistics right before sending
    Learning in situ does not replay throughput traces or require network simulators
    We don't know how to faithfully simulate the Internet