# Pantheon: the training ground for Internet congestion-control research

### Talk&#x20;

* Congestion control
  * Cornerstone problem in computer networking&#x20;
    * Avoids congestion collapse&#x20;
    * Allocates resources among users
    * Affects every application using TCP socket&#x20;
  * BBR, Sprout, PCC&#x20;

![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MVORxAomcgtzVVUqmws%2Fuploads%2FpbfNeekhXEWruZqg2yja%2Fimage.png?alt=media\&token=ca119b82-204c-489f-9d82-56051bbfcef5)

* Every emerging algorithm claims to be the SOTA
  * Compared with other algorithms that they picked&#x20;
    * Must acquire, compile, and execute prior algorithms&#x20;
  * Evaluated on their own testbed&#x20;
    * Large service operators: risky to deploy, long turnaround time&#x20;
    * Researchers: on a much smaller scales, results may not generalize&#x20;
  * On simulators / emulators with their settings
    * How to configure the settings?&#x20;
  * Based on specific results they collected&#x20;
    * The internet is diverse and evolving&#x20;

Other fields:

* Database: TPC&#x20;
* Computer systems: SPEC&#x20;
* CV: ImageNet&#x20;
* Lesson: shared, reproducible benchmarks can lead to huge leaps performance and transform technologies by making them scientific&#x20;

Pantheon: a community resource

* A common language in CC
  * Benchmark algorithms
  * Shared testbeds
  * Public data
* A training ground for congestion control&#x20;
  * Enables faster innovation and more reproducible research
  * e.g. Vivace (NSDI '18), Copa (NSDI '18), Indigo: a ML-based congestion control&#x20;
* 15+ algorithms
* Common testing interface
* Measure performance faithfully without modifications&#x20;
  * Performance varies across types of network path, path direction, and time&#x20;
* Limitation
  * Only tests schemes at full throttle
  * Nodes are not necessarily representative&#x20;
  * Does not measure interactions between different schemes (fairness, TCP-friendliness)&#x20;
* Calibrated emulators and pathological emulators&#x20;
  * Simulator / emulator: reproducible and allows rapid experimentation&#x20;
  * Open problem: what is the choice of parameter values to faithfully emulate a particular target network&#x20;
  * Replication errors&#x20;
    * Five parameters: a bottleneck link rate, a constant propagation delay, a DropTail threshold for the sender's queue, a loss rate, a bit that selects constant rate or Poisson-governed rate&#x20;
  * Steps&#x20;
    * Collect a set of results over a particular network path on Pantheon
      * Avg throughput and 95th percentile delay of a dozen algorithms&#x20;
    * Run Bayesian optimization&#x20;
      * Run twice: constant rate and Poisson-governed rate
      * Objective function f(x): mean replication error
      * Prior: Guassian process
      * Acquisition function: expected improvements&#x20;
    * Pathological emulators
      * Very small buffer sizes
      * Severe ACK aggregation
      * Token-bucket policers&#x20;
* Ongoing projects: Vivace, Copa, and more; Indigo&#x20;
  * Vivace: validating a new scheme in the real world
  * Copa: iterative design with measurements&#x20;
  * Indigo: a machine learning design enabled by Pantheon&#x20;
    * Model the problem as a sequential decision making problem&#x20;
    * Sender observes CC signal at every step, and then it takes an action to adjust the CC window&#x20;
    * Goal: learn the mapping from state to action, and encode the mapping into a model&#x20;
    * Design
      * State: queueing delay, sending rate, receiving rate, window size, previous action&#x20;
      * Model: 1-layer LSTM network (for history)&#x20;
    * CC-oracle
      * Outputs an action that brings congestion window closest to the ideal size&#x20;
      * Ideal size&#x20;
        * Only exists in emulators (global view of the network)&#x20;
        * BDP: simple emulated links with a fixed bandwidth and min RRT
          * Bandwidth delay product and use it as the CC window&#x20;
        * Search around BDP otherwise&#x20;
      * Imitation learning algorithm with DAgger&#x20;
