# Pantheon: the training ground for Internet congestion-control research

### Talk&#x20;

* Congestion control
  * Cornerstone problem in computer networking&#x20;
    * Avoids congestion collapse&#x20;
    * Allocates resources among users
    * Affects every application using TCP socket&#x20;
  * BBR, Sprout, PCC&#x20;

![](/files/3atY6LrtcjJ46YDbCNGC)

* Every emerging algorithm claims to be the SOTA
  * Compared with other algorithms that they picked&#x20;
    * Must acquire, compile, and execute prior algorithms&#x20;
  * Evaluated on their own testbed&#x20;
    * Large service operators: risky to deploy, long turnaround time&#x20;
    * Researchers: on a much smaller scales, results may not generalize&#x20;
  * On simulators / emulators with their settings
    * How to configure the settings?&#x20;
  * Based on specific results they collected&#x20;
    * The internet is diverse and evolving&#x20;

Other fields:

* Database: TPC&#x20;
* Computer systems: SPEC&#x20;
* CV: ImageNet&#x20;
* Lesson: shared, reproducible benchmarks can lead to huge leaps performance and transform technologies by making them scientific&#x20;

Pantheon: a community resource

* A common language in CC
  * Benchmark algorithms
  * Shared testbeds
  * Public data
* A training ground for congestion control&#x20;
  * Enables faster innovation and more reproducible research
  * e.g. Vivace (NSDI '18), Copa (NSDI '18), Indigo: a ML-based congestion control&#x20;
* 15+ algorithms
* Common testing interface
* Measure performance faithfully without modifications&#x20;
  * Performance varies across types of network path, path direction, and time&#x20;
* Limitation
  * Only tests schemes at full throttle
  * Nodes are not necessarily representative&#x20;
  * Does not measure interactions between different schemes (fairness, TCP-friendliness)&#x20;
* Calibrated emulators and pathological emulators&#x20;
  * Simulator / emulator: reproducible and allows rapid experimentation&#x20;
  * Open problem: what is the choice of parameter values to faithfully emulate a particular target network&#x20;
  * Replication errors&#x20;
    * Five parameters: a bottleneck link rate, a constant propagation delay, a DropTail threshold for the sender's queue, a loss rate, a bit that selects constant rate or Poisson-governed rate&#x20;
  * Steps&#x20;
    * Collect a set of results over a particular network path on Pantheon
      * Avg throughput and 95th percentile delay of a dozen algorithms&#x20;
    * Run Bayesian optimization&#x20;
      * Run twice: constant rate and Poisson-governed rate
      * Objective function f(x): mean replication error
      * Prior: Guassian process
      * Acquisition function: expected improvements&#x20;
    * Pathological emulators
      * Very small buffer sizes
      * Severe ACK aggregation
      * Token-bucket policers&#x20;
* Ongoing projects: Vivace, Copa, and more; Indigo&#x20;
  * Vivace: validating a new scheme in the real world
  * Copa: iterative design with measurements&#x20;
  * Indigo: a machine learning design enabled by Pantheon&#x20;
    * Model the problem as a sequential decision making problem&#x20;
    * Sender observes CC signal at every step, and then it takes an action to adjust the CC window&#x20;
    * Goal: learn the mapping from state to action, and encode the mapping into a model&#x20;
    * Design
      * State: queueing delay, sending rate, receiving rate, window size, previous action&#x20;
      * Model: 1-layer LSTM network (for history)&#x20;
    * CC-oracle
      * Outputs an action that brings congestion window closest to the ideal size&#x20;
      * Ideal size&#x20;
        * Only exists in emulators (global view of the network)&#x20;
        * BDP: simple emulated links with a fixed bandwidth and min RRT
          * Bandwidth delay product and use it as the CC window&#x20;
        * Search around BDP otherwise&#x20;
      * Imitation learning algorithm with DAgger&#x20;


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://sliu583.gitbook.io/blog/specific-work/seminar-and-talk/reading-groups/network-reading-group/ml-and-networking/congestion-control/pantheon-the-training-ground-for-internet-congestion-control-research.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
