Congestion control needs to be designed for low-latency operations at datacenter scale. It also needs to meet key operational needs including deploying and maintaining protocols while the datacenter is changing quickly due to technology trends, isolating the traffic of tenants in a shared fabric, efficiently using host CPU and NIC resources, and handling a range of traffic patterns including incast. Existing protocols such as DCTCP are inadequate as they experience milliseconds of tail latency at scale.

Main insight

1) Delay is an effective congestion signal for excellent performance with a simplicity that can help greatly with operational issues.

2) Fabric and host congestions are both important across a range of latency-sensitive, IOPS-intensive, and byte-intensive workloads. Delay should be decomposed into fabric and host delay, and they need different congestion responses. Modern NICs enable Swift to separate the components of delay with features such as hardware timestamps.

3) A wider range of traffic patterns need to be supported, including large-scale incast where thousands of flows destined to a single host simultaneously.

Key strengths

  1. Identify key trends in storage workloads and hardware changes that innovate the design decisions:

    1. new applications with low latency requirements: cluster-wide storage systems evolved to faster media

    2. new host stack and congestion sources: host networking has newer stack that enable key features such as NIC timestamps and fine-grained pacing; host congestion become not neglectable given the increasing line-rates and application patterns

    3. hardware characteristics: the heterogeneity of datacenter switches introduces operational challenges in congestion control, thus we need a simpler design

  2. Show solid performance results: deliver close to 100 Gbps (100% load) while maintaining low delay and near-zero loss, provide short RPC completion time for intensive storage and analytics workloads that make up a substantial fraction of Google's network traffic/

  3. Simplicity and deployability in design: Swift is used by Google as the congestion control algorithm in production for over 5 years.

    1. It uses simple AIMD based design on a target-delay, measuring delay using HW- and SW-provided features like timestamps, and seamlessly transitioning between cwnd and rate.

    2. These decisions enable simple and wide deployment at scale (as compared to ECN-based, explicit feedback based, receiver/credit-based and packet scheduling based approach).

Some interesting notes

  1. The Q&A session from Sigcomm is very interesting: https://www.youtube.com/watch?v=NsyK-55tmhk

Last updated