pFabric: Minimal Near-Optimal Datacenter Transport
https://web.stanford.edu/~skatti/pubs/sigcomm13-pfabric.pdf
Abstract
pFabric: minimalistic datacenter transport design
Provides near theoretically optimal flow completion times even at the 99th percentile for short flows
Minimizing average flow completion time for long flows
Key: datacenter transport should decouple flow scheduling from rate control
Introduction
Problem
Interactive soft real-time workloads demand low latency for each of the short request/response flows
But currently: FCT high
Reason: flows get queued up behind bursts of packets from large flows of co-existing workloads
Solution: rate control
Implicit: Keeping queues nearly empty through mechanisms like adaptive congestion control, ECN-based feedback, pacing etc.
Con: cannot precisely determine the right flow rates to optimally schedule flows
Explicit: compute and assign rates from the network to each flow in order to schedule the flows based on sizes or deadlines
Con: rather complex to implement because it requires detailed flow state at switches and coordination among switches to identify the bottleneck for each flow
Key:
Priority
Switches: depend on the priority
Priority scheduling: when a port is idle, the packet with the highest priority buffered at the port is dequeued and sent out
Starvation prevention: dequeue the earliest packet from the flow that has the highest priority packet in the queue
Priority dropping: when a packet arrives to a port with a full buffer, if it has priority less than or equal to the lowest priority packet in the buffer, it is dropped. Otherwise, the packet with the lowest priority is dropped to make room.
Lazy rate control
Flows start at line rate
SACKS, for every packet act, do additive increase as in standard tcp
No fast retransmits. Packet drops only detected by timeouts.
If fixed threshold occur, flow enters into probe mode where it periodically retransmits min-sized packet with a 1-byte payload and re-enter slow-start once it receives an act.
Last updated