Jellyfish: Networking data centers randomly


What problem?

  • The problem of incremental and heterogeneous network expansion (i.e. adding servers and network capacity incrementally to the data center) is hard. Current high-bandwidth data center network proposals are not amenable to incremental growth without compromises on bandwidth or cost.

Was / is this problem important?

  • It was important, and it probably remains important today. Expansion is necessitated by the growth of the user base, which requires more servers, or by the deployment of more bandwidth-hungry applications.

  • The paper also gives a series of examples from industrial experiences to demonstrate this point. For instance, back in 2009, the growth of Facebook's data center server population is roughly 30,000 and is expected to grow > 60,000 by 2010; much of this growth involves incrementally expanding existing facilities by "adding capacity on a daily basis". In 2011, 84% of firms would probably / definitely expand their data centers in 2012. In addition, several industry products advertise incremental expandability of the server pool (e.g. "pay-as-you-go").

  • Thus finding an incrementally-expandable, high-bandwidth data center interconnect is very important.

Main insight

  • Existing network proposals like fat-tree interconnect limits the network to very coarse design points (e.g. full bisection bw fat-trees can only built at several fixed sizes) and make it hard to maintain the structure incrementally.

  • Workarounds to accommodate incremental growth either sacrifice either bandwidth or cost.

  • Two key goals of this paper are flexibility and high bandwidth. The jellyfish approach is to construct a degree-bounded random graph topology at the top-of-rack(ToR) switch layer.

    • Intuition for flexibility: random regular graph (RRG)'s network capacity becomes "fluid", easily wiring up any number of switches, heterogeneous degree distributions, and newly added switches with a few random link swaps.

    • Intuition for high bandwidth: the end-to-end throughput of a topology depends on both the capacity of the network and the amount of network capacity consumed to deliver each byte (i.e. average path length). RRG's diverse random connections lead to lower mean path length.

Compared to fat-tree

1) Flexibility: Unlike fat-tree which limit the network to very coarse design points given their structure, Jellyfish is more flexible. Additional components can be incorporated with a few random edge swaps with Jellyfish. It also allows constructions of arbitrary-size networks and natively supports heterogeneity.

2) Efficiency: Jellyfish can support 27% more servers at full capacity than a fat-tree while matching fat-tree in capacity and switching equipment

3) Path length: The average path length in jellyfish is much smaller than in the fat-tree. For RGG with 38,400 servers, the average path length is <2.7, while the fat-tree's average is 3.71 at the smallest size, 3.96 at the size of 27,648 servers. Low average path length allows the network to support more flows at high throughput, assuming the network's full capacity is utilized.

4) Failure resilience: both topologies are highly resilient to failures; the normalized throughput per server decreases more gracefully for Jellyfish than for a same-equipment fat-tree as the percentage of failed links increases.

Comments / Thoughts

  • Most of the results presented in the paper are either based on explicit calculations or simulations. Is it hard to evaluate a DC-scale experiment back then?

  • Why specifically did the paper choose k-shortest path routing and MCTCP, comparing to ECMP and TCP? Why are they better? (some intuitions)

  • Randomness may reduce understandability of the network and make it harder to reason about performance

  • What's the difference between bisection bandwidth and throughput introduced in this paper? Which one is better? How hard is it to evaluate these two metrics in random real-world networks?

  • For edges in fat-tree that are not useful from the perspective of their effect on path length, can we just prune it?


Yes! Like the benchmarks and presented results; also enjoy reading the evaluation methodologies section that introduces some interesting metrics for characterizing a particular topology.

Last updated