Yiting Xia, T. S. Eugene Ng Rice University Flat-tree A Convertible Data Center Network Architecture from Clos to Random Graph Yiting Xia, T. S. Eugene Ng Rice University
Clos Topology 3-stage folded Clos - standard data center network architecture Core Switches Aggregation Switches Edge Switches Pods 1
Clos Topology Implementation friendly - central wiring - flexible scale and oversubscription - Pod modular design Suboptimal performance - long paths - congested network core 2
Random Graph Good performance Hard to implement - low average path length - rich bandwidth - optimal throughput for uniform traffic Hard to implement - neighbor-to-neighbor wiring complicated [Jellyfish NSDI’12] 3
Can we combine the best of both worlds? Why fixed topology? Tree Network Flat Network vs. Easy implementation Good performance Can we combine the best of both worlds? 4
Why fixed topology? Fluid data center traffic Fat-tree SIGCOMM’08 BCube SIGCOMM’09 DCell SIGCOMM’08 HyperX SC’09 Easy implementation Good performance Fluid data center traffic - each topology has sweet spots - one-size-fit-all topology impossible Cloud service constantly changing - fixed topology not adaptive to new demands 5
Convertible Network Flat-tree Tree Network Flat Network 6
Design Highlights Flat-tree starts from a Clos network and converts the topology to approximate random graphs. Challenges: Relocate servers from edge switches to aggregation and core switches Connect edge and core switches directly Easy peer-wise wiring between switches Random graphs of different scales Combinations of different topologies Packaging in Pods 7
Converter Switch Small port-count Low cost Physical layer device A B C - as packet switch * simple switching logic * no bandwidth contention * no expensive processor/buffering - as circuit switch * not sensitive to delay * small scale Physical layer device A B C D 8
Converter Switch Configurations 9
Flat-tree Example Clos Pod 10 Core Switch Edge Switch Aggregation Switch Server 10
Flat-tree Example Flat-tree Pod 11 Core Switch Edge Switch Converter Switch Aggregation Switch Server 11
Clos Network 12 Core Switch Converter Switch Aggregation Switch Server Edge Switch 12
Approximate Random Graph Core Switch Converter Switch Aggregation Switch Server Edge Switch 13
Approximate Local Random Graph Core Switch Converter Switch Aggregation Switch Server Edge Switch 14
Flat-tree Pod Blade B 15
Flat-tree Pod 16
Pod-Core Wiring 17
Server Distribution Choice of m and n Network profiling - how many servers per switch of different types - flat-tree maintains structure not purely random * Clos connections between edge and aggregation switches * Pod-core connections * peer-wise connections between adjacent Pods - place servers to leverage shorter paths Network profiling - vary m and n - minimize average path length 18
Inter-Pod Wiring Simple shifting wiring pattern No repeated connection - <i, j> in Pod p <i, (d/2-1-j+i)%(d/2)> in Pod p+1 No repeated connection Same number of “side” and “cross” connections Multi-link connectors - streamline the connection between adjacent Pods - hide wiring complexity 19
Evaluation Compared networks Metric - fat-tree - random graph - two-level random graph - flat-tree global (approximated global random graph) - flat-tree local (approximated pod-level random graph) - flat-tree hybrid (part flat-tree global and part flat-tree local) Metric - average path length - throughput * optimal routing * server links unbounded * linear programming solution
Evaluation Traffic patterns Locality - hot spots: broadcast/incast traffic in 1000-server clusters - clusters: all-to-all traffic in 20-server clusters Locality - (strong) locality * workload placed continuously across servers - weak locality * workload placed randomly in Pods - no locality * workload placed randomly in the entire network
Summary of Simulation Results Global Average Path Length Flat-tree Random Graph Clos ~4.75 ~4.6 ~5.9 Pod-Level Average Path Length Flat-tree Two-Level Random Graph Random Graph Clos ~3.4 ~3.6 ~4.6 ~3.9 20
Summary of Simulation Results Throughput of hot-spot traffic - flat-tree ≈ random graph - flat-tree = 1.5x Clos Throughput of small-clustered traffic - flat-tree > two-level random graph for 1/3 cases - flat-tree >= 91% two-level random graph - flat-tree = 1.15x random graph - flat-tree = 1.6x Clos 21
Global Average Path Length
Pod-Level Average Path Length
Throughput of Hot Spots Traffic
Throughput of Clustered Traffic
Conclusion Flat-tree converts between Clos topology and random graphs of different scales Low cost - inexpensive converter switches Easy implementation - changes packaged in Pods - regular Pod-core wiring patterns - multi-links between adjacent Pods Hybrid mode - network zones with different topologies Performance similar to random graphs - < 5% longer average path length - < 9% lower throughput 22
Impact and Inspiration Flat-tree is one design point of convertible network Motivate further study of relationship between different topologies Traffic optimization - joint optimization with routing and workload placement Network management - self recovery from failures - automatic up/down scale network at busy/idle time 23