Data Center Architectures CIS 700/005 – Lecture 2 Includes material from lectures by Hakim Weatherspoon and Jennifer Rexford
Traditional Data Centers Internet Data Center Layer-3 router Core Layer-2/3 switch Aggregation Layer-2 switch Access Servers
Limitation 1: Cost (Oversub) Ratio of the worst-case achievable aggregate bandwidth among the end hosts to the total bisection bandwidth of a particular communication topology Lower the total cost of the design Typical designs: factor of 2:5:1 to 8:1 Often much higher! [VL2]
Limitation 2: Fault tolerance Oversubscription + Bigger routers less routers at the top of the tree a core router failure has high blast radius Most data centers used 1+1 redundancy Dedicated backup switch and links
Limitation 3: Multi-path routing Traditional data centers use static load balancing like ECMP Can use bandwidth inefficiently Limited ECMP group size
A Scalable, Commodity Data Center Network Architecture Mohammad Al-Fares, Alexander Loukissas, Amin Vahdat Scalable interconnection bandwidth 1:1 oversubscription Economies of scale Backwards compatibility
History Lesson: Clos Networks (1953) Emulate a single huge switch with many smaller switches Add more layers to scale out
History Lesson: Clos Networks (1953) Emulate a single huge switch with many smaller switches Add more layers to scale out
History Lesson: Clos Networks (1953) Emulate a single huge switch with many smaller switches Add more layers to scale out
Fat-tree Architecture K-ary fat tree: three-layer topology (edge, aggregation and core) each pod consists of (k/2)2 servers & 2 layers of k/2 k-port switches each edge switch connects to k/2 servers & k/2 aggr. switches each aggr. switch connects to k/2 edge & k/2 core switches (k/2)2 core switches: each connects to k pods
Obligatory Network Questions How do I address destinations? Hierarchical IP addresses for scalability [PodNumber].[SwitchNumber].[Endhost] How does a switch route packets? Assumption: every routing table entry has 1 output Route downward using prefix (for scalability) Route upward using suffix (for load balancing)
Routing Optimizations Flow classification Classify flows (e.g., src, dest, port #s) Move around a small set of flows as needed Flow scheduling Keep track of large, long-lived flows at the edge switches Assign them to different links
FatTree Summary Motivation: Data center networks are expensive Limitation 1: Cost (oversub) Use commodity hardware to keep costs down Use Clos Networks Limitation 2: Fault tolerance Stop caring about individual components Limitation 3: Multi-path routing Schedule everything
Data centers today https://storage.googleapis.com/pub-tools-public-publication-data/pdf/43837.pdf
Things they didn’t think about
Things they didn’t think about
Things they didn’t think about
What did they get right? Motivation: Data center networks are expensive Limitation 1: Cost (oversub) Use commodity hardware to keep costs down Use Clos Networks Limitation 2: Fault tolerance Stop caring about individual components Limitation 3: Multi-path routing Schedule everything
What did they get wrong? Motivation: Data center networks are expensive Limitation 1: Cost (oversub) Use commodity hardware to keep costs down Use Clos Networks Limitation 2: Fault tolerance Stop caring about individual components Limitation 3: Multi-path routing Schedule everything