© Copyright 2010 Hewlett-Packard Development Company, L.P. 1 Jayaram Mudigonda, HP Labs Praveen Yalagandula, HP Labs Mohammad Al-Fares, UCSD Jeff Mogul, HP Labs SPAIN: High BW Data-Center Ethernet with Unmodified Switches
© Copyright 2010 Hewlett-Packard Development Company, L.P. 2 Datacente r Fabric Traditional Datacenter Internet Internet-facing applications: , Web Servers, etc.
© Copyright 2010 Hewlett-Packard Development Company, L.P. 3 DC Trends Information Explosion Application Consolidation Virtualization HPC Applications
© Copyright 2010 Hewlett-Packard Development Company, L.P. 4 Datacente r Fabric DC Trends Internet M R M R R R R R R R M M M M M M Shuffle phase of Map – Reduce R M
© Copyright 2010 Hewlett-Packard Development Company, L.P. 5 Datacente r Fabric DC Trends Internet M R M R R R R R R R M M M M M M Shuffle phase of Map – Reduce R M High bisection bandwidth
© Copyright 2010 Hewlett-Packard Development Company, L.P. 6 Datacente r Fabric DC Trends Internet Flat Network
© Copyright 2010 Hewlett-Packard Development Company, L.P. 7 DC Fabric Goals High bisection BW Flat network Low-cost
© Copyright 2010 Hewlett-Packard Development Company, L.P. 8 Ethernet: a good choice Commodity Inexpensive Speeds: 10G is here 40G/100G soon Flat-addressing Self-configuring
© Copyright 2010 Hewlett-Packard Development Company, L.P. 9 But wait…
© Copyright 2010 Hewlett-Packard Development Company, L.P. 10 Spanning Tree Protocol (STP) makes Ethernet hard to scale!
© Copyright 2010 Hewlett-Packard Development Company, L.P. 11 Spanning Tree Protocol (STP) Root Bandwidth bottleneck Unused links
© Copyright 2010 Hewlett-Packard Development Company, L.P. 12 Proposal 1: High-port core switch A common current approach
© Copyright 2010 Hewlett-Packard Development Company, L.P. 13 Expensive Core Switch High BW or Multiple Links
© Copyright 2010 Hewlett-Packard Development Company, L.P. 14 Proposal 2: L3 IP Subnetting VL2 [SIGCOMM’09]
© Copyright 2010 Hewlett-Packard Development Company, L.P. 15 L3 routers Expensive No non-IP protocols (FCoE)
© Copyright 2010 Hewlett-Packard Development Company, L.P. 16 Proposal 3: Modify switches (HW/SW) TRILL [IETF] SEATTLE [SIGCOMM’08] PortLand [SIGCOMM’09] Not deployable today!
© Copyright 2010 Hewlett-Packard Development Company, L.P. 17 SPAIN Unmodified L2 switches Multi-pathing Arbitrary topologies
© Copyright 2010 Hewlett-Packard Development Company, L.P. 18 SPAIN Approach Multi-pathing via VLANs + End-host driver to spread load
© Copyright 2010 Hewlett-Packard Development Company, L.P. 19 A C B D Multi-pathing via VLANs Default VLAN
© Copyright 2010 Hewlett-Packard Development Company, L.P. 20 A C B D Multi-pathing via VLANs Default VLAN
© Copyright 2010 Hewlett-Packard Development Company, L.P. 21 SPAIN Unmodified L2 switches Multi-pathing via VLANs Arbitrary topologies Minor End-host modifs Low-cost High-BW DC Fabric Today!
© Copyright 2010 Hewlett-Packard Development Company, L.P. 22 Outline Introduction SPAIN Components Offline computation End-host driver Evaluation Summary
© Copyright 2010 Hewlett-Packard Development Company, L.P. 23 Outline Introduction SPAIN Components Offline computation End-host driver Evaluation Summary
© Copyright 2010 Hewlett-Packard Development Company, L.P. 24 Offline Computation Steps: 1. Discover topology 2. Compute paths 3. Layout paths as VLANs
© Copyright 2010 Hewlett-Packard Development Company, L.P. 25 Discover topology SNMP Queries SPAIN
© Copyright 2010 Hewlett-Packard Development Company, L.P. 26 Compute paths Goal: leverage redundancy; improve reliability Challenges: large graphs; more paths more resources
© Copyright 2010 Hewlett-Packard Development Company, L.P. 27 Compute paths Only consider paths between edge- switches Modified Dijkstra’s; Prefer edge-disjoint paths
© Copyright 2010 Hewlett-Packard Development Company, L.P. 28 VLAN Layout Simple scheme: Each Path as VLAN
© Copyright 2010 Hewlett-Packard Development Company, L.P. 29 But… VLAN ID = 12 bits 4096 VLANs! IEEE 802.1Q:
© Copyright 2010 Hewlett-Packard Development Company, L.P. 30 Simple scheme: Each Path as VLAN Scales to only few switches VLAN Layout
© Copyright 2010 Hewlett-Packard Development Company, L.P. 31 Our approach: 1 VLAN for a set of paths VLAN Layout
© Copyright 2010 Hewlett-Packard Development Company, L.P. 32 Challenge: Minimize VLANs NP-Hard for arbitrary topologies
© Copyright 2010 Hewlett-Packard Development Company, L.P. 33 Heuristics: 1. Greedy path packing 2. Parallel graph-coloring VLAN Layout
© Copyright 2010 Hewlett-Packard Development Company, L.P. 34 # VLANs = 4 VLAN Layout
© Copyright 2010 Hewlett-Packard Development Company, L.P. 35 Outline Introduction SPAIN Components Offline computation End-host driver Evaluation Summary
© Copyright 2010 Hewlett-Packard Development Company, L.P. 36 SPAIN End-host Driver A B SPAI N
© Copyright 2010 Hewlett-Packard Development Company, L.P. 37 SPAI N SPAIN End-host Driver A B Topology & VLANs
© Copyright 2010 Hewlett-Packard Development Company, L.P SPAIN End-host Driver A B Flow Table A B, 1 : RED A B, 2 : BLUE Flow Table 2 1
© Copyright 2010 Hewlett-Packard Development Company, L.P. 39 Challenges Link & switch failures Pathological flooding Interoperability Host mobility Load-balance End-host state
© Copyright 2010 Hewlett-Packard Development Company, L.P. 40 Failures A B Flow Table A B : RED Flow Table
© Copyright 2010 Hewlett-Packard Development Company, L.P. 41 Pathological Flooding A B Flow Table A B : RED Flow Table B A : GREEN Does not know the location of B
© Copyright 2010 Hewlett-Packard Development Company, L.P. 42 Solution: Chirping
© Copyright 2010 Hewlett-Packard Development Company, L.P. 43 Chirping A B Flow Table A B : RED Flow Table B A : GREEN Does not know the location of B C Knows the location of B
© Copyright 2010 Hewlett-Packard Development Company, L.P. 44 Chirping A B Flow Table A B : RED BLUE Flow Table B A : GREEN
© Copyright 2010 Hewlett-Packard Development Company, L.P. 45 Outline Introduction SPAIN Components Offline computation End-host driver Evaluation Summary
© Copyright 2010 Hewlett-Packard Development Company, L.P. 46 Evaluation Simulations Real testbed
© Copyright 2010 Hewlett-Packard Development Company, L.P. 47 Simulations Topologies : CiscoDC Core switches Aggregation modules m = 2 Access switches per module a = 2
© Copyright 2010 Hewlett-Packard Development Company, L.P. 48 Simulations Topologies : CiscoDCFat- Tree [Al-fares et al. SIGCOMM’08] #ports/switch p = 4
© Copyright 2010 Hewlett-Packard Development Company, L.P. 49 Simulations 2D HyperX k=4 Topologies : CiscoDCFat- Tree [Al-fares et al. SIGCOMM’08] HyperX [Ahn et al. SC’09]
© Copyright 2010 Hewlett-Packard Development Company, L.P. 50 Simulations Topologies : CiscoDCFat- Tree [Al-fares et al. SIGCOMM’08] HyperX [Ahn et al. SC’09] B- Cube [Guo et al. SIGCOMM’09] #ports/switch (p) = 2 Levels (l) = 2
© Copyright 2010 Hewlett-Packard Development Company, L.P. 51 Simulations Topologies : CiscoDCFat- Tree [Al-fares et al. SIGCOMM’08] HyperX [Ahn et al. SC’09] B- Cube [Guo et al. SIGCOMM’09] Metrics: #VLANs Link-Coverage Reliability Throughput
© Copyright 2010 Hewlett-Packard Development Company, L.P. 52 Simulations Topologies : CiscoDCFat- Tree [Al-fares et al. SIGCOMM’08] HyperX [Ahn et al. SC’09] B- Cube [Guo et al. SIGCOMM’09] Metrics: #VLANs Link-Coverage Reliability Throughput
© Copyright 2010 Hewlett-Packard Development Company, L.P. 53 Num. of VLANs CiscoDC (8,8) Fat-Tree (48) HyperX (16) B-Cube (48,2) #switches #VLANs
© Copyright 2010 Hewlett-Packard Development Company, L.P. 54 Throughput CiscoDC Fat-Tree HyperX B-Cube 2x 24x 10.5x 1.6x Improveme nt over STP
© Copyright 2010 Hewlett-Packard Development Company, L.P. 55 OpenCirrus Experiments
© Copyright 2010 Hewlett-Packard Development Company, L.P. 56 1G 10G RACK SWITCH (RS) CORE SWITCH (CS) 80 blades OpenCirrus Testbed
© Copyright 2010 Hewlett-Packard Development Company, L.P. 57 1G 10G RACK SWITCH (RS) CORE SWITCH (CS) 80 blades CSCS S1S2S3 OpenCirrus Testbed
© Copyright 2010 Hewlett-Packard Development Company, L.P. 58 CSCS S1S2S3 OpenCirrus Testbed 10G links that we added
© Copyright 2010 Hewlett-Packard Development Company, L.P. 59 CSCS S1S2S3 OpenCirrus Testbed 4 VLANs
© Copyright 2010 Hewlett-Packard Development Company, L.P. 60 Shuffle-like experiment Every server to all other servers 500 MB data transfer
© Copyright 2010 Hewlett-Packard Development Company, L.P. 61 CSCS S1S2S3 Spanning Tree Protocol(STP)
© Copyright 2010 Hewlett-Packard Development Company, L.P. 62 Link utilization in each direction 100% 0% Time
© Copyright 2010 Hewlett-Packard Development Company, L.P. 63 CSCS S1S2S3 Spanning Tree Protocol(STP) Over loaded Unuse d
© Copyright 2010 Hewlett-Packard Development Company, L.P. 64 CSCS S1S2S3 No bottle- necks SPAIN
© Copyright 2010 Hewlett-Packard Development Company, L.P. 65 Completion times ~50% reduction
© Copyright 2010 Hewlett-Packard Development Company, L.P. 66 Aggregate Goodput (Gbps) 87% improvement
© Copyright 2010 Hewlett-Packard Development Company, L.P. 67 Aggregate Goodput (Gbps) % SPAIN hosts Incremental Deployability
© Copyright 2010 Hewlett-Packard Development Company, L.P. 68 CSCS S1S2S3 Single Shortest Path(SSP) SEATTLE/TRILL All flows on RED All flows on GREEN All flows on GRAY SEATTLE/TRILL on unmodified switches with SPAIN
© Copyright 2010 Hewlett-Packard Development Company, L.P. 69 Comparison with SSP 16% better 7% better
© Copyright 2010 Hewlett-Packard Development Company, L.P. 70 SPAIN Take-away Unmodified L2 switches Multi-pathing via VLANs Arbitrary topologies Minor End-host modifs Low-cost High-BW DC Fabric Today!
© Copyright 2010 Hewlett-Packard Development Company, L.P. 71© Copyright 2010 Hewlett-Packard Development Company, L.P. 71 Q&A