Generic and Automatic Address Configuration for Data Center Networks 1Kai Chen, 2Chuanxiong Guo, 2Haitao Wu, 3Jing Yuan, 4Zhenqian Feng, 1Yan Chen, 5Songwu Lu, 6Wenfei Wu 1Northwestern University, 2Micrsoft Research Asia, 3Tsinghua, 4NUDT, 5UCLA, 6BUAA SIGCOMM 2010, New Delhi, India
Motivation Address autoconfiguration is desirable in networked systems Manual configuration is error-prone 50%-80% network outages are due to manual configuration DHCP for layer-2 Ethernet autoconfiguration Address autoconfiguration in data centers (DC) has become a problem Applications need locality information for computation New DC designs encode topology information for routing DHCP is not enough - no such locality/topology information
Research Problem Given a new/generic DC, how to autoconfigure the addresses for all the devices in the network? DAC: data center address autoconfiguration
Outline Motivation Research Problem DAC Implementation and Experiments Simulations Conclusion
DAC Input Blueprint Graph (Gb) Physical Topology Graph (Gp) A DC graph with logical IDs Logical ID can be any format Available earlier and can be automatically generated Physical Topology Graph (Gp) A DC graph with device IDs Device ID can be MAC address Not available until the DC is built and topology is collected 10.0.0.3 00:19:B9:FA:88:E2
DAC System Framework Malfunction Detection Device-to-logical Physical Topology Collection Device-to-logical ID Mapping Logical ID Dissemination
Two Main Challenges Challenge 1: Device-to-logical ID Mapping Assign a logical ID to a device, preserving the topological relationship between devices Challenge 2: Malfunction Detection Detect the malfunctioning devices if the physical topology is not the same as blueprint (NP-complete and even APX-hard)
Malfunction Detection Roadmap Malfunction Detection Physical Topology Collection Device-to-logical ID Mapping Logical ID Dissemination
Device-to-logical ID Mapping How to preserve the topological relationship? Abstract DAC mapping into the Graph Isomorphism (GI) problem The GI problem is hard: complexity (P or NPC) is unknown Introduce O2: a one-to-one mapping for DAC O2 Base Algorithm and O2 Optimization Algorithm Adopt and improve techniques from graph theory
O2 Base Algorithm Gb: {l1 l2 l3 l4 l5 l6 l7 l8} Gp: {d1 d2 d3 d4 d5 d6 d7 d8} Decomposition Gb: {l1} {l2 l3 l4 l5 l6 l7 l8} Gp: {d1} {d2 d3 d4 d5 d6 d7 d8} Refinement Gb: {l1} {l5} {l2 l3 l4 l6 l7 l8} Gp: {d1} {d2 d3 d5 d7} {d4 d6 d8}
O2 Base Algorithm Gb: {l1 l2 l3 l4 l5 l6 l7 l8} Gp: {d1 d2 d3 d4 d5 d6 d7 d8} Decomposition Gb: {l5} {l1 l2 l3 l4 l6 l7 l8} Gp: {d1} {d2 d3 d4 d5 d6 d7 d8} Refinement Gb: {l5} {l1 l2 l7 l8} {l3 l4 l6 } Gp: {d1} {d2 d3 d5 d7} {d4 d6 d8} Refinement Gb: {l5} {l1 l2 l7 l8} {l6} {l3 l4} Gp: {d1} {d2 d3 d5 d7} {d6} {d4 d8}
O2 Base Algorithm Refinement Gb: {l5} {l6} {l1 l2} {l7 l8} {l3 l4} Gp: {d1} {d6} {d2 d7} {d3 d5} {d4 d8} Decomposition Gb: {l5} {l6} {l1} {l2} {l7 l8} {l3 l4} Gp: {d1} {d6} {d2} {d7} {d3 d5} {d4 d8} Decomposition & Refinement Gb: {l5} {l6} {l1} {l2} {l7} {l8} {l3} {l4} Gp: {d1} {d6} {d2} {d7} {d3} {d5} {d4} {d8}
O2 Base Algorithm O2 base algorithm is very slow for 3 problems: P1: Iterative splitting in Refinement: it tries to use each cell to split every other cell iteratively Gp: π1 π2 π3 …… πn-1 πn P2: Iterative mapping in Decomposition: when the current mapping is failed, it iteratively selects the next node as a candidate for mapping P3: Random selection of mapping candidate: no explicit hint for how to select a candidate for mapping
O2 Optimization Algorithm R1: A cell cannot split another cell that is disjoint with itself. R2: If u in Gb cannot be mapped to v in Gp, then all nodes in the same orbit with u cannot be mapped to v either. Heuristics based on DC topology features Sparse => Selective Splitting (for Problem 1) Symmetric => Candidate Filtering via Orbit (for Problem 2) Asymmetric => Candidate Selection via SPLD (Shortest Path Length Distribution) (for Problem3) We propose the last one and adopt the first two from graph theory R3: Two nodes u, v in Gb, Gp cannot be mapped to each other if have different SPLDs.
Speed of O2 Mapping 8.9 seconds 12.4 hours 8.9 seconds
Malfunction Detection Roadmap Malfunction Detection Physical Topology Collection Device-to-logical ID Mapping Logical ID Dissemination
Malfunction Detection Types of Malfunctions Node failure, Link failure, Miswiring Effects of Malfunctions O2 cannot find device-to-logical ID mapping Our Goal Detect malfunctioning devices Problem Complexity An ideal solution Find Maximum Common Subgraph (MCS) between Gb and Gp say Gmcs Remove Gmcs from Gp => the rest are malfunctions MCS is NP-complete and even APX-hard
Practical Solution Isomorphic Isomorphic Observations Our Idea 1 Isomorphic 1 Observations Most node/link failures, miswirings cause node degree change Special, rare miswirings happen without degree change Our Idea Degree change case: exploit the degree regularity in DC Devices in DC have regular degrees (common sense) No degree change case: probe sub-graphs derived from anchor points, and correlate the miswired devices using majority voting Select anchor point pairs from 2 graphs probe sub-graphs iteratively, stop when k-hop subgraphs are isomorphic but (k+1)-hop are not, increase the counters for k- and (k+1)- hop nodes Output node counter list: high counter => high possible to be miswired 2 2 Isomorphic … … k k Non-Isomorphic k+1 k+1
Simulations on Miswiring Detection Over data centers with tens of thousands of devices with 1.5% nodes as anchor points to identify all hardest-to-detect miswirings 1.5%
Malfunction Detection Roadmap Malfunction Detection Physical Topology Collection Device-to-logical ID Mapping Logical ID Dissemination
Basic DAC Protocols CBP: Communication Channel Building Protocol Top-Down, from root to leaves PCP: Physical Topology Collection Protocol Bottom-Up, from leaves to root LDP: Logical ID Dissemination Protocol DAC manager: handle all the intelligences can be any server in the network
Implementation and Experiments Over a BCube(8,1) network with 64 servers Communication Channel Building (CCB) Transition time Physical Topology Collection (TC) Device-to-logical ID Mapping Logical IDs Dissemination (LD) The total time used: 275 milliseconds
46 seconds for the DCell(6, 3) with 3.8+ million devices Simulations Over large-scale data centers (in milliseconds) 46 seconds for the DCell(6, 3) with 3.8+ million devices
Summary DAC: address autoconfiguration for generic data center networks, especially when the address is topology-aware Graph isomorphism for address configuration 275ms for a 64-sever BCube, and 46s for a DCell with 3.8+ million devices Anchor point probing for malfunction detection with 1.5% nodes as anchor points to identify all hardest-to-detect miswirings DAC is a small step towards the more ambitious goal of automanagement of the whole data centers
Q & A? Thanks!