Dingming Wu+, Yiting Xia+*, Xiaoye Steven Sun+,

Slides:



Advertisements
Similar presentations
Chapter 1: Introduction to Scaling Networks
Advertisements

Data Center Fabrics Lecture 12 Aditya Akella.
SDN Controller Challenges
Software-defined networking: Change is hard Ratul Mahajan with Chi-Yao Hong, Rohan Gandhi, Xin Jin, Harry Liu, Vijay Gill, Srikanth Kandula, Mohan Nanduri,
A Novel 3D Layer-Multiplexed On-Chip Network
Cs/ee 143 Communication Networks Chapter 6 Internetworking Text: Walrand & Parekh, 2010 Steven Low CMS, EE, Caltech.
IPv6 Multihoming Support in the Mobile Internet Presented by Paul Swenson CMSC 681, Fall 2007 Article by M. Bagnulo et. al. and published in the October.
PortLand: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric. Presented by: Vinuthna Nalluri Shiva Srivastava.
© 2008 Cisco Systems, Inc. All rights reserved.Cisco ConfidentialPresentation_ID 1 W. Schulte Chapter 5: Inter-VLAN Routing Routing And Switching.
© 2008 Cisco Systems, Inc. All rights reserved.Cisco ConfidentialPresentation_ID 1 Chapter 5: Inter-VLAN Routing Routing & Switching.
KARL NADEN – NETWORKS (18-744) FALL 2010 Overview of Research in Router Design.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Tesseract A 4D Network Control Plane
Michael Over.  Which devices/links are most unreliable?  What causes failures?  How do failures impact network traffic?  How effective is network.
Understanding Network Failures in Data Centers: Measurement, Analysis and Implications Phillipa Gill University of Toronto Navendu Jain & Nachiappan Nagappan.
A Scalable, Commodity Data Center Network Architecture.
Not All Microseconds are Equal: Fine-Grained Per-Flow Measurements with Reference Latency Interpolation Myungjin Lee †, Nick Duffield‡, Ramana Rao Kompella†
PETAL: DISTRIBUTED VIRTUAL DISKS E. K. Lee C. A. Thekkath DEC SRC.
RAMCloud Design Review Recovery Ryan Stutsman April 1,
Virtual ROuters On the Move (VROOM): Live Router Migration as a Network-Management Primitive Yi Wang, Eric Keller, Brian Biskeborn, Kobus van der Merwe,
Network Aware Resource Allocation in Distributed Clouds.
Routing & Architecture
David G. Andersen CMU Guohui Wang, T. S. Eugene Ng Rice Michael Kaminsky, Dina Papagiannaki, Michael A. Kozuch, Michael Ryan Intel Labs Pittsburgh 1 c-Through:
School of EECS, Peking University Microsoft Research Asia UStore: A Low Cost Cold and Archival Data Storage System for Data Centers Quanlu Zhang †, Yafei.
NetPilot: Automating Datacenter Network Failure Mitigation Xin Wu, Daniel Turner, Chao-Chih Chen, David A. Maltz, Xiaowei Yang, Lihua Yuan, Ming Zhang.
Low-Latency Datacenters John Ousterhout Platform Lab Retreat May 29, 2015.
LAN Switching and Wireless – Chapter 1 Vilina Hutter, Instructor
A.SATHEESH Department of Software Engineering Periyar Maniammai University Tamil Nadu.
© 2006 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Version 4.0 Introducing Network Design Concepts Designing and Supporting Computer Networks.
A Snapshot on MPLS Reliability Features Ping Pan March, 2002.
Clustering In A SAN For High Availability Steve Dalton, President and CEO Gadzoox Networks September 2002.
© 2006 Cisco Systems, Inc. All rights reserved.Cisco PublicITE I Chapter 6 1 Introducing Network Design Concepts Designing and Supporting Computer Networks.
1 Enabling Efficient and Reliable Transitions from Replication to Erasure Coding for Clustered File Systems Runhui Li, Yuchong Hu, Patrick P. C. Lee The.
STORE AND FORWARD & CUT THROUGH FORWARD Switches can use different forwarding techniques— two of these are store-and-forward switching and cut-through.
1 | © 2015 Infinera Open SDN in Metro P-OTS Networks Sten Nordell CTO Metro Business Group
A Snapshot on MPLS Reliability Features Ping Pan March, 2002.
1 Protection in SONET Path layer protection scheme: operate on individual connections Line layer protection scheme: operate on the entire set of connections.
CubicRing ENABLING ONE-HOP FAILURE DETECTION AND RECOVERY FOR DISTRIBUTED IN- MEMORY STORAGE SYSTEMS Yiming Zhang, Chuanxiong Guo, Dongsheng Li, Rui Chu,
100GE Upgrades at FNAL Phil DeMar; Andrey Bobyshev CHEP 2015 April 14, 2015.
Redundancy. Single point of failure Hierarchical design produces many single points of failure Redundancy provides alternate paths, but may undermine.
XFabric: a Reconfigurable In-Rack Network for Rack-Scale Computers Sergey Legtchenko, Nicholas Chen, Daniel Cletheroe, Antony Rowstron, Hugh Williams,
Network Virtualization Ben Pfaff Nicira Networks, Inc.
SketchVisor: Robust Network Measurement for Software Packet Processing
VL2: A Scalable and Flexible Data Center Network
Data Center Architectures
Yiting Xia, T. S. Eugene Ng Rice University
Multi Node Label Routing – A layer 2.5 routing protocol
Instructor Materials Chapter 1: LAN Design
Architecture and Algorithms for an IEEE 802
Lecture 2: Leaf-Spine and PortLand Networks
Data Center Network Architectures
Hydra: Leveraging Functional Slicing for Efficient Distributed SDN Controllers Yiyang Chang, Ashkan Rezaei, Balajee Vamanan, Jahangir Hasan, Sanjay Rao.
TCS Proof Of Concept Test VDX NOS 4.1 Virtual Fabrics/VLAN Translation
Praveen Tammana† Rachit Agarwal‡ Myungjin Lee†
ETHANE: TAKING CONTROL OF THE ENTERPRISE
Chuanxiong Guo, et al, Microsoft Research Asia, SIGCOMM 2008
Advanced Computer Networks
Maximum Availability Architecture Enterprise Technology Centre.
Chapter 5: Inter-VLAN Routing
SWITCHING Switched Network Circuit-Switched Network Datagram Networks
IS3120 Network Communications Infrastructure
Ethernet Solutions for Optical Networks
2018/12/10 Energy Efficient SDN Commodity Switch based Practical Flow Forwarding Method Author: Amer AlGhadhban and Basem Shihada Publisher: 2016 IEEE/IFIP.
VL2: A Scalable and Flexible Data Center Network
Jellyfish: Networking Data Centers Randomly
Specialized Cloud Architectures
Data Center Architectures
IS-IS VPLS for Data Center Network draft-xu-l2vpn-vpls-isis-02
In-network computation
2019/10/9 A Weighted ECMP Load Balancing Scheme for Data Centers Using P4 Switches Presenter:Hung-Yen Wang Authors:Jin-Li Ye, Yu-Huang Chu, Chien Chen.
Presentation transcript:

Masking Failures from Application Performance in Data Center Networks with Shareable Backup Dingming Wu+, Yiting Xia+*, Xiaoye Steven Sun+, Xin Sunny Huang+, Simbarashe Dzinamarira+, T. S. Eugene Ng+ +Rice University, *Facebook, Inc. 11/16/2018

Data Center Network Should be Reliable but… 11/16/2018

Network Failures are Disruptive Median case of failures: 10% less traffic delivered Worst 20% of failures: 40% less traffic delivered Gill et al. SIGCOMM 2011 11/16/2018

Today’s Failure Handling---Rerouting Fast local rerouting  inflated path length Global optimal rerouting  high latency of routes updates Impact flows not traveling trough the failure location 11/16/2018

Impact on Coflow Completion Time (CCT) Facebook coflow trace k = 16 Fat-tree network Global optimal rerouting 11/16/2018

Do We Have Other Options? Restores network capacity immediately after failure Be cost efficient --Small pool of backup switch How do we achieve that? 11/16/2018

Circuit Switches A C B D Physical layer device Circuit controlled by software A C Examples --optical 2D-MEMS switch, 40us, $10 per-port cost --electrical cross-point switch, 70ns, $3 per-port cost B D 11/16/2018

Ideal Architecture Regular switches Servers Backup Switch … Regular switches Servers Backup Switch Circuit Switch Entire network shares one backup switch Unreasonable high port-count of circuit switch Replace any failed switch when necessary Single point of failure 11/16/2018

How to Make It Practical Feasibility -small port-count circuit switches Scalability -partition network into failure groups -distribute circuit switches across the network Low cost -small backup pool -share backup switches per failure groups 11/16/2018

ShareBackup Architecture An original Fat-tree with k = 6 Edge layer Agg. layer Core layer Partition the switches into failure groups; each with k/2 switches. Add backup switches per failure groups 11/16/2018

Edge Layer Circuit switches 1 2 Edge Backup Switch i 1 2 1 2 Servers 1 2 Edge Servers Backup Switch i 1 2 1 2 11/16/2018

? Aggregation Layer 1 2 Agg. switches Backup switch Circuit Edge 1 2 1 1 2 Agg. switches Backup switch Circuit Edge 1 2 1 2 11/16/2018

Core Layer Core switches 3 6 1 4 7 2 5 8 Circuit switches Aggregation 3 6 1 4 7 2 5 8 Circuit switches Aggregation switches Backup switch 1 2 1 2 1 2 11/16/2018

Recover First, Diagnose Later Failure Recovery --switch failure replaced by backups via circuit reconfiguration --link failure switches on both side are replaced Automatic failure diagnosis performed offline -details in the paper 11/16/2018

Live Impersonation of Failed Switch Edge switches 1 2 Backup switch Routing Table of Every Edge Switch Routing Table 0 VLAN 0 Routing Table 1 VLAN 1 Routing Table 2 VLAN 2 11/16/2018 Servers

Live Impersonation of Failed Switch Edge switches 1 2 Backup switch Routing Table of Every Edge Switch Routing Table 0 VLAN 0 Routing Table 1 VLAN 1 Originally, each switch has a different routing table. Switch 0 has routing table 0, switch 1 has routing table 1… But in sharebackup Routing Table 2 VLAN 2 11/16/2018 Servers

Live Impersonation of Failed Switch Edge switches 1 2 Backup switch Routing Table of Every Edge Switch Routing Table 0 VLAN 0 Routing Table 1 VLAN 1 Routing Table 2 VLAN 2 11/16/2018 Servers

What does control system do? Collects keep-alive messages & link status reports from switches Reconfigures circuit switches under failures Performs offline failure diagnosis Implications -needs to talk to many circuit switches and packet switches -keeps a large amount of states of circuit/switch/link status 11/16/2018

Distributed Control System One controller for a failure group of k/2 switches --configures the circuit switches adjacent to switches in the group Maintains only local circuit configurations in its group --does not share states with other controllers Talks to circuit switches using an out-of-band control network 11/16/2018

Fast failure recovery, no path dilation, no routing disturbance Summary Fast Failure Recovery --as fast as the underlying circuit switching technology Live Impersonation --Traffic is redirected to the backups in physical layer --Switches in a failure group have same routing tables, use VLAN id for differentiation --Regular switches recovered from failures become backup switches themselves Fast failure recovery, no path dilation, no routing disturbance 11/16/2018

Evaluation Bandwidth Advantage Application performance --Iperf throughput on testbed Application performance --MapReduce job completion time 11/16/2018

Bandwidth Advantage 4 racks, 8 servers, 12 switches 8 iPerf flows saturate the network core ShareBackup restores network to full capacity regardless of failure locations 11/16/2018

Application Performance 4.2X MapReduce Sort w/ 100GB input data 1.2X ShareBackup preserves application performance under failures! 11/16/2018

Extra Cost Small port-count circuit switches--- very inexpensive --e.g. $3 per-port cost for cross-point switches Small backup switch pool --1 backup per failure group is usually enough --k = 48 fat-tree with 27648 servers  ~6.7% extra network cost Partial deployment --failures more destructive at edge layer --employ backup only for ToR failures 11/16/2018

Conclusion ShareBackup: an architectural solution for failure recovery in DCNs --uses circuit switching for fast failover --is an economical approach of using backups in networks --preserves application performance under failures Key takeaways: --rerouting is not the only approach for failure recovery --fast, transparent failure recovery is possible through careful backup placements & fast circuit switching 11/16/2018

Backup---Control System Failures Circuit switch software failure/control channel failure --circuit switches become unresponsive --keep existing circuit configurations, data plane is not impacted --fall back to rerouting Hardware/power failure --controller will receive lots failure reports in a short time --call for human intervention Since ShareBackup uses a separate control network for failure recovery. It must handle potential failures in the control system themselves. Controller failure --state replication on shadow controllers 11/16/2018

Backup---Offline Failure Diagnosis Aggregation switch ? ? Recycle healthy switch - Only one switch has failed - Back to normal after reboot Chain up circuit switches using side ports Circuit switches ? ? Edge switches 11/16/2018 17

Backup---Offline Failure Diagnosis Aggregation switch Circuit switches Edge switches 11/16/2018 18