Jiaxin Cao, Rui Xia, Pengkun Yang, Chuanxiong Guo,

Slides:

Advertisements

Similar presentations

BCube: A High Performance, Server-centric Network Architecture for Modular Data Centers Chuanxiong Guo1, Guohan Lu1, Dan Li1, Haitao Wu1, Xuan Zhang2,

Advertisements

PortLand: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric. Presented by: Vinuthna Nalluri Shiva Srivastava.

Improving Datacenter Performance and Robustness with Multipath TCP Costin Raiciu, Sebastien Barre, Christopher Pluntke, Adam Greenhalgh, Damon Wischik,

Enabling Flow-level Latency Measurements across Routers in Data Centers Parmjeet Singh, Myungjin Lee Sagar Kumar, Ramana Rao Kompella.

Datacenter Network Topologies

Chapter 10 Introduction to Wide Area Networks Data Communications and Computer Networks: A Business User’s Approach.

BCube: A High Performance, Server-centric Network Architecture for Modular Data Centers Chuanxiong Guo1, Guohan Lu1, Dan Li1, Haitao Wu1, Xuan Zhang2,

ACN: Congestion Control1 Congestion Control and Resource Allocation.

Chuanxiong Guo, Haitao Wu, Kun Tan,

Jennifer Rexford Fall 2014 (TTh 3:00-4:20 in CS 105) COS 561: Advanced Computer Networks TCP.

ProActive Routing In Scalable Data Centers with PARIS Joint work with Dushyant Arora + and Jennifer Rexford* + Arista Networks *Princeton University Theophilus.

Information-Agnostic Flow Scheduling for Commodity Data Centers

CONGA: Distributed Congestion-Aware Load Balancing for Datacenters

1 Indirect Adaptive Routing on Large Scale Interconnection Networks Nan Jiang, William J. Dally Computer System Laboratory Stanford University John Kim.

The Delta Routing Project Low-loss Routing for Hybrid Private Networks George Porter (UCB) Minwen Ji, Ph.D. (SRC - HP Labs)

EECC694 - Shaaban #1 lec #7 Spring The OSI Reference Model Network Layer.

A Scalable, Commodity Data Center Network Architecture

Layer-3 Routing Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.

Modeling and Evaluation of Fibre Channel Storage Area Networks Xavier Molero, Federico Silla, Vicente Santonia and Jose Duato.

Mohammad Alizadeh, Abdul Kabbani, Tom Edsall,

Switching, routing, and flow control in interconnection networks.

Lecture 1, 1Spring 2003, COM1337/3501Computer Communication Networks Rajmohan Rajaraman COM1337/3501 Textbook: Computer Networks: A Systems Approach, L.

FAR: A Fault-avoidance Routing Method for Data Center Networks with Regular Topology Bin Liu, ZTE.

Switching Techniques Student: Blidaru Catalina Elena.

Datacast: A Scalable and Efficient Reliable Group Data Delivery Service for Data Centers Jiaxin Cao, Chuanxiong Guo, Guohan Lu, Yongqiang Xiong, Yixin.

Curbing Delays in Datacenters: Need Time to Save Time? Mohammad Alizadeh Sachin Katti, Balaji Prabhakar Insieme Networks Stanford University 1.

1 Pertemuan 20 Teknik Routing Matakuliah: H0174/Jaringan Komputer Tahun: 2006 Versi: 1/0.

Detail: Reducing the Flow Completion Time Tail in Datacenter Networks SIGCOMM PIGGY.

A Distributed Energy Saving Approach for Ethernet Switches in Data Centers Weisheng Si 1, Javid Taheri 2, Albert Zomaya 2 1 School of Computing, Engineering,

 Network Segments  NICs  Repeaters  Hubs  Bridges  Switches  Routers and Brouters  Gateways 2.

DARD: Distributed Adaptive Routing for Datacenter Networks Xin Wu, Xiaowei Yang.

Design and Implementation of a Multi-Channel Multi-Interface Network Chandrakanth Chereddi Pradeep Kyasanur Nitin H. Vaidya University of Illinois at Urbana-Champaign.

1 Flow Identification Assume you want to guarantee some type of quality of service (minimum bandwidth, maximum end-to-end delay) to a user Before you do.

ACN: RED paper1 Random Early Detection Gateways for Congestion Avoidance Sally Floyd and Van Jacobson, IEEE Transactions on Networking, Vol.1, No. 4, (Aug.

Logical Topology Design

CSCI 465 D ata Communications and Networks Lecture 15 Martin van Bommel CSCI 465 Data Communications & Networks 1.

Load-Balancing Routing in Multichannel Hybrid Wireless Networks With Single Network Interface So, J.; Vaidya, N. H.; Vehicular Technology, IEEE Transactions.

VL2: A Scalable and Flexible Data Center Network Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim, Parantap Lahiri, David.

Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

Symbiotic Routing in Future Data Centers Hussam Abu-Libdeh Paolo Costa Antony Rowstron Greg O’Shea Austin Donnelly MICROSOFT RESEARCH Presented By Deng.

Teknik Routing Pertemuan 10 Matakuliah: H0524/Jaringan Komputer Tahun: 2009.

Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.

CQRD: A Switch-based Approach to Flow Interference in Data Center Networks Guo Chen Dan Pei, Youjian Zhao Tsinghua University, Beijing, China.

Theophilus Benson*, Ashok Anand*, Aditya Akella*, Ming Zhang + *University of Wisconsin, Madison + Microsoft Research.

Peter Pham and Sylvie Perreau, IEEE 2002 Mobile and Wireless Communications Network Multi-Path Routing Protocol with Load Balancing Policy in Mobile Ad.

MMPTCP: A Multipath Transport Protocol for Data Centres 1 Morteza Kheirkhah University of Edinburgh, UK Ian Wakeman and George Parisis University of Sussex,

R2C2: A Network Stack for Rack-scale Computers Paolo Costa, Hitesh Ballani, Kaveh Razavi, Ian Kash Microsoft Research Cambridge EECS 582 – W161.

Fall, 2001CS 6401 Switching and Routing Outline Routing overview Store-and-Forward switches Virtual circuits vs. Datagram switching.

VL2: A Scalable and Flexible Data Center Network

HULA: Scalable Load Balancing Using Programmable Data Planes

Modeling and Evaluation of Fibre Channel Storage Area Networks

5/3/2018 3:51 AM Memory Efficient Loss Recovery for Hardware-based Transport in Datacenter Yuanwei Lu1,2, Guo Chen2, Zhenyuan Ruan1,2, Wencong Xiao2,3,

How I Learned to Stop Worrying About the Core and Love the Edge

Data Center Network Architectures

Chuanxiong Guo, et al, Microsoft Research Asia, SIGCOMM 2008

ECE 544: Traffic engineering (supplement)

Improving Datacenter Performance and Robustness with Multipath TCP

Congestion-Aware Load Balancing at the Virtual Edge

BCube: A High Performance, Server-centric Network Architecture for Modular Data Centers Chuanxiong Guo1, Guohan Lu1, Dan Li1, Haitao Wu1, Xuan Zhang2,

11/13/ :11 PM Memory Efficient Loss Recovery for Hardware-based Transport in Datacenter Yuanwei Lu1,2, Guo Chen2, Zhenyuan Ruan1,2, Wencong Xiao2,3,

Chuanxiong Guo, Haitao Wu, Kun Tan,

Switching, routing, and flow control in interconnection networks

AMP: A Better Multipath TCP for Data Center Networks

VL2: A Scalable and Flexible Data Center Network

RDMA over Commodity Ethernet at Scale

Congestion-Aware Load Balancing at the Virtual Edge

2019/5/13 A Weighted ECMP Load Balancing Scheme for Data Centers Using P4 Switches Presenter：Hung-Yen Wang Authors：Peng Wang, George Trimponias, Hong Xu,

Switching, routing, and flow control in interconnection networks

Dragonfly+: Low Cost Topology for scaling Datacenters

Review of Internet Protocols Transport Layer

Presentation transcript:

Per-packet Load-balanced, Low-Latency Routing for Clos-based Data Center Networks Jiaxin Cao, Rui Xia, Pengkun Yang, Chuanxiong Guo, Guohan Lu, Lihua Yuan, Yixin Zheng, Haitao Wu, Yongqiang Xiong, Dave Maltz December 10 2013 Santa Barbara, California

Outline Background DRB for load-balance and low latency DRB for 100% bandwidth utilization DRB latency modeling Routing design and failure handling Evaluations Related work Conclusion

Clos-based DCN: background Topology Routing Equal-cost multi-path (ECMP) Given a spine switch, there is only one path from a src to a dst in fat-tree

Clos-based DCN: issues Low network utilization Due to flow-based hash collision for ECMP High network latency tail results high user perceived latency Many DC applications use thousands or more TCP connections

Network latency measurement 400us Network latency has a long tail Busy servers do not contribute to the long latency tail Server network stack increases the latency by several hundred us 1.5ms 2ms

Where the latency tail comes from A (temporary) congested switch port can use several MB for packet buffering 1MB buffer introduces 1ms latency for a 10G link For a three layer DCN, intra-DC communications take up to 5 hops

The challenge The challenge Given a full bisection bandwidth Clos network, achieve 100% bandwidth utilization and 0 in-network latency Many ways to improve, but none addresses the challenge fully E.g., use traffic engineering for better bandwidth utilization, introduce ECN for latency mitigation Our answer: DRB The challenge:

Digit-reversal bouncing (DRB) Right time for per-packet routing Regular Clos topology Server software stack under control Switches become open and programmable DRB Achieves 100% bandwidth utilization by per-packet routing Achieves small queuing delay by its “Digit-reversal” algorithm Can be readily implemented

Achieve 100% bandwidth utilization Sufficient condition for 100% utilization In a Fat-tree network, given an arbitrary feasible traffic matrix, if a routing algorithm can evenly spread the traffic ai,j from server i to server j among all the possible uplinks at every layer, then all the links, including all the downlinks, are not overloaded The condition implies: Oblivious load-balancing: no need of traffic matrix Packet bouncing: only need to load-balance uplinks Source-destination pair instead of flow-based

DRB for fat-tree DRB for bouncing switch selection: Seq Digit-reversal Spine switch 00 3.0 (00) 01 10 3.2 (10) 3.1 (01) 11 3.3 (11) Many ways to meet the sufficient condition: RB (random bouncing) RRB (round-robin bouncing)

Queuing latency modeling First-hop queue length vs. traffic load with switch port number 24 First-hop queue length vs. switch port number when traffic load is 0.95 DRB and RRB achieves bounded queue length when load approaches 100% Queue length of RRB in proportion to $n^2$ Queue length of DRB is very small (2-3 pkts)

DRB for VL2 Given a spine switch, there are multiple paths between a source and a destination in VL2 DRB splits a spine switch into multiple “virtual spine switches”

DRB routing and failure handling Servers choose bouncing switch for each packet Switches use static routing Switches are programmed to maintain up-to-date network topology Leverage network topology to minimize broadcast messages

Simulation: network utilization Simulation setup: Pkt level simulation with NS3 Three-layer fat-tree and VL2 with 3000+ servers Permutation traffic pattern TCP for transport protocol with 256KB buffer size Resequencing buffer for out-of-order packet arrivals

Simulation: queuing delay RRB results in large queuing delay at the first and forth hops DRB achieves the smallest queuing delay even thought its throughput is the highest

Simulations: out-of-order arrivals Resequencing delay is defined as the time a packet stays in the resequencing buffer RB’s resequencing delay is the worst Resequencing delay is not directly related to queuing delay DRB achieves very small number of out-of-order packet arrivals

Implementation and testbed Servers : perform ip-in-ip packet encap for each source-destination pair at the sending side; packet re-sequencing at the receiving side Switches: ip-in-ip packet decap; topology maintenance Testbed A three-layer fat-tree with 54 servers Each switch has 6 ports

Experiments: queuing delay RB results in large queue length (250KB per port) DRB and RRB performs similar since each switch has only 3 uplinks DRB’s queue length is only 2-3 pkts Same with the queue modeling and simulation results

Related work Random-based per-packet routing Flowlet based LocalFlow Random Packet Spraying (RPS) Random-based per packet VLB Flowlet based LocalFlow DeTail (lossless link layer + per-packet adaptive routing) Flow-level deadline-based approaches D3, D2TCP, PDQ

Conclusion DRB achieves DRB can be readily implemented 100% bandwidth utilization Almost 0 queuing delay Few out-of-order packet arrivals DRB can be readily implemented Servers for packet encap and switches for packet decap

Q & A This is the end my presentation. Thank you. Any questions?