LocalFlow: Simple, Local Flow Routing in Data Centers Siddhartha Sen, DIMACS 2011 Joint work with Sunghwan Ihm, Kay Ousterhout, and Mike Freedman Princeton.

Slides:



Advertisements
Similar presentations
Data Center Networking with Multipath TCP
Advertisements

Improving Datacenter Performance and Robustness with Multipath TCP
Thoughts on Potential OF 1.1 Features Martin Casado, Brandon Heller, Glen Gibb, Rajiv Ramanathan, Leon Poutievski, Edward Crabbe, You.
Software-defined networking: Change is hard Ratul Mahajan with Chi-Yao Hong, Rohan Gandhi, Xin Jin, Harry Liu, Vijay Gill, Srikanth Kandula, Mohan Nanduri,
Improving Datacenter Performance and Robustness with Multipath TCP Costin Raiciu, Sebastien Barre, Christopher Pluntke, Adam Greenhalgh, Damon Wischik,
Utilizing Datacenter Networks: Dealing with Flow Collisions Costin Raiciu Department of Computer Science University Politehnica of Bucharest.
Congestion Control Created by M Bateman, A Ruddle & C Allison As part of the TCP View project.
Congestion Control: TCP & DC-TCP Swarun Kumar With Slides From: Prof. Katabi, Alizadeh et al.
CS 268: Lecture 7 (Beyond TCP Congestion Control) Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University.
Router-assisted congestion control Lecture 8 CS 653, Fall 2010.
Scalable Flow-Based Networking with DIFANE 1 Minlan Yu Princeton University Joint work with Mike Freedman, Jennifer Rexford and Jia Wang.
Network Architecture for Joint Failure Recovery and Traffic Engineering Martin Suchara in collaboration with: D. Xu, R. Doverspike, D. Johnson and J. Rexford.
Dynamic routing Routing Algorithm (Dijkstra / Bellman-Ford) – idealization –All routers are identical –Network is flat. Not true in Practice Hierarchical.
Spring 2002CS 4611 Router Construction Outline Switched Fabrics IP Routers Tag Switching.
1 WiSE Video: using in-band wireless loss notification to improve rate- controlled video streaming A. Markopoulou, E. Setton, M. Kalman, J. Apostolopoulos.
1 Network Layer: Host-to-Host Communication. 2 Network Layer: Motivation Can we built a global network such as Internet by extending LAN segments using.
Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar.
Multipath Routing Jennifer Rexford Advanced Computer Networks Tuesdays/Thursdays 1:30pm-2:50pm.
Multipath Protocol for Delay-Sensitive Traffic Jennifer Rexford Princeton University Joint work with Umar Javed, Martin Suchara, and Jiayue He
Congestion Control for High Bandwidth-Delay Product Environments Dina Katabi Mark Handley Charlie Rohrs.
Multipath Routing CS 522 F2003 Beaux Sharifi. Agenda Description of Multipath Routing Necessity of Multipath Routing 3 Major Components Necessary for.
Building a Strong Foundation for a Future Internet Jennifer Rexford ’91 Computer Science Department (and Electrical Engineering and the Center for IT Policy)
Jennifer Rexford Princeton University MW 11:00am-12:20pm Wide-Area Traffic Management COS 597E: Software Defined Networking.
WAN Technologies.
Path selection Packet scheduling and multipath Sebastian Siikavirta and Antti aalto.
Data Communications and Networking
Data Transfer Case Study: TCP  Go-back N ARQ  32-bit sequence # indicates byte number in stream  transfers a byte stream, not fixed size user blocks.
Common Devices Used In Computer Networks
CS3502: Data and Computer Networks Local Area Networks - 4 Bridges / LAN internetworks.
David G. Andersen CMU Guohui Wang, T. S. Eugene Ng Rice Michael Kaminsky, Dina Papagiannaki, Michael A. Kozuch, Michael Ryan Intel Labs Pittsburgh 1 c-Through:
DARD: Distributed Adaptive Routing for Datacenter Networks Xin Wu, Xiaowei Yang.
Distributed Algorithms Rajmohan Rajaraman Northeastern University, Boston May 2012 Chennai Network Optimization WorkshopDistributed Algorithms1.
ACN: RED paper1 Random Early Detection Gateways for Congestion Avoidance Sally Floyd and Van Jacobson, IEEE Transactions on Networking, Vol.1, No. 4, (Aug.
The Network Layer Introduction  functionality and service models Theory  link state and distance vector algorithms  broadcast algorithms  hierarchical.
1 Week 5 Lecture 2 IP Layer. 2 Network layer functions transport packet from sending to receiving hosts transport packet from sending to receiving hosts.
Scalable Multi-Class Traffic Management in Data Center Backbone Networks Amitabha Ghosh (UtopiaCompression) Sangtae Ha (Princeton) Edward Crabbe (Google)
Department of Computer Science A Scalable, Commodity Data Center Network Architecture Mohammad Al-Fares Alexander Loukissas Amin Vahdat SIGCOMM’08 Reporter:
Jennifer Rexford Princeton University MW 11:00am-12:20pm Measurement COS 597E: Software Defined Networking.
Intradomain Traffic Engineering By Behzad Akbari These slides are based in part upon slides of J. Rexford (Princeton university)
Jennifer Rexford Fall 2014 (TTh 3:00-4:20 in CS 105) COS 561: Advanced Computer Networks TCP.
Data Center Load Balancing T Seminar Kristian Hartikainen Aalto University, Helsinki, Finland
Chapter 11.4 END-TO-END ISSUES. Optical Internet Optical technology Protocol translates availability of gigabit bandwidth in user-perceived QoS.
Optimization Problems in Wireless Coding Networks Alex Sprintson Computer Engineering Group Department of Electrical and Computer Engineering.
TeXCP: Protecting Providers’ Networks from Unexpected Failures & Traffic Spikes Dina Katabi MIT - CSAIL nms.csail.mit.edu/~dina.
Theophilus Benson*, Ashok Anand*, Aditya Akella*, Ming Zhang + *University of Wisconsin, Madison + Microsoft Research.
ECE 544 Protocol Design Project 2016 Chengyao Wen Hua Deng Xiaoyu Duan.
BUFFALO: Bloom Filter Forwarding Architecture for Large Organizations Minlan Yu Princeton University Joint work with Alex Fabrikant,
COS 561: Advanced Computer Networks
HULA: Scalable Load Balancing Using Programmable Data Planes
CIS 700-5: The Design and Implementation of Cloud Networks
Resilient Datacenter Load Balancing in the Wild
How I Learned to Stop Worrying About the Core and Love the Edge
Advanced Computer Networks
CS 268: Lecture 6 Scott Shenker and Ion Stoica
How I Learned to Stop Worrying About the Core and Love the Edge
ECE 544: Traffic engineering (supplement)
Improving Datacenter Performance and Robustness with Multipath TCP
Improving Datacenter Performance and Robustness with Multipath TCP
Bridging the Theory-Practice Gap in Multi-Commodity Flow Routing
Congestion-Aware Load Balancing at the Virtual Edge
Using Packet Information for Efficient Communication in NoCs
Bridges and Extended LANs
ECE 544 Project3 Team member: BIAO LI, BO QU, XIAO ZHANG 1 1.
COS 461: Computer Networks
Distributed Systems CS
Backbone Traffic Engineering
Congestion-Aware Load Balancing at the Virtual Edge
2019/5/13 A Weighted ECMP Load Balancing Scheme for Data Centers Using P4 Switches Presenter:Hung-Yen Wang Authors:Peng Wang, George Trimponias, Hong Xu,
2019/10/9 A Weighted ECMP Load Balancing Scheme for Data Centers Using P4 Switches Presenter:Hung-Yen Wang Authors:Jin-Li Ye, Yu-Huang Chu, Chien Chen.
Distributed Systems CS
Presentation transcript:

LocalFlow: Simple, Local Flow Routing in Data Centers Siddhartha Sen, DIMACS 2011 Joint work with Sunghwan Ihm, Kay Ousterhout, and Mike Freedman Princeton University

Routing flows in data center networks AB

Network utilization suffers when flows collide… AB C D E F [ECMP, VLB]

AB C D E F … but there is available capacity!

AB C D E F Must compute routes repeatedly: real workloads are dynamic (  ms)!

Multi-commodity flow problem Input: Network G = (V,E) of switches and links Flows K = {(s i,t i,d i )} of source, target, demand tuples Goal: Compute flow that maximizes minimum fraction of any d i routed Requires fractionally splitting flows, otherwise no O(1)-factor approximation

Prior solutions Sequential model – Theory: [Vaidya89, PlotkinST95, GargK07, …] – Practice: [BertsekasG87, BurnsOKM03, Hedera10, …] Billboard model – Theory: [AwerbuchKR07, AwerbuchK09, …] – Practice: [MATE01, TeXCP05, MPTCP11,...] Routers model – Theory: [AwerbuchL93, AwerbuchL94, AwerbuchK07, …] – Practice: [REPLEX06, COPE06, FLARE07, …] more decentralized

Prior solutions Sequential model – Theory: [Vaidya89, PlotkinST95, GargK07, …] – Practice: [BertsekasG87, BurnsOKM03, Hedera10, …] Billboard model – Theory: [AwerbuchKR07, AwerbuchK09, …] – Practice: [MATE01, TeXCP05, MPTCP11,...] Routers model – Theory: [AwerbuchL93, AwerbuchL94, AwerbuchK07, …] – Practice: [REPLEX06, COPE06, FLARE07, …]

Prior solutions Sequential model – Theory: [Vaidya89, PlotkinST95, GargK07, …] – Practice: [BertsekasG87, BurnsOKM03, Hedera10, …] Billboard model – Theory: [AwerbuchKR07, AwerbuchK09, …] – Practice: [MATE01, TeXCP05, MPTCP11,...] Routers model – Theory: [AwerbuchL93, AwerbuchL94, AwerbuchK07, …] – Practice: [REPLEX06, COPE06, FLARE07, …] Theory-practice gap: 1. Models unsuitable for dynamic workloads 2. Splitting flows difficult in practice

Goal: Provably optimal + practical multi-commodity flow routing Goal: Provably optimal + practical multi-commodity flow routing

Problems 1.Dynamic workloads 2.Fractionally splitting flows 3.Switch  end host – Limited processing, high-speed matching on packet headers 1.Routers Plus Preprocessing (RPP) model – Poly-time preprocessing is free – In-band messages are free 2.Splitting technique – Group flows by target, split aggregate flow – Group contiguous packets into flowlets to reduce reordering 3.Add forwarding table rules to programmable switches – Match TCP seq num header, use bit tricks to create flowlets Solutions

Sequential solutions don’t scale Controller [Hedera10]

Sequential solutions don’t scale Controller [Hedera10]

Billboard solutions require link utilization information… AB [MPTCP11] in-band message (ECN, 3-dup ACK)

… and react to congestion optimistically… [MPTCP11] AB

… or model paths explicitly (exponential) AB

[REPLEX06] Routers solutions are local and scalable…

AB … but lack global knowledge But in practice we can: Compute valid routes via preprocessing (e.g., RIP) Get congestion info via in-band messages (like Billboard model)

Problems 1.Dynamic workloads 2.Fractionally splitting flows 3.Switch  end host – Limited processing, high-speed matching on packet headers 1.Routers Plus Preprocessing (RPP) model – Poly-time preprocessing is free – In-band messages are free 2.Splitting technique – Group flows by target, split aggregate flow – Group contiguous packets into flowlets to reduce reordering 3.Add forwarding table rules to programmable switches – Match TCP seq num header, use bit tricks to create flowlets Solutions

RPP model: Embrace locality… AB C D E F

… by proactively splitting flows AB C D E F

AB C D E F Problems: Split every flow? What granularity to split at?

Frequency of splitting switch

Frequency of splitting switch

Frequency of splitting switch one flow split!

Granularity of splitting Per-PacketPer-Flow Optimal routing High reordering Suboptimal routing Low reordering flowlets

Problems 1.Dynamic workloads 2.Splitting flows 3.Switch  end host – Limited processing, high-speed matching on packet headers 1.Routers Plus Preprocessing (RPP) model – Poly-time preprocessing is free – In-band messages are free 2.Splitting technique – Group flows by target, split aggregate flow – Group contiguous packets into flowlets to reduce reordering 3.Add forwarding table rules to programmable switches – Match TCP seq num header, use bit tricks to create flowlets Solutions

Line rate splitting (simplified) FlowTCP seq numLink A  B *…0*****1 A  B *…10****2 A  B *…11****3 1/2 1/4 flowlet = 16 packets flowlet = 16 packets

Summary LocalFlow is simple and local – No central control (unlike Hedera) or per-source control (unlike MPTCP) – No avoidable collisions (unlike ECMP/VLB/MPTCP) – Relies on symmetry of data center networks (unlike MPTCP) RPP model bridges theory-practice gap

Preliminary simulations Ran LocalFlow on packet trace from university data center switch – 3914 secs,  260,000 unique flows – Measured effect of grouping flows by target on frequency of splitting Ran LocalFlow on simulated 16-host fat-tree network running TCP – Delayed 5% of packets at each switch by norm(x,x) – Measured effect of flowlet size on reordering

Splitting is infrequent Group flows by target

Splitting is infrequent  -approximate splitting

Preliminary simulations Ran LocalFlow on packet trace from university data center switch – 3914 secs,  260,000 unique flows – Measured effect of grouping flows by target on frequency of splitting Ran LocalFlow on simulated 16-host fat-tree network running TCP – Delayed 5% of packets at each switch by norm(x,x) – Measured effect of flowlet size on reordering

Reordering is low