High Speed Networks Need Proactive Congestion Control

Slides:



Advertisements
Similar presentations
Martin Suchara, Ryan Witt, Bartek Wydrowski California Institute of Technology Pasadena, U.S.A. TCP MaxNet Implementation and Experiments on the WAN in.
Advertisements

Congestion Control and Fairness Models Nick Feamster CS 4251 Computer Networking II Spring 2008.
1 Routing Protocols I. 2 Routing Recall: There are two parts to routing IP packets: 1. How to pass a packet from an input interface to the output interface.
Optimal Capacity Sharing of Networks with Multiple Overlays Zheng Ma, Jiang Chen, Yang Richard Yang and Arvind Krishnamurthy Yale University University.
Finishing Flows Quickly with Preemptive Scheduling
Mobile and Wireless Computing Institute for Computer Science, University of Freiburg Western Australian Interactive Virtual Environments Centre (IVEC)
Traffic Engineering with Forward Fault Correction (FFC)
1 EP2210 Fairness Lecture material: –Bertsekas, Gallager, Data networks, 6.5 –L. Massoulie, J. Roberts, "Bandwidth sharing: objectives and algorithms,“
Deconstructing Datacenter Packet Transport Mohammad Alizadeh, Shuang Yang, Sachin Katti, Nick McKeown, Balaji Prabhakar, Scott Shenker Stanford University.
MAC3: Medium Access Coding & Congestion Control Devavrat Shah (MIT) Damon Wischik (UCL)
PFabric: Minimal Near-Optimal Datacenter Transport Mohammad Alizadeh Shuang Yang, Milad Sharif, Sachin Katti, Nick McKeown, Balaji Prabhakar, Scott Shenker.
Congestion Control Created by M Bateman, A Ruddle & C Allison As part of the TCP View project.
CS640: Introduction to Computer Networks Mozafar Bag-Mohammadi Lecture 3 TCP Congestion Control.
1 Updates on Backward Congestion Notification Davide Bergamasco Cisco Systems, Inc. IEEE 802 Plenary Meeting San Francisco, USA July.
Router-assisted congestion control Lecture 8 CS 653, Fall 2010.
Advanced Computer Networking Congestion Control for High Bandwidth-Delay Product Environments (XCP Algorithm) 1.
Congestion Control An Overview -Jyothi Guntaka. Congestion  What is congestion ?  The aggregate demand for network resources exceeds the available capacity.
XCP: Congestion Control for High Bandwidth-Delay Product Network Dina Katabi, Mark Handley and Charlie Rohrs Presented by Ao-Jan Su.
Local Area Networks LAN. Why LANs? Provide a means of DIRECT connection to other machines Manage access Provide reasonable performance Hopefully allow.
One More Bit Is Enough Yong Xia, RPI Lakshminarayanan Subramanian, UCB Ion Stoica, UCB Shivkumar Kalyanaraman, RPI SIGCOMM’05, August 22-26, 2005, Philadelphia,
The War Between Mice and Elephants Presented By Eric Wang Liang Guo and Ibrahim Matta Boston University ICNP
1 LINK STATE PROTOCOLS (contents) Disadvantages of the distance vector protocols Link state protocols Why is a link state protocol better?
Routing - I Important concepts: link state based routing, distance vector based routing.
Yashar Ganjali Computer Systems Laboratory Stanford University February 13, 2003 Optimal Routing in the Internet.
Congestion Control Tanenbaum 5.3, /12/2015Congestion Control (A Loss Based Technique: TCP)2 What? Why? Congestion occurs when –there is no reservation.
Network Coding for Large Scale Content Distribution Christos Gkantsidis Georgia Institute of Technology Pablo Rodriguez Microsoft Research IEEE INFOCOM.
1 University of Freiburg Computer Networks and Telematics Prof. Christian Schindelhauer Mobile Ad Hoc Networks Network Coding and Xors in the Air 7th Week.
AQM for Congestion Control1 A Study of Active Queue Management for Congestion Control Victor Firoiu Marty Borden.
High speed TCP’s. Why high-speed TCP? Suppose that the bottleneck bandwidth is 10Gbps and RTT = 200ms. Bandwidth delay product is packets (1500.
A General approach to MPLS Path Protection using Segments Ashish Gupta Ashish Gupta.
Lecture 10. TCP flow Control Used for data transfer Every successful packet is acknowledged One can not have more than W unacknowledged packets, where.
Packet Switching EE3900 Data Communications and LANs Packet Switching Slide 1.
Comparison of MaxNet and XCP: Network Congestion Control using explicit signalling Speaker: Bartek Wydrowski Compiled from work by: Lachlan Andrew (2),
CS352- Link Layer Dept. of Computer Science Rutgers University.
Congestion Control for High Bandwidth-Delay Product Environments Dina Katabi Mark Handley Charlie Rohrs.
Yashar Ganjali High Performance Networking Group Stanford University September 17, 2003 Minimum-delay Routing.
Curbing Delays in Datacenters: Need Time to Save Time? Mohammad Alizadeh Sachin Katti, Balaji Prabhakar Insieme Networks Stanford University 1.
1 MaxNet and TCP Reno/RED on mice traffic Khoa Truong Phan Ho Chi Minh city University of Technology (HCMUT)
CONGESTION CONTROL and RESOURCE ALLOCATION. Definition Resource Allocation : Process by which network elements try to meet the competing demands that.
TFRC: TCP Friendly Rate Control using TCP Equation Based Congestion Model CS 218 W 2003 Oct 29, 2003.
CEDAR Counter-Estimation Decoupling for Approximate Rates Erez Tsidon (Technion, Israel) Joint work with Iddo Hanniel and Isaac Keslassy ( Technion ) 1.
CEDAR Counter-Estimation Decoupling for Approximate Rates Erez Tsidon Joint work with Iddo Hanniel and Isaac Keslassy Technion, Israel 1.
1 Network Coding and its Applications in Communication Networks Alex Sprintson Computer Engineering Group Department of Electrical and Computer Engineering.
U Innsbruck Informatik - 1 CADPC/PTP in a nutshell Michael Welzl
MaxNet NetLab Presentation Hailey Lam Outline MaxNet as an alternative to TCP Linux implementation of MaxNet Demonstration of fairness, quick.
CSCI 465 D ata Communications and Networks Lecture 15 Martin van Bommel CSCI 465 Data Communications & Networks 1.
1 IEEE Meeting July 19, 2006 Raj Jain Modeling of BCN V2.0 Jinjing Jiang and Raj Jain Washington University in Saint Louis Saint Louis, MO
Research Unit in Networking - University of Liège A Distributed Algorithm for Weighted Max-Min Fairness in MPLS Networks Fabian Skivée
1 Time-scale Decomposition and Equivalent Rate Based Marking Yung Yi, Sanjay Shakkottai ECE Dept., UT Austin Supratim Deb.
CprE 458/558: Real-Time Systems (G. Manimaran)1 CprE 458/558: Real-Time Systems Real-Time Networks – WAN Packet Scheduling.
The Macroscopic behavior of the TCP Congestion Avoidance Algorithm.
XCP: eXplicit Control Protocol Dina Katabi MIT Lab for Computer Science
Spring Computer Networks1 Congestion Control Sections 6.1 – 6.4 Outline Preliminaries Queuing Discipline Reacting to Congestion Avoiding Congestion.
Error-Correcting Code
1 Transport Bandwidth Allocation 3/29/2012. Admin. r Exam 1 m Max: 65 m Avg: 52 r Any questions on programming assignment 2 2.
Chapter 10 Congestion Control in Data Networks and Internets 1 Chapter 10 Congestion Control in Data Networks and Internets.
Advanced Computer Networks
CS 268: Lecture 6 Scott Shenker and Ion Stoica
Router-Assisted Congestion Control
Congestion Control, Internet transport protocols: udp
Packetizing Error Detection
Packetizing Error Detection
Network Flow 2016/04/12.
Congestion Control (from Chapter 05)
Packetizing Error Detection
April 10, 2006, Northwestern University
Lecture 10, Computer Networks (198:552)
Congestion Control (from Chapter 05)
Congestion Control (from Chapter 05)
Congestion Control (from Chapter 05)
Presentation transcript:

High Speed Networks Need Proactive Congestion Control Lavanya Jose, Lisa Yan, Nick McKeown, Sachin Katti Stanford University ----- Meeting Notes (11/3/15 09:43) ----- test story see how well it holds up implement TCP to Fastpass continuum (queuing) for discussion Mohammad Alizadeh MIT George Varghese Microsoft Research

The Congestion Control Problem Link 0 100 G Link 1 60 G Link 2 30 G Link 3 10 G Link 4 100 G Flow A Flow B Flow C Flow D

Ask an oracle. Link Capacity 100 1 60 2 30 3 10 4 Link 0 Link 1 Link 2 100 1 60 2 30 3 10 4 Link 0 Link 1 Link 2 Link 3 Link 4 Flow A √ Flow B Flow C Flow D Flow Rate Flow A 35 Flow B 25 Flow C 5 Flow D Link 0 100 G Link 1 60 G Link 2 30 G Link 3 10 G Link 4 100 G Flow A Flow A = 35G Flow C = 5G Flow C Flow B Flow B = 25G Flow D = 5G Flow D

Traditional Congestion Control No explicit information about traffic matrix Measure congestion signals, then react by adjusting rate after measurement delay Gradual, can’t jump to right rates, know direction “Reactive Algorithms” Measure congestion signals such as queueing or packet loss I’m gonna show you a typical example of such a reactive algorithm. This is called RCP. RCP was designed specifically to quickly find max min fair rates, which is the notion of fairness I’m looking at. ----- Meeting Notes (11/16/15 08:54) ----- Traditional`

I’m gonna show you how RCP tries to figure out the rates for these flows. On the y axis I have the transmission rate, on x I have time measured in RTTs. First, let me plot the ideal rate for Flow C. At the smaller 10G link, there’s one other flow. So the red Flow C’s ideal fair share is 5G. Just to give some background in RCP switches try to figure out a fair share rate by only measuring congestion signals like queuing and input traffic rate at their links. A flow sends at the minimum rate it hears from its links. And now let’s see what RCP does for the Flow. ----- Meeting Notes (11/13/15 12:28) ----- flow

30 RTTs to Converge This behavior is typical of all traditional congestion control be it TCP, XCP or DCTCP. It’s cool that..

Convergence Times Are Long If flows only last a few RTTs, then we can’t wait 30 RTTs to converge. At 100G, a typical flow in a search workload is < 7 RTTs long. As link speeds get faster, this problem will only get worse. For e.g., at 100G a typical flow in a search workload is less than 7 RTTs long 1MB / 100 Gb/s = 80 µs

Why “Reactive” Schemes Take Long No explicit information Therefore measure congestion signals, react Can’t leap to correct values but know direction Reaction is fed back into network Take cautious steps Adjust Flow Rate ----- Meeting Notes (11/13/15 12:28) ----- animate points Measure Congestion

Reactive algorithms trade off explicit flow information for long convergence times Can we use explicit flow information and get shorter convergence times?

Back to the oracle, how did she use traffic matrix to compute rates? ----- Meeting Notes (11/16/15 08:00) ----- rates anim Link 0 100 G Link 1 60 G Link 2 30 G Link 3 10 G Link 4 100 G Flow A Flow A = 35G Flow C = 5G Flow C Flow B = 25G Flow B Flow D Flow D = 5G

Waterfilling Algorithm Link 0 (0/ 100 G) Link 4 (0/ 100 G) Link 1 (0/ 60 G) Link 2 (0/ 30 G) Flow A (0 G) Link 3 (0/ 10 G) Flow C (0 G) Flow B (0 G) Flow D (0 G)

Waterfilling- 10 G link is fully used Link 0 (5/ 100 G) Link 4 (5/ 100 G) Link 1 (10/ 60 G) Link 2 (10/ 30 G) Flow A (5 G) Link 3 (10/ 10 G) Flow C (5 G) Flow B (5 G) Flow D (5 G)

Waterfilling- 30 G link is fully used Link 0 (25/ 100 G) Link 4 (5/ 100 G) Link 1 (50/ 60 G) Link 2 (30/ 30 G) Flow A (25 G) Link 3 (10/ 10 G) Flow C (5 G) Flow B (25 G) Flow D (5 G)

Waterfilling- 60 G link is fully used Link 0 (35/ 100 G) Link 4 (5/ 100 G) Link 1 (60/ 60 G) Link 2 (30/ 30 G) Flow A (35 G) Link 3 (10/ 10 G) Flow C (5 G) Flow B (25 G) Flow D (5 G)

Fair Share of Bottlenecked Links Link 0 (35/ 100 G) Fair Share: 35 G Link 4 (5/ 100 G) Link 1 (60 G) Fair Share: 25 G Fair Share: 5 G Link 2 (30 G) Flow A (35 G) Link 3 (10 G) Flow C (5 G) Flow B (25 G) Flow D (5 G) If we try to implement this in a centralized controller, the controller would have to be involved in each flow event, and communicate new rates to all the affected flows. Moreover this scheme takes time proportion

A centralized water-filling scheme may not scale. Can we let the network figure out rates in a distributed fashion?

Fair Share for a Single Link flow demand A ∞ B Capacity at Link 1: 30G So Fair Share Rate: 30G/2 = 15G 15 G Link 1 30 G Flow B Flow A ∞ ∞

A second link introduces a dependency flow demand A ∞ B 10 G ∞ Link 1 30 G Link 2 10 G Flow B Flow A

Dependency Graph

Dependency Graph 10 10 ----- Meeting Notes (11/12/15 16:17) ----- Add animations.

Proactive Explicit Rate Control (PERC) Overview Round 1 (Flows  Links) Flows and links alternately exchange messages. A flow sends a “demand” ∞ when no other fair share min. fair share of other links A link sends a “fair share” C/N when demands are ∞ otherwise use water-filling ∞ 15 15 ∞ ∞ 10

Proactive Explicit Rate Control (PERC) Overview Round 2 (Flows  Links) Flows and links alternately exchange messages. A flow sends a “demand” ∞ when no other fair share min. fair share of other links A link sends a “fair share” C/N when demands are ∞ otherwise use water-filling Messages are approximate, jump to right values quickly with more rounds ∞ 10 15

Proactive Explicit Rate Control (PERC) Overview Round 2 (Links  Flows) Flows and links alternately exchange messages. A flow sends a “demand” ∞ when no other fair share min. fair share of other links A link sends a “fair share” C/N when demands are ∞ otherwise use water-filling Messages are approximate, jump to right values quickly with more rounds 20 20 10

Message Passing Algorithms Decoding error correcting codes (LDPC- Gallager, 1963) Flow counts using shared counters (Counter Braids- Lu et al, 2008) Counter 1 Parity Check 1 Parity Check 2 x2 x1 x3 1 x1 + x3 = 0 x2+ x3 = 0 ... Flow A 36 Flow B Counter 2 REDRAW and label ----- Meeting Notes (11/12/15 16:17) ----- New SLIDE PERC 32

Making PERC concrete

PERC Implementation Control Packet For Flow B d| ∞ | ∞ f| ? | ? The messages that a flow exchanges with its links are all in a control packets. Here in Flow B’s control packet you can see demands for each link on its path and placeholders for fair shares. Links on the way look at the demand an fill in the fair shares. Note that each flow is sending these control packets as long as it’s active and independently of other flows, in an asynchronous way.

PERC Implementation Control Packet For Flow B d| ∞ | ∞ f|15 | ?

PERC Implementation Control Packet For Flow B d| ∞ | ∞ d| ∞ | ∞

PERC Implementation Control Packet For Flow B d| ∞ | ∞ d|10 |15 make send at text bigger! send at 15G!

PERC converges fast < 5 RTTs RCP took 30 RTTs to Converge

PERC Converges Fast 4 vs 14 at Median 10 vs 71 at Tail (99th) label median and tail

Some unanswered questions How to calculate fair shares in PERC switches? How to bound convergences times in theory? What about other policies?

Takeways Reactive schemes are slow for short flows (majority) at 100G Proactive schemes like PERC are fundamentally different and can converge quickly because they calculate explicit rates based on out of band information about set of active flows. Message passing promising proactive approach- could be practical, need further analysis to understand good convergence times in practice.

Thanks!

Shorter FCTs For Flows That Last A Few RTTs (“Medium”) 100G, 12us

XCP

RCP

ATM/ Charny etc. - ATMs - flow set up time experiments show we converge faster message passing based idea seems promising

Discussion Fundamentally any limit on how fast we can get max-min rates? Explicit or implicit whatever