AQM and packet scheduling,... again! Jim Roberts IRT-SystemX, France.

Slides:



Advertisements
Similar presentations
A Switch-Based Approach to Starvation in Data Centers Alex Shpiner and Isaac Keslassy Department of Electrical Engineering, Technion. Gabi Bracha, Eyal.
Advertisements

Deconstructing Datacenter Packet Transport Mohammad Alizadeh, Shuang Yang, Sachin Katti, Nick McKeown, Balaji Prabhakar, Scott Shenker Stanford University.
CSIT560 Internet Infrastructure: Switches and Routers Active Queue Management Presented By: Gary Po, Henry Hui and Kenny Chong.
PFabric: Minimal Near-Optimal Datacenter Transport Mohammad Alizadeh Shuang Yang, Milad Sharif, Sachin Katti, Nick McKeown, Balaji Prabhakar, Scott Shenker.
Playback-buffer Equalization For Streaming Media Using Stateless Transport Prioritization By Wai-tian Tan, Weidong Cui and John G. Apostolopoulos Presented.
CS 268: Lecture 8 Router Support for Congestion Control Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences.
CS 4700 / CS 5700 Network Fundamentals Lecture 12: Router-Aided Congestion Control (Drop it like it’s hot) Revised 3/18/13.
Router-assisted congestion control Lecture 8 CS 653, Fall 2010.
On Impact of Non-Conformant Flows on a Network of Drop-Tail Gateways Kartikeya Chandrayana Shivkumar Kalyanaraman ECSE Dept., R.P.I. (
Advanced Computer Networking Congestion Control for High Bandwidth-Delay Product Environments (XCP Algorithm) 1.
Congestion Control An Overview -Jyothi Guntaka. Congestion  What is congestion ?  The aggregate demand for network resources exceeds the available capacity.
XCP: Congestion Control for High Bandwidth-Delay Product Network Dina Katabi, Mark Handley and Charlie Rohrs Presented by Ao-Jan Su.
Fair queueing and congestion control Jim Roberts (France Telecom) Joint work with Jordan Augé Workshop on Congestion Control Hamilton Institute, Sept 2005.
IP traffic and QoS control : the need for flow aware networking Jim Roberts France Telecom R&D NSF-COST Workshop.
CPSC Topics in Multimedia Networking A Mechanism for Equitable Bandwidth Allocation under QoS and Budget Constraints D. Sivakumar IBM Almaden Research.
A Case for Relative Differentiated Services and the Proportional Differentiation Model Constantinos Dovrolis Parameswaran Ramanathan University of Wisconsin-Madison.
High Performance All-Optical Networks with Small Buffers Yashar Ganjali High Performance Networking Group Stanford University
Networking Issues in LAN Telephony Brian Yang
Comparing flow-oblivious and flow-aware adaptive routing Sara Oueslati and Jim Roberts France Telecom R&D CISS 2006 Princeton March 2006.
Generalized Processing Sharing (GPS) Is work conserving Is a fluid model Service Guarantee –GPS discipline can provide an end-to-end bounded- delay service.
ACN: IntServ and DiffServ1 Integrated Service (IntServ) versus Differentiated Service (Diffserv) Information taken from Kurose and Ross textbook “ Computer.
Analysis and Simulation of a Fair Queuing Algorithm
CS 268: Lecture 8 (Router Support for Congestion Control) Ion Stoica February 19, 2002.
A Switch-Based Approach to Starvation in Data Centers Alex Shpiner Joint work with Isaac Keslassy Faculty of Electrical Engineering Faculty of Electrical.
A Real-Time Video Multicast Architecture for Assured Forwarding Services Ashraf Matrawy, Ioannis Lambadaris IEEE TRANSACTIONS ON MULTIMEDIA, AUGUST 2005.
Promoting the Use of End-to- End Congestion Control in the Internet Sally Floyd and Kevin Fall Presented by Scott McLaren.
1 Emulating AQM from End Hosts Presenters: Syed Zaidi Ivor Rodrigues.
Computer Networking Lecture 17 – Queue Management As usual: Thanks to Srini Seshan and Dave Anderson.
Lecture 5: Congestion Control l Challenge: how do we efficiently share network resources among billions of hosts? n Last time: TCP n This time: Alternative.
The War Between Mice and Elephants By Liang Guo (Graduate Student) Ibrahim Matta (Professor) Boston University ICNP’2001 Presented By Preeti Phadnis.
Congestion Control for High Bandwidth-delay Product Networks Dina Katabi, Mark Handley, Charlie Rohrs.
Congestion Control for High Bandwidth-Delay Product Environments Dina Katabi Mark Handley Charlie Rohrs.
France Télécom R&D Engineering for QoS and the limits of service differentiation Jim Roberts IWQoS June 2000.
Ch. 28 Q and A IS 333 Spring Q1 Q: What is network latency? 1.Changes in delay and duration of the changes 2.time required to transfer data across.
IETF-87 AQM BoF Wesley Eddy Richard Scheffenegger Tue., 30. July :00, Potsdam 1 Room 30 July 20131IETF-87, Berlin,
The Effects of Systemic Packets Loss on Aggregate TCP Flows Thomas J. Hacker May 8, 2002 Internet 2 Member Meeting.
Curbing Delays in Datacenters: Need Time to Save Time? Mohammad Alizadeh Sachin Katti, Balaji Prabhakar Insieme Networks Stanford University 1.
TCP Enhancement for Random Loss Jiang Wu Computer Science Lakehead University.
Advance Computer Networking L-5 TCP & Routers Acknowledgments: Lecture slides are from the graduate level Computer Networks course thought by Srinivasan.
ACN: CSFQ1 CSFQ Core-Stateless Fair Queueing Presented by Nagaraj Shirali Choong-Soo Lee ACN: CSFQ1.
U Innsbruck Informatik - 1 CADPC/PTP in a nutshell Michael Welzl
ACN: RED paper1 Random Early Detection Gateways for Congestion Avoidance Sally Floyd and Van Jacobson, IEEE Transactions on Networking, Vol.1, No. 4, (Aug.
Wolfgang EffelsbergUniversity of Mannheim1 Differentiated Services for the Internet Wolfgang Effelsberg University of Mannheim September 2001.
1 On Class-based Isolation of UDP, Short-lived and Long-lived TCP Flows by Selma Yilmaz Ibrahim Matta Computer Science Department Boston University.
Queueing and Active Queue Management Aditya Akella 02/26/2007.
CS640: Introduction to Computer Networks Aditya Akella Lecture 20 - Queuing and Basics of QoS.
1 Time-scale Decomposition and Equivalent Rate Based Marking Yung Yi, Sanjay Shakkottai ECE Dept., UT Austin Supratim Deb.
Thoughts on the Evolution of TCP in the Internet (version 2) Sally Floyd ICIR Wednesday Lunch March 17,
Jennifer Rexford Fall 2014 (TTh 3:00-4:20 in CS 105) COS 561: Advanced Computer Networks TCP.
Mr. Mark Welton.  Quality of Service is deployed to prevent data from saturating a link to the point that other data cannot gain access to it  QoS allows.
T. S. Eugene Ngeugeneng at cs.rice.edu Rice University1 COMP/ELEC 429 Introduction to Computer Networks Lecture 18: Quality of Service Slides used with.
We used ns-2 network simulator [5] to evaluate RED-DT and compare its performance to RED [1], FRED [2], LQD [3], and CHOKe [4]. All simulation scenarios.
Explicit Allocation of Best-Effort Service Goal: Allocate different rates to different users during congestion Can charge different prices to different.
Queue Scheduling Disciplines
Analysis and Design of an Adaptive Virtual Queue (AVQ) Algorithm for AQM By Srisankar Kunniyur & R. Srikant Presented by Hareesh Pattipati.
1 Sheer volume and dynamic nature of video stresses network resources PIE: A lightweight latency control to address the buffer problem issue Rong Pan,
Providing QoS in IP Networks
Univ. of TehranIntroduction to Computer Network1 An Introduction Computer Networks An Introduction to Computer Networks University of Tehran Dept. of EE.
1 Flow-Aware Networking Introduction Concepts, graphics, etc. from Guide to Flow-Aware Networking: Quality-of-Service Architectures and Techniques for.
Accelerating Peer-to-Peer Networks for Video Streaming
Instructor Materials Chapter 6: Quality of Service
HyGenICC: Hypervisor-based Generic IP Congestion Control for Virtualized Data Centers Conference Paper in Proceedings of ICC16 By Ahmed M. Abdelmoniem,
Queue Management Jennifer Rexford COS 461: Computer Networks
TCP Congestion Control
EE 122: Router Support for Congestion Control: RED and Fair Queueing
Insensitivity in IP network performance
TCP Congestion Control
Lecture 17, Computer Networks (198:552)
Horizon: Balancing TCP over multiple paths in wireless mesh networks
Presentation transcript:

AQM and packet scheduling,... again! Jim Roberts IRT-SystemX, France

A new AQM working group at IETF new interest is arising from perceived problems in home networks (bufferbloat) and data center interconnects the IETF working group will define an AQM that –“minimizes standing queues” –“helps sources control rates without loss (using ECN)” –“protects from aggressive flows” –“avoids global synchronization” for queues in routers, home networks, data centers,... “combined queue management / packet scheduling algorithms are in-scope”

Outline bufferbloat the data center interconnect fair flow queuing for large shared links user controlled sharing of the access link

“Bufferbloat” the problem: too large buffers, notably in home routers, get filled by TCP leading to long packet latency

TCP window/rate Bufferbloat how TCP should use the buffer router buffer link TCP flow “good queue” packets in buffer

“bad queue” Bufferbloat impact of a bloated buffer: longer delays, same throughput router bloated buffer link packets in buffer TCP flow

Bufferbloat impact of a bloated buffer: high latency for real time flows TCP flow router bloated buffer link packets in buffer “bad queue” VoIP flow

Bufferbloat impact of drop tail: unfair bandwidth sharing TCP flow router bloated buffer link packets in buffer TCP flow

Bufferbloat impact of drop tail: unfair bandwidth sharing TCP flow router bloated buffer link packets in buffer “UDP” flow

AQM to combat bufferbloat CoDel (Controlled Delay) [Nichols & Jacobson, 2012] –measure packet sojourn time –drop packets to keep minimum delay near target PIE (Proportional Integral controller Enhanced) [Pan et al, 2013] –drop probability updated based on queue length & departure rate both rely on TCP in end-system, neither ensures fairness

AQM and scheduling: fq_codel fq_codel combines SFQ (stochastic fairness queuing) and CoDel –hash flow ID to one of ~1000 queues (buckets) –deficit round robin scheduling over queues –control latency in each queue using CoDel with some enhancements –priority to packets of “new” flows –drop from queue head rather than tail V. Jacobson (quoted by D. Täht): –“If we're sticking code into boxes to deploy CoDel, don't do that. Deploy fq_codel. It's just an across the board win”

Advantages of per-flow fair queuing fairness is imposed, no need to rely on TCP –flows use the congestion control they want (eg, high speed TCP) –no danger from “unresponsive flows” realizes implicit service differentiation –since rate of streaming flows generally less than fair rate –lower still latency by giving priority to “new” flows as proposed by [Nagle 1985], [Kumar 1998], [Kortebi 2004],... –with longest queue drop as AQM Fair Queuing fair rate

Outline bufferbloat the data center interconnect fair flow queuing for large shared links user controlled sharing of the access link

top of rack switches aggregation switches core routers servers pod Congestion in the interconnect 1000s of servers connected by commodity switches and routers mixture of bulk data transfers requiring high throughput... and query flows requiring low latency

Congestion in the interconnect switch shared buffer links query flows bulk data 1000s of servers connected by commodity switches and routers mixture of bulk data transfers requiring high throughput... and query flows requiring low latency a practical observation –regular TCP does not ensure high throughput and low latency

Many new congestion control schemes DCTCP [Alizadeh 2010] –limits delays by refined ECN scheme to smooth rate variations D 3 [Wilson 2011] –“deadline driven delivery” D 2 TCP [Vamanan 2012] –combines aspects of previous two PDQ [Hong 2012] –size-based pre-emptive scheduling HULL [Alizadeh 2012] –low delay by “phantom queues” and ECN evaluations assume all data center flows implement the recommended protocol

pFabric: “minimalist data center transport” instead of end-to-end congestion control, implement scheduling in switch and server buffers [Alizadeh 2013] “key insight: decouple flow schedule from rate control” top of rack switches aggregation switches core routers servers pod

pFabric: “minimalist data center transport” instead of end-to-end congestion control, implement scheduling in switch and server buffers [Alizadeh 2013] “key insight: decouple flow schedule from rate control” “shortest job first” scheduling to minimize flow completion time server or switch outputinput

pFabric: “minimalist data center transport” instead of end-to-end congestion control, implement scheduling in switch and server buffers [Alizadeh 2013] “key insight: decouple flow schedule from rate control” “shortest job first” scheduling to minimize flow completion time also optimal for flows that arrive over time minimal rate control: eg, start at max rate, adjust using AIMD server or switch green flow pre-empted green flow resumes outputinput

pFabric: possible drawbacks relies on scheduler knowing flow sizes –hosts can lie or divide long flows into successive short flows priority in networks can lead to loss of traffic capacity –capacity is load at which delays explode –eg, 2 classes & 2-buffer paths requires ρ 2 < (1 − ρ 1 ) 2 –cf. [Bonald & Massoulié 2003], [Verloop et al. 2005] egress bufferingress buffer server

Outline bufferbloat the data center interconnect fair flow queuing for large shared links user controlled sharing of the access link

How can the Internet continue to rely on end-to-end congestion control ? “TCP saved the Internet from congestion collapse” –but there are easier ways to avoid the “dead packet” phenomenon new TCP versions are necessary for high speed links –but they are generally very unfair to legacy versions no incentive for applications to be “TCP friendly” –in particular, congestion pricing is unworkable better, like pFabric, to decouple scheduling and rate control local congestion ⇒ many dropped packets large number of “dead packets” flow suffers congestion collapse

“Flow-aware networking” [Kortebi et al. 2004] rather than Intserv, DiffServ, MPLS TE,... routers should impose per-flow fair sharing and not rely on end-system implemented congestion control fair queuing is feasible and scalable and realizes implicit service differentiation for network neutral traffic control note: fairness is an expedient, not a socio-economic objective Fair Queuing fair rate

Understanding traffic at flow level a flow is an instance of some application (document transfer, voice signal,...) –a set of packets with like header fields, local in space and time traffic is a process of flows of different types –conversational, streaming, interactive data, background –with different traffic characteristics – rates, volumes,... –and different requirements for latency, integrity, throughput characterized by size and “peak rate” video stream TCP data peak rate

three bandwidth sharing regimes transparent regime: –all flows suffer negligible loss, no throughput degradation elastic regime: –some high rate flows can saturate residual bandwidth; without control these can degrade quality for all other flows overload regime: –traffic load is greater than capacity; all flows suffer from congestion unless they are handled with priority Controlled bandwidth sharing “transparent”“elastic”“overload”

Statistical bandwidth sharing ie, statistical multiplexing with elastic traffic consider a network link handling flows between users, servers, data centers,... define, link load = flow arrival rate x mean flow size / link rate = packet arrival rate x mean packet size / link rate = mean link utilization

Traffic variations and stationarity one day mean link utilization busy hour demand one week mean link utilization a stationary stochastic process mean

Statistical bandwidth sharing ie, statistical multiplexing with elastic traffic consider a network link handling flows between users, servers, data centers,... define, link load = flow arrival rate x mean flow size / link rate = packet arrival rate x mean packet size / link rate = mean link utilization

Bandwidth sharing performance in the following simulation experiments, assume flows –arrive as a Poisson process –have exponential size distribution –instantaneously share link bandwidth fairly results apply more generally thanks to insensitivity

Performance of fair shared link time number of active flows flow performance mean rate duration (arrival rate x mean size / link rate)

Performance of fair shared link number of active flows time

Performance of fair shared link number of active flows time

Performance of fair shared link number of active flows time

Performance of fair shared link number of active flows time

Performance of fair shared link number of active flows time

Observations the number of flows using a fairly shared link is small until load approaches 100% (for any link capacity) therefore, fair queuing schedulers are feasible and scalable our simulations make Markovian assumptions but the results for the number of active flows are true for much more general traffic [Ben-Fredj 2001] Poisson session arrivals flow arrivals session departures new flow of same session... flow 1 flow 2 flow 3 flow n think time think time a session

More simulations on Internet core links (≥ 10 Gbps), the vast majority of flows cannot use all available capacity; their rate is constrained elsewhere on their path (eg, ≤ 10 Mbps) consider a link shared by flows whose maximum rate is only 1% of the link rate –conservatively assume these flows emit packets as a Poisson process at rate proportional to the number of flows in progress

Performance with rate limited flows time number of flows in progress number of active flows “active flows” have ≥ 1 packet in queue

Performance with rate limited flows time number of flows in progress number of active flows

Performance with rate limited flows time number of flows in progress number of active flows

Performance with rate limited flows time number of flows in progress number of active flows

Performance with rate limited flows time number of flows in progress number of active flows

Performance with rate limited flows time number of flows in progress number of active flows

Observations 2 most flows are not elastic and emit packets at their peak rate these flows are “active”, and need to be scheduled, only when they have a packet in the queue the number of active flows is small until load approaches 100% fair queuing is feasible and scalable, even when the number of flows in progress is very large

Yet more simulations links may be shared by many rate limited flows and a few elastic flows consider a link shared by 50% of traffic from flows whose peak rate is 1% of link rate and 50% elastic traffic

Performance of link with elastic and rate limited flows time number of flows in progress number of active flows

Performance of link with elastic and rate limited flows time number of flows in progress number of active flows

Performance of link with elastic and rate limited flows time number of flows in progress number of active flows

Observations 3 the number of active flows is small (<100) with high probability until load approaches 100% therefore, fair queuing is feasible and scalable fair queuing means packets of limited peak rate flows see negligible delay: –they are delayed by at most 1 round robin cycle –this realizes implicit service differentiation since conversational and streaming flows are in the low rate category –a scheduler like DRR considers packets as belonging to new flows; we can therefore identify them and give them priority (cf. fq-codel) imposed flow fairness yields predictable performance that is insensitive to most traffic characteristics –an “Erlang formula” for the Internet [Bonald & Roberts 2012]

Rate control and congestion collapse like pFabric, FQ decouples flow schedule from rate control –minimal rate control: eg, start at max rate, adjust using AIMD –performance doesn’t depend on standard control algorithm (eg, TCP) –FQ has maximal traffic capacity (unlike SRPT) congestion collapse occurs in the overload regime (ie, load > 1) –to be avoided by traffic engineering and overload control another form of congestion collapse would occur if users were all unresponsive to congestion (eg, [Floyd and Fall 1999]) –but this is hardly a realistic scenario! overload regime local congestion ⇒ large number of flows large number of high rate flows these flows suffer from low fair rate

Recommendation for traffic control on shared network links implement per-flow fair queuing in router queues with longest queue drop –this is scalable and feasible –avoids relying on end-system congestion control –and realizes implicit service differentiation –view fairness as an expedient not a socio-economic objective –yielding predictable performance, an “Internet Erlang formula” apply traffic engineering to ensure load is not too close to 100% and implement overload controls in case this fails this works for the Internet and data center interconnects but access networks need more than fair sharing...

Outline bufferbloat the data center interconnect fair flow queuing for large shared links user controlled sharing of the access link

Sharing the first/last mile this is the usual bottleneck for Internet flows –for DSL, cable, wireless, fiber what role for AQM, scheduling, congestion control, QoS? –for upstream and downstream who rules? who decides priorities? –surely, the user, not the network operator “home router” “edge router” uploads ACKs VoIP ∙∙∙ downloads streams VoIP ∙∙∙

Sharing the first/last mile CoDel, PIE, fq-codel do not distinguish classes of service –eg, though users have priorities (eg, background uploads) also, difficult coexistence of AQM and low priority congestion ctrl –eg, CoDel, FQ,... give same share to LEDBAT & TCP [Gong 2013] prefer explicit, per-flow scheduling to realize user’s fairness and priority objectives –eg, priority to Dad’s flows, limit Junior’s total bandwidth,... “home router” “edge router” uploads streams ACKs VoIP ∙∙∙ downloads streams ACKs VoIP ∙∙∙

Sharing the first/last mile operators differentiate managed and over-the-top services –though users may want other priorities (eg, priority to Skype, background downloads) user control would be a better alternative, requiring: –a flow scheduler in the router trading off complexity and flexibility –a signalling protocol allowing the user to dictate priorities “home router” “edge router” uploads streams ACKs VoIP ∙∙∙ downloads streams ACKs VoIP ∙∙∙

Conclusions congestion in data center interconnects –new specific congestion control algorithms and AQMs –pFabric decouples scheduling and rate control imposed fair flow sharing is a scalable solution for large capacity links in the “elastic regime” –avoids relying on end-to-end congestion control –and realizes implicit service differentiation bufferbloat in the home network –new AQMs reduce latency but assume TCP congestion control –fair queuing (fq-codel) is “an across the board win” the first/last mile requires more than just fairness –schedulers in home and edge routers –a signalling scheme to implement user downstream priorities

References –Alizadeh, et al., “Data center TCP (DCTCP)” SIGCOMM, –Alizadeh, et al., “Less is more: trading a little bandwidth for ultra-low latency in the data center”, NSDI, –Alizadeh et al. “pFabric: minimal near-optimal datacenter transport”, SIGCOMM, –Ballani et al., “Towards predictable datacenter networks”, SIGCOMM, –Ben-Fredj et al., “Statistical bandwidth sharing: a study of congestion at flow level”. SIGCOMM, Bonald and Massoulié, “Impact of fairness on Internet performance”, SIGMETRICS, –Bonald and Roberts, “Internet and the Erlang formula”, SIGCOMM Comput. Commun. Rev., Floyd and Fall, “Promoting the use of end-to-end congestion control”, IEEE ToN, Gong et al. “Fighting the bufferbloat: on the coexisitence of AQM and low priority congestion controlé, TMA, 2013.

References (2) –Hong et al., “Finishing flows quickly with preemptive scheduling”, SIGCOMM, –Kortebi et al. “Cross-protect: implicit service differentiation and admission control”, HPSR, –Kumar et al., “Beyond best efort: router architectures for the differentiated services of tomorrow’s Internet”, IEEE Comm Mag., –Nagle, “On packet switches with infinite storage”, RFC 970, –Nichols and Jacobson, “Controlling queue delay”, Comm. ACM, –Pan et al. “PIE: a lightweight control scheme to address the bufferbloat problem”, HPSR, –Vamanan et al., “Deadline-aware datacenter TCP (D2TCP)”, SIGCOMM, [Verloop et al., “Stability of size-based scheduling disciplines in resource sharing networks”, Perfom. Eval., Wilson et al., “Better never than late: meeting deadlines in datacenter networks”, SIGCOMM, 2011.