CSIT560 By M. Hamdi 1 Packet Arbitration in VoQ switches and Others and QoS.

Slides:



Advertisements
Similar presentations
1 Maintaining Packet Order in Two-Stage Switches Isaac Keslassy, Nick McKeown Stanford University.
Advertisements

1 CNPA B Nasser S. Abouzakhar Queuing Disciplines Week 8 – Lecture 2 16 th November, 2009.
Courtesy: Nick McKeown, Stanford 1 Intro to Quality of Service Tahir Azim.
EECB 473 Data Network Architecture and Electronics Lecture 3 Packet Processing Functions.
Real-Time Protocol (RTP) r Provides standard packet format for real-time application r Typically runs over UDP r Specifies header fields below r Payload.
COMP680E by M. Hamdi 1 Can we make these scheduling algorithms simpler? Using a Simpler Architecture.
Differentiated Services. Service Differentiation in the Internet Different applications have varying bandwidth, delay, and reliability requirements How.
Nick McKeown CS244 Lecture 6 Packet Switches. What you said The very premise of the paper was a bit of an eye- opener for me, for previously I had never.
Worst-case Fair Weighted Fair Queueing (WF²Q) by Jon C.R. Bennett & Hui Zhang Presented by Vitali Greenberg.
Algorithm Orals Algorithm Qualifying Examination Orals Achieving 100% Throughput in IQ/CIOQ Switches using Maximum Size and Maximal Matching Algorithms.
Making Parallel Packet Switches Practical Sundar Iyer, Nick McKeown Departments of Electrical Engineering & Computer Science,
Scheduling CS 215 W Keshav Chpt 9 Problem: given N packet streams contending for the same channel, how to schedule pkt transmissions?
Generalized Processing Sharing (GPS) Is work conserving Is a fluid model Service Guarantee –GPS discipline can provide an end-to-end bounded- delay service.
1 Comnet 2006 Communication Networks Recitation 5 Input Queuing Scheduling & Combined Switches.
ACN: IntServ and DiffServ1 Integrated Service (IntServ) versus Differentiated Service (Diffserv) Information taken from Kurose and Ross textbook “ Computer.
Katz, Stoica F04 EECS 122: Introduction to Computer Networks Packet Scheduling and QoS Computer Science Division Department of Electrical Engineering and.
1 ENTS689L: Packet Processing and Switching Buffer-less Switch Fabric Architectures Buffer-less Switch Fabric Architectures Vahid Tabatabaee Fall 2006.
CSIT560 by M. Hamdi 1 Course Exam: Review April 18/19 (in-Class)
CSE 401N Multimedia Networking-2 Lecture-19. Improving QOS in IP Networks Thus far: “making the best of best effort” Future: next generation Internet.
Computer Networking Lecture 17 – Queue Management As usual: Thanks to Srini Seshan and Dave Anderson.
CIST560 by M. Hamdi 1 Packet Scheduling/Arbitration in Virtual Output Queues: Maximal Matching Algorithms (Part II)
COMP680E by M. Hamdi 1 Course Exam: Review April 17 (in-Class)
1 Achieving 100% throughput Where we are in the course… 1. Switch model 2. Uniform traffic  Technique: Uniform schedule (easy) 3. Non-uniform traffic,
1 Netcomm 2005 Communication Networks Recitation 4.
CSIT560 by M. Hamdi 1 Packet Scheduling/Arbitration in Virtual Output Queues and Others.
1 Netcomm 2005 Communication Networks Recitation 5.
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Maximal.
CS144, Stanford University Error in Q3-7. CS144, Stanford University Using longest prefix matching, the IP address will match which entry? a /8.
Pipelined Two Step Iterative Matching Algorithms for CIOQ Crossbar Switches Deng Pan and Yuanyuan Yang State University of New York, Stony Brook.
Localized Asynchronous Packet Scheduling for Buffered Crossbar Switches Deng Pan and Yuanyuan Yang State University of New York Stony Brook.
Packet Scheduling From Ion Stoica. 2 Packet Scheduling  Decide when and what packet to send on output link -Usually implemented at output interface 1.
A Generalized Processor Sharing Approach to Flow Control in Integrated Services Networks: The Single-Node Case Abhay K. Parekh, Member, IEEE, and Robert.
Load Balanced Birkhoff-von Neumann Switches
Belgrade University Aleksandra Smiljanić: High-Capacity Switching Switches with Input Buffers (Cisco)
CIS679: Scheduling, Resource Configuration and Admission Control r Review of Last lecture r Scheduling r Resource configuration r Admission control.
CSE679: QoS Infrastructure to Support Multimedia Communications r Principles r Policing r Scheduling r RSVP r Integrated and Differentiated Services.
CSE QoS in IP. CSE Improving QOS in IP Networks Thus far: “making the best of best effort”
CS640: Introduction to Computer Networks Aditya Akella Lecture 20 - Queuing and Basics of QoS.
Enabling Class of Service for CIOQ Switches with Maximal Weighted Algorithms Thursday, October 08, 2015 Feng Wang Siu Hong Yuen.
CONGESTION CONTROL and RESOURCE ALLOCATION. Definition Resource Allocation : Process by which network elements try to meet the competing demands that.
Queueing and Scheduling Traffic is moved by connecting end-systems to switches, and switches to each other Traffic is moved by connecting end-systems to.
March 29 Scheduling ?. What is Packet Scheduling? Decide when and what packet to send on output link 1 2 Scheduler flow 1 flow 2 flow n Buffer management.
ISLIP Switch Scheduler Ali Mohammad Zareh Bidoki April 2002.
Stress Resistant Scheduling Algorithms for CIOQ Switches Prashanth Pappu Applied Research Laboratory Washington University in St Louis “Stress Resistant.
CS640: Introduction to Computer Networks Aditya Akella Lecture 20 - Queuing and Basics of QoS.
Nick McKeown Spring 2012 Lecture 2,3 Output Queueing EE384x Packet Switch Architectures.
Belgrade University Aleksandra Smiljanić: High-Capacity Switching Switches with Input Buffers (Cisco)
T. S. Eugene Ngeugeneng at cs.rice.edu Rice University1 COMP/ELEC 429 Introduction to Computer Networks Lecture 18: Quality of Service Slides used with.
Scheduling CS 218 Fall 02 - Keshav Chpt 9 Nov 5, 2003 Problem: given N packet streams contending for the same channel, how to schedule pkt transmissions?
048866: Packet Switch Architectures
Buffered Crossbars With Performance Guarantees Shang-Tse (Da) Chuang Cisco Systems EE384Y Thursday, April 27, 2006.
1 Fair Queuing Hamed Khanmirza Principles of Network University of Tehran.
Queue Scheduling Disciplines
Topics in Internet Research: Project Scope Mehreen Alam
CSci5221: Packet Scheduling11 Packet Scheduling (and QoS) Packet Scheduling and Queue Management Beyond FIFO: –Class-based Queueing: Priority Queueing,
Input buffered switches (1)
Providing QoS in IP Networks
1 Lecture 15 Internet resource allocation and QoS Resource Reservation Protocol Integrated Services Differentiated Services.
CS244 Packet Scheduling (Generalized Processor Sharing) Some slides created by: Ion Stoica and Mohammad Alizadeh
04/02/08 1 Packet Scheduling IT610 Prof. A. Sahoo KReSIT.
Instructor Materials Chapter 6: Quality of Service
QoS & Queuing Theory CS352.
Packet Scheduling/Arbitration in Virtual Output Queues and Others
EE384x: Packet Switch Architectures
Computer Science Division
EE 122: Lecture 7 Ion Stoica September 18, 2001.
COMP/ELEC 429 Introduction to Computer Networks
CIS679: Two Planes and Int-Serv Model
Introduction to Packet Scheduling
Introduction to Packet Scheduling
Presentation transcript:

CSIT560 By M. Hamdi 1 Packet Arbitration in VoQ switches and Others and QoS

CSIT560 By M. Hamdi 2 Recap High-Performance Switch Design –We need scalable switch fabrics – crossbar, bit- sliced crossbar, Clos networks. –We need to solve the memory bandwidth problem  Our conclusion is to go for input queued-switches  We need to use VOQ instead of FIFO queues –For these switches to function at high-speed, we need efficient and practically implementable scheduling/arbitration algorithms

CSIT560 By M. Hamdi 3 Port Processor optics LCS Protocol optics Port Processor optics LCS Protocol optics Crossbar Switch core architecture Port #1 Scheduler RequestGrant/CreditCell Data Port #256

CSIT560 By M. Hamdi 4 Algorithms for VOQ Switching We analyzed several algorithms for matching inputs and outputs –Maximum size matching: these are based on bipartite maximum matching – which can be solved using Max-flow techniques in O(N 2.5 )  These are not practical for high-speed implementations  They are stable (100% throughput for uniform traffic)  They are not stable for non-uniform traffic approximate –Maximal size matching: they try to approximate maximum size matching PIM, iSLIP, SRR, etc.  These are practical – can be executed in parallel in O(logN) or even O(1)  They are stable for uniform traffic and unstable for non-uniform traffic

CSIT560 By M. Hamdi 5 Algorithms for VOQ Switching – Maximum weight matching: These are maximum matchings based weights such queue length (LQF) (LPF) or age of cell (OCF) with a complexity of O(N 3 logN) These are not practical for high-speed implementations. Much more difficult to implement than maximum size matching They are stable (100% throughput) under any admissible traffic –Maximal weight matching: they try to approximate maximum weight matching. They use RGA mechanism like iSLIP iLQF, iLPF, iOCF, etc.  These are “somewhat” practical – can be executed in parallel in O(logN) or even O(1) like iSLIP BUT the arbiters are much more complex to build

CSIT560 By M. Hamdi 6 Differences between RRM, iSlip & FIRM RRMiSlipFIRM Input No grantunchanged Grantedone location beyond the accepted one Output No requestunchanged Grant accepted one location beyond the granted one Grant not accepted one location beyond the previously granted one unchangedthe granted one

CSIT560 By M. Hamdi 7 Algorithms for VOQ Switching –Randomized algorithms They try in a smart way to approximate maximum weight matching by avoiding using an iterative process They are stable under any admissible traffic Their time complexity is small (depending on the algorithm) Their hardware complexity is yet untested.

CSIT560 By M. Hamdi 8 Can we avoid having schedulers altogether !!!

CSIT560 By M. Hamdi 9 Remember: Two Successive Scaling Problems OQ routers: + work-conserving (QoS) - memory bandwidth = (N+1)R R R R R IQ routers: + memory bandwidth = 2R - arbitration complexity Bipartite Matching R R

CSIT560 By M. Hamdi 10 Today: 64 ports at 10Gbps, 64-byte cells. Arbitration Time = = 51.2ns Request/Grant Communication BW = 17.5Gbps 10Gbps 64bytes IQ Arbitration Complexity Two main alternatives for scaling: 1. 1.Increase cell size 2. 2.Eliminate arbitration Scaling to 160Gbps: Arbitration Time = 3.2ns Request/Grant Communication BW = 280Gbps

CSIT560 By M. Hamdi 11 Desirable Characteristics for Router Architecture Ideal: OQ 100% throughput Minimum delay Maintains packet order Necessary: able to regularly connect any input to any output What if the world was perfect? Assume Bernoulli iid uniform arrival traffic...

CSIT560 By M. Hamdi 12 Round-Robin Scheduling Uniform & non-bursty traffic => 100% throughput Problem: traffic is non-uniform & bursty

CSIT560 By M. Hamdi 13 Two-Stage Switch (I) 1 N 1 N 1 N External Outputs Internal Inputs External Inputs First Round-RobinSecond Round-Robin

CSIT560 By M. Hamdi 14 Two-Stage Switch (I) 1 N 1 N 1 N External Outputs Internal Inputs External Inputs First Round-RobinSecond Round-Robin Load Balancing

CSIT560 By M. Hamdi % throughput Problem: unbounded mis-sequencing External Outputs Internal Inputs 1 N External Inputs Cyclic Shift 1 N 1 N Two-Stage Switch Characteristics

CSIT560 By M. Hamdi 16 Two-Stage Switch (II) NewN 3 instead of N 2

CSIT560 By M. Hamdi 17 Expanding VOQ Structure Solution: expand VOQ structure by distinguishing among switch inputs a b

CSIT560 By M. Hamdi 18 What is being done in practice (Cisco for example) They want schedulers that achieve 100% throughput and very low delay (Like MWM) They want it to be as simple as iSLIP in terms of hardware implementation Is there any solution to this !!!!!

CSIT560 By M. Hamdi 19 Typical Performance of ISLIP-like Algorithms PIM with 4 iterations

CSIT560 By M. Hamdi 20 What is being done in practice (Cisco for example) CompanySwitching Capacity Switch Architecture Fabric Overspeed Agere40 Gbit/s-2.5 Tbit/sArbitrated crossbar2x AMCC Gbit/sShared memory1.0x AMCC40 Gbit/s-1.2 Tbit/sArbitrated crossbar1-2x Broadcom Gbit/sBuffered crossbar1-4x Cisco Gbit/sArbitrated crossbar2x

CSIT560 By M. Hamdi 21 Can we make these scheduling algorithms simpler? Can we make these scheduling algorithms simpler? Using a Simpler Architecture

CSIT560 By M. Hamdi 22 Buffered Crossbar Switches A buffered crossbar switch is a switch with buffered fabric (memory inside the crossbar). A pure buffered crossbar switch architecture, has only buffering inside the fabric and none anywhere else. Due to HOL blocking problem, VOQ are used in the input side.

CSIT560 By M. Hamdi 23 Buffered Crossbar Architecture ….…. 1 N Arbiter ….…. 1 N ….…. 1 N 1 N 2 … … Data Flow Control Input Cards ……… … … … … Output Card 1 2N

CSIT560 By M. Hamdi 24 Scheduling Process   Scheduling is divided into three steps: – –Input scheduling: each input selects in a certain way one cell from the HoL of an eligible queue and sends it to the corresponding internal buffer. – –Output scheduling: each output selects in a certain way from all internally buffered cells in the crossbar to be delivered to the output port. – –Delivery notifying: for each delivered cell, inform the corresponding input of the internal buffer status.

CSIT560 By M. Hamdi 25 Advantages Total independence between input and output arbiters (distributed design) (1/N complexity as compared to centralized schedulers) Performance of Switch is much better (because there is much less output contention) – a combination of IQ and OQ switches Disadvantage: Crossbar is more complicated

CSIT560 By M. Hamdi I/O Contention Resolution

CSIT560 By M. Hamdi I/O Contention Resolution

CSIT560 By M. Hamdi 28  InRr-OutRr Input scheduling: InRr (Round-Robin) - Each input selects the next eligible VOQ, based on its highest priority pointer, and sends its HoL packet to the internal buffer. Output scheduling: OutRr (Round-Robin) - Each output selects the next nonempty internal buffer, based on its highest priority pointer, and sends it to the output link. The Round Robin Algorithm

CSIT560 By M. Hamdi Input Scheduling (InRr.)

CSIT560 By M. Hamdi Output Scheduling (OutRr.)

CSIT560 By M. Hamdi Out. Ptrs Updt + Notification delivery

CSIT560 By M. Hamdi 32 Performance study Delay/throughput under Bernoulli Uniform and Burtsy Uniform Stability performance:

CSIT560 By M. Hamdi 33 Bernoulli Uniform Arrivals

CSIT560 By M. Hamdi 34 Bursty Uniform Arrivals

CSIT560 By M. Hamdi 35 Scheduling Process   Because the arbitration is simple: – – We can afford to have algorithms based on weights for example (LQF, OCF). – – We can afford to have algorithms that provide QoS

CSIT560 By M. Hamdi 36 Buffered Crossbar Solution: Scheduler The algorithm MVF-RR is composed of two parts: –Input scheduler – MVF (most vacancies first) Each input selects the column of internal buffers (destined to the same output) where there are most vacancies (non-full buffers). –Output scheduler – Round-robin Each output chooses the internal buffer which appears next on its static round-robin schedule from the highest priority one and updates the pointer to 1 location beyond the chosen one.

CSIT560 By M. Hamdi 37 Buffered Crossbar Solution: Scheduler The algorithm ECF-RR is composed of two parts: – –Input scheduler – ECF (empty column first) Each input selects first empty column of internal buffers (destined to the same output). If there is no empty column, it selects on a round-robin basis. – –Output scheduler – Round-robin Each output chooses the internal buffer which appears next on its static round-robin schedule from the highest priority one and updates the pointer to 1 location beyond the chosen one.

CSIT560 By M. Hamdi 38 Buffered Crossbar Solution: Scheduler The algorithm RR-REMOVE is composed of two parts: – –Input scheduler – Round-robin (with remove-request signal sending) Each input chooses non-empty VOQ which appears next on its static round-robin schedule from the highest priority one and updates the pointer to 1 location beyond the chosen one. It then sends out at most one remove-request signal to outputs – –Output scheduler – REMOVE For each output, if it receives any remove-request signals, it chooses one of them based on its highest priority pointer and removes the cell. If no signal is received, it does simple round- robin arbitration.

CSIT560 By M. Hamdi 39 Buffered Crossbar Solution: Scheduler The algorithm ECF-REMOVE is composed of two parts: – –Input scheduler – ECF (with remove-request signal sending) Each input selects first empty column of internal buffers (destined to the same output). If there is no empty column, it selects on a round-robin basis.It then sends out at most one remove-request signal to outputs – –Output scheduler – REMOVE For each output, if it receives any remove-request signals, it chooses one of them based on its highest priority pointer and removes the cell. If no signal is received, it does simple round-robin arbitration.

CSIT560 By M. Hamdi 40 Hardware Implementation of ECF-RR: An Input Scheduling Block Round-robin arbiter Selector 0Selector N-1 Any grant Arbitration results Grants Highest priority pointer

CSIT560 By M. Hamdi 41 Performance Evaluation: Simulation Study Uniform Traffic

CSIT560 By M. Hamdi 42 Performance Evaluation: Simulation Study Load Improvement Percentage 1% 3%6%13%17%12% Normalized Improvement Percentage 1% 3%6%12%15%11% Improvement Factor ECF-REMOVe over RR-RR

CSIT560 By M. Hamdi 43 Performance Evaluation : Simulation Study Bursty Traffic

CSIT560 By M. Hamdi 44 Performance Evaluation: Simulation Study Load Improvement Percentage 10%13%16%20%22%18%11% Normalized Improvement Percentage 9%12%14%16%18%16%10% Improvement Factor ECF-REMOVe over RR-RR

CSIT560 By M. Hamdi 45 Performance Evaluation : Simulation Study Hotspot Traffic

CSIT560 By M. Hamdi 46 Performance Evaluation: Simulation Study Load Improvement Percentage 0.2%0.3%0.5%0.8%1%0.7% Normalized Improvement Percentage 0.2%0.3%0.5%0.8%1%0.7% Improvement Factor ECF-REMOVe over RR-RR

CSIT560 By M. Hamdi 47 Quality of Service Mechanisms for Switches/Routers and the Internet

CSIT560 By M. Hamdi 48 VOQ Algorithms and Delay But, delay is key –Because users don’t care about throughput alone –They care (more) about delays –Delay = QoS (= $ for the network operator) Why is delay difficult to approach theoretically? –Mainly because it is a statistical quantity –It depends on the traffic statistics at the inputs –It depends on the particular scheduling algorithm used  The last point makes it difficult to analyze delays in i /q switches  For example in VOQ switches, it is almost impossible to give any guarantees on delay.

CSIT560 By M. Hamdi 49 VOQ Algorithms and Delay This does not mean that we cannot have an algorithm that can do that. It means there exist none at this moment. For this exact reason, almost all quality of service schemes (whether for delay or bandwidth guarantees) assume that you have an output-queued switch Link 1, ingressLink 1, egress Link 2, ingressLink 2, egress Link 3, ingressLink 3, egress Link 4, ingressLink 4, egress

CSIT560 By M. Hamdi 50 QoS Router Policer Classifier Policer Classifier Per-flow Queue Scheduler Per-flow Queue Scheduler Per-flow Queue shaper Queue management

CSIT560 By M. Hamdi 51 VOQ Algorithms and Delay WHY: Because an OQ switch has no “fabric” scheduling/arbitration algorithm.  Delay simply depends on traffic statistics  Researchers have shown that you can provide a lot of QoS algorithms (like WFQ) using a single server and based on the traffic statistics But, OQ switches are extremely expensive to build –Memory bandwidth requirement is very high –These QoS scheduling algorithms have little practical significance for scalable and high-performance switches/routers.

CSIT560 By M. Hamdi 52 Output Queueing The “ideal”

CSIT560 By M. Hamdi 53 How to get good delay cheaply? Enter speedup… –The fabric speedup for an IQ switch equals 1 (mem. bwdth. = 2) –The fabric speedup for an OQ switch equals N (mem. Bwdth. = N+1) –Suppose we consider switches with fabric speedup of S, 1 < S << N –Such switch will require buffers both at the input and the output  call these combined input- and output-queued (CIOQ) switches Such switches could help if… –With very small values of S –We get the performance – both delay and throughput – of an OQ switch

CSIT560 By M. Hamdi 54 A CIOQ switch Consist of –An (internally non-blocking, e.g. crossbar) fabric with speedup S > 1 –Input and output buffers –A scheduler to determine matchings

CSIT560 By M. Hamdi 55 A CIOQ switch For concreteness, suppose S = 2. The operation of the switch consists of –Transferring no more than 2 cells from (to) each input (output) –Logically, we will think of each time slot as consisting of two phases –Arrivals to (departures from) switch occur at most once per time slot –The transfer of cells from inputs to outputs can occur in each phase

CSIT560 By M. Hamdi 56 Using Speedup

CSIT560 By M. Hamdi 57 Performance of CIOQ switches Now that we have a higher speedup, do we get a handle on delay? –Can we say something about delay (e.g., every packet from a given flow should below 15 msec)? –There is one way of doing this: competitive analysis –  the idea is to compete with the performance of an OQ switch

CSIT560 By M. Hamdi 58 Intuition Speedup = 1 Speedup = 2 Fabric throughput =.58 Fabric throughput = 1.16 Ave I/p queue = 6.25 Ave I/p queue = too large

CSIT560 By M. Hamdi 59 Intuition (continued) Speedup = 3 Fabric throughput = 1.74 Speedup = 4 Fabric throughput = 2.32 Ave I/p queue = 0.75 Ave I/p queue = 1.35

CSIT560 By M. Hamdi 60 Performance of CIOQ switches The setup –Under arbitrary, but identical inputs (packet-by-packet) –Is it possible to replace an OQ switch by a CIOQ switch and schedule the CIOQ switch so that the outputs are identical packet-by-packet? To exactly mimick an OQ switch –If yes, what is the scheduling algorithm?

CSIT560 By M. Hamdi 61 What is exact mimicking? Apply same inputs to an OQ and a CIOQ switch - - packet by packet Obtain same outputs - - packet by packet

CSIT560 By M. Hamdi 62 Consequences Suppose, for now, that a CIOQ is competitive wrt an OQ switch. Then –We get perfect emulation of an OQ switch –This means we inherit all its throughput and delay properties –Most importantly – all QoS scheduling algorithms originally designed for OQ switches can be directly used on a CIOQ switch –But, at the cost of introducing a scheduling algorithm – which is the key

CSIT560 By M. Hamdi 63 Emulating OQ Switches with CIOQ Consider an N x N switch with (integer) speedup S > 1 –We’re going to see if this switch can emulate an OQ switch We’ll apply the same inputs, cell-by-cell, to both switches –We’ll assume that the OQ switch sends out packets in FIFO order –And we’ll see if the CIOQ switch can match cells on the output side

CSIT560 By M. Hamdi 64 Key concept: Urgency Urgency of a cell at any time = its departure time - current time It basically indicates the time that this packet will depart the OQ switch This value is decremented after each time slot When the value reaches 0, it must depart (it is at the HoL of the output queues) OQ switch

CSIT560 By M. Hamdi 65 Key concept: Urgency Algorithm: Most urgent cell first (MUCF). In each “phase” 1.Outputs try to get their most urgent cells from inputs. 2.Input grant to output whose cell is most urgent. In case of ties, output i takes priority over output i + k. 3.Loser outputs try to obtain their next most urgent cell from another (unmatched) input. 4.When no more matchings are possible, cells are transferred.

CSIT560 By M. Hamdi 66 Key concept: Urgency - Example At the beginning of phase 1, both outputs 1 and 2 request input 1 to obtain their most urgent cells Since there is a tie, then input 1 grants to output 1 (give it to least port #). Output 2 proceeds to get its next most urgent cell (from input 2 and has urgency of 3)

CSIT560 By M. Hamdi 67 Implementing MUCF The way in which MUCF matches inputs to outputs is similar to the “stable marriage problem” (SMP) The SMP finds “stable” matchings in bipartite graphs –There are N women and N men –Each woman (man) ranks each man (woman) in order of preference for marriage

CSIT560 By M. Hamdi 68 An example Consider the example we have already seen Executing GSA… –With men proposing we get the matching (1, 1), (2, 4), (3, 2), (4, 3) – this takes 7 proposals (iterations) –With women proposing we get the matching (1, 1), (2, 3), (3, 2), (4, 4) – this takes 7 proposals (iterations) –Both matchings are stable –The first is man-optimal – men get the best partners of any stable matching –Likewise the second is woman-optimal

CSIT560 By M. Hamdi 69 Theorem A CIOQ switch with a speedup of 4 operating under the MUCF algorithm exactly matches cells with FIFO output-queued switch. This is true even for Non-FIFO OQ scheduling schemes (e.g., WFQ, strict priority, etc.) We can achieve similar results with S = 2

CSIT560 By M. Hamdi 70 Implementation - a closer look Difficulty of Implementation - - Estimating urgency - - Matching process - too many iterations? Estimating urgency depends on what is being emulated - - FIFO, Strict priorities - no problem - - WFQ, etc - problems (and communicating this info among I/ps and O/ps)

CSIT560 By M. Hamdi 71 QoS Scheduling Algorithms

CSIT560 By M. Hamdi 72 Principles for QOS Guarantees Consider a phone application at 1Mbps and an FTP application sharing a 1.5 Mbps link. –bursts of FTP can congest the router and cause audio packets to be dropped. –want to give priority to audio over FTP PRINCIPLE 1: Marking of packets is needed for router to distinguish between different classes; and new router policy to treat packets accordingly

CSIT560 By M. Hamdi 73 Principles for QOS Guarantees (more) Applications misbehave (audio sends packets at a rate higher than 1Mbps assumed above); PRINCIPLE 2: provide protection (isolation) for one class from other classes (Fairness)

CSIT560 By M. Hamdi 74 QoS Differentiation: Two options Stateful (per flow) IETF Integrated Services (Intserv)/RSVP Stateless (per class) IETF Differentiated Services (Diffserv)

CSIT560 By M. Hamdi 75 The Building Blocks: May contain more functions Classifier Shaper Policer Scheduler Dropper

CSIT560 By M. Hamdi 76 QoS Mechanisms Admission Control –Determines whether the flow can/should be allowed to enter the network. Packet Classification –Classifies the data based on admission control for desired treatment through the network Traffic Policing –Measures the traffic to determine if it is out of profile. Packets that are determined to be out-of-profile can be dropped or marked differently (so they may be dropped later if needed) Traffic Shaping –Provides some buffering, therefore delaying some of the data, to make sure the traffic fits into the profile (may only effect bursts or all traffic to make it similar to Constant Bit Rate) Queue Management –Determines the behavior of data within a queue. Parameters include queue depth, drop policy Queue Scheduling –Determines how different queues empty onto the outbound link

CSIT560 By M. Hamdi 77 QoS Router Policer Classifier Policer Classifier Per-flow Queue Scheduler Per-flow Queue Scheduler Per-flow Queue shaper Queue management

CSIT560 By M. Hamdi 78 Queue Scheduling Algorithms

CSIT560 By M. Hamdi Scheduling at the output link of an OQ Switch Sharing always results in contention A scheduling discipline resolves contention: Decide when and what packet to send on the output link –Usually implemented at output interface –Scheduling is a Key to fairly sharing resources and providing performance guarantees Link 1, ingressLink 1, egress Link 2, ingressLink 2, egress Link 3, ingressLink 3, egress Link 4, ingressLink 4, egress

CSIT560 By M. Hamdi 80 Output Scheduling scheduler Allocating output bandwidth Controlling packet delay

CSIT560 By M. Hamdi 81 Types of Queue Scheduling Strict Priority –Empties the highest priority non-empty queue first, before servicing lower priority queues. It can cause starvation of lower priority queues. Round Robin –Services each queue by emptying a certain amount of data and then going to the next queue in order. Weighted Fair Queuing (WFQ) –Empties an amount of data from a queue based on the relative weight for the queue (driven by reserved bandwidth) before servicing the next queue. Earliest Deadline First –Determines the latest time a packet must leave to meet the delay requirements and service the queues in that order

CSIT560 By M. Hamdi 82 Scheduling: Deterministic Priority Packet is served from a given priority level only if no packets exist at higher levels (multilevel priority with exhaustive service) Highest level gets lowest delay Watch out for starvation! Usually map priority levels to delay classes Low bandwidth urgent messages Realtime Non-realtime Priority

CSIT560 By M. Hamdi 83 Scheduling: No Classification FIFO First come first serve This is the simplest possible. But we cannot provide any guarantees. With FIFO queues, if the depth of the queue is not bounded, there very little that can be done We can perform preferential dropping We can use other service disciplines on a single queue (e.g., EDF)

CSIT560 By M. Hamdi 84 Scheduling: Class Based Queuing At each output port, packets of the same class are queued at distinct queues. Service disciplines within each queue can vary (e.g., FIFO, EDF, etc.). Usually it is FIFO Service disciplines between classes can vary as well (e.g., strict priority, some kind of sharing, etc.) Class 1 Class 2 Class 3 Class 4 Class based scheduling

CSIT560 By M. Hamdi 85 Per Flow Packet Scheduling Each flow is allocated a separated “virtual queue” –Lowest level of aggregation –Service disciplines between the flows vary (FIFO, SP, etc.) 1 2 Scheduler flow 1 flow 2 flow n Classifier Buffer management

CSIT560 By M. Hamdi 86 Per-flow classification Sender Receiver

CSIT560 By M. Hamdi 87 Per-flow buffer management Sender Receiver

CSIT560 By M. Hamdi 88 Per-flow scheduling Sender Receiver

CSIT560 By M. Hamdi 89 The problems caused by FIFO queues in routers 1.In order to maximize its chances of success, a source has an incentive to maximize the rate at which it transmits. 2.(Related to #1) When many flows pass through it, a FIFO queue is “unfair” – it favors the most greedy flow. 3.It is hard to control the delay of packets through a network of FIFO queues. Fairness Delay Guarantees

CSIT560 By M. Hamdi 90 Round Robin (RR) RR avoids starvation All sessions have the same weight and the same packet length: A:B:C: Round #2 … Round #1

CSIT560 By M. Hamdi 91 RR with variable packet length A:B:C: Round #1Round #2 … But the Weights are equal !!!

CSIT560 By M. Hamdi 92 Solution… A:B:C: #1#2#3 … #4

CSIT560 By M. Hamdi 93 Weighted Round Robin (WRR) W A =3 W B =1 W C =4 #1 round length = 8 … #2

CSIT560 By M. Hamdi 94 WRR – non Integer weights W A =1.4 W B =0.2 W C =0.8 W A =7 W B =1 W C =4 Normalize round length = 13 …

CSIT560 By M. Hamdi 95 Weighted round robin Serve a packet from each non-empty queue in turn –Can provide protection against starvation –It is easy to implement in hardware Unfair if packets are of different length or weights are not equal What is the Solution? Different weights, fixed packet size –serve more than one packet per visit, after normalizing to obtain integer weights

CSIT560 By M. Hamdi 96 Problems with Weighted Round Robin Different weights, variable size packets –normalize weights by mean packet size e.g. weights {0.5, 0.75, 1.0}, mean packet sizes {50, 500, 1500} normalize weights: {0.5/50, 0.75/500, 1.0/1500} = { 0.01, , }, normalize again {60, 9, 4} With variable size packets, need to know mean packet size in advance Fairness is only provided at time scales larger than the schedule

CSIT560 By M. Hamdi 97 Fairness 1.1 Mb/s 10 Mb/s 100 Mb/s A B R1R1 C 0.55 Mb/s 0.55 Mb/s What is the “fair” allocation: (0.55Mb/s, 0.55Mb/s) or (0.1Mb/s, 1Mb/s)? e.g. an http flow with a given (IP SA, IP DA, TCP SP, TCP DP)

CSIT560 By M. Hamdi 98 Fairness 1.1 Mb/s 10 Mb/s 100 Mb/s A B R1R1 D What is the “fair” allocation? 0.2 Mb/s C

CSIT560 By M. Hamdi 99 Max-Min Fairness The min of the flows should be as large as possible Max-Min fairness for single resource: Bottlenecked (unsatisfied) connections share the residual bandwidth equally Their share is > = the share held by the connections not constrained by this bottleneck C=10 F1 = 25 F2 = 6 F1’= 5 F2”= 5

CSIT560 By M. Hamdi 100 Max-Min Fairness An allocation is fair if it satisfies max-min fairness –each connection gets no more than what it wants –the excess, if any, is equally shared

CSIT560 By M. Hamdi 101 Max-Min Fairness A common way to allocate flows N flows share a link of rate C. Flow f wishes to send at rate W(f), and is allocated rate R(f). 1.Pick the flow, f, with the smallest requested rate. 2.If W(f) < C/N, then set R(f) = W(f). 3.If W(f) > C/N, then set R(f) = C/N. 4.Set N = N – 1. C = C – R(f). 5.If N>0 goto 1.

CSIT560 By M. Hamdi W(f 1 ) = 0.1 W(f 3 ) = 10 R1R1 C W(f 4 ) = 5 W(f 2 ) = 0.5 Max-Min Fairness An example Round 1: Set R(f 1 ) = 0.1 Round 2: Set R(f 2 ) = 0.9/3 = 0.3 Round 3: Set R(f 4 ) = 0.6/2 = 0.3 Round 4: Set R(f 3 ) = 0.3/1 = 0.3

CSIT560 By M. Hamdi 103 Max-Min Fairness How can an Internet router “allocate” different rates to different flows? First, let’s see how a router can allocate the “same” rate to different flows…

CSIT560 By M. Hamdi 104 Fair Queueing 1.Packets belonging to a flow are placed in a FIFO. This is called “per-flow queueing”. 2.FIFOs are scheduled one bit at a time, in a round-robin fashion. 3.This is called Bit-by-Bit Fair Queueing. Flow 1 Flow N ClassificationScheduling Bit-by-bit round robin

CSIT560 By M. Hamdi 105 Weighted Bit-by-Bit Fair Queueing Likewise, flows can be allocated different rates by servicing a different number of bits for each flow during each round. 1 R(f 1 ) = 0.1 R(f 3 ) = 0.3 R1R1 C R(f 4 ) = 0.3 R(f 2 ) = 0.3 Order of service for the four queues: … f 1, f 2, f 2, f 2, f 3, f 3, f 3, f 4, f 4, f 4, f 1,… Also called “Generalized Processor Sharing (GPS)”

CSIT560 By M. Hamdi 106 Understanding bit by bit WFQ 4 queues, sharing 4 bits/sec of bandwidth, Equal Weights Weights : 1:1:1: B1 = 3 A1 = 4 D2 = 2 D1 = 1 C2 = 1C1 = 1 Time B1 = 3 A1 = 4 D2 = 2 D1 = 1 C2 = 1C1 = 1 A1B1C1D1 A2 = 2 C3 = 2 Weights : 1:1:1:1 D1, C1 Depart at R=1 A2, C3 arrive Time Round 1 Weights : 1:1:1: B1 = 3 A1 = 4 D2 = 2 D1 = 1 C2 = 1C1 = 1 A1B1C1D1 A2 = 2 C3 = 2 A1B1C2D2 C2 Departs at R=2 Time Round 1Round 2

CSIT560 By M. Hamdi 107 Understanding bit by bit WFQ 4 queues, sharing 4 bits/sec of bandwidth, Equal Weights Weights : 1:1:1: B1 = 3 A1 = 4 D2 = 2 D1 = 1 C2 = 1C1 = 1 A1B1C1D1 A2 = 2 C3 = 2 A1B1C2D2 D2, B1 Depart at R=3 A1B1C3D2 Time Round 1Round 2Round 3 Weights : 1:1:1: B1 = 3 A1 = 4 D2 = 2 D1 = 1 C2 = 1C3 = 2C1 = 1 C1D1C2B1 D2 A 1 A2 = 2 C3 A2 Departure order for packet by packet WFQ: Sort by finish round of packets Time Sort packets B1 = 3 A1 = 4 D2 = 2 D1 = 1 C2 = 1C1 = 1 A1B1C1D1 A2 = 2 C3 = 2 A1B1C2D2 A1 Depart at R=4 A1B1C3D2A1C3A2 Time Round 1Round 2Round 3 Round 4 C3,A2 Departs at R=6 56

CSIT560 By M. Hamdi 108 Understanding bit by bit WFQ 4 queues, sharing 4 bits/sec of bandwidth, Weights 3:2:2:1 Weights : 3:2:2: B1 = 3 A1 = 4 D2 = 2 D1 = 1 C2 = 1C1 = 1 Time B1 = 3 A1 = 4 D2 = 2 D1 = 1 C2 = 1C1 = 1 A1 B1 A2 = 2 C3 = 2 Time Weights : 3:2:2:1 Round B1 = 3 A1 = 4 D2 = 2 D1 = 1 C2 = 1C1 = 1 A1 B1 A2 = 2 C3 = 2 D1, C2, C1 Depart at R=1 Time B1C1C2D1 Weights : 3:2:2:1 Round 1

CSIT560 By M. Hamdi 109 Understanding bit by bit WFQ 4 queues, sharing 4 bits/sec of bandwidth, Weights 3:2:2:1 Weights : 3:2:2: B1 = 3 A1 = 4 D2 = 2 D1 = 1 C2 = 1C1 = 1 A2 = 2 C3 = 2 B1, A2 A1 Depart at R=2 Time A1 B1 C1C2D1A1A2 B1 Round 1Round 2 Weights : 3:2:2: B1 = 3 A1 = 4 D2 = 2 D1 = 1 C2 = 1C1 = 1 A2 = 2 C3 = 2 D2, C3 Depart at R=2 Time A1 B1 C1C2D1A1A2 B1C3 D2 Round 1Round 2 3 Weights : 1:1:1:1 Weights : 3:2:2: B1 = 3 A1 = 4 D2 = 2 D1 = 1 C2 = 1C3 = 2C1 = 1 C1C2D1A1 A2 B1 A2 = 2 C3 D2 Departure order for packet by packet WFQ: Sort by finish time of packets Time Sort packets

CSIT560 By M. Hamdi 110 Packetized Weighted Fair Queueing (WFQ) Problem: We need to serve a whole packet at a time. Solution: 1.Determine what time a packet, p, would complete if we served it flows bit-by-bit. Call this the packet’s finishing time, F p. 2.Serve packets in the order of increasing finishing time. Also called “Packetized Generalized Processor Sharing (PGPS)”

CSIT560 By M. Hamdi 111 WFQ is complex There may be hundreds to millions of flows; the linecard needs to manage a FIFO queue per each flow. The finishing time must be calculated for each arriving packet, Packets must be sorted by their departure time. Most efforts in QoS scheduling algorithms is to come up with practical algorithms that can approximate WFQ! N Packets arriving to egress linecard Calculate F p Find Smallest F p Departing packet Egress linecard

CSIT560 By M. Hamdi 112 When can we Guarantee Delays? Theorem If flows are leaky bucket constrained and all nodes employ GPS (WFQ), then the network can guarantee worst-case delay bounds to sessions.

CSIT560 By M. Hamdi 113 time Cumulative bytes A(t) D(t) R B(t) Deterministic analysis of a router queue FIFO case FIFO delay, d(t) R A(t)D(t) Model of router queue B(t)

CSIT560 By M. Hamdi 114 Flow 1 Flow N Classification WFQ Scheduler A 1 (t) A N (t) R(f 1 ), D 1 (t) R(f N ), D N (t) time Cumulative bytes A 1 (t) D 1 (t) R(f 1 ) Key idea: In general, we don’t know the arrival process. So let’s constrain it.

CSIT560 By M. Hamdi 115 Let’s say we can bound the arrival process time Cumulative bytes   Number of bytes that can arrive in any period of length t is bounded by: This is called “(  ) regulation” A 1 (t)

CSIT560 By M. Hamdi 116 The leaky bucket “(  )” regulator Tokens at rate,  Token bucket size,  Packet buffer Packets One byte (or packet) per token

CSIT560 By M. Hamdi 117 (  ) Constrained Arrivals and Minimum Service Rate time Cumulative bytes A 1 (t) D 1 (t) R(f 1 )   d max B max Theorem [Parekh,Gallager ’93]: If flows are leaky-bucket constrained, and routers use WFQ, then end-to-end delay guarantees are possible.