CSIT560 By M. Hamdi 1 Packet Scheduling/Arbitration in Virtual Output Queues and Others.

Slides:



Advertisements
Similar presentations
EE384y: Packet Switch Architectures
Advertisements

Lecture 7. Network Flows We consider a network with directed edges. Every edge has a capacity. If there is an edge from i to j, there is an edge from.
1 Scheduling Crossbar Switches Who do we chose to traverse the switch in the next time slot? N N 11.
Lecture 4. Topics covered in last lecture Multistage Switching (Clos Network) Architecture of Clos Network Routing in Clos Network Blocking Rearranging.
1 Outline  Why Maximal and not Maximum  Definition and properties of Maximal Match  Parallel Iterative Matching (PIM)  iSLIP  Wavefront Arbiter (WFA)
Belief-Propagation Assisted Scheduling in Input-Queued Switches S. Atalla 1, D. Cuda 2, P. Giaccone 1, M. Pretti 2 1 Politecnico di Torino 2 Italian National.
Nick McKeown Spring 2012 Maximum Matching Algorithms EE384x Packet Switch Architectures.
Router Architecture : Building high-performance routers Ian Pratt
Submitters: Erez Rokah Erez Goldshide Supervisor: Yossi Kanizo.
Nick McKeown CS244 Lecture 6 Packet Switches. What you said The very premise of the paper was a bit of an eye- opener for me, for previously I had never.
Frame-Aggregated Concurrent Matching Switch Bill Lin (University of California, San Diego) Isaac Keslassy (Technion, Israel)
Towards Simple, High-performance Input-Queued Switch Schedulers Devavrat Shah Stanford University Berkeley, Dec 5 Joint work with Paolo Giaccone and Balaji.
Algorithm Orals Algorithm Qualifying Examination Orals Achieving 100% Throughput in IQ/CIOQ Switches using Maximum Size and Maximal Matching Algorithms.
1 Input Queued Switches: Cell Switching vs. Packet Switching Abtin Keshavarzian Joint work with Yashar Ganjali, Devavrat Shah Stanford University.
April 10, HOL Blocking analysis based on: Broadband Integrated Networks by Mischa Schwartz.
Input Queue Switch Technologies Speaker : Kuo-Cheng Lu N300/CCL/ITRI.
1 Comnet 2006 Communication Networks Recitation 5 Input Queuing Scheduling & Combined Switches.
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Input-Queued.
1 ENTS689L: Packet Processing and Switching Buffer-less Switch Fabric Architectures Buffer-less Switch Fabric Architectures Vahid Tabatabaee Fall 2006.
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion MSM.
CSIT560 by M. Hamdi 1 Course Exam: Review April 18/19 (in-Class)
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion The.
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Scaling.
1 Internet Routers Stochastics Network Seminar February 22 nd 2002 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University.
Lecture 11. Matching A set of edges which do not share a vertex is a matching. Application: Wireless Networks may consist of nodes with single radios,
CIST560 by M. Hamdi 1 Packet Scheduling/Arbitration in Virtual Output Queues: Maximal Matching Algorithms (Part II)
Maximum Size Matchings & Input Queued Switches Sundar Iyer, Nick McKeown High Performance Networking Group, Stanford University,
COMP680E by M. Hamdi 1 Course Exam: Review April 17 (in-Class)
1 Achieving 100% throughput Where we are in the course… 1. Switch model 2. Uniform traffic  Technique: Uniform schedule (easy) 3. Non-uniform traffic,
CSIT560 by M. Hamdi 1 Packet Scheduling/Arbitration in Virtual Output Queues and Others.
1 Netcomm 2005 Communication Networks Recitation 5.
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Maximal.
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Scheduling.
Distributed Scheduling Algorithms for Switching Systems Shunyuan Ye, Yanming Shen, Shivendra Panwar
1 Scheduling Crossbar Switches Who do we chose to traverse the switch in the next time slot? N N 11.
Pipelined Two Step Iterative Matching Algorithms for CIOQ Crossbar Switches Deng Pan and Yuanyuan Yang State University of New York, Stony Brook.
Localized Asynchronous Packet Scheduling for Buffered Crossbar Switches Deng Pan and Yuanyuan Yang State University of New York Stony Brook.
Buffer Management for Shared- Memory ATM Switches Written By: Mutlu Apraci John A.Copelan Georgia Institute of Technology Presented By: Yan Huang.
Load Balanced Birkhoff-von Neumann Switches
CS 552 Computer Networks IP forwarding Fall 2005 Rich Martin (Slides from D. Culler and N. McKeown)
ATM SWITCHING. SWITCHING A Switch is a network element that transfer packet from Input port to output port. A Switch is a network element that transfer.
High Speed Stable Packet Switches Shivendra S. Panwar Joint work with: Yihan Li, Yanming Shen and H. Jonathan Chao New York State Center for Advanced Technology.
Enabling Class of Service for CIOQ Switches with Maximal Weighted Algorithms Thursday, October 08, 2015 Feng Wang Siu Hong Yuen.
Summary of switching theory Balaji Prabhakar Stanford University.
The Router SC 504 Project Gardar Hauksson Allen Liu.
Routers. These high-end, carrier-grade 7600 models process up to 30 million packets per second (pps).
ISLIP Switch Scheduler Ali Mohammad Zareh Bidoki April 2002.
Packet Forwarding. A router has several input/output lines. From an input line, it receives a packet. It will check the header of the packet to determine.
1 Performance Guarantees for Internet Routers ISL Affiliates Meeting April 4 th 2002 Nick McKeown Professor of Electrical Engineering and Computer Science,
Crossbar Switch Project
Stress Resistant Scheduling Algorithms for CIOQ Switches Prashanth Pappu Applied Research Laboratory Washington University in St Louis “Stress Resistant.
Jon Turner Resilient Cell Resequencing in Terabit Routers.
Belgrade University Aleksandra Smiljanić: High-Capacity Switching Switches with Input Buffers (Cisco)
Buffered Crossbars With Performance Guarantees Shang-Tse (Da) Chuang Cisco Systems EE384Y Thursday, April 27, 2006.
Queueing in switched networks Damon Wischik, UCL thanks to Devavrat Shah, MIT TexPoint fonts used in EMF. Read the TexPoint manual before you delete this.
SNRC Meeting June 7 th, Crossbar Switch Scheduling Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University
Improving Matching algorithms for IQ switches Abhishek Das John J Kim.
Topics in Internet Research: Project Scope Mehreen Alam
Input buffered switches (1)
1 Chapter 7 Network Flow Slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved.
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
scheduling for local-area networks”
Balaji Prabhakar Departments of EE and CS Stanford University
Packet Forwarding.
Chapter 7 Network Flow Slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved.
Instructor: Shengyu Zhang
Packet Scheduling/Arbitration in Virtual Output Queues and Others
Outline Why Maximal and not Maximum
Balaji Prabhakar Departments of EE and CS Stanford University
Scheduling Crossbar Switches
Presentation transcript:

CSIT560 By M. Hamdi 1 Packet Scheduling/Arbitration in Virtual Output Queues and Others

CSIT560 By M. Hamdi 2 Key Characteristics in Designing Internet Switches and Routers Scalability in terms of line rates Scalability in terms of number of interfaces (port numbers)

CSIT560 By M. Hamdi 3 Switch fabric chips comparison d= d=47959

CSIT560 By M. Hamdi 4 Head-of-Line Blocking Blocked!

CSIT560 By M. Hamdi 5

6

7 Crossbar Switches: Virtual Output Queues Virtual Output Queues: –At each input port, there are N queues – each associated with an output port –Only one packet can go from an input port at a time –Only one packet can be received by an output port at a time It retains the scalability of FIFO input-queued switches It eliminates the HoL problem with FIFO input Queues

CSIT560 By M. Hamdi 8 Virtual Output Queues

CSIT560 By M. Hamdi 9 Scheduler VOQs VOQs: How Packets Move

CSIT560 By M. Hamdi 10 Crossbar Scheduler in VOQ Architecture Scheduler Memory b/w=2R Can be quite complex!

CSIT560 By M. Hamdi 11 Question: do more lanes help? Answer: it depends on the scheduling Head of Line BlockingVOQs with Bad Scheduling Good Scheduling? Ayalon: depends on traffic matrix…

CSIT560 By M. Hamdi 12 Crossbar Scheduler in VOQ Architecture Which packets I can send during each configuration of the crossbar

CSIT560 By M. Hamdi 13 Port Processor optics LCS Protocol optics Port Processor optics LCS Protocol optics Crossbar Switch core architecture Port #1 Scheduler RequestGrant/CreditCell Data Port #256

CSIT560 By M. Hamdi 14 Basic Switch Model A 1 (n) S(n) N N L NN (n) A 1N (n) A 11 (n) L 11 (n) 11 A N (n) A NN (n) A N1 (n) D 1 (n) D N (n)

CSIT560 By M. Hamdi 15 Some definitions 3. Queue occupancies: Occupancy L 11 (n) L NN (n)

CSIT560 By M. Hamdi 16 Some possible performance goals When traffic is admissible

CSIT560 By M. Hamdi 17 VOQ Switch Scheduling A1 B C D E F The VOQ switch scheduling can be represented by a bipartite graph –The left-hand side nodes of the bipartite graph are the input ports –The right-hand side nodes of the bipartite graph are the output ports –The edges between the nodes are requests for packet transmission between input ports and output ports.

CSIT560 By M. Hamdi 18 Maximum size bipartite match Intuition: maximizes instantaneous throughput L 11 (n)>0 L N1 (n)>0 “Request” Graph Bipartite Match Maximum Size Match

CSIT560 By M. Hamdi 19 Network flows and bipartite matching Finding a maximum size bipartite matching is equivalent to solving a network flow problem with capacities and flows of size “1”. A1 Source s Sink t B C D E F

CSIT560 By M. Hamdi 20 Network Flows Source s Sink t ac bd Let G=[V,E] be a directed graph with capacity cap(v,w) on edge [v,w]. A flow is an (integer) function, f, that is chosen for each edge so that f(v,w) <= cap(v,w). We wish to maximize the flow allocation.

CSIT560 By M. Hamdi 21 A maximum network flow example By inspection Source s Sink t ac bd Step 1: Source s Sink t ac bd 10, , , 10 Flow is of size 10

CSIT560 By M. Hamdi 22 A maximum network flow example Source s Sink t ac bd 10, 10 10, 1 10, , 1 10, 1 10, 10 Step 2: Flow is of size 10+1 = 11 Source s Sink t ac bd 10, 10 10, 2 10, 9 1,1 10, 2 10, 10 Maximum flow: Flow is of size 10+2 = 12 Not obvious

CSIT560 By M. Hamdi 23 Ford-Fulkerson method of augmenting paths 1.Set f(v,w) = -f(w,v) on all edges. 2.Define a Residual Graph, R, in which res(v,w) = cap(v,w) – f(v,w) 3.Find paths from s to t for which there is positive residue. 4.Increase the flow along the paths to augment them by the minimum residue along the path. 5.Keep augmenting paths until there are no more to augment.

CSIT560 By M. Hamdi 24 Example of Residual Graph st ac bd 10, , , 10 Flow is of size 10 t ac bd s res(v,w) = cap(v,w) – f(v,w) Residual Graph, R Augmenting path

CSIT560 By M. Hamdi 25 Example of Residual Graph st ac bd 10, , , 10 Flow is of size 10 t ac bd s res(v,w) = cap(v,w) – f(v,w) Residual Graph, R Augmenting path

CSIT560 By M. Hamdi 26 Example of Residual Graph st ac bd 10, 10 10, 1 10, , 1 10, 1 10, 10 Step 2: Flow is of size 10+1 = 11 st ac bd Residual Graph 9 9 Augmenting path

CSIT560 By M. Hamdi 27 Example of Residual Graph st ac bd 10, 10 10, 2 10, 9 1, 1 10, 2 10, 10 Step 3: Flow is of size 10+2 = 12 st ac bd Residual Graph 8 8

CSIT560 By M. Hamdi 28 An other Example: Ford-Fulkerson method s ab cd t f=0 G s ab cd t GfGf find augmenting path p s 16 4/ /4 4/11 ab cd t s ab cd t 4 9 f=4

CSIT560 By M. Hamdi 29 f=4 G GfGf find augmenting path p s 16 4/ /4 4/11 ab cd t s ab cd t 4 9 f=4+12 s 12/16 4/ /12 12/20 4/4 4/11 ab cd t s ab cd t An other Example: Ford-Fulkerson method

CSIT560 By M. Hamdi 30 f=16 G GfGf find augmenting path p s 12/16 4/ /12 12/20 4/4 4/11 ab cd t s ab cd t f=16+7 s 12/16 11/ /7 12/12 19/20 4/4 11/11 ab cd t s ab cd t An other Example: Ford-Fulkerson method

CSIT560 By M. Hamdi 31 f=23 G GfGf find augmenting path p s 12/16 11/ /7 12/12 19/20 4/4 11/11 ab cd t s ab cd t No more augmenting path Maximum Flow is 23 An other Example: Ford-Fulkerson method

CSIT560 By M. Hamdi 32 An example for Flow: Obvious solution S T Input graph G S T Residual Graph G r S T Flow graph G f S T S T S T S T Total flow = 10, Sub-optimal solution!

CSIT560 By M. Hamdi 33 Flow algorithm – Optimal version S T Input graph G S T Residual Graph G r S T Flow graph G f S T S T S T S T S T S T S T S T Total flow = = 19 units! S T 1 1 S T

CSIT560 By M. Hamdi 34 Complexity of network flow problems In general, it is possible to find a solution by considering at most V.E paths, by picking shortest augmenting path first. There are many variations, such as picking most augmenting path first. The complexity of the algorithm is less when the graph is bipartite There are techniques other than the Ford- Fulkerson method.

CSIT560 By M. Hamdi 35 Ford - Fulkerson Algorithm – sink abcdef source Network flows and bipartite matching Finding a maximum size bipartite matching is equivalent to solving a network flow problem with capacities and flows of size “1”.

CSIT560 By M. Hamdi 36 Ford - Fulkerson Algorithm – sink abcdef source Increasing the flow by 1.

CSIT560 By M. Hamdi 37 Ford - Fulkerson Algorithm – sink abcdef source Increasing the flow by 1.

CSIT560 By M. Hamdi 38 Ford - Fulkerson Algorithm – sink abcdef source Increasing the flow by 1.

CSIT560 By M. Hamdi 39 Ford - Fulkerson Algorithm – sink abcdef source Increasing the flow by 1.

CSIT560 By M. Hamdi 40 Ford - Fulkerson Algorithm – sink abcdef source Increasing the flow by 1.

CSIT560 By M. Hamdi 41 Ford - Fulkerson Algorithm – sink abcdef source Augmenting flow along the augmenting path.

CSIT560 By M. Hamdi 42 Ford - Fulkerson Algorithm – sink abcdef source Maximum flow found! Thus maximum matching found.

CSIT560 By M. Hamdi 43 Complexity of Maximum Matchings Maximum Size/Cardinality Matchings: –Algorithm by Dinic O(N 5/2 ) Maximum Weight Matchings –Algorithm by Kuhn O(N 3 logN) ftp://dimacs.rutgers.edu/pub/netflow/matching/ (contains code for maximum size/weighting algorithms) In general: –Hard to implement in hardware –Slooooow.

CSIT560 By M. Hamdi 44 Maximum size bipartite match Intuition: maximizes instantaneous throughput for uniform traffic. L 11 (n)>0 L N1 (n)>0 “Request” Graph Bipartite Match Maximum Size Match

CSIT560 By M. Hamdi 45 Why doesn’t maximizing instantaneous throughput give 100% throughput for non-uniform traffic? Three possible matches, S (n):

CSIT560 By M. Hamdi 46 Maximum weight matching A 1 (n) N N L NN (n) A 1N (n) A 11 (n) L 11 (n) 11 A N (n) A NN (n) A N1 (n) D 1 (n) D N (n) L 11 (n) L N1 (n) “Request” Graph Bipartite Match S*(n) Maximum Weight Match Weight could be length of queue or age of packetWeight could be length of queue or age of packet Achieves 100% throughput under all traffic patterns Achieves 100% throughput under all traffic patterns

CSIT560 By M. Hamdi 47 Packet Scheduling/Arbitration in Virtual Output Queues: Maximal Matching Algorithms

CSIT560 By M. Hamdi Maximum size matching Maximum weight matching Maximum Matching in VOQ Architecture

CSIT560 By M. Hamdi 49 Complexity of Maximum Matchings Maximum Size/Cardinality Matchings: –Algorithm by Dinic O(N 5/2 ) Maximum Weight Matchings –Algorithm by Kuhn O(N 3 logN) In general: –Hard to implement in hardware –Slooooow.

CSIT560 By M. Hamdi 50 Maximal Matching A maximal matching is a matching in which each edge is added one at a time, and is not later removed from the matching. i.e., No augmenting paths allowed (they remove edges added earlier) – like by inspection. No input and output are left unnecessarily idle.

CSIT560 By M. Hamdi 51 Example of Maximal Size Matching A1 B C D E F A1 B C D E F Maximal Matching Maximum Matching

CSIT560 By M. Hamdi 52 Comments on Maximal Matchings In general, maximal matching is much simpler to implement, and has a much faster running time. A maximal size matching is at least half the size of a maximum size matching. A maximal weight matching is defined in the obvious way. A maximal weight matching is at least half the size of a maximum weight matching.

CSIT560 By M. Hamdi 53 PIM Maximal Size Matching Algorithm: Performance and Properties It is among the very first practical schedulers proposed for VOQ architectures (used by DEC). It is based on having arbiters at the inputs and outputs It iterates the following steps until no more requests can be accepted (or for a given number of iterations): 1.Request: Each unmatched input sends a request to every output for which it has a queued cell 2. Grant (outputs): If an unmatched output receives any request, it grants one by randomly selecting a request uniformly over all requests. 3.Accept (inputs): If an unmatched input receives a grant, it accepts one by selecting an output randomly among those granted to this input.

CSIT560 By M. Hamdi 54 State of Input Queues (N 2 bits) 1 2 N 1 2 N Decision Register Grant Arbiters Request Arbiters Implementation of the parallel maximal matching algorithms

CSIT560 By M. Hamdi 55 Implementation of the parallel maximal matching algorithms (another similar way)

CSIT560 By M. Hamdi Step 1: Request Step 2: Grant Step 3: Accept PIM: 1 st Iteration Random selection PIM Maximum Size Matching Algorithm: Performance and Properties

CSIT560 By M. Hamdi Step 3: Accept PIM: 2 nd Iteration Step 1: Request Step 2: Grant PIM Maximum Size Matching Algorithm: Performance and Properties

CSIT560 By M. Hamdi 58 Traffic Types to evaluate Algorithms Uniform traffic Unbalanced traffic Hotpot traffic

CSIT560 By M. Hamdi 59 Parallel Iterative Matching PIM with a single iteration

CSIT560 By M. Hamdi 60 Parallel Iterative Matching PIM with 4 iterations

CSIT560 By M. Hamdi 61 Parallel Iterative Matching Analytical Results Number of iterations to converge:

CSIT560 By M. Hamdi 62 PIM Maximum Size Matching Algorithm: Performance and Properties It is a fair algorithm – servicing inputs Can have 100% throughtput under uniform traffic It converges in logN iterations to a maximal size matching It has a very poor performance (63% throughput) with 1 iteration – because of its inability to desynchronize the output pointers It is not easy to build random arbiters in hardware The best iterative maximal size matching algorithm takes O(N 2 logN) serial or O(log N) parallel time steps. If the number of iterations is constant, then it can be implemented in constant time (that is why it is practical) – however the hardware design is not trivial.

CSIT560 By M. Hamdi 63 RRM Maximum Size Matching Algorithm: Performance and Properties Round Robin Matching (RRM) is easier to implement that PIM (in terms of designing the I/O arbiters). The pointers of the arbiters move in straightforward way It iterates the following steps until no more requests can be accepted (or for a given number of iterations): Request. Each input sends a request to every output for which it has a queued cell. Grant. If an output receives any requests, it chooses the one that appears next in a fixed, round-robin schedule starting from the highest priority element. The output notifies each input whether or not its request was granted. The pointer g i to the highest priority element of the round-robin schedule is incremented (modulo N) to one location beyond the granted input. If no request is received, the pointer stays unchanged.

CSIT560 By M. Hamdi 64 RRM Maximum Size Matching Algorithm: Performance and Properties Accept. If an input receives a grant, it accepts the one that appears next in a fixed, round-robin schedule starting from the highest priority element. The pointer a i to the highest priority element of the round-robin schedule is incremented (modulo N) to one location beyond the accepted output. If no grant is received, the pointer stays unchanged.

CSIT560 By M. Hamdi 65 RRM Maximal Matching Algorithm (1) Step 1: Request

CSIT560 By M. Hamdi 66 RRM Maximal Matching Algorithm (2) Step 2: Grant

CSIT560 By M. Hamdi 67 RRM Maximal Matching Algorithm (2) Step 2: Grant

CSIT560 By M. Hamdi 68 RRM Maximal Matching Algorithm (2) Step 2: Grant

CSIT560 By M. Hamdi 69 RRM Maximal Matching Algorithm (2) Step 2: Grant

CSIT560 By M. Hamdi 70 RRM Maximal Matching Algorithm (3) Step 3: Accept

CSIT560 By M. Hamdi 71 RRM Maximal Matching Algorithm (3) Step 3: Accept

CSIT560 By M. Hamdi 72 RRM Maximal Matching Algorithm (3) Step 3: Accept

CSIT560 By M. Hamdi 73 Poor performance of RRM Maximal Matching Algorithm % Throughput

CSIT560 By M. Hamdi 74 iSLIP Maximum Size Matching Algorithm: Performance and Properties It is a scheduler used in most VOQ switches (e.g., Cisco). It is exactly like RRM algorithm with the following change: Grant. If an output receives any requests, it chooses the one that appears next in a fixed, round-robin schedule starting from the highest priority element. The output notifies each input whether or not its request was granted. The pointer g i to the highest priority element of the round-robin schedule is incremented (modulo N) to one location beyond the granted input if and only if the grant is accepted in (Accept phase).

CSIT560 By M. Hamdi Step 2: Grant Step 3: Accept iSlip: 1 st Iteration Step 1: Request Original pointer Selected one Updated pointer iSLIP Maximum Size Matching Algorithm

CSIT560 By M. Hamdi Step 2: Grant Step 3: Accept iSlip: 2 nd Iteration Step 1: Request No change Original pointer Selected one Updated pointer iSLIP Maximum Size Matching Algorithm

CSIT560 By M. Hamdi 77 Simple Iterative Algorithms: iSlip Step 1: Request

CSIT560 By M. Hamdi 78 Simple Iterative Algorithms: iSlip Step 2: Grant

CSIT560 By M. Hamdi Step 2: Grant Simple Iterative Algorithms: iSlip

CSIT560 By M. Hamdi Step 3: Accept Simple Iterative Algorithms: iSlip

CSIT560 By M. Hamdi Step 3: Accept Simple Iterative Algorithms: iSlip

CSIT560 By M. Hamdi 82 Simple Iterative Algorithms: iSlip Step 3: Accept

CSIT560 By M. Hamdi 83 Simple Iterative Algorithms: iSlip Step 3: Accept

CSIT560 By M. Hamdi 84 Simple Iterative Algorithms: iSlip Step 3: Accept

CSIT560 By M. Hamdi 85 iSLIP Implementation Grant Accept 1 2 N 1 2 N State N N N Decision log 2 N Programmable Priority Encoder

CSIT560 By M. Hamdi 86 Hardware Design Layout of the 256 bits Priority Encoder

CSIT560 By M. Hamdi 87 Hardware Design Layout of 256 bits grant arbiter

CSIT560 By M. Hamdi 88 FIRM Maximum Size Matching Algorithm: Performance and Properties It is exactly like iSLIP with a very small – yet significant modification. Grant (outputs): If an unmatched output receives a request, it grants the one that appears next in a fixed, round-robin schedule starting from the highest priority element. The output notifies each input whether or not its request is granted. The pointer to the highest priority element of the round-robin schedule is incremented beyond the granted input. If input does not accept the pointer is set at the granted one.

CSIT560 By M. Hamdi Step 3: Accept Simple Iterative Algorithms: FIRM

CSIT560 By M. Hamdi 90 Pointer Synchronization Why this is good: this small change prevents the output arbiters from moving in lock-step (being synchronized – pointing to the same input) leading to a dramatic improvement in performance. If several outputs grant the same input, no matter how this input chooses, only one match can be made, and the other outputs will be idle. To get as many matches as possible, it's better that each output grants a different input. Since each output will select the highest priority input if a request is received from this input, it's better to keep the output pointers desynchronized (pointing to different locations).

CSIT560 By M. Hamdi 91 iSLIP Maximal Matching Algorithm % Throughput

CSIT560 By M. Hamdi 92 Pointer Synchronization: Differences between RRM, iSlip & FIRM

CSIT560 By M. Hamdi 93 Differences between RRM, iSlip & FIRM RRMiSlipFIRM Input No grantunchanged Grantedone location beyond the accepted one Output No requestunchanged Grant accepted one location beyond the granted one Grant not accepted one location beyond the previously granted one unchangedthe granted one

CSIT560 By M. Hamdi 94 General remarks Since all of these algorithms try to approximate maximum size matching, they can be unstable under non-uniform traffic They can achieve 100% throughput under uniform traffic Under a large number of iterations, their performance is similar They have similar implementation complexity

CSIT560 By M. Hamdi 95 Input Queueing Longest Queue First or Oldest Cell First M ax i m u m w e i g h t Weight Waiting Time 100% Queue Length { } =

CSIT560 By M. Hamdi 96 Input Queueing Why is serving long/old queues better than serving maximum number of queues? When traffic is uniformly distributed, servicing the maximum number of queues leads to 100% throughput. When traffic is non-uniform, some queues become longer than others. A good algorithm keeps the queue lengths matched, and services a large number of queues. VOQ # Avg Occupancy Uniform traffic VOQ # Avg Occupancy Non-uniform traffic

CSIT560 By M. Hamdi 97 Maximum/Maximal Weight Matching 100% throughput for admissible traffic (uniform or non- uniform) Maximum Weight Matching –OCF (Oldest Cell First): w=cell waiting time –LQF (Longest Queue First):w=input queue occupancy –LPF (Longest Port First):w=QL of the source port + Sum of QL form the source port to the destination port Maximal Weight Matching (practical algorithms) –iOCF –iLQF –iLPF (comparators in the critical path of iLQF are removed )

CSIT560 By M. Hamdi 98 Maximal Weight Matching Algorithms: iLQF Request. Each unmatched input sends a request word of width bits to each output for which it has a queued cell, indicating the number of cells that it has queued to that output. Grant. If an unmatched output receives any requests, it chooses the largest valued request. Ties are broken randomly. Accept. If an unmatched input receives one or more grants, it accepts the one to which it made the largest valued request. Ties are broken randomly.

CSIT560 By M. Hamdi 99 Maximal Weight Matching Algotithms: iLQF The i-LQF algorithm has the following properties: Property 1. Independent of the number of iterations, the longest input queue is always served. Property 2. As with i-SLIP, the algorithm converges in at most logN iterations. Property 3. For an inadmissible offered load, an input queue may be starved.

CSIT560 By M. Hamdi 100 Maximal Weight Matching Algotithms: iOCF The i-OCF algorithm works in similar fashion to iLQF, and has the following properties: Property 1. Independent of the number of iterations, the cell that has been waiting the longest time in the input queues (it must at the head of the queue) Property 2. As with i-LQF, the algorithm converges in at most logN iterations. Property 3. No input queue can be starved indefinitely. Property 4. It is difficult to keep time stamps on the cells.

CSIT560 By M. Hamdi 101 iLQF - Implementation

CSIT560 By M. Hamdi 102 iLPF - Implementation Complicated hardware

CSIT560 By M. Hamdi 103 Other research efforts Packet-based arbitration Exhaustive-based arbitration Numerous other efforts

CSIT560 By M. Hamdi 104 Packet Scheduling/Arbitration in Virtual Output Queues: Randomized Algorithms and Others

CSIT560 By M. Hamdi 105 Input-Queued Packet Switch Crossbar Scheduler inputs outputs 1 N 1N i,j N,N 1,1 X i,j  i  j (  i i,j < 1 ;  j i,j < 1)

CSIT560 By M. Hamdi 106 Bipartite Graph and Matrix inputs outputs

CSIT560 By M. Hamdi 107 Stability of Scheduling Definition: Let X i,j (t) be the number of packets queued at input i for output j at time-slot t. Then an algorithm is stable iff:

CSIT560 By M. Hamdi 108 Motivation Networking problems suffer from the “curse of dimensionality” –algorithmic solutions do not scale well Typical causes –size: large number of users or large number of I/O –time: very high speeds of operation A good deterministic algorithm exists (Max Flow), but … –it requires too large a data structure –it needs state information, and “state” is too big –it “starts from scratch” in each iteration

CSIT560 By M. Hamdi 109 Randomization Randomized algorithms have frequently been used in many situations where the state space (e.g., different number of connections between input and output N!) is very large Randomized algorithms –are a powerful way of approximating –it is often possible to randomize deterministic algorithms –this simplifies the implementation while retaining a (surprisingly) high level of performance The main idea is –to simplify the decision-making process –by basing decisions upon a small, randomly chosen sample of the state –rather than upon the complete state

CSIT560 By M. Hamdi 110 Randomizing Iterative Schemes (e.g., iSLIP) Often, we want to perform some operation iteratively Example: find the heaviest matching in a switch in every time slot Since, in each time slot –at most one packet can arrive at each input –and, at most one packet can depart from each output  the size of the queues, or the “state” of the switch, doesn’t change by much between successive time slots  so, a matching that was heavy at time t will quite likely continue to be heavy at time t+1 This suggests that –knowing a heavy matching at time t should help in determining a heavy matching at time t+1  there is no need to start from scratch in each time slot

CSIT560 By M. Hamdi 111 Summarizing Randomized Algorithms Randomized algorithms can help simplify the implementation –by reducing the amount of work in each iteration If the state of the system doesn’t change by much between iterations, then –we can reduce the work even further by carrying information between iterations The big pay-off is  that, even though it is an approximation, the performance of a randomized scheme can be surprisingly good

CSIT560 By M. Hamdi 112 Randomized Scheduling Algorithms: Example Consider a 3 x 3 input-queued switch –input traffic: is Bernoulli IID and λij = α/3 for all i, j, and α < 1 –This is admissible –note: there are a total of 6 (= 3!) possible service matrices

CSIT560 By M. Hamdi 113 Random Scheduling Algorithms In time slot n, let S(n) be equal to one of the 6 possible matchings independently and uniformly at random Stability of Random –Consider L11(n), the number of packets in VOQ11 arrivals to VOQ11 occur according to A11(n), which is Bernoulli IID input rate = λ11 = α/3 this queue gets served whenever the service matrix connects input 1 to output 1 There are 2 service matrices that connect input 1 to output 1 since Random chooses service matrices u.a.r., input 1 is connected to output 1 1. for a fraction of time = 2/6 = 1/3 --- the service rate between input1 and output1 E(L11(n)) < iff λ11 < 1/3  α < 1 This random algorithm is stable.

CSIT560 By M. Hamdi 114 Random Scheduling Algorithms Instability of Random Now suppose λii = α for all i and λij =0 for –clearly, this is admissible traffic for all α < 1 –but, under Random, the service rate at VOQ11 is 1/3 at best –hence VOQ11 and the switch will be unstable as soon as Stability (or 100% throughput) means it is stable under all admissible traffic!

CSIT560 By M. Hamdi 115 Obvious Randomized Schemes Choose a matching at random and use it as the schedule  doesn’t give 100% throughput (already shown) Choose 2 matchings at random and use the heavier one as the schedule Choose N matchings at random and use the heaviest one as the schedule   None of these can give 100% throughput !!

CSIT560 By M. Hamdi 116

CSIT560 By M. Hamdi 117 Iterative Randomized Scheme (Tassiulas) Say M is the matching used at time t Let R be a new matching chosen uniformly at random (u.a.r.) among the N! different matchings At time t+1, use the heavier of M and R Complexity is very low O(1) iterations This gives 100% throughput !  note the boost in throughput is due to memory (saving previous matchings) But, delays are very large

CSIT560 By M. Hamdi 118

CSIT560 By M. Hamdi 119 Finer Observations Let M be schedule used at time t Choose a “good’’ random matching R M’ = Merge(M,R) M’ includes best edges from M and R Use M’ as schedule at time t+1 Above procedure yields algorithm called LAURA There are many other small variations to this algorithm.

CSIT560 By M. Hamdi Merging XR = =-1 W(X)=12W(R)=10 M W(M)=13 Merging Procedure

CSIT560 By M. Hamdi 121