Enabling Class of Service for CIOQ Switches with Maximal Weighted Algorithms Thursday, October 08, 2015 Feng Wang Siu Hong Yuen.

Slides:

Advertisements

Similar presentations

EE384y: Packet Switch Architectures

Advertisements

1 Outline  Why Maximal and not Maximum  Definition and properties of Maximal Match  Parallel Iterative Matching (PIM)  iSLIP  Wavefront Arbiter (WFA)

1 CNPA B Nasser S. Abouzakhar Queuing Disciplines Week 8 – Lecture 2 16 th November, 2009.

Alternative Simulation Core for Material Reliability Assessments Speculation how to heighten random character of probability calculations (concerning the.

Lecture 12. Emulating the Output Queue So far we have shown that it is possible to obtain the same throughput with input queueing as with output queueing.

Submitters: Erez Rokah Erez Goldshide Supervisor: Yossi Kanizo.

Nick McKeown CS244 Lecture 6 Packet Switches. What you said The very premise of the paper was a bit of an eye- opener for me, for previously I had never.

Algorithm Orals Algorithm Qualifying Examination Orals Achieving 100% Throughput in IQ/CIOQ Switches using Maximum Size and Maximal Matching Algorithms.

Generalized Processing Sharing (GPS) Is work conserving Is a fluid model Service Guarantee –GPS discipline can provide an end-to-end bounded- delay service.

1 Comnet 2006 Communication Networks Recitation 5 Input Queuing Scheduling & Combined Switches.

Analysis of a Packet Switch with Memories Running Slower than the Line Rate Sundar Iyer, Amr Awadallah, Nick McKeown Departments.

Software Quality Control Methods. Introduction Quality control methods have received a world wide surge of interest within the past couple of decades.

Packet-Mode Emulation of Output-Queued Switches David Hay, CS, Technion Joint work with Hagit Attiya (CS) and Isaac Keslassy (EE)

1 ENTS689L: Packet Processing and Switching Buffer-less Switch Fabric Architectures Buffer-less Switch Fabric Architectures Vahid Tabatabaee Fall 2006.

Locality-Aware Request Distribution in Cluster-based Network Servers 1. Introduction and Motivation --- Why have this idea? 2. Strategies --- How to implement?

048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion MSM.

CSIT560 by M. Hamdi 1 Course Exam: Review April 18/19 (in-Class)

048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion The.

048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Scaling.

1 Internet Routers Stochastics Network Seminar February 22 nd 2002 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University.

CS 268: Lecture 12 (Router Design) Ion Stoica March 18, 2002.

CIST560 by M. Hamdi 1 Packet Scheduling/Arbitration in Virtual Output Queues: Maximal Matching Algorithms (Part II)

Maximum Size Matchings & Input Queued Switches Sundar Iyer, Nick McKeown High Performance Networking Group, Stanford University,

COMP680E by M. Hamdi 1 Course Exam: Review April 17 (in-Class)

04/21/2004CSCI 315 Operating Systems Design1 Disk Scheduling.

1 Achieving 100% throughput Where we are in the course… 1. Switch model 2. Uniform traffic  Technique: Uniform schedule (easy) 3. Non-uniform traffic,

1 Netcomm 2005 Communication Networks Recitation 5.

048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Maximal.

048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Scheduling.

Distributed Scheduling Algorithms for Switching Systems Shunyuan Ye, Yanming Shen, Shivendra Panwar

1 Scheduling Crossbar Switches Who do we chose to traverse the switch in the next time slot? N N 11.

Wireless scheduling analysis (With ns3) By Pradeep Prathik Saisundatr.

Pipelined Two Step Iterative Matching Algorithms for CIOQ Crossbar Switches Deng Pan and Yuanyuan Yang State University of New York, Stony Brook.

Localized Asynchronous Packet Scheduling for Buffered Crossbar Switches Deng Pan and Yuanyuan Yang State University of New York Stony Brook.

Called as the Interval Scheduling Problem. A simpler version of a class of scheduling problems. – Can add weights. – Can add multiple resources – Can ask.

A Distributed Scheduling Algorithm for Real-time (D-SAR) Industrial Wireless Sensor and Actuator Networks By Kiana Karimpour.

High Speed Stable Packet Switches Shivendra S. Panwar Joint work with: Yihan Li, Yanming Shen and H. Jonathan Chao New York State Center for Advanced Technology.

CONGESTION CONTROL and RESOURCE ALLOCATION. Definition Resource Allocation : Process by which network elements try to meet the competing demands that.

Summary of switching theory Balaji Prabhakar Stanford University.

The NIProxy: a Flexible Proxy Server Supporting Client Bandwidth Management and Multimedia Service Provision Maarten Wijnants Wim Lamotte.

A High Performance Channel Sorting Scheduling Algorithm Based On Largest Packet P.G.Sarigiannidis, G.I.Papadimitriou, and A.S.Pomportsis Department of.

Engineering Jon Turner Computer Science & Engineering Washington University Coarse-Grained Scheduling for Multistage Interconnects.

ISLIP Switch Scheduler Ali Mohammad Zareh Bidoki April 2002.

1 Performance Guarantees for Internet Routers ISL Affiliates Meeting April 4 th 2002 Nick McKeown Professor of Electrical Engineering and Computer Science,

Crossbar Switch Project

Stress Resistant Scheduling Algorithms for CIOQ Switches Prashanth Pappu Applied Research Laboratory Washington University in St Louis “Stress Resistant.

Run-time Adaptive on-chip Communication Scheme 林孟諭 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan, R.O.C.

Packet Scheduling: SCFQ, STFQ, WF2Q Yongho Seok Contents Review: GPS, PGPS SCFQ( Self-clocked fair queuing ) STFQ( Start time fair queuing ) WF2Q( Worst-case.

Buffered Crossbars With Performance Guarantees Shang-Tse (Da) Chuang Cisco Systems EE384Y Thursday, April 27, 2006.

SNRC Meeting June 7 th, Crossbar Switch Scheduling Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University

Improving Matching algorithms for IQ switches Abhishek Das John J Kim.

Topics in Internet Research: Project Scope Mehreen Alam

Scheduling algorithms for CIOQ switches Balaji Prabhakar.

Input buffered switches (1)

Scheduling Mechanisms Applied to Packets in a Network Flow CSC /15/03 By Chris Hare, Ricky Johnson, and Fulviu Borcan.

scheduling for local-area networks”

Lower bound for the Stable Marriage Problem

Balaji Prabhakar Departments of EE and CS Stanford University

CS 268: Router Design Ion Stoica February 27, 2003.

CPU Scheduling G.Anuradha

Packet Scheduling/Arbitration in Virtual Output Queues and Others

Outline Why Maximal and not Maximum

Memory Management Algorithms Huan Liu, Damon Mosk-Aoyama

Stability Analysis of MNCM Class of Algorithms and two more problems !

EE 122: Lecture 7 Ion Stoica September 18, 2001.

Balaji Prabhakar Departments of EE and CS Stanford University

Write about the funding Sundar Iyer, Amr Awadallah, Nick McKeown

Introduction to Packet Scheduling

EE384Y: Packet Switch Architectures II

Introduction to Packet Scheduling

Presentation transcript:

Enabling Class of Service for CIOQ Switches with Maximal Weighted Algorithms Thursday, October 08, 2015 Feng Wang Siu Hong Yuen

2 Contents 1. Motivation  WFQ on OQ switches can provide service for different classes.  Can we find maximal weight matching algorithms to provide service for different classes for CIOQ switches? 2. Bandwidth Metric 3. Simulation Environment 4. Algorithms used and their results 5. Intuition behind the result 6. Further work 7. Conclusion

3 Motivation  We know that by using WFQ, we can provide service for different classes based on the priorities of the classes for OQ switches.  However, OQ switches are impractical to implement because of the high memory bandwidth and fabric switch bandwidth required.

4 Motivation  It is shown that with a speedup of 2, using stable marriage algorithm, CIOQ switches can emulate OQ switches.  Can we find maximal matching algorithms that can provide service for different classes same as OQ switch with WFQ for a CIOQ switch at a speedup of S?

5 Contents 1. Motivation 2. Metric used  WFQ as an ideal algorithm  Using bandwidth as a quantative metric 3. Simulation Environment 4. Algorithms used and their results 5. Intuition behind the result 6. Further work 7. Conclusion

6 Metric used  We used the WFQ algorithm implemented on OQ switches as the ideal algorithm to provide service for multiple classes.  Thus, in order to measure the effectiveness of our algorithms, we need a quantitative metric to compare our algorithms against the WFQ algorithm.

7 Metric used  Bandwidth metric measures whether the distribution of bandwidth that our algorithm produces is similar to that of WFQ.  During a time period T, we observe the distribution of packets departing from the OQ (using WFQ) and the CIOQ (using our algorithm).  Denote the number of class k packets departed from output port j of the OQ as X jk, and the number of class k packets departed from output port j of the CIOQ as Y jk.

8 Metric used  For output port j, Bandwidth used by class k for the OQ x jk = X jk / T Bandwidth used by class k for the CIOQ y jk = Y jk / T  Bandwidth metric we use:  BDiff ranges from 0 to 1.  The closer BDiff is to 0, the closer we are to emulating WFQ for OQ switches.  T is chosen as the time taken for the WFQ algorithm to finish one round-robin cycle. `

9 Contents 1. Motivation 2. Metric used 3. Simulation Environment  Simulator  Switch configuration  Traffic  Sampling 4. Algorithms used and their results 5. Intuition behind the result 6. Further work 7. Conclusion

10 Simulation Environment  Simulator: SIM v2.35  Switch: 8x8, 4 classes of service with weight 5:2:2:1  Traffic model:  Bernoulli iid uniform  Bernoulli iid nonuniform: overloaded traffic  Bursty uniform  Bursty nonuniform: overloaded traffic  Same input traffic trace for OQ and CIOQ switches  Sample the distribution of packets for port 0 each 10 time slots

11 Contents 1. Motivation 2. Metric used 3. Simulation Environment 4. Algorithms used and their results  algo0 to algo4 5. Intuition behind the result 6. Further work 7. Conclusion

12 Algorithms  We came up with 5 maximal weight matching algorithms that attempt to provide service for multiple classes.  They are based on the request-grant- accept phases similar to iSLIP.  Each VOQij is split into P sub-queues, each sub-queue stores the packet for a class

13 algo0  algo0 is the most basic algorithm out of the 5 algorithms upon which the subsequent algorithms build on. Algo0 is a variation of PIM with support for different priorities.  Request: For each output j that input i has a packet for, it requests that output with weight = 1.  Grant: If output j receives any requests, it determines the request with the largest weight (all the same in this case). If multiple requests are the same largest weight, ties are broken randomly.  Accept: If input i receives any grants, it determines the grant with the highest weight (all the same in this case). If multiple requests are the same largest weight, ties are broken randomly.

14 algo1  algo0 does not differentiate between different requests, i.e. all requests are treated equally.  algo1 improves on that by associating a weight with each request.  For each VOQij, we calculate W ij k = weight of class k x amount of time a class k packet has waited at the HoL. Then we take the maximum W ij k over all k classes for this VOQ and assign this as W ij, the weight of the request from input i to output j

15 algo1  The rest of the algorithm is the same as algo0.  Request: For each output j that input i has a packet for, it requests that output with weight = W ij.  Grant: If output j receives any requests, it determines the request with the largest weight. If multiple requests are the same largest weight, the ties are broken randomly.  Accept: If input i receives any grants, it determines the grant with the largest weight. If multiple requests are the same largest weight, ties are broken randomly.

16 algo2 and algo3  For algo0 and algo1, during the grant and accept phases, ties are broken randomly.  This does not take into consideration which request was granted/accepted previously.  algo2 and algo3 improves on algo0 and algo1 by remembering previous matches in a similar way to iSLIP  For each output, we keep a pointer to the last accepted grant input for every priority.  For each input, we keep a pointer to the last accepted output for every priority.

17 algo2  algo2 is algo0 with the pointer enhancement.  Request: For each output j that input i has a packet for, it requests that output with weight = 1.  Grant: If output j receives any requests, it determines the request with the highest weight (all the same in this case). Ties are broken first by priority. If multiple requests are of the same priority, we do the following: the output last granted for this priority is least preferred. We then grant the output that is most preferred (in the round-robin definition).  Accept: If input i receives any grants, it determines the grant with the highest weight (all the same in this case). If there are ties, we first select a priority to accept randomly. If there are multiple grants with this priority, we do the following: the input last accepted for this priority is least preferred. We then accept the input that is most preferred (in the round-robin definition).

18 algo3  algo3 is algo1 with the pointer enhancement.  Request: For each output j that input i has a packet for, it requests that output with weight = W ij.  Grant: If output j receives any requests, it determines the request with the highest weight. Ties are broken first by priority. If multiple requests are of the same priority, we do the following: the output last granted for this priority is least preferred. We then grant the output that is most preferred (in the round-robin definition).  Accept: If input i receives any grants, it determines the grant with the highest weight. If there are ties, we first select a priority to accept randomly. If there are multiple grants with this priority, we do the following: the input last accepted for this priority is least preferred. We then accept the input that is most preferred (in the round-robin definition).

19 algo4  algo2 and algo3 rotate the pointer for the preferred input port to grant and the preferred output port to accept.  Instead of having a pointer that rotates regularly, algo4 tries to rotate the preference depending on the weight of each class. It attempts to rotate the pointer similar to WFQ, where the pointer stays at a particular preferred port depending on the schedule determined by WFQ.

20 algo4  Request: For each output j that input i has a packet for, it requests that output with a bitmap showing which priority has a packet.  Grant: Output j maintains a preferred priority, which is updated in a way similar to WFQ for accepted grant request. Assume the preferred priority for output j is k. Output j checks all the received requests has the packet with priority k. If multiple inputs have priority k packets, ties are broken randomly. If no input has priority k packets, the output j updates its preferred priority to the next one.  Accept: Input i also maintains a preferred priority, which is updated in a way similar to WFQ for accepted request. Assume the preferred priority for input i is k. If input i receives any grants, it finds the grant with priority k. If multiple grants have priority k, ties are broken randomly. If no grant has priority k, the the preferred priority is updated to the next one.

21 Result: Bernoulli iid uniform

22 Result: Bernoulli iid nonuniform

23 Result: Bursty uniform

24 Result: Bursty nonuniform

25 Results:  In most of the cases, algo1 is better than algo0, algo3 is better than algo2  algo3 is not always better than algo1  algo4 is not always better than algo1 and algo3  When speedup increases, the results are getting close for different algorithms.  For speedup > 4, the BDiff = 0

26 Content 1. Motivation 2. Bandwidth Metric 3. Simulation Environment 4. Algorithms used and their results 5. Conclusion  Weight information is helpful  Size of matching is not helpful  WFQ on both input and output side is not helpful  Speedup for BDiff = 0 6. Further work 7. Conclusion

27 Intuition behind the result  Adding the weight information in the algorithms helps the scheduler to make the better decision for serving different classes.  Compared with algo0 and algo1, algo2 and algo3 improve the size of the matching because they desynchronize the grants to different ports. However, we observed that algo2 and algo3 did not improve the BDiff metric. So the size of the matching does not help for serving different classes.  Implement WFQ on both input and output port to select grants and accepts does not help to make the better decision. Intuitive thinking: WFQ on output side may help to make better decisions, but we could perhaps shall use other criteria to break ties on the input side.

28 Intuition behind the result  BDiff = 0 for Speedup > 4. 4 is the number of classes in our test. So maybe with Speedup > number of classes, BDiff=0. However, we did a couple of tests for number of classes = 5, BDiff = 0 for speedup > 4 is still hold.

29 Content 1. Motivation 2. Bandwidth Metric 3. Simulation Environment 4. Algorithms used and their results 5. Intuition behind the result 6. Further work  Latency metric  Existence of a constant Speedup S for BDiff = 0? 7. Conclusion

30 Future work  Besides the bandwidth allocated to different classes of service, the latency is another metric to measure how good the algorithm is. Define the metric for latency as how close the latency of the packets for different classes is to OQ switch, measure the latency metrics for different algorithms.  Investigate more on whether exist a constant speedup S, CIOQ switch can emulate OQ WFQ for the service rate for different classes. Need more theoretical analysis

31 Conclusion  We define the metric to evaluate the capability of algorithms to provide class of service. The metric is measured for different algorithms.  The result suggests that the weight information in selecting grants and accepts is helpful for smaller speedup. When speed up increases, the difference for different algorithm is not obvious. So there is a trade off between simple algorithm or speedup.  Among all the algorithms we tried, algo1 is good enough to provide a good service rate for different classes. Algo3 and Algo4 does not improve from algo1.  It’s possible to find a maximal matching algorithm with certain speedup for CIOQ switch to emulate OQ WFQ for the service rate of different classes