Engineering Jon Turner Computer Science & Engineering Washington University www.arl.wustl.edu/~jst Coarse-Grained Scheduling for Multistage Interconnects.

Slides:



Advertisements
Similar presentations
1 Scheduling Crossbar Switches Who do we chose to traverse the switch in the next time slot? N N 11.
Advertisements

Jon Turner Extreme Networking Achieving Nonstop Network Operation Under Extreme Operating Conditions.
Ken Wong Jon Turner and Prashanth Pappu Washington University Distributed Queueing Gigabit Kits (June 2002)
Worst-case Fair Weighted Fair Queueing (WF²Q) by Jon C.R. Bennett & Hui Zhang Presented by Vitali Greenberg.
Algorithm Orals Algorithm Qualifying Examination Orals Achieving 100% Throughput in IQ/CIOQ Switches using Maximum Size and Maximal Matching Algorithms.
4-1 Network layer r transport segment from sending to receiving host r on sending side encapsulates segments into datagrams r on rcving side, delivers.
Generalized Processing Sharing (GPS) Is work conserving Is a fluid model Service Guarantee –GPS discipline can provide an end-to-end bounded- delay service.
1 Input Queued Switches: Cell Switching vs. Packet Switching Abtin Keshavarzian Joint work with Yashar Ganjali, Devavrat Shah Stanford University.
CS 268: Router Design Ion Stoica March 1, 2004.
10 - Network Layer. Network layer r transport segment from sending to receiving host r on sending side encapsulates segments into datagrams r on rcving.
Chapter 10 Switching Fabrics. Outline Physical Interconnection Physical box with backplane Individual blades plug into backplane slots Each blade contains.
Katz, Stoica F04 EECS 122: Introduction to Computer Networks Packet Scheduling and QoS Computer Science Division Department of Electrical Engineering and.
1 ENTS689L: Packet Processing and Switching Buffer-less Switch Fabric Architectures Buffer-less Switch Fabric Architectures Vahid Tabatabaee Fall 2006.
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion MSM.
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion The.
EE 122: Router Design Kevin Lai September 25, 2002.
CS 268: Lecture 12 (Router Design) Ion Stoica March 18, 2002.
1 EE384Y: Packet Switch Architectures Part II Load-balanced Switches Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University.
1 Achieving 100% throughput Where we are in the course… 1. Switch model 2. Uniform traffic  Technique: Uniform schedule (easy) 3. Non-uniform traffic,
EECC694 - Shaaban #1 lec #7 Spring The OSI Reference Model Network Layer.
1 Scheduling Crossbar Switches Who do we chose to traverse the switch in the next time slot? N N 11.
Pipelined Two Step Iterative Matching Algorithms for CIOQ Crossbar Switches Deng Pan and Yuanyuan Yang State University of New York, Stony Brook.
Localized Asynchronous Packet Scheduling for Buffered Crossbar Switches Deng Pan and Yuanyuan Yang State University of New York Stony Brook.
Chapter 4 Queuing, Datagrams, and Addressing
1 IP routers with memory that runs slower than the line rate Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford.
Computer Networks Switching Professor Hui Zhang
Packet Scheduling From Ion Stoica. 2 Packet Scheduling  Decide when and what packet to send on output link -Usually implemented at output interface 1.
Load Balanced Birkhoff-von Neumann Switches
Merits of a Load-Balanced AAPN 1.Packets within a flow are transported to their correct destinations in sequence. This is due to the 1:1 logical connection.
Jon Turner (and a cast of thousands) Washington University Design of a High Performance Active Router Active Nets PI Meeting - 12/01.
An Integrated IP Packet Shaper and Scheduler for Edge Routers MSEE Project Presentation Student: Yuqing Deng Advisor: Dr. Belle Wei Spring 2002.
Review of Networking Concepts Part 1: Switching Networks
Enabling Class of Service for CIOQ Switches with Maximal Weighted Algorithms Thursday, October 08, 2015 Feng Wang Siu Hong Yuen.
CSCI 465 D ata Communications and Networks Lecture 15 Martin van Bommel CSCI 465 Data Communications & Networks 1.
March 29 Scheduling ?. What is Packet Scheduling? Decide when and what packet to send on output link 1 2 Scheduler flow 1 flow 2 flow n Buffer management.
Packet Forwarding. A router has several input/output lines. From an input line, it receives a packet. It will check the header of the packet to determine.
Stress Resistant Scheduling Algorithms for CIOQ Switches Prashanth Pappu Applied Research Laboratory Washington University in St Louis “Stress Resistant.
Belgrade University Aleksandra Smiljanić: High-Capacity Switching Switches with Input Buffers (Cisco)
Packet Scheduling: SCFQ, STFQ, WF2Q Yongho Seok Contents Review: GPS, PGPS SCFQ( Self-clocked fair queuing ) STFQ( Start time fair queuing ) WF2Q( Worst-case.
Forwarding.
T. S. Eugene Ngeugeneng at cs.rice.edu Rice University1 COMP/ELEC 429 Introduction to Computer Networks Lecture 18: Quality of Service Slides used with.
Buffered Crossbars With Performance Guarantees Shang-Tse (Da) Chuang Cisco Systems EE384Y Thursday, April 27, 2006.
CS440 Computer Networks 1 Packet Switching Neil Tang 10/6/2008.
A Pseudo Random Coordinated Scheduling Algorithm for Bluetooth Scatternets MobiHoc 2001.
Queue Scheduling Disciplines
Achieving Stability in a Network of IQ Switches Neha Kumar Shubha U. Nabar.
HP Labs 1 IEEE Infocom 2003 End-to-End Congestion Control for InfiniBand Jose Renato Santos, Yoshio Turner, John Janakiraman HP Labs.
Network Layer4-1 Chapter 4 Network Layer All material copyright J.F Kurose and K.W. Ross, All Rights Reserved Computer Networking: A Top Down.
Input buffered switches (1)
Providing QoS in IP Networks
1 Chapter 7 Network Flow Slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved.
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
6/12/2016© 2010 Raymond P. Jefferis IIILect The Network Layer.
Network layer (addendum) Slides adapted from material by Nick McKeown and Kevin Lai.
Graciela Perera Department of Computer Science and Information Systems Slide 1 of 18 INTRODUCTION NETWORKING CONCEPTS AND ADMINISTRATION CSIS 3723 Graciela.
William Stallings Data and Computer Communications
scheduling for local-area networks”
Advanced Computer Networks
CS 268: Router Design Ion Stoica February 27, 2003.
Packet Forwarding.
Data Center Networks and Switching and Queueing
Addressing: Router Design
CS4470 Computer Networking Protocols
Bridges and Extended LANs
Hierarchical Scheduling Algorithms
Computer Science Division
EE 122: Lecture 7 Ion Stoica September 18, 2001.
Chapter 4 Network Layer Computer Networking: A Top Down Approach 5th edition. Jim Kurose, Keith Ross Addison-Wesley, April Network Layer.
Scheduling Crossbar Switches
EECS 122: Introduction to Computer Networks Packet Scheduling and QoS
Presentation transcript:

Engineering Jon Turner Computer Science & Engineering Washington University Coarse-Grained Scheduling for Multistage Interconnects

2 Engineering Overview System level traffic regulation needed »arriving traffic unpredictable and largely uncontrolled »prevent congestion in interconnect »isolate uncongested links from effects of overloaded links Methods used for crossbars don’t directly apply »scale issues – hundreds to thousands of ports »not practical to schedule every packet transmission Alternate approach »maintain Virtual Output Queues at inputs »regulate traffic flows by controlling input sending rates »adjust sending rates periodically in response to traffic »trade-off – responsiveness vs. overhead

3 Engineering Coarse-Grained Scheduling Interconnect... DS... to output 1 to output n 11 n n 2 2 Coarse-grained nature of makes it scalable »limit status traffic to fraction of interconnect bandwidth schedulers exchange periodic status reports Virtual output queues scheduler paces queues to avoid overload

4 Engineering Batch LOOFA Algorithm Goals: avoid congestion and underflow at outputs Preference to outputs with smallest queues »for output with smallest queue, send max # of cells allowed by switch bandwidth and data in input-side VOQs »repeat for output with second smallest queue, etc. »continue until no input/output pair can transfer more cells Variants based on how inputs are selected »longest VOQ first »backlog proportional allocation Batch LOOFA is coarse-grained, but not distributed »but, can be shown to be work-conserving for speedups 2 »motivates variants that can be distributed

5 Engineering Implementing Batch LOOFA Finding maximal schedule equivalent to finding a blocking flow (flow that saturates all source-sink paths) »blocking flows can be found in O(n 2 ) time »even when we also favor low occupancy outputs Hardware implementation possible outputs inputs VOQ levels output queue levels S = 1.5 T = 8 Scheduling Problem outputs inputs Scheduling SolutionBlocking Flow Problem with Solution s a0a0 a1a1 a2a2 a3a3 b0b0 b1b1 b2b2 b3b3 t 12,12 12,6 12,12 12,11 12,12 12,7 6,6 12,6 4,4 6,3 5,5 5,0 14,6 6,6 4,4 5,2 capacity,flow

6 Engineering BLOOFA is Work-Conserving Idealized view of coarse-grained scheduling »each input receives up to T bytes during input phase »during transfer phase, each input can send up to ST bytes and each output can receive up to ST bytes »each output sends up to T bytes during output phase »scheduler is work-conserving if any time an output j sends <T bytes in an output phase, no input has data for j Define »q j = number of bytes in packets at output j »p ij = number of bytes in V ij and VOQs that precede V ij »slack ij = q j p ij For speedup 2, slack ij ≥T before an output phase »because each transfer phase increases min j slack ij by 2T

7 Engineering Minimum Slack Increases Define minSlack i =min j slack ij Outline of proof »slack ij increases by 2T during each transfer phase in which V ij is not passed »if slack ij =minSlack i + and <2T, then slack ij increases by at least 2T– during transfer phase to prove, must account for VOQs that pass V ij »so, minSlack i increases by at least 2T during each transfer »no VOQ can pass another during an input or output phase »minSlack i never decreases during a busy period at input i »minSlack ij ≥0 before each output phase (so, slack ij ≥0 also) »so, no wasted output phases

8 Engineering Distributed Batch LOOFA (DBL) Periodic status exchange »input i sends V ij (VOQ length) to output j (for all i,j) »output j sends q j and  i V ij to input i (for all i,j) Input i sets upper limit on rate to j to S × V ij /  i V ij »guarantees traffic to j does not congest switch Input i sets rates starting with shortest output queues first »for each output, go up to upper limit unless no remaining input-side bandwidth »can also limit by VOQ contents Only approximates centralized BLOOFA »rate assignments may correspond to non-blocking flow

9 Engineering Stress Test Inputs combine to build backlogs for 0, 2,... »creates input contention As inputs “drop out” they switch to unique output »must supply unique output, while clearing backlogs Ideal switch can forward all packets from p phase test with k steps per phase in pk steps »overshoot – (excess steps)/pk »miss rate – (missed xmit opportunities)/(# of steps) phase 0phase 1phase 2phase 3

10 Engineering Sample Stress Test Ideally, no input- side queueing for outputs 1, 3, 5. overshoot of 12.8%

11 Engineering Performance on Stress Tests BLOOFA/L favors inputs with longest VOQs, BLOOFA/P uses backlog proportional allocation among inputs Approx. Output Leveling (A-OLA) and Dist. Output Leveling Algorithms (DOLA) seek to equalize output queue levels