Belief-Propagation Assisted Scheduling in Input-Queued Switches S. Atalla 1, D. Cuda 2, P. Giaccone 1, M. Pretti 2 1 Politecnico di Torino 2 Italian National.

Slides:

Advertisements

Similar presentations

EE384y: Packet Switch Architectures

Advertisements

1 Outline  Why Maximal and not Maximum  Definition and properties of Maximal Match  Parallel Iterative Matching (PIM)  iSLIP  Wavefront Arbiter (WFA)

ICCV 2007 tutorial Part III Message-passing algorithms for energy minimization Vladimir Kolmogorov University College London.

Router Architecture : Building high-performance routers Ian Pratt

Nick McKeown CS244 Lecture 6 Packet Switches. What you said The very premise of the paper was a bit of an eye- opener for me, for previously I had never.

Routers with a Single Stage of Buffering Sundar Iyer, Rui Zhang, Nick McKeown High Performance Networking Group, Stanford University,

Towards Simple, High-performance Input-Queued Switch Schedulers Devavrat Shah Stanford University Berkeley, Dec 5 Joint work with Paolo Giaccone and Balaji.

A Scalable Switch for Service Guarantees Bill Lin (University of California, San Diego) Isaac Keslassy (Technion, Israel)

Beneficial Caching in Mobile Ad Hoc Networks Bin Tang, Samir Das, Himanshu Gupta Computer Science Department Stony Brook University.

Algorithm Orals Algorithm Qualifying Examination Orals Achieving 100% Throughput in IQ/CIOQ Switches using Maximum Size and Maximal Matching Algorithms.

Making Parallel Packet Switches Practical Sundar Iyer, Nick McKeown Departments of Electrical Engineering & Computer Science,

1 A Tree Based Router Search Engine Architecture With Single Port Memories Author: Baboescu, F.Baboescu, F. Tullsen, D.M. Rosu, G. Singh, S. Tullsen, D.M.Rosu,

Fast Matching Algorithms for Repetitive Optimization Sanjay Shakkottai, UT Austin Joint work with Supratim Deb (Bell Labs) and Devavrat Shah (MIT)

1 Input Queued Switches: Cell Switching vs. Packet Switching Abtin Keshavarzian Joint work with Yashar Ganjali, Devavrat Shah Stanford University.

*Sponsored in part by the DARPA IT-MANET Program, NSF OCE Opportunistic Scheduling with Reliability Guarantees in Cognitive Radio Networks Rahul.

1 Comnet 2006 Communication Networks Recitation 5 Input Queuing Scheduling & Combined Switches.

The Concurrent Matching Switch Architecture Bill Lin (University of California, San Diego) Isaac Keslassy (Technion, Israel)

1 ENTS689L: Packet Processing and Switching Buffer-less Switch Fabric Architectures Buffer-less Switch Fabric Architectures Vahid Tabatabaee Fall 2006.

048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion MSM.

048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion The.

048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Scaling.

1 Internet Routers Stochastics Network Seminar February 22 nd 2002 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University.

Scheduling in Delay Graphs with Applications to Optical Networks Isaac Keslassy (Stanford University), Murali Kodialam, T.V. Lakshman, Dimitri Stiliadis.

Maximum Network lifetime in Wireless Sensor Networks with Adjustable Sensing Ranges Mihaela Cardei, Jie Wu, Mingming Lu, and Mohammad O. Pervaiz Department.

Maximum Size Matchings & Input Queued Switches Sundar Iyer, Nick McKeown High Performance Networking Group, Stanford University,

COMP680E by M. Hamdi 1 Course Exam: Review April 17 (in-Class)

1 Achieving 100% throughput Where we are in the course… 1. Switch model 2. Uniform traffic  Technique: Uniform schedule (easy) 3. Non-uniform traffic,

1 Netcomm 2005 Communication Networks Recitation 5.

048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Maximal.

048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Scheduling.

Distributed Scheduling Algorithms for Switching Systems Shunyuan Ye, Yanming Shen, Shivendra Panwar

Pipelined Two Step Iterative Matching Algorithms for CIOQ Crossbar Switches Deng Pan and Yuanyuan Yang State University of New York, Stony Brook.

Localized Asynchronous Packet Scheduling for Buffered Crossbar Switches Deng Pan and Yuanyuan Yang State University of New York Stony Brook.

Load Balanced Birkhoff-von Neumann Switches

Belgrade University Aleksandra Smiljanić: High-Capacity Switching Switches with Input Buffers (Cisco)

High Speed Stable Packet Switches Shivendra S. Panwar Joint work with: Yihan Li, Yanming Shen and H. Jonathan Chao New York State Center for Advanced Technology.

Summary of switching theory Balaji Prabhakar Stanford University.

Message-Passing for Wireless Scheduling: an Experimental Study Paolo Giaccone (Politecnico di Torino) Devavrat Shah (MIT) ICCCN 2010 – Zurich August 2.

The Router SC 504 Project Gardar Hauksson Allen Liu.

Routers. These high-end, carrier-grade 7600 models process up to 30 million packets per second (pps).

1 Optical Packet Switching Techniques Walter Picco MS Thesis Defense December 2001 Fabio Neri, Marco Ajmone Marsan Telecommunication Networks Group

ISLIP Switch Scheduler Ali Mohammad Zareh Bidoki April 2002.

Packet Forwarding. A router has several input/output lines. From an input line, it receives a packet. It will check the header of the packet to determine.

Abtin Keshavarzian Yashar Ganjali Department of Electrical Engineering Stanford University June 5, 2002 Cell Switching vs. Packet Switching EE384Y: Packet.

1 Performance Guarantees for Internet Routers ISL Affiliates Meeting April 4 th 2002 Nick McKeown Professor of Electrical Engineering and Computer Science,

Stress Resistant Scheduling Algorithms for CIOQ Switches Prashanth Pappu Applied Research Laboratory Washington University in St Louis “Stress Resistant.

Belgrade University Aleksandra Smiljanić: High-Capacity Switching Switches with Input Buffers (Cisco)

An Efficient Gigabit Ethernet Switch Model for Large-Scale Simulation Dong (Kevin) Jin.

Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.

Practical Message-passing Framework for Large-scale Combinatorial Optimization Inho Cho, Soya Park, Sejun Park, Dongsu Han, and Jinwoo Shin KAIST 2015.

Buffered Crossbars With Performance Guarantees Shang-Tse (Da) Chuang Cisco Systems EE384Y Thursday, April 27, 2006.

Queueing in switched networks Damon Wischik, UCL thanks to Devavrat Shah, MIT TexPoint fonts used in EMF. Read the TexPoint manual before you delete this.

An Efficient Gigabit Ethernet Switch Model for Large-Scale Simulation Dong (Kevin) Jin.

SNRC Meeting June 7 th, Crossbar Switch Scheduling Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University

Improving Matching algorithms for IQ switches Abhishek Das John J Kim.

Topics in Internet Research: Project Scope Mehreen Alam

A Load Balanced Switch with an Arbitrary Number of Linecards I.Keslassy, S.T.Chuang, N.McKeown ( CSL, Stanford University ) Some slides adapted from authors.

Distributed cooperation and coordination using the Max-Sum algorithm

Input buffered switches (1)

1 Chapter 7 Network Flow Slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved.

scheduling for local-area networks”

Balaji Prabhakar Departments of EE and CS Stanford University

Routing and Switching Fabrics

Packet Forwarding.

Chapter 7 Network Flow Slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved.

Packet Scheduling/Arbitration in Virtual Output Queues and Others

Stability Analysis of MNCM Class of Algorithms and two more problems !

Balaji Prabhakar Departments of EE and CS Stanford University

Javad Ghaderi, Tianxiong Ji and R. Srikant

Routing and Switching Fabrics

Presentation transcript:

Belief-Propagation Assisted Scheduling in Input-Queued Switches S. Atalla 1, D. Cuda 2, P. Giaccone 1, M. Pretti 2 1 Politecnico di Torino 2 Italian National Research Council Hot Interconnects 2010 August 2010

Outline  Background motivations  System model  Basic belief-propagation algorithm for MWM  Assisted scheduling  Belief-propagation for assisted scheduling  Performance evaluation  Hardware implementation  Conclusions 2Hot Interconnects 2010

Background motivations  Internet traffic is steadily increasing  Routers and switches require to process growing amount of data faster and faster  Input Queued (IQ) switches can be considered as a reference architecture  Memory speed = line rate  IQ switches require suitable scheduling algorithms that  Ensure good performance (throughput, delay,)  Run fast (few ns to take each scheduling decision)  Are implementable in hardware (HW) 3Hot Interconnects 2010

System model  NxN crossbar with Virtual Output Queuing  one FIFO queue for each input output pair  total of N 2 queues  Synchronous architecture:  time is slotted  fixed sized packets Hot Interconnects 20104

Scheduling algorithm  At each timeslot, the scheduler selects a set of head-of-line packets compatible with the crossbar constraint:  At the most one packet can be transferred to/from each output/input port  equivalent to choose a matching in a bipartite graph  Inputs: lengths of the VOQ  Outputs: matching described through binary variable: x ij =1 iff input i transfer packet to output j q ij Scheduler (MWM, iSLIP, iLQF, …) x 00 =1 x 33 =0 5Hot Interconnects

Scheduling algorithm dichotomy  Maximum Weight Matching (MWM) is  Optimal in terms of performance  Difficult to implement in HW  O(N 3 ) operations, difficult to be parallelized  Heuristic algorithms mimicking MWM  E.g., iSLIP, iLQF, WFA (and many others)  Efficient to be implemented in HW  e.g., iSLIP was implemented in CISCO serie  Possible traffic losses under critical traffic patterns Hot Interconnects 20106

Basic belief-propagation for MWM  Recently, Belief-Propagation (BP) algorithm has been proposed to solve MWM problem [1,2]  BP algorithms are message passing algorithms firstly conceived to study Graphical Models (GMs)  GMs combine graphic theory and probability theory  BP is exact for MWM over bipartite graph (see [1]), but  To ensure convergence, MWM must be unique  Small random noise can be added to queue length  It takes O(N 3 / ε ) to converge  ε : difference in weight between the first two heaviest matchings  not known a priori Hot Interconnects [1]M. Bayati, D. Shah, and M. Sharma, “Max-product for maximum weight matching: Convergence, correctness, and LP duality,” Information Theory, IEEE Transactions on, vol. 54, no. 3, pp. 1241–1251, Mar [2]M. Bayati, B. Prabhakar, D. Shah, and M. Sharma, “Iterative scheduling algorithms,” in INFOCOM 2007, IEEE, , pp. 445 –453.

Basic belief-propagation for MWM 0 0 8Hot Interconnects

Basic belief-propagation for MWM 0 0 9Hot Interconnects

Basic belief-propagation for MWM Hot Interconnects

Basic belief-propagation for MWM Hot Interconnects

Basic belief-propagation for MWM 0 0 After convergence, each output it is matched to the input associated with the largest message. 12Hot Interconnects 2010

Assisted scheduling  Our major contribution is the introduction of the concept of assisted scheduling:  Instead of the queue length, scheduling algorithms are modified to use messages computed by BP as weights  We show that BP assisted scheduling boosts performance of existing schedulers while keeping backward compatibility 13Hot Interconnects 2010

Assisted scheduling  We introduce the Belief-Propagation Message-Processing module between the VOQs and the Scheduler  BP-MP computes message values as a function of the queue length Q(t), based on a BP algorithm  The scheduler works in the usual way, but scheduling decisions are based on the messages F(t) computed by the BP-MP module instead that on Q(t)  F(t) can be see as a correction of the VOQ lengths Q(t) BP-MP few I Scheduler 14Hot Interconnects 2010

Assisted scheduling  BP propagation has been improved with:  Relaxation of the MWM uniqueness constraint  We do not need BP to converge anymore  No random noise  Finite (and small) number of iterations  Integer number representation  Memory  Self-Asynchronous update Hot Interconnects

Messages for assisted scheduling  It runs for a fixed (and small) number of iterations I Hot Interconnects Messages are bounded  Messages  represented through integer numbers  Same numerical range of the queue length (around log 2 Q max bits)

Memory for assisted scheduling  Queues exhibit a strong correlation that is reflected in the message dynamics  Queue length can change at the most by 1 at each timeslot Memory: messages are initialized to the last computed messages Memory speeds up convergence 17Hot Interconnects 2010

Self-asynchronous update for assisted scheduling  Studies in BP showed that messages updated in a random sequential order are beneficial for the convergence (asynchronous update)  Not easy to implement in HW Self-asynchronous update: exploits randomness of the arrival process updates only messages associated with queues which have changed from the previous timeslot mimics asynchronous update 18Hot Interconnects 2010

Scheduling algorithms  iLQF vs. BP assisted iLQF (BP-iLQF)  Distributed greedy algorithm  Each input (each output) is equipped with an arbiter which selects output (input) associated with the longest queue  Greedy MWM (GMWM) vs. BP assisted GMWM (BP-GMWM)  centralized scheduling, iterating N times  at each iteration it selects the unmatched input/output couple associated with the longest queue  iSLIP  as iLQF, but sending only a binary information (queue empty/not-empty) Hot Interconnects

Performance evaluation settings  Simulation settings:  Traffic patterns:  Critical traffic pattern 20Hot Interconnects 2010

Performance evaluation results BP assisted scheduling improves performance (I=3) Memory No Memory 21Hot Interconnects 2010 Self-asynchronous Synchronous Asynchronous

Hardware design: General overview 2N modules running in parallel BP-MP Backward messages Forward messages When n=I, IM sends F(t) to the scheduler IM and OM perform the same operations VOQ Scheduler 22Hot Interconnects 2010

Hardware design: IM details Self-asynchronous: if w ij (t)≠ w ij (t-1) e ij =1 else e ij =0 Flags associated with VOQ at input i Memory: registers storing messages computed during the previous timeslot Max operation Tournament implementation  log 2 (N-1)  stages and (N-2) comparisons c used to select between 0 and the result of the subtraction operation Subtraction operation When n=I messages are sent to the scheduler 23Hot Interconnects 2010 N registers of size log 2 Q max

Conclusion  We proposed BP assisted scheduling to boost performance of existing scheduling algorithms keeping backward compatibility  BP runs for few iterations  We simplified and improved basic BP algorithm:  Relaxation of MWM uniqueness constraint  Integer messages (backward compatibility)  Message memory  Self-asynchronous update  We provided a high-level description of a possible HW implementation of the BP-MP:  BP-MP can be efficiently implemented in HW and it is compatible with existing implementations Hot Interconnects

Belief-Propagation Assisted Scheduling in Input-Queued Switches S. Atalla 1, D. Cuda 2, P. Giaccone 1, M. Pretti 2 1 Politecnico di Torino 2 Italian National Research Council Hot Interconnects 2010 August 2010 Any questions? Thank you for your attention! 25Hot Interconnects 2010

Example: MWM computation over a tree  Node “1” must decide to add or not edge (1,2) to the matching  Node “1” takes its decision based on the information provided only by nodes belonging to its neighborhood  E.g., Node “2” sends to “1” two messages:  : MWM of the sub-tree rooted at “2” comprising (2,1) given that (2,1) is part of the MWM rooted at “1”  : MWM of the sub-tree rooted at 2 comprehending (2,1) given that (2,1) is part of the MWM rooted at “1” Take or not to take (2,1)? w 32 w 21 w 42 w 61 w 71 26Hot Interconnects 2010

Example: MWM computation over a tree Message definitions:  If (2,1) is part of the MWM, then (3,2), (4,2), (5,2) can not be in the MWM  if (2,1) is not the MWM, then at the most one (or none) among (3,2), (4,2), (5,2) can part of the MWM  It is possible to reduce the number of exchanged messages combining into a single message w Hot Interconnects 2010

Example: MWM computation over a tree Node “1” decision:  Node “1” adds edge (1,2) to the MWM if:  or equivalently  Take or not to take (2,1)? w 32 w 21 w 42 w 61 w 71 28Hot Interconnects 2010

Graphical models  BP algorithms are message passing algorithms conceived firstly to study Graphical Models (GMs)  GMs are a “marriage” between probability theory and graph theory lo direi solo a voce, non significa niente qui  GMs are becoming a powerful tool in several fields of science (AI, speech recognition, coding/decoding, bioinformatics) to compute marginal probabilities and maximum a posteriori probability (max-product algorithm)  “BP” and “max-product “ are usually simply referred as “BP” since computing the maximum a posteriori probability requires first to compute the marginal distributions io questa frase non l’ho capita e mi pare rischiosissima!!! 29Hot Interconnects 2010

VOQBP-MPScheduler 30Hot Interconnects 2010

Scheduler: iLQF  If the MWM is unique, BP assisted iLQF, running with weights computes exactly the MWM 31Hot Interconnects 2010

Performance evaluation: results BP assisted scheduling improves performance (I=3) Average delays : delays BP-iLQF/GWM are at the most 1.37 times delays of iLQF/GWM. Memory No Memory 32Hot Interconnects 2010 Self-asynchronous Synchronous Asynchronous

Basic belief-propagation for MWM Hot Interconnects