Some Unsolved Problems in High Speed Packet Swtiching

Slides:



Advertisements
Similar presentations
1 Maintaining Packet Order in Two-Stage Switches Isaac Keslassy, Nick McKeown Stanford University.
Advertisements

DYNAMIC POWER ALLOCATION AND ROUTING FOR TIME-VARYING WIRELESS NETWORKS Michael J. Neely, Eytan Modiano and Charles E.Rohrs Presented by Ruogu Li Department.
Nick McKeown CS244 Lecture 6 Packet Switches. What you said The very premise of the paper was a bit of an eye- opener for me, for previously I had never.
Frame-Aggregated Concurrent Matching Switch Bill Lin (University of California, San Diego) Isaac Keslassy (Technion, Israel)
Routers with a Single Stage of Buffering Sundar Iyer, Rui Zhang, Nick McKeown High Performance Networking Group, Stanford University,
Towards Simple, High-performance Input-Queued Switch Schedulers Devavrat Shah Stanford University Berkeley, Dec 5 Joint work with Paolo Giaccone and Balaji.
Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University The Load-Balanced Router.
A Scalable Switch for Service Guarantees Bill Lin (University of California, San Diego) Isaac Keslassy (Technion, Israel)
Algorithm Orals Algorithm Qualifying Examination Orals Achieving 100% Throughput in IQ/CIOQ Switches using Maximum Size and Maximal Matching Algorithms.
Making Parallel Packet Switches Practical Sundar Iyer, Nick McKeown Departments of Electrical Engineering & Computer Science,
Fast Matching Algorithms for Repetitive Optimization Sanjay Shakkottai, UT Austin Joint work with Supratim Deb (Bell Labs) and Devavrat Shah (MIT)
1 Input Queued Switches: Cell Switching vs. Packet Switching Abtin Keshavarzian Joint work with Yashar Ganjali, Devavrat Shah Stanford University.
*Sponsored in part by the DARPA IT-MANET Program, NSF OCE Opportunistic Scheduling with Reliability Guarantees in Cognitive Radio Networks Rahul.
April 10, HOL Blocking analysis based on: Broadband Integrated Networks by Mischa Schwartz.
1 Comnet 2006 Communication Networks Recitation 5 Input Queuing Scheduling & Combined Switches.
The Concurrent Matching Switch Architecture Bill Lin (University of California, San Diego) Isaac Keslassy (Technion, Israel)
Scaling Internet Routers Using Optics Producing a 100TB/s Router Ashley Green and Brad Rosen February 16, 2004.
1 Architectural Results in the Optical Router Project Da Chuang, Isaac Keslassy, Nick McKeown High Performance Networking Group
Packet-Mode Emulation of Output-Queued Switches David Hay, CS, Technion Joint work with Hagit Attiya (CS) and Isaac Keslassy (EE)
Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.
1 ENTS689L: Packet Processing and Switching Buffer-less Switch Fabric Architectures Buffer-less Switch Fabric Architectures Vahid Tabatabaee Fall 2006.
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion MSM.
CSIT560 by M. Hamdi 1 Course Exam: Review April 18/19 (in-Class)
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion The.
Guaranteed Smooth Scheduling in Packet Switches Isaac Keslassy (Stanford University), Murali Kodialam, T.V. Lakshman, Dimitri Stiliadis (Bell-Labs)
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Scaling.
A Load-Balanced Switch with an Arbitrary Number of Linecards Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.
6/22/20151 CLOS-NETWORK SWITCHES. H. Jonathan Chao 6/22/2015 Page 2 A Growable Switch Configuration i j.
1 Internet Routers Stochastics Network Seminar February 22 nd 2002 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University.
EE 122: Router Design Kevin Lai September 25, 2002.
1 EE384Y: Packet Switch Architectures Part II Load-balanced Switches Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University.
Maximum Size Matchings & Input Queued Switches Sundar Iyer, Nick McKeown High Performance Networking Group, Stanford University,
1 Trend in the design and analysis of Internet Routers University of Pennsylvania March 17 th 2003 Nick McKeown Professor of Electrical Engineering and.
COMP680E by M. Hamdi 1 Course Exam: Review April 17 (in-Class)
1 Achieving 100% throughput Where we are in the course… 1. Switch model 2. Uniform traffic  Technique: Uniform schedule (easy) 3. Non-uniform traffic,
1 Netcomm 2005 Communication Networks Recitation 5.
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Maximal.
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Scheduling.
Distributed Scheduling Algorithms for Switching Systems Shunyuan Ye, Yanming Shen, Shivendra Panwar
Pipelined Two Step Iterative Matching Algorithms for CIOQ Crossbar Switches Deng Pan and Yuanyuan Yang State University of New York, Stony Brook.
Localized Asynchronous Packet Scheduling for Buffered Crossbar Switches Deng Pan and Yuanyuan Yang State University of New York Stony Brook.
Load Balanced Birkhoff-von Neumann Switches
Belgrade University Aleksandra Smiljanić: High-Capacity Switching High-Capacity Packet Switches.
Belgrade University Aleksandra Smiljanić: High-Capacity Switching Switches with Input Buffers (Cisco)
A Cooperative MAC Protocol for Wireless LAN Pei Liu, Zhifeng Tao, Shivendra S. Panwar Motivation: In the legacy system, source station transmits.
ATM SWITCHING. SWITCHING A Switch is a network element that transfer packet from Input port to output port. A Switch is a network element that transfer.
1 Copyright © Monash University ATM Switch Design Philip Branch Centre for Telecommunications and Information Engineering (CTIE) Monash University
High Speed Stable Packet Switches Shivendra S. Panwar Joint work with: Yihan Li, Yanming Shen and H. Jonathan Chao New York State Center for Advanced Technology.
Summary of switching theory Balaji Prabhakar Stanford University.
Routers. These high-end, carrier-grade 7600 models process up to 30 million packets per second (pps).
ISLIP Switch Scheduler Ali Mohammad Zareh Bidoki April 2002.
Packet Forwarding. A router has several input/output lines. From an input line, it receives a packet. It will check the header of the packet to determine.
Abtin Keshavarzian Yashar Ganjali Department of Electrical Engineering Stanford University June 5, 2002 Cell Switching vs. Packet Switching EE384Y: Packet.
1 Performance Guarantees for Internet Routers ISL Affiliates Meeting April 4 th 2002 Nick McKeown Professor of Electrical Engineering and Computer Science,
Stress Resistant Scheduling Algorithms for CIOQ Switches Prashanth Pappu Applied Research Laboratory Washington University in St Louis “Stress Resistant.
Belgrade University Aleksandra Smiljanić: High-Capacity Switching Switches with Input Buffers (Cisco)
Buffered Crossbars With Performance Guarantees Shang-Tse (Da) Chuang Cisco Systems EE384Y Thursday, April 27, 2006.
Queueing in switched networks Damon Wischik, UCL thanks to Devavrat Shah, MIT TexPoint fonts used in EMF. Read the TexPoint manual before you delete this.
SNRC Meeting June 7 th, Crossbar Switch Scheduling Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University
Improving Matching algorithms for IQ switches Abhishek Das John J Kim.
Topics in Internet Research: Project Scope Mehreen Alam
A Load Balanced Switch with an Arbitrary Number of Linecards I.Keslassy, S.T.Chuang, N.McKeown ( CSL, Stanford University ) Some slides adapted from authors.
Input buffered switches (1)
Network layer (addendum) Slides adapted from material by Nick McKeown and Kevin Lai.
scheduling for local-area networks”
Balaji Prabhakar Departments of EE and CS Stanford University
Packet Forwarding.
Packet Scheduling/Arbitration in Virtual Output Queues and Others
EE 122: Lecture 7 Ion Stoica September 18, 2001.
Balaji Prabhakar Departments of EE and CS Stanford University
Presentation transcript:

Some Unsolved Problems in High Speed Packet Swtiching Shivendra S. Panwar Joint work with: Yihan Li, Yanming Shen and H. Jonathan Chao Polytechnic University, Brooklyn, NY NY State Center for Advanced Technology in Telecommunications http://catt.poly.edu/CATT/panwar.html

Advice to Woodward and Bernstein: “Follow the money” -- Deep Throat (aka Mark Felt)

Advice to performance analysts: “Find the bottleneck”

Packet Switching

Buffering in a Packet Switch Fixed-size packet switches Operates in a time-slotted manner The slot duration is equal to the cell transmission time Contention occurs when multiple inputs have arrivals destined to the same output Buffering is needed to avoid packet loss Buffering schemes in a packet switch Output queueing (IQ) Input queueing (OQ) Virtual output queueing (VOQ) / combined input-output-queueing (CIOQ)

Output Queuing (OQ) 100% throughput Internal speedup of N Impractical for large N Input 1 Output 1 3 Input 2 3 Output 2 Input 3 Output 3 3 Output 4 Input 4 3

Input Queuing (IQ) Easy to implement HOL Blocking, throughput 58.6% Output 1 2 1 Head of Line Blocking Input 2 2 3 Output 2 Input 3 4 3 Output 3 Input 4 4 2 Output 4

Virtual Output Queuing (VOQ) Overcome HOL blocking No speedup requirement Need scheduling algorithms to resolve contention Complexity Performance guarantee 1 2 3 4

Challenges in Switch Design Stability 100% throughput Delay performance Scalability Scale to high number of linecards and to high linecard speeds Distributed scheduler is more desirable than a centralized scheduler Scheduler complexity Pin count

High Speed Packet Switches VOQ switches and scheduling algorithms Buffered crossbar switch Load Balanced switch Multi-stage switch

VOQ Switch Architecture Input 1 Input 2 Input 3 Input 4 Output 1 Output 2 Output 3 Output 4 Switch Fabric VOQ ISM ORM 1 N Input Segmentation Module (ISM): Segment packets to fixed-length cells. Output Reassembly Module (ORM): Reassemble cells into packets.

Scheduling for VOQ Switch Scheduling is needed to avoid output contention A scheduling problem can be modeled as a matching problem in a bipartite graph An input and an output are connected by an edge if the corresponding VOQ is not empty Each edge may have a weight, which can be The length of the VOQ The age of the HOL cell

Maximum Weight Matching (MWM) 7 MWM always finds a match with the maximum weight Stable under any admissible traffic Very high complexity O(N3), impractical 4 3 7 8 5 6 References L. Tassiulas, A. Ephremides, ``Stability properties of constrained queueing systems and scheduling for maximum throughput in multihop radio networks,'' IEEE Transactions on Automatic Control, Vol. 37, No. 12, pp. 1936-1949, December 1992. E. Leonardi, M. Mellia, F. Neri, Marco A. Marsan, “On the stability of Input-Queued Switches with speed-up”, IEEE/ACM Transactions on Networking, Vol.9, No.1, pp.104-118, ISSN: S 1063-6692(01)01313, February 2001 10 5 2 Weight of the match: 25 N. McKeown, V. Anantharam, and J. Walrand, “Achieving 100% Throughput in an Input-Queued Switch,” IEEE Transaction on Comm., vol. 47, no. 8, Aug. 1999, pp. 1260-1267. J.G. Dai and B. Prabhakar, “The throughput of data switches with and without speedup,” INFOCOM 2000.

Maximum Weight Matching The maximum weight matching algorithm is strongly stable under any admissible traffic pattern Lyapunov function Strongly stable Admissible References Emilio Leonardi, Marco Mellia, Fabio Neri, Marco Ajmone Marsan, “On the stability of Input-Queued Switches with speed-up”, IEEE/ACM Transactions on Networking, Vol.9, No.1, pp.104-118, ISSN: S 1063-6692(01)01313, February 2001 N. McKeown, V. Anantharam, and J. Walrand, “Achieving 100% Throughput in an Input-Queued Switch,” IEEE Transaction on Comm., vol. 47, no. 8, Aug. 1999, pp. 1260-1267.

Maximum Weight Matching Fluid model The maximum weight matching is rate stable if: The arrival processes satisfy a strong law of large numbers (SLLN) with probability one , and References J.G. Dai and B. Prabhakar, “The throughput of data switches with and without speedup,” INFOCOM 2000, pp. 556-564.

Approximate MWM 1-APRX A function f(.) is a sub-linear function if limx∞ f(x)/x = 0 Let the weight of a schedule obtained by a scheduling algorithm B be WB Let the weight of the maximum weight match for the same switch state be W* If WB ≥ W* - f(W*) B is a 1-APRX to MWM B is stable if Makes it possible to find stable matching algorithms with lower complexity than MWM. References D. Shah, M. Kopikare, “Delay bounds for approximate Maximum weight matching algorithms for input-queued switches”, IEEE INFOCOM, New York, USA, June 2002.

Average Delay Bound Delay bound for MWM Lyapunov function References E. Leonardi, M. Melia, F. Neri, and M. Ajmone Marson. Bounds on average delays and queue size averages and variances in input-queued cell-based switches. Proceedings of IEEE INFOCOM, 2001.

Average Delay Bound (contd.) Delay bound for approximate-MWM Lyapunov function Cb: weight difference to the MWM matching Uniform traffic, they have the same result References D. Shah, M. Kopikare, “Delay bounds for approximate Maximum weight matching algorithms for input-queued switches”, IEEE INFOCOM, New York, USA, June 2002.

Open Issues With simulations, MWM has the best delay performance (Cell delay) Average delay: Choose the weight of a queue as Qa , then delay is increasing with a for a>0 Is MWM the optimal scheduling scheme for achieving the minimum average cell delay? What is the optimal scheduling scheme to achieve the minimum average packet delay (Including reassembly delay)?

Maximal Matching Maximal Matching 7 4 3 8 5 6 10 2 Weight of the match: 23 Maximal Matching Add connections incrementally, without removing connections made earlier No more matches can be made trivially by the end of the operation Solution may not be unique Complexity O(NlogN)

Maximal Matching A maximal matching achieves 100% throughput with speed-up S≥2 under any admissible traffic pattern [Leonardi, ToN 2001] 100% throughput if with probability 1 A maximal matching algorithm is rate stable with speed-up S≥2 [Dai, Infocom 2000] References Emilio Leonardi, Marco Mellia, Fabio Neri, Marco Ajmone Marsan, “On the stability of Input-Queued Switches with speed-up”, IEEE/ACM Transactions on Networking, Vol.9, No.1, pp.104-118, ISSN: S 1063-6692(01)01313, February 2001 J.G. Dai and B. Prabhakar, “The throughput of data switches with and without speedup,” INFOCOM 2000, pp. 556-564.

Multiple Iterative Matching Use multiple iterations to converge on a maximal matching Parallel Iterative Matching (PIM) iSLIP and DRRM complexity of each iteration is O(logN) O(logN) iterations are needed to converge on a maximal matching (iSLIP) 100% throughput only under uniform traffic

iSLIP Step 1: Request Step 2: Grant Step 3: Accept Each input sends a request to every output for which it has a queued cell. Step 2: Grant If an output receives multiple requests it chooses the one that appears next in a fixed round-robin schedule. The output arbiter pointer is incremented by one location beyond the granted input if, and only if, the grant is accepted in step 3. Step 3: Accept If an input receives multiple grants, it accepts the one that appears next in a fixed round-robin schedule. The input arbiter pointer is incremented by one location beyond the accepted output. Output Input Request Grant Accept

Achieving 100% Throughput without Speedup Matching algorithms using memory Polling system based matching

Low Complexity Algorithms with 100% Throughput Algorithms with memory Use the previous schedule as a candidate References L. Tassiulas, “Linear complexity algorithms for maximum throughput in radio networks and input queued switches,” IEEE INFOCOM 1998, vol.2, New York, 1998, pp.533-539. P. Giaccone, B. Prabhakar, D. Shah “Toward simple, high-performance schedulers for high-aggregate bandwidth switches”, IEEE INFOCOM 2002, New York, 2002. Polling system based matching algorithms Improve the efficiency by using exhaustive service Y. Li, S. Panwar, H. J. Chao, “Exhaustive service matching algorithms for input queued switches,” 2004 Workshop on High Performance Switching and Routing (HPSR 2004), April 2004. Y. Li, S. Panwar, H. J. Chao, “ Performance Analysis of a Dual Round Robin Matching Switch with Exhaustive Service,” IEEE GLOBECOM 2002.

Matching Algorithms with Memory The queue length of each VOQ does not change much during successive time slots In each time slot, there can be At most one cell arrives to each input At most one cell departs from each input It is likely that a busy connection will continue to be busy over a few time slots, if the queue length is used as the weight of a connection Use the match in the previous time slot as an candidate for the new match Important results: Randomized algorithm with memory [Tassiulas 98] Derandomized algorithm with memory [Giaccone 02] With higher complexity: APSARA, LAURA, SERENA [Giaccone 02]

Notations For a NxN switch, there are N! possible matches Q(t)=[qij]NxN, qij is the queue length of VOQij M(t), a match at time t The weight of M(t) W(t)=<M(t),Q(t)> the sum of the lengths of all matched VOQs

Randomized algorithm with memory Let S(t) be the schedule used at time t At time t+1, uniformly select a match R(t+1) at random from the set of all N! possible matches Let Stable under any Bernoulli i.i.d. admissible arrival traffic Very simple to implement, complexity O(logN) Delay performance is very poor

Derandomized Algorithm with Memory Hamiltonian walk A walk which visits every vertex of a graph exactly once. In a NxN switch, N! vertices (possible schedules), a Hamiltonian walk visits each vertex once every N! time slots H(t): the value of the vertex which is visited at time t The complexity of generating H(t+1) when H(t) is known is O(1) Derandomized algorithm with memory Use the match generated by Hamiltonian walk instead of the random match Similar performance as randomized algorithm

Compared to MWM … Simple matching algorithms can achieve stability as MWM does Not necessary to find “the best match” in each time slot to achieve 100% throughput MWM has much better delay performance than randomized and derandomized matching “better” matches lead to better delay performance

With Higher Complexity and Lower Delay Introduce higher complexity for much lower delay than the randomized and derandomized algorithms APSARA include the neighbors of the latest match as candidates LAURA: merge the latest match with a random match to remember the heavy edges SERENA Merge the latest match with the arrival figure Figure: generated from the current arrival pattern Complexity O(N)

Polling System Based Matching Exhaustive Service Matching Inspired by exhaustive service polling systems All the cells in the corresponding VOQ are served after an input and an output are matched Slot times wasted to achieve an input-output match are amortized over all the cells waiting in the VOQ instead of only one Cells within the same packet are transferred continuously Hamiltonian walk is used to guarantee stability

Exhaustive Service Matching with Hamiltonian Walk (EMHW) Let S(t) be the match at time t. At time t+1, generate match Z(t+1) by the Exhaustive Service Matching algorithm based on S(t), and H(t+1) by Hamiltonian walk Let where <S,Q(t+1)> is the weight of S at time t+1. Stable under any admissible traffic Analyzed by an exhaustive service polling system Implementation complexity HE-iSLIP: O(logN)

E-iSLIP Average Delay Analysis Exhaustive random polling system model Symmetric system -- only consider one input N VOQs per input, exhaustive service policy -- an exhaustive service polling system with N stations The service order of the VOQs are not fixed -- random polling system, assume all station VOQs have the same probability of selection for service after a VOQ is served Switch over time S Average delay T [Levy and Kleinrock]

Delay Performance of HE-iSLIP Packet delay: the sum of cell delay and reassembly delay Cell delay: measured from VOQ to destination output Reassembly delay: time spent in an ORM, often ignored in other work Input 1 Input 2 Input 3 Input 4 Output 1 Output 2 Output 3 Output 4 Switch Fabric VOQ ISM ORM 1 N

packet delay performance Performance Summary schemes complexity stable packet delay performance iSLIP O(logN) No Always higher than HE-iSLIP. HE-iSLIP Yes Lowest when packet size is larger than 1 cell. Derandomized Highest for all traffic patterns. SERENA O(N) Lower than HE-iSLIP only under nonuniform diagonal traffic. MWM O(N3) Lowest when packet size is 1 cell.

Packet Delay under Uniform Traffic Pattern 1: packet size is 1 cell. SERENA iSLIP HE-iSLIP MWM

Packet Delay under Uniform Traffic Pattern 2: packet length is 10 cells Pattern 3: packet length is variable, the average is 10 cells (Internet packet size distribution) SERENA SERENA iSLIP MWM iSLIP MWM HE-iSLIP HE-iSLIP

When packet length is larger than 1 cell Why does HE-iSLIP have a lower packet delay than MWM? For example, when packet length is 10 cells: Cell delay Reassembly delay HE-iSLIP MWM HE-iSLIP MWM Low cell delay + low reassembly delay needed for low packet delay Open Problem: Which scheduler minimizes packet delay performance?

Packet-Based Scheduling Packet-based scheduling algorithm once it starts transmitting the first cell of a packet to an output port, it continues the transmission until the whole packet is completely received at the corresponding output port Packet-based MWM is stable for any admissible Bernoulli i.i.d. traffic Lyapunov function, MA. Marsan, A. Bianco, P. Giaccone, E. Leonardi, and F. Neri, “Packet Scheduling in Input-Queued Cell-Based Swithces,” INFOCOM 2001, pp. 1085-1094. Packet-based MWM is stable under regenerative admissible input traffic Fluid model, Y. Ganjali, A. Keshavarzian, D. Shah, “Input Queued Switches: Cell switching v/s Packet switching", Proceedings of Infocom, 2003. regenerative: Let T be the time between two successive occurrences of the event that all ports are free with E(T) being finite Modified waiting PB-MWM algorithm is stable under any admissible traffic

Buffered Crossbar Switch One buffer for each crosspoint Distributed arbitration for inputs and outputs From each input, one cell can be sent to a crosspoint buffer if it has space One cell can be sent to an output if at least one crosspoint buffer to that output is nonempty References Y. Doi and N. Yamanaka, “A High-Speed ATM Switch with Input and Cross-Point Buffers,” IEICE TRANS. COMMUN., VOL. E76, NO.3, pp. 310-314, March 1993. R. Rojas-Cessa, E. Oki, Z. Jing, and H. J. Chao, “CIXB-1: Combined Input-One-Cell-Crosspoint Buffered Switch,” Proceedings of IEEE Workshop of High Performance Switches and Routers 2001.

Birkhoff-von Neumann Switch When traffic matrix is known Birkhoff-von Neumann decomposition Reference Cheng-Shang Chang, Wen-Jyh Chen and Hsiang-Yi Huang, "On service guarantees for input buffered crossbar switches: a capacity decomposition approach by Birkhoff and von Neumann," IEEE IWQoS'99, pp. 79-86, London, U.K., 1999.

Birkhoff-von Neumann Switch Example High complexity, impractical

Load-Balanced Switch Load-balanced switch Convert the traffic to uniform, then fixed switching 100% throughput for broad class of traffic No centralized scheduler needed, scalable Switching ... Load-balancing … 1 k N

Original Work on LB Switch Stability: the load-balanced switch is stable Delay: burst reduction Problem: unbounded out-of-sequence delays Reference C.-S. Chang, D.-S. Lee and Y.-S. Jou, “Load balanced Birkhoff-von Neumann switches, Part I: one-stage buffering,” Computer Comm., Vol. 25, pp. 611-622, 2002.

LB Switch variants Solve the out-of-sequence problem FCFS (First come first serve) Jitter control mechanism Increase the average delay EDF (Earliest deadline first) Reduce the average delay High complexity Mailbox switch Prevent packets from being out-of-sequence Not 100% throughput References C.-S. Chang, D.-S. Lee and C.-M. Lien, “Load balanced Birkhoff-von Neumann switches, Part II: multi-stage buffering,” Computer Comm., Vol. 25, pp. 623-634, 2002. C.S. Chang, D. Lee, and Y. J. Shih, “Mailbox switch: A scalable twostage switch architecture for conflict resolution of ordered packets,” In Proceedings of IEEE INFOCOM, Hong Kong, March 2004.

More LB switch variants FFF (Full frames first) (Infocom 2002, Mckeown) Frame-based No need for resequencing Require multi-stage buffer communication-high complexity FOFF (Full ordered frames first) (Sigcomm 2003, Mckeown) Maximum resequencing delay N2 Bandwidth wastage References I. Keslassy and N. McKeown, “Maintaining packet order in two-stage switches,” Proc. of the IEEE Infocom, June 2002. I. Keslassy, S.-T. Chuang, K. Yu, D. Miller, M. Horowitz, O. Solgaard and N. McKeown , “Scaling Internet routers using optics,” ACM SIGCOMM ’03, Karlsruhe, Germany, Aug. 2003.

Byte-Focal Switch Architecture Re-sequencing buffer 1st stage switch fabric 2nd stage switch fabric Arrival Input VOQ Second-stage VOQ … 1 2 N i (1,1) (1,1) ... ... 1 1 1 (1,k) (1,k) (1,N) (1,N) … 1 2 N … … (i,1) (j,1) ... ... … j k i (i,k) (j,k) (i,N) (j,N) … … (N,1) ... (N,1) ... (N,k) … N N N (N,k) (N,N) (N,N)

Byte-Focal Switch Packet-by-packet scheduling Improves the average delay performance The maximum resequencing delay is N2 The time complexity of the resequencing buffer is O(1) Does not need communications between linecards References Y. Shen, S. Jiang, S.S.Panwar, H.J. Chao, “Byte-Focal: a practical load-balanced swtich”, HPSR 2005, Hongkong.

Multi-Stage Switches Single Stage Switches (e.g., Cross-point switch) Single path between each input-output pair Cannot meet the increasing demands of Internet traffic No packets out-of-sequence Easy to design Lack of scalability Multi-stage Switches (e.g., Clos-network switch) Multiple paths between each input-output pair Better tradeoff between the switch performance and complexity Highly scalable and fault tolerant Memory-less multi-stage switches No packets out-of-sequence, may encounter internal blocking Buffered multi-stage switches Packet may be out-of-sequence, easy scheduling

Multi-Stage Architecture

Trueway: A Multi-Plane Multi-Stage Switch

Trueway Switch The switch fabric consists of multiple switching planes, with each being a three-stage Clos network with m center modules Each input/output pair has multiple routing paths Highly scalable 1 n 2 Cross-point buffered memory

Challenges in Multi-Stage Switching How to efficiently allocate and share the limited on-chip memory? How to schedule packets on multiple paths to maximize memory utilization and system performance? How to minimize link congestion and prevent buffer overflow (i.e., stage-to-stage flow control)? How to maintain cells/packet order if they are delivered over multiple paths (i.e., port-to-port flow control)? How to achieve 100% throughput?

Conclusion Introduced switch architecture trends Many open research problems Bottleneck keeps changing!