Using Load-Balancing To Build High-Performance Routers Isaac Keslassy Ph.D. Oral Examination Department of Electrical Engineering Stanford University.

Slides:



Advertisements
Similar presentations
1 EE384Y: Packet Switch Architectures Part II Load-balanced Switch (Borrowed from Isaac Keslassys Defense Talk) Nick McKeown Professor of Electrical Engineering.
Advertisements

1 Maintaining Packet Order in Two-Stage Switches Isaac Keslassy, Nick McKeown Stanford University.
Configuring a Load-Balanced Switch in Hardware Srikanth Arekapudi, Shang-Tse (Da) Chuang, Isaac Keslassy, Nick McKeown Stanford University.
Clean Slate Design for the Internet Designing a Predictable Backbone Network with Valiant Load Balancing NSF 100 x 100 Clean.
High-Performance Networking Group Isaac Keslassy, Nick McKeown
Frame-Aggregated Concurrent Matching Switch Bill Lin (University of California, San Diego) Isaac Keslassy (Technion, Israel)
A Load-Balanced Switch with an Arbitrary Number of Linecards Isaac Keslassy, Shang-Tse Chuang, Nick McKeown.
Routers with a Single Stage of Buffering Sundar Iyer, Rui Zhang, Nick McKeown High Performance Networking Group, Stanford University,
Scaling Internet Routers Using Optics UW, October 16 th, 2003 Nick McKeown Joint work with research groups of: David Miller, Mark Horowitz, Olav Solgaard.
May 28th, 2002Nick McKeown 1 Scaling routers: Where do we go from here? HPSR, Kobe, Japan May 28 th, 2002 Nick McKeown Professor of Electrical Engineering.
Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University The Load-Balanced Router.
A Scalable Switch for Service Guarantees Bill Lin (University of California, San Diego) Isaac Keslassy (Technion, Israel)
Making Parallel Packet Switches Practical Sundar Iyer, Nick McKeown Departments of Electrical Engineering & Computer Science,
1 Input Queued Switches: Cell Switching vs. Packet Switching Abtin Keshavarzian Joint work with Yashar Ganjali, Devavrat Shah Stanford University.
The Concurrent Matching Switch Architecture Bill Lin (University of California, San Diego) Isaac Keslassy (Technion, Israel)
Scaling Internet Routers Using Optics Producing a 100TB/s Router Ashley Green and Brad Rosen February 16, 2004.
1 Architectural Results in the Optical Router Project Da Chuang, Isaac Keslassy, Nick McKeown High Performance Networking Group
1 OR Project Group II: Packet Buffer Proposal Da Chuang, Isaac Keslassy, Sundar Iyer, Greg Watson, Nick McKeown, Mark Horowitz
Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion MSM.
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion The.
Guaranteed Smooth Scheduling in Packet Switches Isaac Keslassy (Stanford University), Murali Kodialam, T.V. Lakshman, Dimitri Stiliadis (Bell-Labs)
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Scaling.
A Load-Balanced Switch with an Arbitrary Number of Linecards Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.
Scaling Internet Routers Using Optics Isaac Keslassy, Shang-Tse Da Chuang, Kyoungsik Yu, David Miller, Mark Horowitz, Olav Solgaard, Nick McKeown Department.
Scheduling Proposals Scheduling Group Giulio Galante, Wensheng Hua, Sundar Iyer, Isaac Keslassy, Pablo Molinero, Gireesh Shrimali, Rui Zhang.
The Crosspoint Queued Switch Yossi Kanizo (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel) and David Hay (Politecnico di Torino, Italy)
EE 122: Router Design Kevin Lai September 25, 2002.
Nick McKeown 1 Memory for High Performance Internet Routers Micron February 12 th 2003 Nick McKeown Professor of Electrical Engineering and Computer Science,
1 EE384Y: Packet Switch Architectures Part II Load-balanced Switches Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University.
Scheduling in Delay Graphs with Applications to Optical Networks Isaac Keslassy (Stanford University), Murali Kodialam, T.V. Lakshman, Dimitri Stiliadis.
Fundamental Complexity of Optical Systems Hadas Kogan, Isaac Keslassy Technion (Israel)
Maximum Size Matchings & Input Queued Switches Sundar Iyer, Nick McKeown High Performance Networking Group, Stanford University,
Ph. D Oral Examination Load-Balancing and Parallelism for the Internet Stanford University Ph.D. Oral Examination Tuesday, Feb 18 th 2003 Sundar Iyer
1 Achieving 100% throughput Where we are in the course… 1. Switch model 2. Uniform traffic  Technique: Uniform schedule (easy) 3. Non-uniform traffic,
Optimal Load-Balancing Isaac Keslassy (Technion, Israel), Cheng-Shang Chang (National Tsing Hua University, Taiwan), Nick McKeown (Stanford University,
Surprise Quiz EE384Z: McKeown, Prabhakar ”Your Worst Nightmares in Packet Switching Architectures”, 3 units [Total time = 15 mins, Marks: 15, Credit is.
1 IP routers with memory that runs slower than the line rate Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford.
Load Balanced Birkhoff-von Neumann Switches
Belgrade University Aleksandra Smiljanić: High-Capacity Switching High-Capacity Packet Switches.
Nick McKeown CS244 Lecture 7 Valiant Load Balancing.
Belgrade University Aleksandra Smiljanić: High-Capacity Switching Switches with Input Buffers (Cisco)
Optics in Internet Routers Mark Horowitz, Nick McKeown, Olav Solgaard, David Miller Stanford University
Summary of switching theory Balaji Prabhakar Stanford University.
Advance Computer Networking L-8 Routers Acknowledgments: Lecture slides are from the graduate level Computer Networks course thought by Srinivasan Seshan.
Designing Packet Buffers for Internet Routers Friday, October 23, 2015 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford.
Routers. These high-end, carrier-grade 7600 models process up to 30 million packets per second (pps).
Applied research laboratory 1 Scaling Internet Routers Using Optics Isaac Keslassy, et al. Proceedings of SIGCOMM Slides:
Packet Forwarding. A router has several input/output lines. From an input line, it receives a packet. It will check the header of the packet to determine.
Nick McKeown1 Building Fast Packet Buffers From Slow Memory CIS Roundtable May 2002 Nick McKeown Professor of Electrical Engineering and Computer Science,
1 Performance Guarantees for Internet Routers ISL Affiliates Meeting April 4 th 2002 Nick McKeown Professor of Electrical Engineering and Computer Science,
Stress Resistant Scheduling Algorithms for CIOQ Switches Prashanth Pappu Applied Research Laboratory Washington University in St Louis “Stress Resistant.
An Introduction to Packet Switching Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford University
Winter 2006EE384x1 EE384x: Packet Switch Architectures I a) Delay Guarantees with Parallel Shared Memory b) Summary of Deterministic Analysis Nick McKeown.
Guaranteed Smooth Scheduling in Packet Switches Isaac Keslassy (Stanford University), Murali Kodialam, T.V. Lakshman, Dimitri Stiliadis (Bell-Labs)
Belgrade University Aleksandra Smiljanić: High-Capacity Switching Switches with Input Buffers (Cisco)
Buffered Crossbars With Performance Guarantees Shang-Tse (Da) Chuang Cisco Systems EE384Y Thursday, April 27, 2006.
SNRC Meeting June 7 th, Crossbar Switch Scheduling Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University
1 How scalable is the capacity of (electronic) IP routers? Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University
Block-Based Packet Buffer with Deterministic Packet Departures Hao Wang and Bill Lin University of California, San Diego HSPR 2010, Dallas.
The Fork-Join Router Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford University
A Load Balanced Switch with an Arbitrary Number of Linecards I.Keslassy, S.T.Chuang, N.McKeown ( CSL, Stanford University ) Some slides adapted from authors.
A Load-Balanced Switch with an Arbitrary Number of Linecards Offense Anwis Das.
Network layer (addendum) Slides adapted from material by Nick McKeown and Kevin Lai.
1 Building big router from lots of little routers Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford University.
Packet Forwarding.
Addressing: Router Design
Parallelism in Network Systems Joint work with Sundar Iyer
CS 740: Advance Computer Networks Hand-out on Router Design
Advance Computer Networking
Presentation transcript:

Using Load-Balancing To Build High-Performance Routers Isaac Keslassy Ph.D. Oral Examination Department of Electrical Engineering Stanford University

2 R R R R R R Typical Router Architecture Input Switch Fabric Scheduler Output

3  Traffic matrix:  Uniform traffic matrix: λ ij = λ Definitions: Traffic Matrix R R R R R R 1 N i 1 N j

4  100% throughput: for any traffic matrix of row and column sum less than R, λ ij < μ ij Definitions: 100% Throughput R R R R R R 1 N i 1 N j

5 Router Wish List Scale to High Linecard Speeds  No Centralized Scheduler  Optical Switch Fabric  Low Packet-Processing Complexity Scale to High Number of Linecards  High Number of Linecards  Arbitrary Arrangement of Linecards Provide Performance Guarantees  100% Throughput Guarantee  Delay Guarantee  No Packet Reordering

6 Stanford 100Tb/s Router  “Optics in Routers” project   Some challenging numbers:  100Tb/s  160Gb/s linecards  640 linecards

7 In Out R R R R R R Router capacity = NR Switch capacity = N 2 R 100% Throughput in a Mesh Fabric ? ? ? ? ? ? ? ? ? R R R R R R R R R R R R R

8 R In Out R R R R R R/N If Traffic Is Uniform R R

9 Real Traffic is Not Uniform R In Out R R R R R R/N R R R R R R R R R ?

10 Out R R R R/N Load-Balanced Switch Load-balancing stageForwarding stage In Out R R R R/N R R R 100% throughput for weakly mixing traffic (Valiant, C.-S. Chang)

11 Out R R R R/N In R R R R/N Load-Balanced Switch

12 Out R R R R/N In R R R R/N Load-Balanced Switch

13 Out R R R R/N In R R R R/N Intuition: 100% Throughput  Arrivals to second mesh:  Capacity of second mesh:  Second mesh: arrival rate < service rate

14 Router Wish List Scale to High Linecard Speeds  No Centralized Scheduler  Optical Switch Fabric  Low Packet-Processing Complexity Scale to High Number of Linecards  High Number of Linecards  Arbitrary Arrangement of Linecards Provide Performance Guarantees  100% Throughput Guarantee  Delay Guarantee  No Packet Reordering ?

15 Out R R R R/N In R R R R/N Packet Reordering 1 2

16 Out R R R R/N In R R R R/N Bounding Delay Difference Between Middle Ports 1 2

17 Out R R R R/N In R R R R/N UFS (Uniform Frame Spreading) 1 2

18 Out R R R R/N In R R R R/N FOFF (Full Ordered Frames First) 1 2

19 FOFF (Full Ordered Frames First)  Input Algorithm  N FIFO queues corresponding to the N output flows  Spread each flow uniformly: if last packet was sent to middle port k, send next to k+1.  Every N time-slots, pick a flow: - If full frame exists, pick it and spread like UFS - Else if all frames are partial, pick one in round-robin order and send it N

20 Out R R R R/N In R R R R/N Bounding Reordering 1 2 3

21 FOFF  Output properties  N FIFO queues corresponding to the N middle ports  Buffer size less than N 2 packets  If there are N 2 packets, one of the head-of-line packets is in order Output 4 N

22 FOFF Properties  Property 1: FOFF maintains packet order.  Property 2: FOFF has O(1) complexity.  Property 3: Congestion buffers operate independently.  Property 4: FOFF maintains an average packet delay within constant from ideal output-queued router.  Corollary: FOFF has 100% throughput for any adversarial traffic.

23 In Out R R R R R R Output-Queued Router ? ? ? ? ? ? ? ? ? R R R R R R R R R R R R R

24 Router Wish List Scale to High Linecard Speeds  No Centralized Scheduler  Optical Switch Fabric  Low Packet-Processing Complexity Scale to High Number of Linecards  High Number of Linecards  Arbitrary Arrangement of Linecards Provide Performance Guarantees  100% Throughput Guarantee  Delay Guarantee  No Packet Reordering

25 Out R R R R/N In R R R R/N From Two Meshes to One Mesh One linecard In Out

26 From Two Meshes to One Mesh First mesh In Out In Out In Out In Out One linecard Second mesh R R R R R

27 From Two Meshes to One Mesh Combined mesh In Out In Out In Out In Out 2R R

28 Many Fabric Options Options Space: Full uniform mesh Time: Round-robin crossbar Wavelength: Static WDM Any spreading device C 1, C 2, …, C N C1C1 C2C2 C3C3 CNCN In Out In Out In Out In Out N channels each at rate 2R/N One linecard

29 AWGR (Arrayed Waveguide Grating Router) A Passive Optical Component  Wavelength i on input port j goes to output port (i+j-1) mod N  Can shuffle information from different inputs  1,  2 …  N NxN AWGR Linecard 1 Linecard 2 Linecard N  1  2  N Linecard 1 Linecard 2 Linecard N

30 In Out In Out In Out In Out Static WDM Switching: Packaging AWGR Passive and Almost Zero Power A B C D A, B, C, D A, A, A, A B, B, B, B C, C, C, C D, D, D, D N WDM channels, each at rate 2R/N

31 Router Wish List Scale to High Linecard Speeds  No Centralized Scheduler  Optical Switch Fabric  Low Packet-Processing Complexity Scale to High Number of Linecards  High Number of Linecards  Arbitrary Arrangement of Linecards Provide Performance Guarantees  100% Throughput Guarantee  Delay Guarantee  No Packet Reordering

32 Scaling Problem  For N < 64, an AWGR is a good solution.  We want N = 640.  Need to decompose.

33 A Different Representation of the Mesh In Out In Out In Out In Out R 2R Mesh 2R In Out In Out In Out In Out R 2R R

34 A Different Representation of the Mesh In Out In Out In Out In Out R In Out In Out In Out In Out R 2R/N

Example: N= R/8

36 When N is Too Large Decompose into groups (or racks) 4R/4 2R2R2R2R R2R 2R2R R

37 When N is Too Large Decompose into groups (or racks) 12L 2R 12L Group/Rack 1 Group/Rack G 12L 2R Group/Rack 1 12L 2R Group/Rack G 2RL 2RL/G

38 Router Wish List Scale to High Linecard Speeds  No Centralized Scheduler  Optical Switch Fabric  Low Packet-Processing Complexity Scale to High Number of Linecards  High Number of Linecards  Arbitrary Arrangement of Linecards Provide Performance Guarantees  100% Throughput Guarantee  Delay Guarantee  No Packet Reordering

39 When Linecards Fail 12L 2R 12L Group/Rack 1 Group/Rack G 12L 2R Group/Rack 1 12L 2R Group/Rack G 2RL 2RL/G 2RL Solution: replace mesh with sum of permutations = + + 2RL/G ≤ 2RL 2RL/G G *

40 Hybrid Electro-Optical Architecture Using MEMS Switches 12L 2R 12L Group/Rack 1 Group/Rack G 12L 2R Group/Rack 1 12L 2R Group/Rack G MEMS Switch MEMS Switch

41 When Linecards Fail 12L 2R 12L Group/Rack 1 Group/Rack G 12L 2R Group/Rack 1 12L 2R Group/Rack G MEMS Switch MEMS Switch

42 Fiber Link Capacity 12L 2R 12L Group/Rack 1 Group/Rack G 12L 2R Group/Rack 1 12L 2R Group/Rack G MEMS Switch MEMS Switch MEMS Switch Link Capacity ≈ 64 λ’s * 5 Gb/s/λ = 320 Gb/s = 2R      Laser/ Modulator    MUX

43 Group/Rack R 4R Group/Rack R 4R Example 2 Groups of 2 Linecards 12 2R Group/Rack R Group/Rack 2 4R 2R

44  Theorem: M≡L+G-1 MEMS switches are sufficient for bandwidth. Number of MEMS Switches  Examples:  G groups, L i linecards in group i,

45 Group A 1 2 2R 4R Group B 12 2R 4R Packet Schedule 12 2R Group A 12 2R Group B 4R 2R

46 At each time-slot:  Each transmitting linecard sends one packet  Each receiving linecard receives one packet  (MEMS constraint) Each transmitting group i sends at most one packet to each receiving group j through each MEMS connecting them In a schedule of N time-slots:  Each transmitting linecard sends exactly one packet to each receiving linecard Rules for Packet Schedule

47 Packet Schedule T+1T+2T+3T+4 Tx LC A1???? Tx LC A2???? Tx LC B1???? Tx LC B2???? Tx Group A Tx Group B

48 Packet Schedule T+1T+2T+3T+4 Tx LC A1A1A2B1B2 Tx LC A2B2A1A2B1 Tx LC B1B1B2A1A2 Tx LC B2A2B1B2A1 Tx Group A Tx Group B

49 Bad Packet Schedule T+1T+2T+3T+4 Tx LC A1A1A2B1B2 Tx LC A2B2A1A2B1 Tx LC B1B1B2A1A2 Tx LC B2A2B1B2A1 Tx Group A Tx Group B

50 Group Schedule T+1T+2T+3T+4 Tx Group AAB Tx Group BAB

51 Good Packet Schedule T+1T+2T+3T+4 Tx LC A1A1A2B1B2 Tx LC A2B2B1A2A1 Tx LC B1B1B2A1A2 Tx LC B2A2A1B2B1  Theorem: There exists a polynomial-time algorithm that finds the correct packet schedule. Tx Group A Tx Group B

52 Router Wish List Scale to High Linecard Speeds  No Centralized Scheduler  Optical Switch Fabric  Low Packet-Processing Complexity Scale to High Number of Linecards  High Number of Linecards  Arbitrary Arrangement of Linecards Provide Performance Guarantees  100% Throughput Guarantee  Delay Guarantee  No Packet Reordering

53 Summary  The load-balanced switch  Does not need any centralized scheduling  Can use a mesh  Using FOFF  It keeps packets in order  It guarantees 100% throughput  Using the hybrid electro-optical architecture  It scales to high port numbers  It tolerates linecard failure

54 Summary of Contributions  Load-Balanced Switch  I. Keslassy and N. McKeown, “Maintaining Packet Order in Two- Stage Switches,” Proceedings of IEEE Infocom '02, New York, June  I. Keslassy, S.-T. Chuang, K. Yu, D. Miller, M. Horowitz, O. Solgaard and N. McKeown, “Scaling Internet Routers Using Optics,” ACM SIGCOMM '03, Karlsruhe, Germany, August Also in Computer Communication Review, vol. 33, no. 4, p. 189, October  I. Keslassy, S.-T. Chuang and N. McKeown, “A Load-Balanced Switch with an Arbitrary Number of Linecards,” to appear in Proceedings of IEEE Infocom ’04, Hong Kong, March  I. Keslassy, C.-S. Chang, N. McKeown and D.-S. Lee, “Maximizing the Throughput of Fixed Interconnection Networks,” in preparation.

55 Summary of Contributions  Packet-Switch Scheduling  I. Keslassy and N. McKeown, “Analysis of Scheduling Algorithms That Provide 100% Throughput in Input-Queued Switches,” Proceedings of the 39th Annual Allerton Conference on Communication, Control, and Computing, Monticello, Illinois, October  I. Keslassy, M. Kodialam, T. V. Lakshman and D. Stiliadis, “On Guaranteed Smooth Scheduling for Input-Queued Switches,” Proceedings of IEEE Infocom '03, San Francisco, California, April  I. Keslassy, R. Zhang-Shen and N. McKeown, “Maximum Size Matching is Unstable for Any Packet Switch,” IEEE Communications Letters, Vol. 7, No. 10, pp , Oct  I. Keslassy, M. Kodialam, T. V. Lakshman and D. Stiliadis, “On Guaranteed Smooth Scheduling for Input-Queued Switches,” submitted to IEEE/ACM Transactions on Networking.

56 Summary of Contributions  Scheduling in Optical Networks  I. Keslassy, M. Kodialam, T. V. Lakshman and D. Stiliadis, “Scheduling Schemes for Delay Graphs with Applications to Optical Packet Networks,” to appear in Proceedings of IEEE HPSR ’04, Phoenix, Arizona, April  Scheduling in Wireless Networks  I. Keslassy, M. Kodialam and T. V. Lakshman, “Faster Algorithms for Minimum-Energy Scheduling of Wireless Data Transmissions,” Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks (WiOpt '03), INRIA Sophia-Antipolis, France, March 2003.

57 Summary of Contributions  Router Buffer Sizing  G. Appenzeller, I. Keslassy and N. McKeown, “Sizing Router Buffers,” submitted to ACM SIGCOMM ’04.  Image Classification  I. Keslassy, M. Kalman, D. Wang, and B. Girod, “Classification of Compound Images Based on Transform Coefficient Likelihood,” Proceedings of the International Conference on Image Processing (ICIP '01), Thessaloniki, Greece, October 2001.

58 Merci !  Nick McKeown  Balaji Prabhakar  Mark Horowitz, David Miller, Olav Solgaard  John and Kate Wakerly (Stanford Graduate Fellowship)  SNRC, DARPA/MARCO, Cisco, NSF  Da  Rui and Nandita  Group Members: Gireesh, Greg, Guido, Martin, Masayoshi, Matthew, Mingjie, Pablo, Sundar, Theresa, Yashar  Friends and Colleagues: Abtin, Alan, Allen, Amalia, Amelia, Anamaya, Ananthan, Arjun, Athina, Bill, Brian, Chang, Chandra, Changhua, Chao-Kai, Chao-Lin, Christine, Christophe, Damon, Dana, Daniel, Danny, David, Denise, Derek, Devavrat, Dimitri, Elif, Emilio, Eric, Flavio, Giulio, Hanna, In-Sung, Ingrid, Joachim, Jonathan, Ken, Kevin, Kostas, Kyoungsik, Lakshman, Laurence, Lizzi, Marcy, Marissa, Mark, Maureen, Max-David, Mayank, Milind, Mina, Mohsen, Murali, Myles, Nathan, Neda, Neha, Nick, Ofer, Paolo, Pascal, Paul, Peter, Prashanth, Rivi, Rong, Ruben, Ryan, Sam, Sylvia, Tali, Vinayak, Vincent, Yoav, … and the audience!  In memory of my departed grandparents Z’’L.  To My Family: Mamie, Papa, Maman, Michael  and the numerous cousins…

Thank you.