Download presentation
Presentation is loading. Please wait.
Published byJune Todd Modified over 9 years ago
1
Using Load-Balancing To Build High-Performance Routers Isaac Keslassy Ph.D. Oral Examination Department of Electrical Engineering Stanford University
2
2 R R R R R R Typical Router Architecture Input Switch Fabric Scheduler Output 1 1 2 2 1 1
3
3 Traffic matrix: Uniform traffic matrix: λ ij = λ Definitions: Traffic Matrix R R R R R R 1 N i 1 N j
4
4 100% throughput: for any traffic matrix of row and column sum less than R, λ ij < μ ij Definitions: 100% Throughput R R R R R R 1 N i 1 N j
5
5 Router Wish List Scale to High Linecard Speeds No Centralized Scheduler Optical Switch Fabric Low Packet-Processing Complexity Scale to High Number of Linecards High Number of Linecards Arbitrary Arrangement of Linecards Provide Performance Guarantees 100% Throughput Guarantee Delay Guarantee No Packet Reordering
6
6 Stanford 100Tb/s Router “Optics in Routers” project http://yuba.stanford.edu/or/ Some challenging numbers: 100Tb/s 160Gb/s linecards 640 linecards
7
7 In Out R R R R R R Router capacity = NR Switch capacity = N 2 R 100% Throughput in a Mesh Fabric ? ? ? ? ? ? ? ? ? R R R R R R R R R R R R R
8
8 R In Out R R R R R R/N If Traffic Is Uniform R R
9
9 Real Traffic is Not Uniform R In Out R R R R R R/N R R R R R R R R R ?
10
10 Out R R R R/N Load-Balanced Switch Load-balancing stageForwarding stage In Out R R R R/N R R R 100% throughput for weakly mixing traffic (Valiant, C.-S. Chang)
11
11 Out R R R R/N In R R R R/N 1 1 2 2 3 3 Load-Balanced Switch
12
12 Out R R R R/N In R R R R/N 3 3 2 2 1 1 Load-Balanced Switch
13
13 Out R R R R/N In R R R R/N Intuition: 100% Throughput Arrivals to second mesh: Capacity of second mesh: Second mesh: arrival rate < service rate
14
14 Router Wish List Scale to High Linecard Speeds No Centralized Scheduler Optical Switch Fabric Low Packet-Processing Complexity Scale to High Number of Linecards High Number of Linecards Arbitrary Arrangement of Linecards Provide Performance Guarantees 100% Throughput Guarantee Delay Guarantee No Packet Reordering ?
15
15 Out R R R R/N In R R R R/N Packet Reordering 1 2
16
16 Out R R R R/N In R R R R/N Bounding Delay Difference Between Middle Ports 1 2
17
17 Out R R R R/N In R R R R/N 1 2 3 UFS (Uniform Frame Spreading) 1 2
18
18 Out R R R R/N In R R R R/N FOFF (Full Ordered Frames First) 1 2
19
19 FOFF (Full Ordered Frames First) Input Algorithm N FIFO queues corresponding to the N output flows Spread each flow uniformly: if last packet was sent to middle port k, send next to k+1. Every N time-slots, pick a flow: - If full frame exists, pick it and spread like UFS - Else if all frames are partial, pick one in round-robin order and send it 12 3 1 2 4 N
20
20 Out R R R R/N In R R R R/N Bounding Reordering 1 2 3
21
21 FOFF Output properties N FIFO queues corresponding to the N middle ports Buffer size less than N 2 packets If there are N 2 packets, one of the head-of-line packets is in order 11 1 2 2 3 3 3 Output 4 N
22
22 FOFF Properties Property 1: FOFF maintains packet order. Property 2: FOFF has O(1) complexity. Property 3: Congestion buffers operate independently. Property 4: FOFF maintains an average packet delay within constant from ideal output-queued router. Corollary: FOFF has 100% throughput for any adversarial traffic.
23
23 In Out R R R R R R Output-Queued Router ? ? ? ? ? ? ? ? ? R R R R R R R R R R R R R
24
24 Router Wish List Scale to High Linecard Speeds No Centralized Scheduler Optical Switch Fabric Low Packet-Processing Complexity Scale to High Number of Linecards High Number of Linecards Arbitrary Arrangement of Linecards Provide Performance Guarantees 100% Throughput Guarantee Delay Guarantee No Packet Reordering
25
25 Out R R R R/N In R R R R/N From Two Meshes to One Mesh One linecard In Out
26
26 From Two Meshes to One Mesh First mesh In Out In Out In Out In Out One linecard Second mesh R R R R R
27
27 From Two Meshes to One Mesh Combined mesh In Out In Out In Out In Out 2R R
28
28 Many Fabric Options Options Space: Full uniform mesh Time: Round-robin crossbar Wavelength: Static WDM Any spreading device C 1, C 2, …, C N C1C1 C2C2 C3C3 CNCN In Out In Out In Out In Out N channels each at rate 2R/N One linecard
29
29 AWGR (Arrayed Waveguide Grating Router) A Passive Optical Component Wavelength i on input port j goes to output port (i+j-1) mod N Can shuffle information from different inputs 1, 2 … N NxN AWGR Linecard 1 Linecard 2 Linecard N 1 2 N Linecard 1 Linecard 2 Linecard N
30
30 In Out In Out In Out In Out Static WDM Switching: Packaging AWGR Passive and Almost Zero Power A B C D A, B, C, D A, A, A, A B, B, B, B C, C, C, C D, D, D, D N WDM channels, each at rate 2R/N
31
31 Router Wish List Scale to High Linecard Speeds No Centralized Scheduler Optical Switch Fabric Low Packet-Processing Complexity Scale to High Number of Linecards High Number of Linecards Arbitrary Arrangement of Linecards Provide Performance Guarantees 100% Throughput Guarantee Delay Guarantee No Packet Reordering
32
32 Scaling Problem For N < 64, an AWGR is a good solution. We want N = 640. Need to decompose.
33
33 A Different Representation of the Mesh In Out In Out In Out In Out R 2R Mesh 2R In Out In Out In Out In Out R 2R R
34
34 A Different Representation of the Mesh In Out In Out In Out In Out R In Out In Out In Out In Out R 2R/N
35
35 1 2 3 4 Example: N=8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 2R/8
36
36 When N is Too Large Decompose into groups (or racks) 4R/4 2R2R2R2R 1 2 3 4 5 6 7 8 2R2R 2R2R 1 2 3 4 5 6 7 8 4R
37
37 When N is Too Large Decompose into groups (or racks) 12L 2R 12L Group/Rack 1 Group/Rack G 12L 2R Group/Rack 1 12L 2R Group/Rack G 2RL 2RL/G
38
38 Router Wish List Scale to High Linecard Speeds No Centralized Scheduler Optical Switch Fabric Low Packet-Processing Complexity Scale to High Number of Linecards High Number of Linecards Arbitrary Arrangement of Linecards Provide Performance Guarantees 100% Throughput Guarantee Delay Guarantee No Packet Reordering
39
39 When Linecards Fail 12L 2R 12L Group/Rack 1 Group/Rack G 12L 2R Group/Rack 1 12L 2R Group/Rack G 2RL 2RL/G 2RL Solution: replace mesh with sum of permutations = + + 2RL/G ≤ 2RL 2RL/G G *
40
40 Hybrid Electro-Optical Architecture Using MEMS Switches 12L 2R 12L Group/Rack 1 Group/Rack G 12L 2R Group/Rack 1 12L 2R Group/Rack G MEMS Switch MEMS Switch
41
41 When Linecards Fail 12L 2R 12L Group/Rack 1 Group/Rack G 12L 2R Group/Rack 1 12L 2R Group/Rack G MEMS Switch MEMS Switch
42
42 Fiber Link Capacity 12L 2R 12L Group/Rack 1 Group/Rack G 12L 2R Group/Rack 1 12L 2R Group/Rack G MEMS Switch MEMS Switch MEMS Switch Link Capacity ≈ 64 λ’s * 5 Gb/s/λ = 320 Gb/s = 2R Laser/ Modulator MUX
43
43 Group/Rack 1 1 2 2R 4R Group/Rack 2 12 2R 4R Example 2 Groups of 2 Linecards 12 2R Group/Rack 1 12 2R Group/Rack 2 4R 2R
44
44 Theorem: M≡L+G-1 MEMS switches are sufficient for bandwidth. Number of MEMS Switches Examples: G groups, L i linecards in group i,
45
45 Group A 1 2 2R 4R Group B 12 2R 4R Packet Schedule 12 2R Group A 12 2R Group B 4R 2R
46
46 At each time-slot: Each transmitting linecard sends one packet Each receiving linecard receives one packet (MEMS constraint) Each transmitting group i sends at most one packet to each receiving group j through each MEMS connecting them In a schedule of N time-slots: Each transmitting linecard sends exactly one packet to each receiving linecard Rules for Packet Schedule
47
47 Packet Schedule T+1T+2T+3T+4 Tx LC A1???? Tx LC A2???? Tx LC B1???? Tx LC B2???? Tx Group A Tx Group B
48
48 Packet Schedule T+1T+2T+3T+4 Tx LC A1A1A2B1B2 Tx LC A2B2A1A2B1 Tx LC B1B1B2A1A2 Tx LC B2A2B1B2A1 Tx Group A Tx Group B
49
49 Bad Packet Schedule T+1T+2T+3T+4 Tx LC A1A1A2B1B2 Tx LC A2B2A1A2B1 Tx LC B1B1B2A1A2 Tx LC B2A2B1B2A1 Tx Group A Tx Group B
50
50 Group Schedule T+1T+2T+3T+4 Tx Group AAB Tx Group BAB
51
51 Good Packet Schedule T+1T+2T+3T+4 Tx LC A1A1A2B1B2 Tx LC A2B2B1A2A1 Tx LC B1B1B2A1A2 Tx LC B2A2A1B2B1 Theorem: There exists a polynomial-time algorithm that finds the correct packet schedule. Tx Group A Tx Group B
52
52 Router Wish List Scale to High Linecard Speeds No Centralized Scheduler Optical Switch Fabric Low Packet-Processing Complexity Scale to High Number of Linecards High Number of Linecards Arbitrary Arrangement of Linecards Provide Performance Guarantees 100% Throughput Guarantee Delay Guarantee No Packet Reordering
53
53 Summary The load-balanced switch Does not need any centralized scheduling Can use a mesh Using FOFF It keeps packets in order It guarantees 100% throughput Using the hybrid electro-optical architecture It scales to high port numbers It tolerates linecard failure
54
54 Summary of Contributions Load-Balanced Switch I. Keslassy and N. McKeown, “Maintaining Packet Order in Two- Stage Switches,” Proceedings of IEEE Infocom '02, New York, June 2002. I. Keslassy, S.-T. Chuang, K. Yu, D. Miller, M. Horowitz, O. Solgaard and N. McKeown, “Scaling Internet Routers Using Optics,” ACM SIGCOMM '03, Karlsruhe, Germany, August 2003. Also in Computer Communication Review, vol. 33, no. 4, p. 189, October 2003. I. Keslassy, S.-T. Chuang and N. McKeown, “A Load-Balanced Switch with an Arbitrary Number of Linecards,” to appear in Proceedings of IEEE Infocom ’04, Hong Kong, March 2004. I. Keslassy, C.-S. Chang, N. McKeown and D.-S. Lee, “Maximizing the Throughput of Fixed Interconnection Networks,” in preparation.
55
55 Summary of Contributions Packet-Switch Scheduling I. Keslassy and N. McKeown, “Analysis of Scheduling Algorithms That Provide 100% Throughput in Input-Queued Switches,” Proceedings of the 39th Annual Allerton Conference on Communication, Control, and Computing, Monticello, Illinois, October 2001. I. Keslassy, M. Kodialam, T. V. Lakshman and D. Stiliadis, “On Guaranteed Smooth Scheduling for Input-Queued Switches,” Proceedings of IEEE Infocom '03, San Francisco, California, April 2003. I. Keslassy, R. Zhang-Shen and N. McKeown, “Maximum Size Matching is Unstable for Any Packet Switch,” IEEE Communications Letters, Vol. 7, No. 10, pp. 496-498, Oct. 2003. I. Keslassy, M. Kodialam, T. V. Lakshman and D. Stiliadis, “On Guaranteed Smooth Scheduling for Input-Queued Switches,” submitted to IEEE/ACM Transactions on Networking.
56
56 Summary of Contributions Scheduling in Optical Networks I. Keslassy, M. Kodialam, T. V. Lakshman and D. Stiliadis, “Scheduling Schemes for Delay Graphs with Applications to Optical Packet Networks,” to appear in Proceedings of IEEE HPSR ’04, Phoenix, Arizona, April 2004. Scheduling in Wireless Networks I. Keslassy, M. Kodialam and T. V. Lakshman, “Faster Algorithms for Minimum-Energy Scheduling of Wireless Data Transmissions,” Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks (WiOpt '03), INRIA Sophia-Antipolis, France, March 2003.
57
57 Summary of Contributions Router Buffer Sizing G. Appenzeller, I. Keslassy and N. McKeown, “Sizing Router Buffers,” submitted to ACM SIGCOMM ’04. Image Classification I. Keslassy, M. Kalman, D. Wang, and B. Girod, “Classification of Compound Images Based on Transform Coefficient Likelihood,” Proceedings of the International Conference on Image Processing (ICIP '01), Thessaloniki, Greece, October 2001.
58
58 Merci ! Nick McKeown Balaji Prabhakar Mark Horowitz, David Miller, Olav Solgaard John and Kate Wakerly (Stanford Graduate Fellowship) SNRC, DARPA/MARCO, Cisco, NSF Da Rui and Nandita Group Members: Gireesh, Greg, Guido, Martin, Masayoshi, Matthew, Mingjie, Pablo, Sundar, Theresa, Yashar Friends and Colleagues: Abtin, Alan, Allen, Amalia, Amelia, Anamaya, Ananthan, Arjun, Athina, Bill, Brian, Chang, Chandra, Changhua, Chao-Kai, Chao-Lin, Christine, Christophe, Damon, Dana, Daniel, Danny, David, Denise, Derek, Devavrat, Dimitri, Elif, Emilio, Eric, Flavio, Giulio, Hanna, In-Sung, Ingrid, Joachim, Jonathan, Ken, Kevin, Kostas, Kyoungsik, Lakshman, Laurence, Lizzi, Marcy, Marissa, Mark, Maureen, Max-David, Mayank, Milind, Mina, Mohsen, Murali, Myles, Nathan, Neda, Neha, Nick, Ofer, Paolo, Pascal, Paul, Peter, Prashanth, Rivi, Rong, Ruben, Ryan, Sam, Sylvia, Tali, Vinayak, Vincent, Yoav, … and the audience! In memory of my departed grandparents Z’’L. To My Family: Mamie, Papa, Maman, Michael and the numerous cousins…
59
Thank you.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.