Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Challenges in Modern Multi-Tera- bit Class Switch Design.

Slides:



Advertisements
Similar presentations
1 Maintaining Packet Order in Two-Stage Switches Isaac Keslassy, Nick McKeown Stanford University.
Advertisements

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 High Speed Router Design Shivkumar Kalyanaraman Rensselaer Polytechnic Institute
Router Architecture : Building high-performance routers Ian Pratt
Scaling Internet Routers Using Optics UW, October 16 th, 2003 Nick McKeown Joint work with research groups of: David Miller, Mark Horowitz, Olav Solgaard.
May 28th, 2002Nick McKeown 1 Scaling routers: Where do we go from here? HPSR, Kobe, Japan May 28 th, 2002 Nick McKeown Professor of Electrical Engineering.
MEMS and its Applications Optical Routing, an example Shashi Mysore Computer Science UCSB.
Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University The Load-Balanced Router.
A Scalable Switch for Service Guarantees Bill Lin (University of California, San Diego) Isaac Keslassy (Technion, Israel)
Making Parallel Packet Switches Practical Sundar Iyer, Nick McKeown Departments of Electrical Engineering & Computer Science,
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 High Speed Router Design Shivkumar Kalyanaraman Rensselaer Polytechnic Institute
1 Circuit Switching in the Core OpenArch April 5 th 2003 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University
Scaling Internet Routers Using Optics Producing a 100TB/s Router Ashley Green and Brad Rosen February 16, 2004.
1 Architectural Results in the Optical Router Project Da Chuang, Isaac Keslassy, Nick McKeown High Performance Networking Group
1 OR Project Group II: Packet Buffer Proposal Da Chuang, Isaac Keslassy, Sundar Iyer, Greg Watson, Nick McKeown, Mark Horowitz
Big, Fast Routers Dave Andersen CMU CS
Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.
1 ENTS689L: Packet Processing and Switching Buffer-less Switch Fabric Architectures Buffer-less Switch Fabric Architectures Vahid Tabatabaee Fall 2006.
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion The.
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Scaling.
Scaling Internet Routers Using Optics Isaac Keslassy, Shang-Tse Da Chuang, Kyoungsik Yu, David Miller, Mark Horowitz, Olav Solgaard, Nick McKeown Department.
1 Internet Routers Stochastics Network Seminar February 22 nd 2002 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University.
EE 122: Router Design Kevin Lai September 25, 2002.
IEE, October 2001Nick McKeown1 High Performance Routers Slides originally by Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford.
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Introduction.
Nick McKeown 1 Memory for High Performance Internet Routers Micron February 12 th 2003 Nick McKeown Professor of Electrical Engineering and Computer Science,
1 EE384Y: Packet Switch Architectures Part II Load-balanced Switches Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University.
1 Trend in the design and analysis of Internet Routers University of Pennsylvania March 17 th 2003 Nick McKeown Professor of Electrical Engineering and.
August 20 th, A 2.5Tb/s LCS Switch Core Nick McKeown Costas Calamvokis Shang-tse Chuang Accelerating The Broadband Revolution P M C - S I E R R.
1 Growth in Router Capacity IPAM, Lake Arrowhead October 2003 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University.
Can Google Route? Building a High-Speed Switch from Commodity Hardware Guido Appenzeller, Matthew Holliman Q2/2002.
Router Design (Nick Feamster) February 11, Today’s Lecture The design of big, fast routers Partridge et al., A 50 Gb/s IP Router Design constraints.
1 IP routers with memory that runs slower than the line rate Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford.
Computer Networks Switching Professor Hui Zhang
Lecture Note on Network Processors. What Is a Network Processor? Processor optimized for processing communications related tasks. Often implemented with.
Nick McKeown CS244 Lecture 7 Valiant Load Balancing.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
Optics in Internet Routers Mark Horowitz, Nick McKeown, Olav Solgaard, David Miller Stanford University
How Emerging Optical Technologies will affect the Future Internet NSF Meeting, 5 Dec, 2005 Nick McKeown Stanford University
TO p. 1 Spring 2006 EE 5304/EETS 7304 Internet Protocols Tom Oh Dept of Electrical Engineering Lecture 9 Routers, switches.
1 Optical Burst Switching (OBS). 2 Optical Internet IP runs over an all-optical WDM layer –OXCs interconnected by fiber links –IP routers attached to.
Advance Computer Networking L-8 Routers Acknowledgments: Lecture slides are from the graduate level Computer Networks course thought by Srinivasan Seshan.
Designing Packet Buffers for Internet Routers Friday, October 23, 2015 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford.
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 ECSE-6600: Internet Protocols Informal Quiz #14 Shivkumar Kalyanaraman: GOOGLE: “Shiv RPI”
Applied research laboratory 1 Scaling Internet Routers Using Optics Isaac Keslassy, et al. Proceedings of SIGCOMM Slides:
ISLIP Switch Scheduler Ali Mohammad Zareh Bidoki April 2002.
1 Performance Guarantees for Internet Routers ISL Affiliates Meeting April 4 th 2002 Nick McKeown Professor of Electrical Engineering and Computer Science,
1 Router Design Bruce Davie with help from Hari Balakrishnan & Nick McKeown.
Winter 2006EE384x Handout 11 EE384x: Packet Switch Architectures Handout 1: Logistics and Introduction Professor Balaji Prabhakar
Opticomm 2001Nick McKeown1 Do Optics Belong in Internet Core Routers? Keynote, Opticomm 2001 Denver, Colorado Nick McKeown Professor of Electrical Engineering.
IEE, October 2001Nick McKeown1 High Performance Routers IEE, London October 18 th, 2001 Nick McKeown Professor of Electrical Engineering and Computer Science,
Lecture Note on Switch Architectures. Function of Switch.
1 A quick tutorial on IP Router design Optics and Routing Seminar October 10 th, 2000 Nick McKeown
1 How scalable is the capacity of (electronic) IP routers? Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University
Packet Switch Architectures The following are (sometimes modified and rearranged slides) from an ACM Sigcomm 99 Tutorial by Nick McKeown and Balaji Prabhakar,
The Fork-Join Router Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford University
A Load Balanced Switch with an Arbitrary Number of Linecards I.Keslassy, S.T.Chuang, N.McKeown ( CSL, Stanford University ) Some slides adapted from authors.
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
Network layer (addendum) Slides adapted from material by Nick McKeown and Kevin Lai.
1 Building big router from lots of little routers Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford University.
Buffer Management and Arbiter in a Switch
Weren’t routers supposed
Addressing: Router Design
Architecture & Organization 1
CS 740: Advance Computer Networks Hand-out on Router Design
Architecture & Organization 1
Advance Computer Networking
EE 122: Lecture 7 Ion Stoica September 18, 2001.
Project proposal: Questions to answer
Techniques and problems for
Presentation transcript:

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Challenges in Modern Multi-Tera- bit Class Switch Design

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 2 Tbps System Architecture switch core network processor ingress buffer egress buffer data ingress FC data egress FC input line 1 output line 1 OC-x itf network processor ingress buffer egress buffer input line 1 output line 1 OC-x itf switch fabric interface chips switch fabric line card 1 line card N iRT eRT

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 3 Trend: Single POP routers q Very high capacity (10+Tb/s) q Line-rates T1 to OC768 Reasons: q Big multi-rack router more efficient than many single-rack routers, q Easier to manage fewer routers. q Power requirements easier to meet

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 4 Multi Tbps Systems: Goals… q Design of a terabit-class system q Several Tb/s aggregate throughput q 2.5 Tb/s: 256x256 OC-192 or 64x64 OC- 768 q OEM q Achieve wide coverage of application spectrum q Single-stage q Electronic fabric

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 5 Trends & Consequences CPU Instructions per minimum length packet 1 Disparity between traffic and router growth 2 1Tb/s Router capacity 2x every 18 months Traffic 2x every year 100Tb/s Consequences: 1.Per-packet processing is getting harder. 2.Efficient, simple processing will become more important. 3.Routers will get faster, simpler and more efficient. (Weren’t they supposed to simple in the first place?)

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 6 Trends and Consequences (2) Power consumption is out of control 3 Disparity between line-rate and memory access time 4 Consequences: 1.Power efficiency will continue to be important. 2.Memories will seem slower and slower. Are we just going to keep adding more parallelism?

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 7 What’s hard, what’s not q Linerate fowarding: q Linerate LPM was an issue for while. q Commercial TCAMs and algorithms available up to 100Gb/s. q 1M prefixes fit in corner of 90nm ASIC. q 2 32 addresses will fit in a $10 DRAM in 8 years q Packet buffering: q Not a problem up to about 10Gb/s; big problem above 10Gb/s. q Header processing: q For basic IPv4 operations: not a problem. q If we keep adding functions, it will be a problem. q More on this later…

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 8 What’s hard, what’s not (2) q Switching q If throughput doesn’t matter: q Easy: Lots of multistage, distributed or load- balanced switch fabrics. q If throughput matters: q Use crossbar, VOQs and centralized scheduler q Or multistage fabric and lots of speedup. q If throughput guarantee is required: q Maximal matching, VOQs and speedup of two [Dai & Prabhakar ‘00]; or q Load-balanced 2-stage switch [Chang 01; Sigcomm 03].

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 9 Memory: Buffers  Memory speed will matter more than size  Memory speed will remain a problem.  Waiting for slow off-chip memory will become intolerable.  Memory size will become less of an issue.  Memory Size  Packet buffers: Today they are too big; they’ll get smaller.

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 10 Switching: Myths about CIOQ-based crossbar switches 1. “Input-queued crossbars have low throughput” q An input-queued crossbar can have as high throughput as any switch. 2. “Crossbars don’t support multicast traffic well” q A crossbar inherently supports multicast efficiently. 3. “Crossbars don’t scale well” q Today, it is the number of chip I/Os, not the number of crosspoints, that limits the size of a switch fabric. Expect 5-10Tb/s crossbar switches.

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 11 Packet processing gets harder time Instructions per arriving byte What we’d like: (more features) QoS, Multicast, Security, … What will happen

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 12 Packet processing gets harder Clock cycles per minimum length packet since 1996

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 13 Power  Requirement q Do not exceed the per shelf (2 kW), per board (150W), and per chip (20W) budgets q Forced-air cooling, avoid hot-spots q More throughput at same power: Gb/s/W density is increasing q I/O uses an increasing fraction of power (> 50%) q Electrical I/O technology has not kept pace with capacity demand q Low-power, high-density I/O technology is a must q CMOS density increases faster than W/gate decreases q Functionality/chip constrained by power rather than density  Power determines the number of chips and boards q Architecture must be able to be distributed accordingly

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 14 Packaging  Requirement q NEBS compliance q Constrained by q Standard form factors q Power budget at chip, card, rack level q Switch core q Link, connector, chip packaging technology q Connector density (pins/inch) q CMOS density doubles, number of pins +5-10% per generation q This determines the maximum per-chip and per-card throughput q Line cards q Increasing port counts q Prevalent line rate granularity OC-192 (10 Gb/s) q 1 adapter/card  > 1 Tb/s systems require multi-rack solutions q Long cables instead of backplane (30 to 100m) q Interconnect accounts for large part of system cost

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 15 Packaging q 2.5 Tb/s, 1.6x speedup, 2.5 Gb/s links 8b/10b: 4000 links (diff. pairs)

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 16 Switch-Internal Round-Trip (RT) q Physical system size q Direct consequence of packaging q CMOS technology q Clock speeds increasing much slower than density q More parallelism required to increase throughput q Shrinking packet cycle q Line rates have up drastically (OC-3 through OC-768) q Minimum packet size has remained constant  Large round-trip (RT) in terms of min. packet duration q Can be (many) tens of packets per port q Used to be only a node-to-node issue, now also inside the node q System-wide clocking and synchronization Evolution of RT

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 17 Switch-Internal Round-Trip (RT) switch core network processor ingress buffer egress buffer data ingress FC data egress FC input line 1 output line 1 OC-x itf network processor ingress buffer egress buffer input line 1 output line 1 OC-x itf switch fabric interface chips switch fabric line card 1 line card N iRT eRT  Consequences q Performance impact? q All buffers must be scaled by RT q Fabric-internal flow control becomes an important issue iRT eRT

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 18 Physical Separation Separating Control & Data Linecard LCSLCS LCSLCS Switch Scheduler Switch Scheduler Switch Port 1: Req Req Control Channel Data Channel Switch Fabric Switch Fabric Buffer or Guard-Band Linecard measures RTT to ~1 cell time 2: Grant/credit Grant Time 3: Data

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 19 Speed-Up  Requirement q “Industry standard” 2x speed-up q Three flavors q Utilization: compensate SAR overhead q Performance: compensate scheduling inefficiencies q OQ speed-up: memory access time  Switch core speed-up S is very costly q Bandwidth is a scarce resource: COST and POWER q Core buffers must run S times faster q Core scheduler must run S times faster q Is it really needed? q SAR overhead reduction q Variable-length packet switching: hard to implement, but may be more cost-effective q Performance: does the gain in performance justify the increase in cost and power? q Depends on application q Low Internet utilization speed-up

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 20 Multicast  Requirement q Full multicast support q Many multicast groups, full link utilization, no blocking, QoS q Complicates everything q Buffering, queuing, scheduling, flow control, QoS  Sophisticated multicast support really needed? q Expensive q Often disabled in the field… q Complexity, billing, potential for abuse, etc. q Again, depends on application

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 21 Packet size  Requirement q Support very short packets (32-64B) q OC-768 = 8 ns q Short packet duration q Determines speed of control section q Queues and schedulers q Implies longer RT q Wider data paths  Do we have to switch short packets individually? q Aggregation techniques q Burst, envelope, container switching, “packing” q Single-stage, multi-path switches q Parallel packet switch

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 22 Increase Payload Size Packing Packets in “Cells” Cell Hdr Cell Hdr Cell Hdr Cell Hdr B178B 40B 128B cell payload 100B Packets all in same VOQ: Cell Hdr 234 Start pkt Pkt Len

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 23 Can optics help in switching? Buffered or Bufferless Fabric Arbitration Physical Layer Framing & Maintenance Packet Processing Buffer Mgmt & Scheduling Buffer Mgmt & Scheduling Buffer & State Memory Buffer & State Memory Typical IP Router Linecard Lookup Tables Optics Physical Layer Framing & Maintenance Packet Processing Buffer Mgmt & Scheduling Buffer Mgmt & Scheduling Buffer & State Memory Buffer & State Memory Typical IP Router Linecard Lookup Tables Optics Buffered or Bufferless Fabric Arbitration Physical Layer Framing & Maintenance Packet Processing Buffer Mgmt & Scheduling Buffer Mgmt & Scheduling Buffer & State Memory Buffer & State Memory Typical IP Router Linecard Lookup Tables Optics Physical Layer Framing & Maintenance Packet Processing Buffer Mgmt & Scheduling Buffer Mgmt & Scheduling Buffer & State Memory Buffer & State Memory Typical IP Router Linecard Lookup Tables Optics optical electrical Req/Grant

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 24 Can optics help? Cynical view: 1. A packet switch (e.g. an IP router) must have buffering. 2. Optical buffering is not feasible. 3. Therefore, optical routers are not feasible. 4. Hence, “optical switches” are circuit switches (e.g. TDM, space or Lambda switches).

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 25 Can optics help? Open-minded view: q Optics seem ill-suited to processing intensive functions, or where random access memory is required. q Optics seems well-suited to bufferless, reconfigurable datapaths.

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute Tb/s optical router Stanford University Research Project q Collaboration q 4 Professors at Stanford (Mark Horowitz, Nick McKeown, David Miller and Olav Solgaard), and our groups. q Objective q To determine the best way to incorporate optics into routers. q Push technology hard to expose new issues. q Photonics, Electronics, System design q Motivating example: The design of a 100 Tb/s Internet router q Challenging but not impossible (~100x current commercial systems) q It identifies some interesting research problems

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 27 Arbitration 160Gb/s 40Gb/s Optical Switch Line termination IP packet processing Packet buffering Line termination IP packet processing Packet buffering Gb/s Gb/s Electronic Linecard #1 Electronic Linecard #625 Request Grant (100Tb/s = 625 * 160Gb/s) 100Tb/s optical router

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 28 Research Problems q Linecard q Memory bottleneck: Address lookup and packet buffering. q Architecture q Arbitration: Computation complexity. q Switch Fabric q Optics: Fabric scalability and speed, q Electronics: Switch control and link electronics, q Packaging: Three surface problem.

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 29 b 160Gb/s Linecard: Packet Buffering q Problem q Packet buffer needs density of DRAM (40 Gbits) and speed of SRAM (2ns per packet) q Solution q Hybrid solution uses on-chip SRAM and off-chip DRAM. q Identified optimal algorithms that minimize size of SRAM (12 Mbits). q Precisely emulates behavior of 40 Gbit, 2ns SRAM. DRAM 160 Gb/s Queue Manager SRAM

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 30 The Arbitration Problem q A packet switch fabric is reconfigured for every packet transfer. q At 160Gb/s, a new IP packet can arrive every 2ns. q The configuration is picked to maximize throughput and not waste capacity. q Known algorithms are too slow.

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute Tb/s Router Optical Switch Fabric Racks of 160Gb/s Linecards Optical links

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 32 Racks with 160Gb/s linecards DRAM Queue Manager SRAM Lookup DRAM Queue Manager SRAM Lookup

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 33 Passive Optical Switching n n 1 2 n Midstage Linecard 1 Midstage Linecard 2 Midstage Linecard n Ingress Linecard 1 Ingress Linecard 2 Ingress Linecard n 1 2 n Egress Linecard 1 Egress Linecard 2 Egress Linecard n nn Integrated AWGR or diffraction grating based wavelength router

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 34 Question q Can we use an optical fabric at 100Tb/s with 100% throughput? q Conventional answer: No. q Need to reconfigure switch too often q 100% throughput requires complex electronic scheduler.

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 35 Out R R R R/N Two-stage load-balancing switch Load-balancing stageSwitching stage In Out R R R R/N R R R 100% throughput for weakly mixing, stochastic traffic. [C.-S. Chang, Valiant]

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 36 Optical two-stage router Phase 2 Phase 1 Lookup Buffer Lookup Buffer Lookup Buffer Linecards

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute Tb/s Load-Balanced Router L = Gb/s linecards Linecard Rack G = 40 L = Gb/s linecards Linecard Rack 1 L = Gb/s linecards x 40 MEMS Switch Rack < 100W

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 38 Predictions: Core Internet routers q The need for more capacity for a given power and volume budget will mean: q Fewer functions in routers: q Little or no optimization for multicast, q Continued over-provisioning will lead to little or no support for QoS, DiffServ, …, q Fewer unnecessary requirements: q Mis-sequencing will be tolerated, q Latency requirements will be relaxed. q Less programmability in routers, and hence no network processors (NPs used at edge…). q Greater use of optics to reduce power in switch.

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 39 Likely Events The need for capacity and reliability will mean: q Widespread replacement of core routers with transport switching based on circuits: q Circuit switches have proved simpler, more reliable, lower power, higher capacity and lower cost per Gb/s. Eventually, this is going to matter. q Internet will evolve to become edge routers interconnected by rich mesh of WDM circuit switches.

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 40 Summary q High speed routers: lookup, switching, classification, buffer management q Lookup: Range-matching, tries, multi-way tries q Switching: circuit s/w, crossbar, batcher-banyan, q Queuing: input/output queuing issues q Classification, Scheduling: … q Road ahead to 100 Tbps routers…