Jon Turner (and a cast of thousands) Washington University Design of a High Performance Active Router Active Nets PI Meeting - 12/01
2 - Jonathan Turner - December 5, 2001 Switch Fabric IPPOPP SPC TI IPPOPP SPC TI IPPOPP SPC TI IPPOPP SPC TI IPPOPP SPC TI IPPOPP SPC TI Control Processor Washington University Active Router Smart Port Card Sys. FPGA 64 MB Pentium Cache North Bridge APIC ATM Switch Core Transmisson InterfacesEmbedded Processors Control Processor global coordination & control routing protocols build routing tables and otherinformation needed by SPCs active plugin code server
3 - Jonathan Turner - December 5, 2001 SPC Software Architecture Gen. Filters Flow & Route Lookup... virtual output queues... Plugin Control plugins Input Side Processing Distributed Queueing Gen. Filters Flow Lookup output queues... Plugin Control plugins Rate Control... reassembly queues Output Side Processing
4 - Jonathan Turner - December 5, 2001 SPC Throughput - Packets Per Second
5 - Jonathan Turner - December 5, 2001 Comparison with SPC 2
6 - Jonathan Turner - December 5, 2001 SPC Throughput - Mb/s
7 - Jonathan Turner - December 5, 2001 SPC Throughput vs. Packet Length
8 - Jonathan Turner - December 5, 2001 Distributed Queueing Switch Fabric TI IOIOIO IO IO IO Control Processor Routing Sched. Routing Sched. Routing Sched. Routing Sched. Routing Sched. Routing Sched. queue per output periodic queue length reports Scheduler paces each queue according to backlog share
9 - Jonathan Turner - December 5, 2001 Distributed Queueing Algorithm Goal: avoid switch congestion and output queue underflow. Let hi(i,j) be input i’s share of input-side backlog to output j. »can avoid switch congestion by sending from input i to output j at rate L S hi(i,j) »where L is external link rate and S is switch speedup Let lo(i,j) be input i’s share of total backlog for output j. »can avoid underflow of queue at output j by sending from input i to output j at rate L lo(i,j) »this works if L (lo(i,1)+···+lo(i,n)) L S for all i Let wt(i,j) be the ratio of lo(i,j) to lo(i,1)+···+lo(i,n). Let rate(i,j)=L S lo(wt(i,j),hi(i,j)). Note: algorithm avoids congestion and for avoids underflow for large enough S. »what is the smallest value of S for which underflow cannot occur?
10 - Jonathan Turner - December 5, 2001 Stress Test
11 - Jonathan Turner - December 5, 2001 Stress Test Simulation - Min Rates
12 - Jonathan Turner - December 5, 2001 Stress Test Simulation - Actual Rates
13 - Jonathan Turner - December 5, 2001 Stress Test Simulation - Backlog
14 - Jonathan Turner - December 5, 2001 Stress Test Measurement Results
15 - Jonathan Turner - December 5, 2001 Switch Fabric IPPOPP FPX SPC TI IPPOPP FPX SPC TI IPPOPP FPX SPC TI IPPOPP FPX SPC TI IPPOPP FPX SPC TI IPPOPP FPX SPC TI Control Processor Reconfigurable Hardware Extension Field Programmable Port Extenders Field Programmable Port Ext. Network Interface Device Reprogrammable Application Device SDRAM 128 MB SRAM 4 MB
16 - Jonathan Turner - December 5, 2001 Switch Fabric IPPOPP FPX SPC TI IPPOPP FPX SPC TI IPPOPP FPX SPC TI IPPOPP FPX SPC TI IPPOPP FPX SPC TI IPPOPP FPX SPC TI Control Processor Active Packet Processing Smart Port Card Sys. FPGA MB Pentium Cache North Bridge APIC
17 - Jonathan Turner - December 5, 2001 Logical Port Architecture Gen. Filters Flow Lookup active flow queues return queues... output queues... PCU plugins SPC FPX Output Side Processing Gen. Filters Flow & Route Lookup active flow queues return queues... virtual output queues... PCU plugins SPC FPX Input Side Processing
18 - Jonathan Turner - December 5, 2001 Fast IP Lookup (Eatherton & Dittia) Multibit trie with clever data encoding. »small memory requirements (4-6 bytes per prefix typical) »small memory bandwidth, simple lookup yields fast lookup rates »updates have negligible impact on lookup performance Avoid impact of external memory latency on throughput by interleaving several concurrent lookups. »8 lookup engine config. uses about 10% of Virtex 1000E logic cells address: , *010,001, * 1, internal bit vector external bit vector
19 - Jonathan Turner - December 5, 2001 Lookup Throughput & Latency linear throughput gain negligible latency increase
20 - Jonathan Turner - December 5, 2001 Update Performance reasonable update rates have little impact 1 update every 10 s
21 - Jonathan Turner - December 5, 2001 Performance of Combined Traffic
22 - Jonathan Turner - December 5, 2001 Summmary and Status Latest version of SPC software nearly complete. »additional testing of distributed queueing »testing of new output queueing subsystem - QSDRR »porting active applications to new plugin environment SPC2 almost ready for production. »finalizing details of PC board schematic and layout »overload performance testing on development system Completion of FPX design & integration with SPC. »low level debugging of FPX interface circuit »distributed queueing implementation in FPX »FIPL extension for flow classification »enhance active flow, output queueing subsystems