Download presentation
Presentation is loading. Please wait.
Published byGwendoline Morris Modified over 8 years ago
1
Jon Turner jst@cse.wustl.edu http://www.arl.wustl.edu/arl Extreme Networking Achieving Nonstop Network Operation Under Extreme Operating Conditions DARPA PI Meeting, January 27-29, 2003
2
2 - Jonathan Turner – January 27-29, 2003 Project Overview Motivation »data networks have become mission-critical resource »networks often subject to extreme traffic conditions »need to design networks for worst-case conditions »technology advances making extreme defenses practical Extreme network services »Lightweight Flow Setup (LFS) »Network Access Service (NAS) »Reserved Tree Service (RTS) Key router technology components »Super-Scalable Packet Scheduling (SPS) »Dynamic Queues with Auto-aggregation (DQA) »Scalable Distributed Queueing (SDQ)
3
3 - Jonathan Turner – January 27-29, 2003 Switch Fabric IPPOPP FPX SPC Line Card IPPOPP FPX SPC Line Card IPPOPP FPX SPC Line Card IPPOPP FPX SPC Line Card IPPOPP FPX SPC Line Card IPPOPP FPX SPC Line Card Control Processor Prototype Extreme Router
4
4 - Jonathan Turner – January 27-29, 2003 Switch Fabric IPPOPP FPX SPC Line Card IPPOPP FPX SPC Line Card IPPOPP FPX SPC Line Card IPPOPP FPX SPC Line Card IPPOPP FPX SPC Line Card IPPOPP FPX SPC Line Card Control Processor Prototype Extreme Router
5
5 - Jonathan Turner – January 27-29, 2003 Switch Fabric IPPOPP FPX SPC Line Card IPPOPP FPX SPC Line Card IPPOPP FPX SPC Line Card IPPOPP FPX SPC Line Card IPPOPP FPX SPC Line Card IPPOPP FPX SPC Line Card Control Processor Prototype Extreme Router Field Programmable Port Ext. Network Interface Device Reprogrammable Application Device SDRAM 128 MB SRAM 4 MB Field Programmable Port Extenders ATM Switch Core
6
6 - Jonathan Turner – January 27-29, 2003 Switch Fabric IPPOPP FPX SPC Line Card IPPOPP FPX SPC Line Card IPPOPP FPX SPC Line Card IPPOPP FPX SPC Line Card IPPOPP FPX SPC Line Card IPPOPP FPX SPC Line Card Control Processor Prototype Extreme Router Embedded Processors Smart Port Card 2 Flash Disk 128 MB Pentium Cache North Bridge FPGA APIC
7
7 - Jonathan Turner – January 27-29, 2003 Switch Fabric IPPOPP FPX SPC Line Card IPPOPP FPX SPC Line Card IPPOPP FPX SPC Line Card IPPOPP FPX SPC Line Card IPPOPP FPX SPC Line Card IPPOPP FPX SPC Line Card Control Processor Prototype Extreme Router Gigabit Ethernet GBIC Framer FPGA
8
8 - Jonathan Turner – January 27-29, 2003 Performance of SPC-2 Largest gain at small packet sizes. PCI bus limits performance for large packets
9
9 - Jonathan Turner – January 27-29, 2003 More SPC-2 Performance Throughput loss at high loads due to PCI bus contention and input priority.
10
10 - Jonathan Turner – January 27-29, 2003 Field Programmable Port Extender (FPX) n Network Interface Device (NID) routes cells to/from RAD. n Reprogrammable Application Device (RAD) functions: »will implement core router functions in extensible router »may also implement arbitrary packet processing functions n Functions for extreme router. »high speed packet storage manager »packet classification & route lookup –fast route lookup –exact match filters –32 general filters »flexible queue manager –per-flow queues for reserved flows –route packets to/from SPC Network Interface Device Reprogrammable App. Device (400 Kg+80 KB) SDRAM (64 MB) SRAM (1 MB) 2 Gb/s interface 6.4 Gb/s 100 MHz SRAM (1 MB) SDRAM (64 MB) 64 36 64 100 MHz
11
11 - Jonathan Turner – January 27-29, 2003 Logical Port Architecture Input Side Processing special flow queues... virtual output queues... PCU plugins SPC FPX reassembly contexts DQ Packet Classification & Route Lookup Output Side Processing special flow queues... output queues... PCU plugins SPC FPX reassembly contexts RC Packet Classification
12
12 - Jonathan Turner – January 27-29, 2003 FPX Packet Processor Block Diagram Control Data Path Control Cell Processor DQ Status & Rate Control Route & Filter Updates Register Set Updates & Status SRAM Register Set Queue Manager ISAR SDRAM from SW from LC to SW to LC OSAR Packet Storage Manager (includes free space list) Header Pointer Discard Pointer Header Proc. Classification and Route Lookup
13
13 - Jonathan Turner – January 27-29, 2003 Classification and Route Lookup (CARL) Three lookup engines. »route lookup for routing datagrams - best prefix »flow filters for multicast & reserved flows - exact »general filters (32) for management - exhaustive Input processing. »parallel check of all three »return highest priority exclusive and highest priority non-exclusive »general filters have unique priority »all flow filters share single priority »ditto for routes Route Lookup General Filters Flow Filters Input Demux Result Proc. & Priority Resolution bypass headers Output processing. »no route lookup on output Route lookup & flow filters share off-chip SRAM General filters processed on-chip
14
14 - Jonathan Turner – January 27-29, 2003 on-chip SRAM 11tag+data-- 01 10 00 11... tag+data -- tag+data-- off-chip SRAM Exact Match Lookup Exact match lookup table used for reserved flows. »includes LFS, signaled QOS flows and multicast »and, flows requiring processing by SPCs »each of these flows has separate queue in QM »multicast flows have two queues (recycling multicast) »implemented using hashing srcdst packet 56 simple hash ingress valid egress valid tag =[src,dst,sport, dport,proto] data includes 2 outputs+2 QIDs LFS rates packet,byte counters flags separate memory areas for ingress and egress packets
15
15 - Jonathan Turner – January 27-29, 2003 General Filter Match General filter match considers full 5-tuple »prefix match on source and destination addresses »range match on source and destination ports »exact or wildcard match on protocol »each filter has a priority and may be exclusive or non- exclusive Intended primarily for management filters. »firewall filters »class-based monitoring »class-based special processing Implemented using parallel exhaustive search. »limit of 32 filters matcher filter memory matcher
16
16 - Jonathan Turner – January 27-29, 2003 Fast IP Lookup (Eatherton & Dittia) Multibit trie with clever data encoding. »small memory requirements (<7 bytes per prefix) »small memory bandwidth, simple lookup yields fast lookup rates »updates have negligible impact on lookup performance Avoid impact of external memory latency on throughput by interleaving several concurrent lookups. »8 lookup engine config. uses about 6% of Virtex 2000E logic cells address: 101 100 101 000 01,10 000001 010 100 101 110 011110 100101100 *010,001,11 0 00 11--1* 1,10 0 00 0100 00000000 0 10 1000 00000000 0 10 0000 00000000 0 01 0001 00000000 0 00 0110 11101110 0 00 0000 00001000 0 00 0001 00010010 0 00 0000 00000010 0 01 0000 00001100 1 00 0000 00000000 0 01 0010 00000000 1 00 0000 00000000 0 00 1000 00000000 internal bit vector external bit vector
17
17 - Jonathan Turner – January 27-29, 2003 SRAM Bandwidth – 450 MB/s Lookup Throughput linear throughput gain Split tree cuts storage by 30%
18
18 - Jonathan Turner – January 27-29, 2003 Update Performance reasonable update rates have little impact 1 update per s
19
19 - Jonathan Turner – January 27-29, 2003 Queue Manager Logical View (QM) arriving packets... SPC pkt. sched. to SPC from SPC... res. flow queues VOQ pkt. sched. datagram queue to output 1 to output 8 to output 0 DQ to switch datagram queues... link pkt. sched.... to link res. flow queues 64 hashed datagram queues for traffic isolation separate queues for each reserved flow separate queue set for each output. separate queue for each SPC flow
20
20 - Jonathan Turner – January 27-29, 2003 Backlogged TCP Flows with Tail Discard with large buffers get large delay variance with small buffers get underflow and low throughput
21
21 - Jonathan Turner – January 27-29, 2003 DRR with Discard from Longest Queue Smaller fluctuations, but still significant.
22
22 - Jonathan Turner – January 27-29, 2003 Queue State DRR Add hysteresis to packet discard policy »discard from same queue until shortest non-empty queue. low variation, even with small queues, low delay, no tuning
23
23 - Jonathan Turner – January 27-29, 2003 Packet Scheduling with Approx. Radix Sorting To implement virtual time schedulers, need to quickly find the queue whose “lead packet” has the smallest virtual finish time. »using priority queue, this requires O (log n) time for n queues Use approximate radix sorting, with compensation – O (1). »timing wheels with increasing granularity and range »approximate sorting produces inter-packet timing errors »observe errors & compensate when next packet scheduled Fast-forward bits used to skip to empty slots. Scheduler puts no limit on number of queues. Two copies of data structure needed for approx. version of WF2Q+. wheel 1wheel 2wheel 3 output list fast forward bits 00110100 10000010 00101010
24
24 - Jonathan Turner – January 27-29, 2003 Resource Usage Estimates Key resources in Xilinx FPGAs »flip flops - 38,400 »lookup tables (LUTs) - 38,400 each can implement any 4 input Boolean function »block RAMs (4 Kbits each) - 160
25
25 - Jonathan Turner – January 27-29, 2003 FPGA Performance Characteristics
26
26 - Jonathan Turner – January 27-29, 2003 Summary Version 1 Hardware status. »hardware operating in lab, passing packets »but, still have some bugs to correct »one day for typical test-diagnose-correction cycle »version 1 has simplified queue manager Planning several system demos in next month. »system level throughput testing – focus on lookup proc. »verifying basic fair queueing behavior »TCP SYN attack suppressor SPC-resident plugin monitors new TCP connections going to server when too many “half-open” connections, oldest are reset flow filters inserted for stable connections, enabling hw forwarding Expect to complete version 2 hardware in next six months.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.