Presentation is loading. Please wait.

Presentation is loading. Please wait.

An NP-Based Router for the Open Network Lab Design

Similar presentations


Presentation on theme: "An NP-Based Router for the Open Network Lab Design"— Presentation transcript:

1 An NP-Based Router for the Open Network Lab Design
John DeHart

2 ONL Work Items (Updated 4/9/08)
NPR and NSP Integration Configuration Switch Ring Operation NSP Replace Filter Result without doing a Remove and Add Removal of MCast Filter Fix and Testing Create and copy pkts in Plugins Ingress vs. Egress NPR Mart’s problem with ordered_signal.h Init of signal does not appear to always work, adding code to the function helps Drop Counters (not all are implemented) Other Counters Verify that everything is implemented NPUB Add UseRoute capability DONE: Test Reclassify path Plugin Tags, etc. Performance Testing Demo at GENI meeting in June/July Workshop at a later GENI or similar meeting

3 ONL NP Router Register Counters
Rx Per Port Counters 0: RX P0 Bytes 1: Rx P0 Pkts 2: RX P1 Bytes 3: Rx P1 Pkts 4: RX P2 Bytes 5: Rx P2 Pkts 6: RX P3 Bytes 7: Rx P3 Pkts 8: RX P4 Bytes 9: Rx P4 Pkts Tx Per Port Counters 10: TX P0 Bytes 11: Tx P0 Pkts 12: TX P1 Bytes 13: Tx P1 Pkts 14: TX P2 Bytes 15: Tx P2 Pkts 16: TX P3 Bytes 17: Tx P3 Pkts 18: TX P4 Bytes 19: Tx P4 Pkts Mux Counters 20: From XScale Bytes 21: From XScale Pkts 22: From Plugin Bytes 23: From Plugin Pkts IP Drop Counters 24: HEC Drops 25: Length Error Drops 26: Hdr Length Error Drops 27: IP Version Error Drops 28: PLC to Plugin Drops 29: PLC to XScale Drops 30: PLC out of Descriptors Drops QM 31: Queue Overflow Drops XScale 32: XScale Drops Rx 33: Rx Drops Tx 34: Tx Drops Freelist Manager 35: Buffer Descriptor Errors Free Registers 36 37 Plugin Counters 38: Plugin 0 Counter 0 39: Plugin 0 Counter 1 40: Plugin 0 Counter 2 41: Plugin 0 Counter 3 42: Plugin 1 Counter 0 43: Plugin 1 Counter 1 44: Plugin 1 Counter 2 45: Plugin 1 Counter 3 46: Plugin 2 Counter 0 47: Plugin 2 Counter 1 48: Plugin 2 Counter 2 49: Plugin 2 Counter 3 50: Plugin 3 Counter 0 51: Plugin 3 Counter 1 52: Plugin 3 Counter 2 53: Plugin 3 Counter 3 54: Plugin 4 Counter 0 55: Plugin 4 Counter 1 56: Plugin 4 Counter 2 57: Plugin 4 Counter 3 Exception Drop Counters 58: NextHop Invalid Drops 59: ARP Drops 60: No Route Drops Filters 61: Filter Directed Drops HF 62: HF Descriptor Error 63

4 ONL NP Router Register Counters
#define ONL_ROUTER_RX_PORT0_BYTE_CNTR #define ONL_ROUTER_RX_PORT0_PKT_CNTR #define ONL_ROUTER_RX_PORT1_BYTE_CNTR #define ONL_ROUTER_RX_PORT1_PKT_CNTR #define ONL_ROUTER_RX_PORT2_BYTE_CNTR #define ONL_ROUTER_RX_PORT2_PKT_CNTR #define ONL_ROUTER_RX_PORT3_BYTE_CNTR #define ONL_ROUTER_RX_PORT3_PKT_CNTR #define ONL_ROUTER_RX_PORT4_BYTE_CNTR #define ONL_ROUTER_RX_PORT4_PKT_CNTR #define ONL_ROUTER_TX_PORT_BASE #define ONL_ROUTER_TX_BYTE_CNTR(PORT) (ONL_ROUTER_TX_PORT_BASE + (PORT*2)) #define ONL_ROUTER_TX_PKT_CNTR(PORT) (ONL_ROUTER_TX_PORT_BASE + (PORT*2) + 1) #define ONL_ROUTER_TX_PORT0_BYTE_CNTR #define ONL_ROUTER_TX_PORT0_PKT_CNTR #define ONL_ROUTER_TX_PORT1_BYTE_CNTR #define ONL_ROUTER_TX_PORT1_PKT_CNTR #define ONL_ROUTER_TX_PORT2_BYTE_CNTR #define ONL_ROUTER_TX_PORT2_PKT_CNTR #define ONL_ROUTER_TX_PORT3_BYTE_CNTR #define ONL_ROUTER_TX_PORT3_PKT_CNTR #define ONL_ROUTER_TX_PORT4_BYTE_CNTR #define ONL_ROUTER_TX_PORT4_PKT_CNTR #define ONL_ROUTER_XSCALE_TO_MUX_BYTE_CNTR #define ONL_ROUTER_XSCALE_TO_MUX_PKT_CNTR #define ONL_ROUTER_PLUGIN_TO_MUX_BYTE_CNTR #define ONL_ROUTER_PLUGIN_TO_MUX_PKT_CNTR #define ONL_ROUTER_IP_HEC_DROP_CNTR #define ONL_ROUTER_IP_LENGTH_ERR_DROP_CNTR #define ONL_ROUTER_IP_HDR_LENGTH_ERR_DROP_CNTR 26 #define ONL_ROUTER_IP_VERSION_ERR_DROP_CNTR 27 #define ONL_ROUTER_PLC_TO_PLUGIN_DROP_CNTR #define ONL_ROUTER_PLC_TO_XSCALE_DROP_CNTR #define ONL_ROUTER_PLC_OUT_OF_DESCS_DROP_CNTR 30 #define ONL_ROUTER_QUEUE_OVERFLOW_DROP_CNTR 31 #define ONL_ROUTER_XSCALE_DROP_CNTR #define ONL_ROUTER_RX_DROP_CNTR #define ONL_ROUTER_TX_DROP_CNTR #define ONL_ROUTER_FREELISTMGR_BUFFER_DESC_ERROR_CNTR 35 // Counters 36,37 are Free #define ONL_ROUTER_PLUGIN_0_CNTR_ #define ONL_ROUTER_PLUGIN_0_CNTR_ #define ONL_ROUTER_PLUGIN_0_CNTR_ #define ONL_ROUTER_PLUGIN_0_CNTR_ #define ONL_ROUTER_PLUGIN_1_CNTR_ #define ONL_ROUTER_PLUGIN_1_CNTR_ #define ONL_ROUTER_PLUGIN_1_CNTR_ #define ONL_ROUTER_PLUGIN_1_CNTR_ #define ONL_ROUTER_PLUGIN_2_CNTR_ #define ONL_ROUTER_PLUGIN_2_CNTR_ #define ONL_ROUTER_PLUGIN_2_CNTR_ #define ONL_ROUTER_PLUGIN_2_CNTR_ #define ONL_ROUTER_PLUGIN_3_CNTR_ #define ONL_ROUTER_PLUGIN_3_CNTR_ #define ONL_ROUTER_PLUGIN_3_CNTR_ #define ONL_ROUTER_PLUGIN_3_CNTR_ #define ONL_ROUTER_PLUGIN_4_CNTR_ #define ONL_ROUTER_PLUGIN_4_CNTR_ #define ONL_ROUTER_PLUGIN_4_CNTR_ #define ONL_ROUTER_PLUGIN_4_CNTR_ #define ONL_ROUTER_EXC_NH_INV_DROP_CNTR 58 #define ONL_ROUTER_EXC_ARP_DROP_CNTR #define ONL_ROUTER_EXC_NO_ROUTE_DROP_CNTR 60 #define ONL_ROUTER_FILTER_DIRECTED_DROP_CNTR 61 #define ONL_ROUTER_HF_DESCRIPTOR_ERROR_CNTR // Counters 63 are Free

5 Notes: Exceptions Exception bits may not work exactly as we planned:
For example, if a pkt comes in with an expired TTL but it still matches a route, then the Copy block will use the route result and send the packet out. Somehow the exception should take precedence over a Route. Should filters be forced to explicitly match on the exception bits? That is the mask bits for the exception bits should be all 1’s. Exceptions we detect: TTL No Route IP Options ARP Pkt Non IP Pkt NH Invalid ARP Needed

6 Adding Use_Route Capability
In the NSP router, filters can be added with a specification that says, use the Route Lookup result for which port to send the packet to. In the NPR we do not currently support that. We would like to add Use_Route for PF and AF Priority field in PF TCAM Result is currently 8 bits. We only use 6 bits to conform to its usage in the NSP Take top bit of the 8 bit Priority field and make it the Use_Route bit and reduce the Priority field to 6 bits, with the other bit RESERVED. Use similar bit in AF TCAM Result Routes are always Unicast If Use_Route bit is set and RL is a MISS Treat packet as NO_ROUTE If Use_Route bit is set and RL is a HIT: Read Filter Result AND Route Result from QDR SRAM Bank0 Normally we only read one of PR or RL result from QDR SRAM Bank0 Primary/Aux Result uses: From Primary/Aux Filter Result: QID Stats UCast/MCast bits for D/PPS/Plugins From Route Result: VALID Bits UCast/MCast bits for OutPort NH_MAC/NH_IP If Filter result is multi-copy, then each copy gets the same OutPort If NH_MAC is not valid, then ARP result is used

7 Notes: Lookup Needs to work from both NPUA and NPUB
Currently hardcoded DB Ids Need to have XScale write to Scratch or SRAM with either Indication of NPUA vs. NPUB The 4 Ids that this Lookup is to use.

8 Notes: Exceptions User Preferences: No Route ARP Needed NH Inv
Exception Type Action if Lookup Hit Action if Lookup Miss TTL XScale No Route NA (Not Applicable) User Pref IP Options Use Result (Handled by No Route) ARP Pkt ARP Needed NA Non IP Pkt NH Invalid User Preferences: No Route ARP Needed NH Inv

9 Notes: ARP Handling When Parse receives an ARP packet it may leave garbage/random bits in the lookup key. Fixed

10 ARP Notes Add 4th database for IP to MAC translations
ARP DB will be populated either statically at configuration time or dynamically by ARP daemon on XScale This ARP DB will be queried at same time other 3 are and if a result is returned it can be used for attached hosts. NH Router filter and route results will be populated with either NH IP or NH MAC addresses. If NH IP then PLC will send to XScale with NH IP and address of filter result that needs to be overwritten with new NH MAC Address If NH MAC then PLC will use that. If neither, then use result from ARP DB lookup If that had no result then send to XScale with DAddr from pkt for it to perform ARP and populate ARP DB. Logic: if (NH_IP valid) { // this implies that we have not arped for this filter's NH_IP // But, we might have arped for that IP for another filter... // So, if we have 10 filters pointing to the same NH router, // we could end up arping for that NH Router 10 times. send pkt to XScale, include NH_IP and address of filter result } else if (NH_MAC valid) { Use it. } else if (ARP_valid) { use MAC from ARP result } else { Send Pkt DAddr to XScale. }

11 NH_IP[31:0] or Filter SRAM Addr
ARP Notes Format of data to XScale: Flags(8b): Why pkt is being sent to XScale TTL(1b): TTL expired Options(1b): IP Options present NoRoute(1b): No matching route or filter NonIP(1b): Non IP Packet received ARP_Needed(1b): NH_IP valid, but no MAC NH_Invalid(1b): NH_IP AND NH_MAC both invalid Reserved(2b): currently unused Flags(8b): Why pkt is being sent to XScale TTL(1b): TTL expired Options(1b): IP Options present NoRoute(1b): No matching route or filter NonIP(1b): Non IP Packet received ARP_Needed(1b): NH_IP valid, but no MAC NH_Invalid(1b): NH_IP AND NH_MAC both invalid Update_ARP_Table: Put ARP result in ARP DB Reserved(1b): currently unused L3 (IP, ARP, …) Pkt Length (16b) Buffer Handle(24b) Stats Index (16b) QID(16b) In Port (3b) Plugin Tag (5b) Flags (8b) Rsv NH MAC DA[47:16] (32b) NH MAC DA[15:0] (16b) EtherType (16b) Unicast/MCast Bits Reserved (16b) L3 (IP, ARP, …) Pkt Length (16b) Buffer Handle(24b) Stats Index (16b) QID(16b) In Port (3b) Plugin Tag (5b) Flags (8b) Rsv NH_IP[31:0] or Filter SRAM Addr or NH MAC DA[47:16] (32b) NH MAC DA[15:0] (16b) EtherType (16b) Unicast/MCast Bits Reserved (16b) 1 2 3 7 Reserved (2b) NR (1b) TTL Opt NI ARP NH INV Rsvd (1b) ARP DB (1b) NH INV (1b) ARP (1b) NI (1b) NR (1b) Opt (1b) TTL (1b) 7 3 2 1

12 ARP Notes NH field: ARP DB bit in flags indicates whether ARP result should go into the ARP DB (=1) or into the NH_MAC field of a filter or route (=0). If the result is to go into the ARP DB then the upper 32 bits of the NH field is the 32 bit IP DAddr to be ARPed. If the result is to go into the NH_MAC field of a filter or route then the NH field is the SRAM address of the filter or route result. In this case the XScale will read the address given to retrieve the NH IP Address, ARP it and then write back the NH MAC address and reset the flag bits in the filter/route result to indicate NH_MAC is valid and NH_IP is not valid.

13 ARP Notes Aux Filters: Always Unicast
Always contain a NH_IP or a NH_MAC i.e. never uses ARP DB May still require XScale to ARP to resolve IP to MAC

14 Distribution Notes First, some notes about IXP files:
.ind: Script file for doing initialization when running in Simulation .uc: Microcode (Assembly) source file .c: ‘C’ source file .ucp: Intermediate file Output of Preprocessor Input to Assembler .uci: Intermediate file Created by Assembler Used by Assembler to create .list file .list: executable image for a particular ME .uof: package of one or more ME-specific .list files Produced by Linker Loaded onto a Chip There is an issue with ONL having blades with two different versions of IXPs: 2800, 2805. Which use different binary files.

15 Distribution Notes How should we distribute our ONL NP Router ONL users? Can we do it without giving them all of our source code? And any of Intel’s or Radisys’s source code? Proposal, Give ONL Users the following: .list files for all the blocks that we provide. If possible we will remove the code from the .list files so they can’t even see it. Charlie and John would have versions of the .list files with the code for debugging purposes. A script for building their plugin.c files into .list files. A script for building a .uof file from all the .list files with inclusion of any possible plugins that they want to include. May be separate scripts for building HW version of .uof file(s) A DEBUG Only Simulation project file. This allows them to do all the debugging we currently do you just can't "Go to Source". All the initialization scripts they will need and configure them in the project file. TCAM stuff: Library: Was there an NDA for the runtime support? .txt files configuring the TCAM DBs. Do we need to remove the “code” for the IDT stuff from the list files? Some sample pktgen flows to get them started. Some sample plugin code to get them started.

16 Distribution and Usage Notes
How will they load plugins in HW? They will need a way to load, unload and reload plugins while they are running in HW. This may be doable with a full .uof file that contains everything or the scripts may have to build an individual .uof file for each plugin.

17 Distribution and Usage Notes
TCAM and Lookup We will NOT distribute the TCAM simulation for runtime We will hack up lookup to do a simple distribution of pkts: IP DAddr[2:0] 0: Sent to Plugin 0 1: Sent to Plugin 1 2 3 4: Sent to Plugin 4 5: Sent to QM for Output 0 6: Sent to QM for Output 1 7: Sent to QM for Output 2 Each of the above has a separate Lookup Result stored in SRAM Bank 0 Extension to use the upper bits of IP DAddr to detect Multicast and have 8 different Results to handle 8 potential Multicast results. This hacked up lookup will be for SIMULATION only.

18 ONL NP Router xScale xScale (3 Rings?) SRAM TCAM SRAM Rx (2 ME) Mux
Assoc. Data ZBT-SRAM xScale (3 Rings?) Small SRAM Ring Large SRAM Ring Scratch Ring SRAM TCAM LD Except Errors SRAM NN NN Ring 64KW Rx (2 ME) Mux (1 ME) Parse, Lookup, Copy (3 MEs) QM (1 ME) HdrFmt (1 ME) Tx (1 ME) NN Mostly Unchanged 64KW xScale 64KW 64KW 64KW 64KW 64KW Plugin to XScale Ctrl,Update & RLI Msgs 512W 512W 512W 512W 512W New NN NN NN NN Plugin0 Plugin1 Plugin2 Plugin3 Plugin4 SRAM Needs A Lot Of Mod. 512W 512W 512W 512W 512W Rx Mux HF Copy Plugins Tx Needs Some Mod. Stats (1 ME) Tx, QM Parse Plugin XScale FreeList Mgr (1 ME) SRAM

19 Performance What is our performance target? To hit 5 Gb rate:
Minimum Ethernet frame: 76B 64B frame + 12B InterFrame Spacing 5 Gb/sec * 1B/8b * packet/76B = 8.22 Mpkt/sec IXP ME processing: 1.4Ghz clock rate 1.4Gcycle/sec * 1 sec/ 8.22 Mp = cycles per packet compute budget: (MEs*170) 1 ME: 170 cycles 2 ME: 340 cycles 3 ME: 510 cycles 4 ME: 680 cycles latency budget: (threads*170) 1 ME: 8 threads: 1360 cycles 2 ME: 16 threads: 2720 cycles 3 ME: 24 threads: 4080 cycles 4 ME: 32 threads: 5440 cycles

20 Performance Results – July 2007
Methodology: Set workbench to stop after each 100 pkts received Record Tx pkt count and cycle count at Rx pkts counts of 0, 100, 200, 300, 400, 500 Performance metric is Tx rated between Rx pkt counts of 100 and 500. Input Rate: 7.44 Mpkts/s Minimum sized UDP packets Ethernet Frame wire occupancy: 84 Bytes 64 Byte Ethernet Frame 12 Byte Ethernet Interframe Spacing 8 Byte Ethernet Preamble Ethernet wire bit rate: Mb/s Output Rates for various configurations: All Real Blocks: 40 Queues (QM worst case): 3.35 Mpkts/s Each packet causes eviction and reload of a queue 5 Queues (QM best case): Mpkts/s All queues remain resident, no evicting and reloading Some Stub Blocks: PLC: Mpkts/s PLC Stub uses only 5 queues, should be very similar to 5 Queues above QM: Mpkts/s QM, Mux: MPkts/s QM, PLC: MPkts/s Previous two numbers tell us that PLC is slightly faster than Mux QM, Mux, PLC: MPkts/s QM, Mux, PLC, HF: 7.35 MPkts/s Bottleneck Order: QM MUX PLC

21 Performance Results – July 2007
Methodology: Set workbench to stop after each 100 pkts received Record Tx pkt count and cycle count at Rx pkts counts of 0, 100, 200, 300, 400, 500 Performance metric is Tx rate between Rx pkt counts of 100 and 500. Some measurement artifacts possible due to Tx being “just about to transmit” Input Rate: 7.44 Mpkts/s (Minimum sized UDP packets) Ethernet Frame wire occupancy: 84 Bytes 64 Byte Ethernet Frame 12 Byte Ethernet Interframe Spacing 8 Byte Ethernet Preamble Ethernet wire bit rate: Gb/s Bottleneck blocks: QM (45%), Mux (90%), PLC (97%) Stubs Tx MPkt/s Tx Gbit/s Tx % Comments None (40 Qs) 3.35 2.25 45 QM Worst Case None (5 Qs) 3.81 2.56 51.25 QM Best Case PLC 3.95 2.65 53 Uses 5 Qs (similar to above) QM 6.75 4.54 90.75 QM, Mux 7.25 4.87 97.5 QM, PLC 6.68 4.49 89.75 PLC Slightly better than Mux QM, Mux, PLC 7.46 5.01 100.25 HF and Tx are not bottlenecks QM, Mux, PLC, HF 7.35 4.94 98.75 Artifact or HF Stub slower than HF

22 Performance Results – July 2007
Multicast: 5 Qs Stub QM Stubs Tx MPkt/s Tx Gbit/s Tx % Comments None (5 Qs) 3.25 2.18 43.3 QM 5.15 3.46 68.6

23 Inter Block Rings Scratch Rings (sizes in 32b Words: 128, 256, 512, 1024) XScale  MUX 3 Word per pkt 256 Word Ring 256/3 pkts PLC  XScale MUX  PLC  QM 3 Words per pkt 1024 Word Ring 1024/3 Pkts HF  TX 5 Word per pkt 256/5 pkts  Stats 1 Word per pkt 256 pkts  Freelist Mgr Total Scratch Size: 4KW (16KB) Total Used in Rings: 2.5 KW

24 Inter Block Rings SRAM Rings (sizes in 32b KW: 0.5, 1, 2, 4, 8, 16, 32, 64) RX  MUX 2 Words per pkt 64KW Ring 32K Pkts PLC  Plugins (5 of them) 3 Words per pkt 64KW Rings ~21K Pkts Plugins  MUX (1 serving all plugins) NN Rings (128 32b words) QM HF 1 Word per pkt 128 Pkts Plugin N  Plugin N+1 (for N=1 to N=4) Words per pkt is plugin dependent

25 ONL SRAM Buffer Descriptor
Problem: With the use of Filters, Plugins and recycling back around for reclassification, we can end up with an arbitrary number of copies of one packet in the system at a time. Each copy of a packet could end up going to an output port and need a different MAC DAddr from all the other packets Having one Buffer Descriptor per packet regardless of the number of copies will not be sufficient. Solution: When there are multiple copies of the packet in the system, each copy will need a separate Header buffer descriptor which will contain the MAC DAddr for that copy. When the Copy block gets a packet that it only needs to send one copy to QM, it will read the current reference count and if this copy is the ONLY copy in the system, it will not prepend the Header buffer descriptor. SRAM buffer descriptors are the scarce resource and we want to optimize their use. Therefore: We do NOT want to always prepend a header buffer descriptor Otherwise, Copy will prepend a Header buffer descriptor to each copy going to the QM. Copy does NOT need to prepend a Header buffer descriptor to copies going to plugins Copy does NOT need to prepend a Header buffer descriptor to a copy going to the XScale The Header buffer descriptors will come from the same pool (freelist 0) as the PacketPayload buffer descriptors. There is no advantage to associating these Header buffer descriptors with small DRAM buffers. DRAM is not the scarce resource SRAM buffer descriptors are the scarce resource. We want to avoid getting a descriptor coming in to PLC for reclassification with and the Header buffer descriptor chained in front of the payload buffer descriptor. Plugins and XScale should append a Header Buffer descriptor when they are sending something that has copies that is going directly to the QM or to Mux and PLC for PassThrough.

26 ONL SRAM Buffer Descriptor
Buffer_Next (32b) LW0 Buffer_Size (16b) Offset (16b) LW1 Packet_Size (16b) Free_list 0000 (4b) Reserved (4b) Ref_Cnt (8b) LW2 Stats Index (16b) MAC DAddr_47_32 (16b) LW3 MAC DAddr_31_00 (32b) LW4 EtherType (16b) Reserved (16b) LW5 Reserved (32b) LW6 Packet_Next (32b) LW7 1 Written by Rx, Added to by Copy Decremented by Freelist Mgr Ref_Cnt (8b) Written by Freelist Mgr Written by Rx Written by Copy Written by Rx and Plugins Written by QM

27 ONL DRAM Buffer and SRAM Buffer Descriptor
SRAM Buffer Descriptor Fields: Buffer_Next: ptr to next buffer in a multi-buffer packet Buffer_Size: number of bytes in the associated DRAM buffer Packet_Size: total number of bytes in the pkt QM (dequeue) uses this to decrement qlength Offset: byte offset into DRAM buffer where packet (ethernet frame) starts. From RX: 0x180: Constant offset to start of Ethernet Hdr 0x18E: Constant offset to start of IP/ARP/etc hdr However, Plugins can do ANYTHING so we cannot depend on the constant offsets. The following slides will, however, assume that nothing funny has happened. Freelist: Id of freelist that this buffer came from and should be returned to when it is freed Ref_Cnt: Number of copies of this buffer currently in the system MAC_DAddr: Ethernet MAC Destination Address that should be used for this packet Stats Index: Index into statistics counters that should be used for this packet EtherType: Ethernet Type filed that should be used for this packet Packet_Next: ptr to next packet in the queue when this packet is queued by the QM Buffer_Next (32b) Buffer_Size (16b) Offset (16b) Packet_Size (16b) Free_list 0000 (4b) Reserved (4b) Ref_Cnt (8b) Stats Index(16b) MAC DAddr_47_32 (16b) MAC DAddr_31_00 (32b) EtherType (16b) Reserved (16b) Reserved (32b) Packet_Next (32b) 0x000 Empty 0x180 Ethernet Hdr 0x18E IP Packet 0x800

28 ONL DRAM Buffer and SRAM Buffer Descriptor
Normal Unicast case: One copy of packet being sent to one output port SRAM Buffer Descriptor Fields: Buffer_Next: NULL Buffer_Size: IP_Pkt_Length Packet_Size: IP_Pkt_Length Offset: x18E Freelist: Ref_Cnt: MAC_DAddr: <result of lookup> Stats Index: <from lookup result> EtherType: 0x0800 (IP) Packet_Next: <as used by QM> Buffer_Next (32b) EtherType (16b) Packet_Next (32b) Reserved (4b) Free_list 0000 Ref_Cnt (8b) Stats Index (16b) MAC DAddr_47_32 (16b) MAC DAddr_31_00 (32b) Reserved (32b) Buffer_Size (16b) Packet_Size (16b) Offset (16b) Reserved (16b) 0x000 Empty 0x180 Ethernet Hdr 0x18E IP Packet 0x800

29 ONL DRAM Buffer and SRAM Buffer Descriptor
Multi-copy case: >1 copy of packet in system This copy going from Copy to QM to go out on an output port Header Buf Descriptor Payload Buf Descriptor Buffer_Next (32b) EtherType (16b) Packet_Next (32b) Reserved (4b) Free_list 0000 Ref_Cnt (8b) Stats Index (16b) MAC DAddr_47_32 (16b) MAC DAddr_31_00 (32b) Reserved (32b) Buffer_Size (16b) Packet_Size (16b) Offset (16b) Reserved (16b) Buffer_Next (32b) EtherType (16b) Packet_Next (32b) Reserved (4b) Free_list 0000 Ref_Cnt (8b) Stats Index (16b) MAC DAddr_47_32 (16b) MAC DAddr_31_00 (32b) Reserved (32b) Buffer_Size (16b) Packet_Size (16b) Offset (16b) Reserved (16b) 0x000 Empty 0x000 Empty 0x180 Empty 0x180 Ethernet Hdr 0x18E Empty 0x18E IP Packet 0x800 0x800

30 ONL DRAM Buffer and SRAM Buffer Descriptor
Multi-copy case (continued): >1 copy of packet in system This copy going from Copy to QM to go out on an output port Header Buf Descriptor: SRAM Buffer Descriptor Fields: Buffer_Next: ptr to payload buf desc Buffer_Size: 0 (Don’t Care) Packet_Size: IP_Pkt_Length Offset: (Don’t Care) Freelist: Ref_Cnt: MAC_DAddr: <result of lookup> Stats Index: <from lookup result> Different copies of the same packet may actually have different Stats Indices EtherType: 0x0800 (IP) Packet_Next: <as used by QM> Header Buf Descriptor Payload Buf Descriptor Buffer_Next (32b) EtherType (16b) Packet_Next (32b) Reserved (4b) Free_list 0000 Ref_Cnt (8b) Stats Index (16b) MAC DAddr_47_32 (16b) MAC DAddr_31_00 (32b) Reserved (32b) Buffer_Size (16b) Packet_Size (16b) Offset (16b) Reserved (16b) Buffer_Next (32b) EtherType (16b) Packet_Next (32b) Reserved (4b) Free_list 0000 Ref_Cnt (8b) Stats Index (16b) MAC DAddr_47_32 (16b) MAC DAddr_31_00 (32b) Reserved (32b) Buffer_Size (16b) Packet_Size (16b) Offset (16b) Reserved (16b) 0x000 Empty 0x000 Empty 0x180 Empty 0x180 Ethernet Hdr 0x18E Empty 0x18E IP Packet 0x800 0x800

31 ONL DRAM Buffer and SRAM Buffer Descriptor
Multi-copy case (continued): >1 copy of packet in system This copy going from Copy to QM to go out on an output port Payload Buf Descriptor: SRAM Buffer Descriptor Fields: Buffer_Next: NULL Buffer_Size: IP_Pkt_Length Packet_Size: IP_Pkt_Length Offset: x18E Freelist: Ref_Cnt: <number of copies currently in system> MAC_DAddr: <don’t care> Stats Index: <should not be used> EtherType: <don’t care> Packet_Next: <should not be used> Header Buf Descriptor Payload Buf Descriptor Buffer_Next (32b) EtherType (16b) Packet_Next (32b) Reserved (4b) Free_list 0000 Ref_Cnt (8b) Stats Index (16b) MAC DAddr_47_32 (16b) MAC DAddr_31_00 (32b) Reserved (32b) Buffer_Size (16b) Packet_Size (16b) Offset (16b) Reserved (16b) Buffer_Next (32b) Buffer_Size (16b) Offset (16b) Packet_Size (16b) Free_list 0000 (4b) Reserved (4b) Ref_Cnt (8b) Stats Index (16b) MAC DAddr_47_32 (16b) MAC DAddr_31_00 (32b) EtherType (16b) Reserved (16b) Reserved (32b) Packet_Next (32b) 0x000 Empty 0x000 Empty 0x180 Empty 0x180 Ethernet Hdr 0x18E Empty 0x18E IP Packet 0x800 0x800

32 ONL SRAM Buffer Descriptor
Rx writes: Buffer_size  ethernet frame length Packet_size  ethernet frame length Offset  0x180 Freelist  0 Mux Block writes: Buffer_size  (frame length from Rx) -14 Packet_size  (frame length from Rx) -14 Offset  0x18E Ref_cnt  1 Copy Block initializes a newly allocated Hdr desc: Buffer_Next to point to original payload buffer Buffer_size  0 (don’t care, noone should be using this field) Packet_size  IP Pkt Length (should be length from input ring) Offset  0 (don’t care, noone should be using this field) Stats_Index  from lookup result MAC DAddr  from lookup result (or calculated for Mcast) EtherType  0x0800 IP If copy is making copies then we must have done a classification so it must have been an IP packet Packet_Next  0 The QM will now be using the IP Pkt length for its qlength increments and decrements.

33 SRAM Usage What will be using SRAM? Buffer descriptors
Current MR supports 229,376 buffers 32 Bytes per SRAM buffer descriptor 7 MBytes Queue Descriptors Current MR supports queues 16 Bytes per Queue Descriptor 1 MByte Queue Parameters 16 Bytes per Queue Params (actually only 12 used in SRAM) QM Scheduling structure: Current MR supports batch buffers per QM ME 44 Bytes per batch buffer Bytes QM Port Rates 4 Bytes per port Plugin “scratch” memory How much per plugin? Large inter-block rings Rx  Mux  Plugins  Plugins Stats/Counters Currently 64K sets, 16 bytes per set: 1 MByte Lookup Results

34 SRAM Bank Allocation SRAM Banks:
4 MB total, 2MB per NPU Same interface/bus as TCAM Bank1-3 8 MB each Criteria for how SRAM banks should be allocated? Size: SRAM Bandwidth: How many SRAM accesses per packet are needed for the various SRAM uses? QM needs buffer desc and queue desc in same bank

35 Proposed SRAM Bank Allocation
TCAM Lookup Results SRAM Bank 1 (2.5MB/8MB): QM Queue Params (1MB) QM Scheduling Struct (0.5 MB) QM Port Rates (20B) Large Inter-Block Rings (1MB) SRAM Rings are of sizes (in Words): 0.5K, 1K, 2K, 4K, 8K, 16K, 32K, 64K Rx  Mux (2 Words per pkt): 64KW (32K pkts): 128KB  Plugin (3 Words per pkt): 64KW each (21K Pkts each): 640KB  Plugin (3 Words per pkt): 64KW (21K Pkts): 256KB SRAM Bank 2 (8MB/8MB): Buffer Descriptors (7MB) Queue Descriptors (1MB) SRAM Bank 3 (6MB/8MB): Stats Counters (1MB) Global Registers (256 * 4B) Plugin “scratch” memory (5MB, 1MB per plugin)

36 Queues and QIDs Assigned Queues vs. Datagram Queues
A flow or set of flows can be assigned to a specific Queue by assigning a specific QID to its/their filter(s) and/or route(s) A flow can be assigned to use a Datagram queue by assigning QID=0 to its filter(s) and/or route(s) There are 64 datagram queues If it sees a lookup result with a QID=0, the PLC block will calculate the datagram QID for the result based on the following hash function: DG QID = SA[9:8] SA[6:5] DA[6:5] Concatenate IP src addr bits 9 and 8, IP src addr bits 6 and 5, IP dst addr bits 6 and 5 Who/What assigns QIDs to flows? The ONL User can assign QIDs to flows or sets of flows using the RLI The XScale daemon can assign QIDs to flows on behalf of the User/RLI if so requested: User indicates that they want an assigned QID but they want the system to pick it for them and report it back to them. The ONL User indicates that they want to use a datagram queue and the data path (Copy block) calculates the QID using a defined hash fct Using the same QID for all copies of a multicast does not work The QM does not partition QIDs across ports We cannot assume that the User will partition the QIDs so we will have to enforce a partitioning.

37 Queues and QIDs (continued)
Proposed partitioning of QIDs: QID[15:13]: Port Number 0-4 (numbered 1-5) Copy block will add these bits QID[12: 0] : per port queues 8128 Reserved queues per port 64 datagram queues per port yyy xx xxxx: Datagram queues for port <yyy> QIDs : per port Reserved Queues QIDs 0-63: per port Datagram Queues With this partitioning, only 13 bits of the QID should be made available to the ONL User.

38 Lookups How will lookups be structured?
Three Databases: Route Lookup: Containing Unicast and Multicast Entries Unicast: Port: Can be wildcarded Longest Prefix Match on DAddr Routes should be shorted in the DB with longest prefixes first. Multicast Port: Can be wildcarded? Exact Match on DAddr Longest Prefix Match on SAddr Routes should be sorted in the DB with longest prefixes first. Primary Filter Filters should be sorted in the DB with higher priority filters first Auxiliary Filter Priority between Primary Filter and Route Lookup A priority will be stored with each Primary Filter A priority will be assigned to RLs (all routes have same priority) PF priority and RL priority compared after result is retrieved. One of them will be selected based on this priority comparison. Auxiliary Filters: If matched, cause a copy of packet to be sent out according to the Aux Filter’s result.

39 Route Lookup Route Lookup Key (72b) Route Lookup: Result (96b)
Port (3b): Can be a wildcard (for Unicast, probably not for Multicast) Value of 111b in Port field can be used to denote a packet that originated from the XScale Value of 110b in Port field can be used to denots a packet that originated from a Plugin Ports numbered 0-4 PluginTag (5b): Can be a wildcard (for Unicast, probably not for Multicast) Plugins numberd 0-4 DAddr (32b) Prefixed for Unicast Exact Match for Multicast SAddr (32b) Unicast entries always have this and its mask set to 0 Prefixed for Multicast Route Lookup: Result (96b) Unicast/Multicast Fields (determined by IP_MCast_Valid bit (1:MCast, 0:Unicast) (13b) IP_MCast Valid (1b) MulticastFields (12b) Plugin/Port Selection Bit (1b): 0: Send pkt to both Port and Plugin. Does it get the MCast CopyVector? 1: Send pkt to all Plugin bits set, include MCast CopyVector in data going to plugins MCast CopyVector (11b) One bit for each of the 5 ports and 5 plugins and one bit for the XScale, to drop a MCast, set MCast CopyVector to all 0’s UnicastFields (8b) Drop Bit (1b) 0: handle normally 1: Drop Unicast pkt 0: Send packet to port indicated by Unicast Output Port field 1: Send packet to plugin indicated by Unicast Output Plugin field. Unicast Output Port, QID, Stats Index, and NH fields also get sent to plugin Unicast Output Port (3b): Port or XScale 0: Port0, 1: Port1, 2: Port2, 3: Port3, 4: Port4 Unicast Output Plugin (3b): 0: Plugin0, 1: Plugin1, 2: Plugin2, 3: Plugin3, 4: Plugin4 5: XScale (treated like a plugin) QID (16b) Stats Index (16b) NH_IP/NH_MAC (48b): At most one of NH_IP or NH_MAC should be valid Valid Bits (3b): At most one of the following three bits should be set IP_MCast Valid (1b) (Also included above) NH_IP_Valid (1b) NH_MAC_Valid (1b)

40 Lookup Key and Results Formats
IP DAddr (32b) IP SAddr (32b) P Tag (5b) P (3b) Proto (8b) DPort (16b) SPort (16b) Exceptions (16b) TCP Flags (12b) 140 Bit Key: RL PF and AF 32 Bit Result in TCAM Assoc. Data SRAM: 96 Bit Result in QDR SRAM Bank0: PF Prio (8b) D (1b) H M Address (21b) V (4b) UCast MCast (12b) QID (16b) Stats Index (16b) NH_MAC (48b) NH_IP (32b) Res (16b) AF Res (8b) D (1b) H M Address (21b) V (4b) S B (2b) R e s (2b) Uni Cast (8b) QID (16b) Stats Index (16b) NH_MAC (48b) NH_IP (32b) Res (16b) RL Res (8b) D (1b) H M Address (21b) V (4b) UCast MCast (12b) QID (16b) Stats Index (16b) NH_MAC (48b) NH_IP (32b) Res (16b) TCAM Ctrl Bits: D:Done H:HIT MH:Multi-Hit Entry Valid (1b) NH IP MAC MC D (1b) PPS UCast Out Port (3b) Out Plugin Reserved (4b) If IP MC Valid = 0 Multicast Copy Vector (11b) PPS (1b) If IP MC Valid = 1

41 Lookup Key and Results Formats
Multicast Copy Vector (IP MCV == 1) IP MCV = 1 PPS (1b) Multicast Copy Vector (11b) Port Bits (5b) Plugin Bits (5b) Xsc (1b) PT4 (1b) PT3 (1b) PT2 (1b) PT1 (1b) PT0 (1b) PL4 (1b) PL3 (1b) PL2 (1b) PL1 (1b) PL0 (1b) Xsc (1b)

42 Lookup Key and Results Formats
IP DAddr (32b) IP SAddr (32b) P Tag (5b) P (3b) Proto (8b) DPort (16b) SPort (16b) Exceptions (16b) TCP Flags (12b) 140 Bit Key: RL PF and AF 32 Bit Result in TCAM Assoc. Data SRAM: 96 Bit Result in QDR SRAM Bank0: PF Prio (8b) D (1b) H M Address (21b) V (4b) UCast MCast (12b) QID (16b) Stats Index (16b) NH_MAC (48b) NH_IP (32b) Res (16b) AF Res (8b) D (1b) H M Address (21b) V (4b) S B (2b) R e s (2b) Uni Cast (8b) QID (16b) Stats Index (16b) NH_MAC (48b) NH_IP (32b) Res (16b) RL Res (8b) D (1b) H M Address (21b) V (4b) UCast MCast (12b) QID (16b) Stats Index (16b) NH_MAC (48b) NH_IP (32b) Res (16b) ARP Res (8b) D (1b) H M Address (21b)

43 Exception Bits in Lookup Key
IP DAddr (32b) IP SAddr (32b) P Tag (5b) P (3b) Proto (8b) DPort (16b) SPort (16b) Exceptions (16b) TCP Flags (12b) 140 Bit Key: RL PF and AF Reserved (12b) Non-IP (1b) ARP (1b) IP Opt (1b) TTL (1b) Exception Bits: TTL: TTL has expired. It was 0 or 1 on arriving packet IP Opt: IP Packet contained Options ARP: Ethertype field in ethernet header was ARP Non-IP: Ethertype field in ethernet header was NOT IP NOTE: An ARP packet will have ARP bit and Non-IP bit set

44 Multicast Copy Vector (11b)
UCast/MCast Bits Format of the UCast/MCast fields in Ring data going to XScale and Plugins: Multicast: IP MCV = 1 Reserved (3b) IP MCV (1b) PPS (1b) Multicast Copy Vector (11b) Port Bits (5b) Plugin Bits (5b) Xsc (1b) PT4 (1b) PT3 (1b) PT2 (1b) PT1 (1b) PT0 (1b) PL4 (1b) PL3 (1b) PL2 (1b) PL1 (1b) PL0 (1b) Xsc (1b) Unicast: IP MCV = 0 1 2 3 7 8 9 10 11 15 Reserved (3b) D (1b) IP MCV 12 PPS UCast Out Port UCast Out Plugin (4b)

45 Primary Filter Primary Filter Lookup Key (140b)
Port (3b): Can be a wildcard (for Unicast, probably not for Multicast) Value of 111b in Port field to denote coming from the XScale Ports numbered 0-4 PluginTag (5b): Can be a wildcard (for Unicast, probably not for Multicast) Plugins numberd 0-4 DAddr (32b) SAddr (32b) Protocol (8b) DPort (16b) Sport (16b) TCP Flags (12b) Exception Bits (16b): Allow for directing of packets based on defined exceptions Primary Filter Result (104b) Unicast/Multicast Fields (determined by IP_MCast_Valid bit (1:MCast, 0:Unicast) (13b) IP_MCast Valid (1b) MulticastFields (12b) Plugin/Port Selection Bit (1b): 0: Send pkt to ports and plugins indicated by MCast Copy Vector. 1: Send pkt to plugin(s) indicated by MCast Copy Vector but not ports and send Plugin(s) the MuticastFields bits MCast CopyVector (11b) One bit for each of the 5 ports and 5 plugins and one bit for the XScale, to drop a MCast, set MCast CopyVector to all 0’s UnicastFields (8b) Drop Bit (1b) 0: handle normally 1: Drop pkt 0: Send packet to port indicated by Unicast Output Port field 1: Send packet to plugin indicated by Unicast Output Plugin field. Unicast Output Port, QID, Stats Index, and NH fields also get sent to plugin Unicast Output Port (3b): Port or XScale 0: Port0, 1: Port1, 2: Port2, 3: Port3, 4: Port4 Unicast Output Plugin (3b): 0: Plugin0, 1: Plugin1, 2: Plugin2, 3: Plugin3, 4: Plugin4 5: XScale (treated like a plugin) QID (16b) Stats Index (16b) NH IP(32b)/MAC(48b) (48b): At most one of NH_IP or NH_MAC should be valid Valid Bits (3b): At most one of the following three bits should be set IP_MCast Valid (1b) (also included above) NH IP Valid (1b) NH MAC Valid (1b) Priority (8b)

46 Auxiliary Filter Auxiliary Filter Lookup Key (140b)
Port (3b): Can be a wildcard (for Unicast, probably not for Multicast) Value of 111b in Port field to denote coming from the XScale Ports numbered 0-4 PluginTag (5b): Can be a wildcard (for Unicast, probably not for Multicast) Plugins numberd 0-4 DAddr (32b) SAddr (32b) Protocol (8b) DPort (16b) Sport (16b) TCP Flags (12b) Exception Bits (16b) Allow for directing of packets based on defined exceptions Can be wildcarded. Auxiliary Filter Lookup Result (93b) Unicast Fields (8b): (No Multicast fields) Drop Bit (1b) (Should never actually be set by control software, but keep here for symmetry with other Unicast Fields) 0: handle normally 1: Drop pkt Plugin/Port Selection Bit (1b): 0: Send packet to port indicated by Unicast Output Port field 1: Send packet to plugin indicated by Unicast Output Plugin field. Unicast Output Port, QID, Stats Index, and NH fields also get sent to plugin Unicast Output Port (3b): Port or XScale 0: Port0, 1: Port1, 2: Port2, 3: Port3, 4: Port4 Unicast Output Plugin (3b): 0: Plugin0, 1: Plugin1, 2: Plugin2, 3: Plugin3, 4: Plugin4 5: XScale QID (16b) Stats Index (16b) NH IP(32b)/MAC(48b) (48b): At most one of NH_IP or NH_MAC should be valid Valid Bits (3b): At most one of the following three bits should be set NH IP Valid (1b) NH MAC Valid (1b) IP_MCast Valid (1b): Should always be 0 for AF Result Sampling bits (2b) : For Aux Filters only 00: “Sample All” 01: Use Random Number generator 1 10: Use Random Number generator 2 11: Use Random Number generator 3

47 TCAM Operations for Lookups
Five TCAM Operations of interest: Lookup (Direct) 1 DB, 1 Result Multi-Hit Lookup (MHL) (Direct) 1 DB, <= 8 Results Simultaneous Multi-Database Lookup (SMDL) (Direct) 2 DB, 1 Result Each DBs must be consecutive! Care must be given when assigning segments to DBs that use this operation. There must be a clean separation of even and odd DBs and segments. Multi-Database Lookup (MDL) (Indirect) <= 8 DB, 1 Result Each Simultaneous Multi-Database Lookup (SMDL) (Indirect) Functionally same as Direct version but key presentation and DB selection are different. DBs need not be consecutive.

48 Lookups: Proposed Design
Use SRAM Bank 0 (2 MB per NPU) for all Results B0 Byte Address Range: 0x – 0x3FFFFF 22 bits B0 Word Address Range: 0x – 0x3FFFFC 20 bits Two trailing 0’s Use 32-bit Associated Data SRAM result for Address of actual Result: Done: 1b Hit: 1b MHit: 1b Priority: 8b Present for Primary Filters, for RL and Aux Filters should be 0 SRAM B0 Word Address: 21b 1 spare bit Use Multi-Database Lookup (MDL) Indirect for searching all 3 DBs Order of fields in Key is important. Each thread will need one TCAM context Route DB: Lookup Size: 68b (3 32b words transferred across QDR intf) Core Size: 72b AD Result Size: 32b SRAM B0 Result Size: 78b (3 Words) Primary DB: Lookup Size: 136b (5 32b words transferred across QDR intf) Core Size: 144b SRAM B0 Result Size: 82b (3 Words) Priority not included in SRAM B0 result because it is in AD result

49 Block Interfaces The next set of slides show the block interfaces
These slides are still very much a work in progress

50 ONL NP Router xScale xScale TCAM SRAM Rx (2 ME) Mux (1 ME) Parse,
64KW TCAM Assoc. Data ZBT-SRAM SRAM 64KW Rx (2 ME) Mux (1 ME) Parse, Lookup, Copy (3 MEs) QM (1 ME) HdrFmt (1 ME) Tx (1 ME) NN 64KW SRAM 64KW Each SRAM Ring NN NN NN NN Plugin0 Plugin1 Plugin2 Plugin3 Plugin4 Scratch Ring xScale SRAM NN NN Ring Stats (1 ME) QM Copy Plugins SRAM FreeList Mgr (1 ME) Tx, QM Parse Plugin XScale

51 ONL NP Router xScale xScale TCAM SRAM Rx (2 ME) Mux (1 ME) Parse,
64KW TCAM Assoc. Data ZBT-SRAM SRAM 64KW Rx (2 ME) Mux (1 ME) Parse, Lookup, Copy (3 MEs) QM (1 ME) HdrFmt (1 ME) Tx (1 ME) NN 64KW SRAM Buf Handle(32b) InPort (4b) Reserved (12b) Eth. Frame Len (16b) 64KW Each SRAM Ring NN NN NN NN Plugin0 Plugin1 Plugin2 Plugin3 Plugin4 Scratch Ring xScale SRAM NN NN Ring Stats (1 ME) QM Copy Plugins SRAM FreeList Mgr (1 ME) Tx, QM Parse Plugin XScale

52 ONL NP Router 1 2 3 7 xScale xScale TCAM SRAM Rx (2 ME) Mux (1 ME)
64KW TCAM Assoc. Data ZBT-SRAM SRAM 64KW Rx (2 ME) Mux (1 ME) Parse, Lookup, Copy (3 MEs) QM (1 ME) HdrFmt (1 ME) Tx (1 ME) NN Flags: Src: Source (2b): 00: Rx 01: XScale 10: Plugin 11: Undefined PT(1b): PassThrough(1)/Classify(0) Reserved (5b) Rsv (4b) Out Port (4b) Buffer Handle(24b) 64KW SRAM 64KW Each L3 (IP, ARP, …) Pkt Length (16b) QID(16b) Plugin Tag (5b) In Port (3b) Flags (8b) Stats Index (16b) SRAM Ring NN NN NN NN Plugin0 Plugin1 Plugin2 Plugin3 Plugin4 Scratch Ring xScale SRAM NN NN Ring Stats (1 ME) QM Copy Plugins SRAM FreeList Mgr (1 ME) Tx, QM Parse Plugin XScale Reserved (5b) Src (2b) PT (1b) 1 2 3 7

53 ONL NP Router QM will not do any Stats Operations
xScale xScale 64KW TCAM Assoc. Data ZBT-SRAM SRAM 64KW Rx (2 ME) Mux (1 ME) Parse, Lookup, Copy (3 MEs) QM (1 ME) HdrFmt (1 ME) Tx (1 ME) NN QM will not do any Stats Operations so it does not need the Stats Index. But the QM code is nasty enough that it will not be easy to change the format for the input. We will attack that change when we do other optimizations for QM. 64KW Rsv (8b) Buffer Handle(24b) SRAM 64KW Each Rsv (4b) Out Port (4b) Rsv (8b) QID(16b) L3 (IP, ARP, …) Pkt Length (16b) Reserved(16b) SRAM Ring NN NN NN NN Plugin0 Plugin1 Plugin2 Plugin3 Plugin4 Scratch Ring xScale SRAM NN NN Ring Stats (1 ME) QM Copy Plugins SRAM FreeList Mgr (1 ME) Tx, QM Parse Plugin XScale

54 ONL NP Router xScale xScale TCAM SRAM Rx (2 ME) Mux (1 ME) Parse,
64KW TCAM Assoc. Data ZBT-SRAM SRAM 64KW Rx (2 ME) Mux (1 ME) Buffer Handle(24b) Rsv (3b) Port (4b) V 1 Parse, Lookup, Copy (3 MEs) QM (1 ME) HdrFmt (1 ME) Tx (1 ME) NN 64KW SRAM 64KW Each SRAM Ring NN NN NN NN Plugin0 Plugin1 Plugin2 Plugin3 Plugin4 Scratch Ring xScale SRAM NN NN Ring Stats (1 ME) QM Copy Plugins SRAM FreeList Mgr (1 ME) Tx, QM Parse Plugin XScale

55 ONL NP Router xScale xScale TCAM SRAM Rx (2 ME) Mux (1 ME) Parse,
64KW TCAM Assoc. Data ZBT-SRAM SRAM 64KW Rx (2 ME) Mux (1 ME) Parse, Lookup, Copy (3 MEs) QM (1 ME) HdrFmt (1 ME) Tx (1 ME) NN 64KW SRAM 64KW Each Buffer Handle(24b) Rsv (3b) Port (4b) V 1 Ethernet DA[47-16] (32b) Ethernet DA[15-0](16b) Ethernet SA[31-0] (32b) Ethernet SA[47-32](16b) Ethernet Type(16b) Reserved (16b) SRAM Ring NN NN NN NN Plugin0 Plugin1 Plugin2 Plugin3 Plugin4 Scratch Ring xScale SRAM NN NN Ring Stats (1 ME) QM Copy Plugins SRAM FreeList Mgr (1 ME) Tx, QM Parse Plugin XScale

56 ONL NP Router 1 2 3 7 xScale xScale TCAM SRAM Rx (2 ME) Mux (1 ME)
Flags(8b): Why pkt is being sent to Plugin TTL(1b): TTL expired Options(1b): IP Options present NoRoute(1b): No matching route or filter NonIP(1b): Non IP Packet received ARP_Needed(1b): NH_IP valid, but no MAC NH_Invalid(1b): NH_IP AND NH_MAC both invalid Reserved(2b): currently unused xScale xScale Rsv (8b) Buffer Handle(24b) TCAM Assoc. Data ZBT-SRAM L3 (IP, ARP, …) Pkt Length (16b) QID(16b) SRAM Plugin Tag (5b) In Port (3b) Flags (8b) Stats Index (16b) 64KW Rx (2 ME) Mux (1 ME) Parse, Lookup, Copy (3 MEs) QM (1 ME) HdrFmt (1 ME) Tx (1 ME) NN NH MAC DA[47:16] (32b) MC: 1: Multiple copies of this pkt exist in the system 0: This is the only copy of pkt NH MAC DA[15:0] (16b) EtherType (16b) M C (1b) Rsv (3b) Out Port (4b) Buffer Handle(24b) 64KW Reserved (16b) Unicast/MCast Bits (16b) SRAM L3 (IP, ARP, …) Pkt Length (16b) QID(16b) 64KW Each Plugin Tag (5b) In Port (3b) Rsv (8b) Stats Index (16b) SRAM Ring NH MAC DA[47:16] (32b) NN NN NN NN Plugin0 Plugin1 Plugin2 Plugin3 Plugin4 Scratch Ring xScale SRAM NH MAC DA[15:0] (16b) EtherType (16b) NN NN Ring Reserved (16b) Unicast/MCast bits (16b) Stats (1 ME) QM Copy Plugins SRAM FreeList Mgr (1 ME) Tx, QM Parse Plugin XScale 1 2 3 7 Reserved (2b) NR (1b) TTL Opt NI ARP NH INV

57 ONL NP Router 1 2 3 7 xScale xScale TCAM SRAM Rx (2 ME) Mux (1 ME)
64KW TCAM Assoc. Data ZBT-SRAM SRAM 64KW Rx (2 ME) Mux (1 ME) Parse, Lookup, Copy (3 MEs) QM (1 ME) HdrFmt (1 ME) Tx (1 ME) NN Flags: PT(1b): PassThrough(1)/Classify(0) Reserved (7b) Rsv (4b) Out Port (4b) Buffer Handle(24b) 64KW SRAM L3 (IP, ARP, …) Pkt Length (16b) 64KW Each QID(16b) Plugin Tag (5b) In Port (3b) Flags (8b) Stats Index (16b) SRAM Ring NN NN NN NN Plugin0 Plugin1 Plugin2 Plugin3 Plugin4 Scratch Ring xScale SRAM NN NN Ring Reserved (7b) PT (1b) 1 2 3 7 Stats (1 ME) QM Copy Plugins SRAM FreeList Mgr (1 ME) Tx, QM Parse Plugin XScale

58 ONL NP Router 7 3 2 1 xScale xScale TCAM SRAM Rx (2 ME) Mux (1 ME)
Flags(8b): Why pkt is being sent to XScale TTL(1b): TTL expired Options(1b): IP Options present NoRoute(1b): No matching route or filter NonIP(1b): Non IP Packet received ARP_Needed(1b): NH_IP valid, but no MAC NH_Invalid(1b): NH_IP AND NH_MAC both invalid ARP_DB_Update(1b): Update ARP DB with result Reserved(1b): currently unused xScale xScale L3 (IP, ARP, …) Pkt Length (16b) Buffer Handle(24b) Stats Index (16b) QID(16b) In Port (3b) Plugin Tag (5b) Flags (8b) Rsv NH MAC DA[47:16] (32b) NH MAC DA[15:0] (16b) EtherType (16b) Unicast/MCast Bits Reserved (16b) TCAM Assoc. Data ZBT-SRAM SRAM 64KW Rx (2 ME) Mux (1 ME) Parse, Lookup, Copy (3 MEs) QM (1 ME) HdrFmt (1 ME) Tx (1 ME) NN 64KW SRAM 64KW Each SRAM Ring NN NN NN NN Plugin0 Plugin1 Plugin2 Plugin3 Plugin4 Scratch Ring xScale SRAM NN NN Ring Rsvd (1b) Stats (1 ME) QM Copy Plugins SRAM FreeList Mgr (1 ME) Tx, QM Parse Plugin XScale ARP DB (1b) NH INV (1b) ARP (1b) NI (1b) NR (1b) Opt (1b) TTL (1b) 7 3 2 1

59 ONL NP Router xScale xScale TCAM SRAM Rx (2 ME) Mux (1 ME) Parse,
Flags: PassThrough/Classify (1b): Reserved (7b) L3 (IP, ARP, …) Pkt Length (16b) Buffer Handle(24b) Stats Index (16b) QID(16b) In Port (3b) Plugin Tag (5b) Flags (8b) Rsv (4b) Out 64KW TCAM Assoc. Data ZBT-SRAM SRAM 64KW Rx (2 ME) Mux (1 ME) Parse, Lookup, Copy (3 MEs) QM (1 ME) HdrFmt (1 ME) Tx (1 ME) NN 64KW SRAM 64KW Each SRAM Ring NN NN NN NN Plugin0 Plugin1 Plugin2 Plugin3 Plugin4 Scratch Ring xScale SRAM NN NN Ring Stats (1 ME) QM Copy Plugins SRAM FreeList Mgr (1 ME) Tx, QM Parse Plugin XScale

60 ONL NP Router xScale xScale TCAM SRAM Rx (2 ME) Mux (1 ME) Parse,
64KW TCAM Assoc. Data ZBT-SRAM SRAM 64KW Rx (2 ME) Mux (1 ME) Parse, Lookup, Copy (3 MEs) QM (1 ME) HdrFmt (1 ME) Tx (1 ME) NN 64KW SRAM 64KW Each SRAM Ring NN NN NN NN Plugin0 Plugin1 Plugin2 Plugin3 Plugin4 Opcode (4b) Data (12b) Stats Index (16b) Scratch Ring xScale SRAM NN NN Ring Stats (1 ME) QM Copy Plugins SRAM FreeList Mgr (1 ME) Tx, QM Parse Plugin XScale

61 ONL NP Router xScale xScale TCAM SRAM Rx (2 ME) Mux (1 ME) Parse,
64KW TCAM Assoc. Data ZBT-SRAM SRAM 64KW Rx (2 ME) Mux (1 ME) Parse, Lookup, Copy (3 MEs) QM (1 ME) HdrFmt (1 ME) Tx (1 ME) NN 64KW SRAM 64KW Each SRAM Ring NN NN NN NN Plugin0 Plugin1 Plugin2 Plugin3 Plugin4 Buffer Handle(24b) Reserved (8b) Scratch Ring xScale SRAM NN NN Ring Stats (1 ME) QM Copy Plugins SRAM FreeList Mgr (1 ME) Tx, QM Parse Plugin XScale

62 Extra Slides Everything after this is either OLD or is just extra support data for me to use.

63 ONL NP Router TCAM xScale Parse Lookup Copy QM SRAM Plugins
Assoc. Data ZBT-SRAM Input Data Buffer Handle In Plugin In Port Out Port Flags Source (3b): Rx/XScale/Plugin PassThrough/Classify (1b): Reserved (4b) QID Frame Length Stats Index Exception Bits (16b) TTL Expired IP Options present No Route Auxiliary Result Valid (1b) CopyVector (10b) NH IP/MAC (48b) QID (16b) LD (1b): Send to XScale Drop (1b): Drop pkt NH IP Valid (1b) NH MAC Valid (1b) IP_MCast Valid (1b) Sampling bits (2b) xScale Parse Lookup Copy QM SRAM Plugins Key (136b) Port/Plugin (4b) 0-4: Port 5-9: Plugin 15: XScale DAddr (32b) SAddr (32b) Protocol (8b) DPort (16b) Sport (16b) TCP Flags (12b) Exception Bits (16b) Control Flags PassThrough/Reclassify Primary Result Valid (1b) CopyVector (10b) NH IP/MAC (48b) QID (16b) LD (1b): Send to XScale Drop (1b): Drop pkt Valid Bits (3b) NH IP Valid (1b) NH MAC Valid (1b) IP_MCast Valid (1b)

64 Lookup Results Results of a lookup could be: 1 PF/RL Result:
IP Unicast: 1 packet sent to a Port Plugin Unicast: 1 packet sent to a Plugin Unicast with Plugin Copies: 0 or 1 packet sent to a port 1-5 copies sent to plugin(s) IP Multicast: 0-10 copies sent 1 to each of 5 ports and one to each of 5 plugins 1 Aux Filter Result: 0 or 1 copy sent to a Port 1-5 copies sent to plugins

65 PLC Main() { If (PassThrough) { Copy() } Else { Parse() if (!Drop) {
Lookup()

66 PLC Lookup() { write KEY to TCAM
use timestamp delay to wait appropriate time while !DoneBit // DONE Bit BUG Fix requires reading just first word read 1 word from Results Mailbox check DoneBit done read words 2 and 3 from Results Mailbox If (PrimaryFilter and RouteLookup results HIT) { compare priorities PrimaryResult.Valid  TRUE store higher priority result as Primary Result (read result from SRAM Bank0) } else if (PrimaryFilter results HIT) { PrimaryResults.*  PrimaryFilter.* (read result from SRAM Bank0) else if (RouterLookup results HIT) { PrimaryResults.*  RouteLookup.* (read result from SRAM Bank0) if (AuxiliaryFilter result HIT) { store result as Auxiliary Result (read result from SRAM Bank0) mark Auxiliary Result VALID

67 PLC Copy() { currentRefCnt  Read(Buffer Descriptor Ref Cnt)
copyCount  0 outputData.bufferHandle  inputData.bufferHandle outputData.QID  inputData.QID outputData.frameLength  inputData.frameLength outputData.statsIndex  inputData.statsIndex if (PassThrough) { // It came from either XScale or Plugin, process inputData copyCount  1 if (inputData.outPort == XScale) { // Do we need to include any additional flags when sending to XScale? outputData.outPort  inputData.outPort outputData.Flags  inputData.Flags outputData.inPort  inputData.inPort outputData.Plugin  inputData.Plugin // Packets to XScale do not (we think) need addition Header buf desc. sendToXScale() } if (inputData.outPort == {Port}) { // Pass Through pkt should already have MAC DAddr in buffer desc. // Pass Through pkt should not need any additional Header buf desc. sendToQM() if (inputData.outPort == {Plugin}) { // Packets to Plugins do not need addition Header buf desc. sendToPlugin(Plugin#) return

68 PLC else { // Process Lookup Results
// PrimaryResult is either Primary Filter or Route Lookup, depending on Priority if (PrimaryResult.Valid == TRUE) { if (PrimaryResult.IP_MCastValid == TRUE) { IP_MCast_Daddr = read DRAM MacDAddr = calculateMCast(IP_MCast_Daddr) } else { // Unicast if (countPorts(PrimaryResult.copyVector) > 1) { ILLEGAL if (PrimaryResult.NH_Mac_Valid == TRUE) { MacDAddr = PrimaryResult.NH_Address copyCount = copyCount + countOnes(PrimaryResult.copyVector); if (AuxiliaryResult.Valid == TRUE) { if (countPorts(AuxiliaryResult.copyVector) > 1) { copyCount = copyCount + countOnes(AuxialiaryResult.copyVector); update reference counter in pkt buffer descriptor for each copy{ if ((copy is going to QM) and ((copyCount + currentRefCnt) > 1)) { Add header SRAM buffer descriptor and header DRAM buffer sendCopy(header Buffer Descriptor) else { sendCopy(Pkt Buffer Descriptor)

69 ONL NP Router xScale xScale TCAM SRAM Rx (2 ME) Mux (1 ME) Parse,
Ring Scratch Ring TCAM Assoc. Data ZBT-SRAM SRAM NN NN Ring 64KW Rx (2 ME) Mux (1 ME) Parse, Lookup, Copy (3 MEs) QM (1 ME) HdrFmt (1 ME) Tx (1 ME) NN Mostly Unchanged 64KW SRAM 64KW Each New NN NN NN NN Plugin0 Plugin1 Plugin2 Plugin3 Plugin4 xScale SRAM Needs A Lot Of Mod. Needs Some Mod. Stats (1 ME) Tx, QM Parse Plugin XScale QM Copy Plugins FreeList Mgr (1 ME) SRAM

70 ONL NP Router TCAM SRAM Rx (2 ME) Mux (1 ME) Parse, Lookup, Copy
(3 MEs) Queue Manager (1 ME) HdrFmt (1 ME) Tx (2 ME)

71 ONL NP Router TCAM SRAM Rx (2 ME) Mux (1 ME) Parse, Lookup, Copy
Frame Length (16b) Buffer Handle(32b) Stats Index (16b) QID(20b) Rsv (4b) Port Buf Handle(32b) Port (8b) Reserved Eth. Frame Len (16b) TCAM SRAM Rx (2 ME) Mux (1 ME) Parse, Lookup, Copy (3 MEs) Queue Manager (1 ME) HdrFmt (1 ME) Tx (2 ME) Buf Handle(24b) Frm Offset (16b) Frm Length(16b) Port (8) Buffer Handle(24b) Rsv (3b) Port (4b) V 1 Buffer Handle(24b) Rsv (3b) Port (4b) V 1

72 ONL NP Router Parse Lookup Do IP Router checks Extract lookup key
Buf Handle(24b) Frm Offset (16b) Frm Length(16b) Port (8) Frame Length (16b) Buffer Handle(32b) Stats Index (16b) QID(20b) Rsv (4b) Port TCAM Copy Port: Identifies Source MAC Addr Write it to buffer descriptor or let HF determine it via port? Unicast: Valid MAC: Write MAC Addr to Buffer descriptor and queue pkt No Valid MAC: Prepare pkt to be sent to XScale for ARP processing Multicast: Calculate Ethernet multicast Dst MAC Addr Fct(IP Multicast Dst Addr) Write Dst MAC Addr to buf desc. Same for all copies! For each bit set in copy bit vector: Queue a packet to port represented by bit in bit vector. Reference Count in buffer desc. Parse, Lookup, PHF&Copy (3 MEs) Parse Do IP Router checks Extract lookup key Lookup Perform lookups – potentially three lookups: Route Lookup Primary Filter lookup Auxiliary Filter lookup

73 ONL NP Router xScale xScale add configurable per port delay (up to 150 ms total delay) add large SRAM ring TCAM Assoc. Data ZBT-SRAM SRAM Rx (2 ME) Mux (1 ME) Parse, Lookup, Copy (4 MEs) Queue Manager (1 ME) HdrFmt (1 ME) Tx (1 ME) Stats (1 ME) large SRAM ring Each output has common set of QiDs Multicast copies use same QiD for all outputs QiD ignored for plugin copies Plugin Plugin Plugin Plugin Plugin xScale SRAM large SRAM ring Plugin write access to QM Scratch Ring

74 ONL NP Router Each output has common set of QiDs
xScale xScale TCAM SRAM Rx (2 ME) Mux (1 ME) Parse, Lookup, Copy (4 MEs) Queue Manager (1 ME) HdrFmt (1 ME) Tx (1 ME) Plugin1 Plugin2 Plugin3 Plugin4 Each output has common set of QiDs Multicast copies use same QiD for all outputs QiD ignored for plugin copies Stats (1 ME) NN NN NN NN Plugin0 xScale SRAM

75 Lookup Results Results of a lookup could be:
1 PF/RL Result: IP Unicast: 1 packet sent to a Port Plugin Unicast: 1 packet sent to a Plugin Unicast with Plugin Copies: 0 or 1 packet sent to a port 1-5 copies sent to plugin(s) IP Multicast: 0-10 copies sent 1 to each of 5 ports and one to each of 5 plugins 1 Aux Filter Result: 0 or 1 copy sent to a Port 1-5 copies sent to plugins Valid Combinations of the Above: (A1 or A3) and (B1 or B3) Potentially two different unicast MAC DAddresses needed (A1 or A3) and B2 A1 and (B1 or B3) A2 and B2 A4 and B4 Potentially 1 unicast MAC DAddr and 1 multicast MAC DAddr needed

76 PLC Input Data Control Flags Key (136b) Primary Result
Buffer Handle In Plugin In Port Out Port Flags Source (3b): Rx/XScale/Plugin PassThrough/Classify (1b): Reserved (4b) QID Frame Length Stats Index Control Flags PassThrough/Reclassify Key (136b) Port/Plugin (4b) 0-4: Port 5-9: Plugin 15: XScale DAddr (32b) SAddr (32b) Protocol (8b) DPort (16b) Sport (16b) TCP Flags (12b) Exception Bits (16b) TTL Expired IP Options present No Route Primary Result Valid (1b) CopyVector (10b) NH IP/MAC (48b) QID (16b) LD (1b): Send to XScale Drop (1b): Drop pkt Valid Bits (3b) NH IP Valid (1b) NH MAC Valid (1b) IP_MCast Valid (1b) Auxiliary Result Sampling bits (2b) Output Data Buffer Handle Plugin (To XScale only) In Port (To XScale only) Out Port (To XScale or QM only) Flags (To XScale only) QID Frame Length Stats Index

77 SRAM Buffer Descriptor
Problem: With the use of Filters, Plugins and recycling back around for reclassification, we can end up with an arbitrary number of copies of one packet in the system at a time. Each copy of a packet could end up going to an output port and need a different MAC DAddr from all the other packets Having one Buffer Descriptor per packet regardless of the number of copies will not be sufficient. Solution: When there are multiple copies of the packet in the system, each copy will need a separate Header buffer descriptor which will contain the MAC DAddr for that copy. When the Copy block gets a packet that it only needs to send one copy to QM, it will read the current reference count and if this copy is the ONLY copy in the system, it will not prepend the Header buffer descriptor. SRAM buffer descriptors are the scarce resource and we want to optimize their use. Therefore: We do NOT want to always prepend a header buffer descriptor Otherwise, Copy will prepend a Header buffer descriptor to each copy going to the QM. Copy does not need to prepend a Header buffer descriptor to copies going to plugins We have to think some more about the case of copies going to the XScale. The Header buffer descriptors will come from the same pool (freelist 0) as the PacketPayload buffer descriptors. There is no advantage to associating these Header buffer descriptors with small DRAM buffers. DRAM is not the scarce resource SRAM buffer descriptors are the scarce resource.

78 MR Buffer Descriptor Buffer_Next (32b) Buffer_Size (16b) Offset (16b)
LW0 Buffer_Size (16b) Offset (16b) LW1 Packet_Size (16b) Free_list 0000 (4b) Reserved (4b) Reserved (8b) LW2 Reserved (16b) Stats Index (16b) LW3 Reserved (16b) Reserved (8b) Reserved (4b) Reserved (4b) LW4 Reserved (4b) Reserved (4b) Reserved (32b) LW5 Reserved (16b) Reserved (16b) LW6 Packet_Next (32b) LW7

79 Intel Buffer Descriptor
Buffer_Next (32b) LW0 Buffer_Size (16b) Offset (16b) LW1 Packet_Size (16b) Free_list (4b) Rx_stat (4b) Hdr_Type (8b) LW2 Input_Port (16b) Output_Port (16b) LW3 Next_Hop_ID (16b) Fabric_Port (8b) Reserved (4b) NHID type (4b) LW4 ColorID (4b) Reserved (4b) FlowID (32b) LW5 Class_ID (16b) Reserved (16b) LW6 Packet_Next (32b) LW7

80 SRAM Accesses Per Packet
To support 8.22 M pkts/sec we can Read 24 Words and Write 24 Words per pkt (200M/8.22M) Rx: SRAM Dequeue (1 Word) To retrieve a buffer descriptor from free list Write buffer desc (2 Words) Parse Lookup TCAM Operations Reading Results Copy Write buffer desc (3 Words) Ref_cnt MAC DAddr Stats Index Pre-Q stats increments Read: 2 Words Write: 2 Words HF Should not need to read or write any of the buffer descriptor Tx Read buffer desc (4 Words) Freelist Mgr: SRAM Enqueue – Write 1 Word To return buffer descriptor to free list.

81 QM SRAM Accesses Per Packet
QM (Worst case analysis) Enqueue (assume queue is idle and not loaded in Q-Array) Write Q-Desc (4 Words) Eviction of Least Recently Used Queue Write Q-Params ? When we evict a Q do we need to write its params back? The Q-Length is the only thing that the QM is changing. Looks like it writes it back ever time it enqueues or dequeues AND it writes it back when it evcicts (we can probably remove the one when it evicts) Read Q-Desc (4 Words) Read Q-Params (3 Words) Q-Length, Threshold, Quantum Write Q-Length (1 Word) SRAM Enqueue -- Write (1 Word) Scheduling structure accesses? They are done once every 5 pkts (when running full rate) Dequeue (assume queue is not loaded in Q-Array) See notes in enqueue section SRAM Dequeue -- Read (1 Word) Post-Q stats increments 2 Reads 2 Writes

82 QM SRAM Accesses Per Packet
QM (Worst case analysis) Total Per Pkt accesses: Queue Descriptors and Buffer Enq/Deq: Write: 9 Words Read: 9 Words Queue Params: Write: 2 Words Read: 6 Words Scheduling Structure Accesses Per Iteration (batch of 5 packets): Advance Head: Read 11 Words Write Tail: Write 11 Words Update Freelist Read 2 Words OR Write 5 Words

83 TCAM Core Lookup Performance
Routes Filters Lookup/Core size of 72 or 144 bits, Freq=200MHz CAM Core can support 100M searches per second For 1 Router on each of NPUA and NPUB: 8.22 MPkt/s per Router 3 Searches per Pkt (Primary Filter, Aux Filter, Route Lookup) Total Per Router: M Searches per second TCAM Total: M Searches per second So, the CAM Core can keep up Now lets look at the LA-1 Interfaces…

84 TCAM LA-1 Interface Lookup Performance
Routes Filters Lookup/Core size of 144 bits (ignore for now that Route size is smaller) Each LA-1 interface can support 40M searches per second. For 1 Router on each of NPUA and NPUB (each NPU uses a separate LA-1 Intf): 8.22 MPkt/s per Router Maximum of 3 Searches per Pkt (Primary Filter, Aux Filter, Route Lookup) Max of 3 assumes they are each done as a separate operation Total Per Interface: M Searches per second So, the LA-1 Interfaces can keep up Now lets look at the AD SRAM Results …

85 TCAM Assoc. Data SRAM Results Performance
8.22M 72b or 144b lookups 32b results consumes 1/12 64b results consumes 1/6 128b results consumes 1/3 Routes Filters Lookup/Core size of 72 or 144 bits, Freq=200MHz, SRAM Result Size of 128 bits Associated SRAM can support up to 25M searches per second. For 1 Router on each of NPUA and NPUB: 8.22 MPkt/s per Router 3 Searches per Pkt (Primary Filter, Aux Filter, Route Lookup) Total Per Router: M Searches per second TCAM Total: M Searches per second So, the Associated Data SRAM can NOT keep up

86 Lookups: Latency Three searches in one MDL Indirect Operation
Latencies for operation QDR xfer time: 6 clock cycles 1 for MDL Indirect subinstruction 5 for 144 bit key transferred across QDR Bus Instruction Fifo: 2 clock cycles Synchronizer: 3 clock cycles Execution Latency: search dependent Re-Synchronizer: 1 clock cycle Total: 12 clock cycles

87 Lookups: Latency 144 bit DB, 32 bits of AD (two of these)
Instruction Latency: 30 Core blocking delay: 2 Backend latency: 8 72 bit DB, 32 bits of AD Core blocking delay:2 Latency of first search (144 bit DB): = 41 clock cycles Latency of subsequent searchs: (previous search latency) – (backend latency of previous search) + (core block delay of previous search) + (backend latency of this search) Latency of second 144 bit search: 41 – = 43 Latency of third search (72 bit): 43 – = 45 clock cycles 45 QDR Clock cycles (200 MHz clock)  315 IXP Clock cycles (1400 MHz clock) This is JUST for the TCAM operation, we also need to read the SRAM: SRAM Read to retrieve TCAM Results Mailbox (3 words – one per search) TWO SRAM Reads to then retrieve the full results (3 Words each) from SRAM Bank 0 but we don’t have to wait for one to complete before issuing the second. About 150 IXP cycles for an SRAM Read  = 615 IXP Clock cycles Lets estimate 650 IXP Clock cycles for issuing, performing and retrieving results for a lookup. (multi-word, two reads, …) Does not include any lookup block processing

88 Lookups: SRAM Bandwidth
Analysis is PER LA-1 QDR Interface That is, each of NPUA and NPUB can do the following. 16-bit QDR SRAM at 200 MHz Separate read and write bus Operations on rising and falling edge of each clock 32 bits of read AND 32 bits of write per clock tick QDR Write Bus: 6 32-bit cycles per instruction Cycle 0: Write Address bus contains the TCAM Indirect Instruction Write Data bus contains the TCAM Indirect MDL Sub-Instruction Cycles 1-5 Write Data bus contains the 5 words of the Lookup Key Write Bus can support 200M/6 = M searches/sec QDR Read Bus: Retrieval of Results Mailbox: 3 32-bit cycles per instruction Retrieval of two full results from QDR SRAM Bank 0: Total of 9 32-bit cycles per instruction Read Bus can support 200M/9 = M searches/sec Conclusion: Plenty of SRAM bandwidth to support TCAM operations AND SRAM Bank 0 accesses to perform all aspects of lookups at over 8.22 M searches/sec.

89 Lookups Route Lookup: Key (72b)
Port (4b): Can be a wildcard (for Unicast, probably not for Multicast) Value of 1111b in Port field to denote coming from the XScale Ports numbered 0-4 Plugin (4b): Can be a wildcard (for Unicast, probably not for Multicast) Plugins numberd 0-4 DAddr (32b) Prefixed for Unicast Exact Match for Multicast SAddr (32b) Unicast entries always have this and its mask set to 0 Prefixed for Multicast Result (99b) CopyVector (11b) One bit for each of the 5 ports and 5 plugins and one bit for the XScale PluginOutputPortVector(5b) (under consideration) This would allow users to send packets to a plugin which could then send it along to output port(s). The copyvector is not useful for this since bits set in the copyvector would cause the Copy block to send out multiple copies to different places. QID (16b) Stats Index (16b) NH_IP/NH_MAC (48b) At most one of NH_IP or NH_MAC should be valid Valid Bits (3b) At most one of the following three bits should be set IP_MCast Valid (1b) NH_IP_Valid (1b) NH_MAC_Valid (1b)

90 Lookups Filter Lookup Key (140b)
Port (4b): Can be a wildcard (for Unicast, probably not for Multicast) Value of 1111b in Port field to denote coming from the XScale Ports numbered 0-4 Plugin (4b): Can be a wildcard (for Unicast, probably not for Multicast) Plugins numberd 0-4 DAddr (32b) SAddr (32b) Protocol (8b) DPort (16b) Sport (16b) TCP Flags (12b) Exception Bits (16b) Allow for directing of packets based on defined exceptions Result (109b) CopyVector (11b) One bit for each of the 5 ports and 5 plugins and one bit for the XScale PluginOutputPortVector(5b) (under consideration) This would allow users to send packets to a plugin which could then send it along to output port(s). The copyvector is not useful for this since bits set in the copyvector would cause the Copy block to send out multiple copies to different places. NH IP(32b)/MAC(48b) (48b) At most one of NH_IP or NH_MAC should be valid QID (16b) Stats Index (16b) Valid Bits (3b) At most one of the following three bits should be set NH IP Valid (1b) NH MAC Valid (1b) IP_MCast Valid (1b) Sampling bits (2b) : For Aux Filters only 00: “Sample All” Priority (8b) : For Primary Filters only

91 Filters, Ports/Plugins, Unicast/Multicast, Etc
The following slides are my thoughts on the current problems we have been having with filters, plugins etc. and my proposal on how to address the problems. We have perhaps over-generalized our model/design. We have a copy vector in all of our filter and route lookup results allowing users to copy any packet to <= 1 Port and 0 or more plugins, even when the filter/route is “Unicast” We have also generalized the design to allow plugins to send a packet directly to the QM either by putting it directly into the QM input ring or sending it back to the MUX block and then the PLC block with a flag bit indicating the packet is to NOT be reclassified. This has led to difficulties in supporting what we think Plugins will want to do with packets when they receive them. There is only one set of output information (Port, QID, NH Address) There is no easy way to indicate to a Plugin through the Filter or Route Lookup result where it should send the packet next. We have also generalized the Route Lookups so that they could be easily implemented as a Primary Filter. This can be somewhat confusing to someone who may ask the question of why we have both and which one should be used when.

92 Filters, Ports/Plugins, Unicast/Multicast, Etc
I propose we return to a simpler model: A Unicast Primary Filter has a result that can send ONE copy to either ONE Port or to ONE Plugin. A Unicast Route Lookup has a result that can send ONE copy to either ONE Port or to ONE Plugin. A Unicast Auxiliary Filter can be used to send ONE copy to either ONE Port or to ONE Plugin. Plugins are allowed to make copies of packets and send them where they want: The Plugin Framework will support the making of copies Packets with an IP Multicast Destination Address can match: Primary Filters and Route Lookups: Result in 0 to 10 copies Each copy can go to 1 of the 5 output ports or 1 of the 5 plugins Two copies can NOT go to the same Port or to the same Plugin directly: Plugins can redirect a packet anywhere after it processes thus causing two or more copies end up going to the same Port or Plugin. Auxiliary Filters: Result in 1 copy going to either ONE port or ONE Plugin An Auxiliary Filter matching an IP Multicast Destination Address should provide the NH IP or NH MAC Address if it is directed to an Output Port. (IE: Aux filters will not utilize the IP MCast Address to Ethernet MCast Address translation)

93 Filters, Ports/Plugins, Unicast/Multicast, Etc
Route Lookup: Result (101b) MCast CopyVector (11b) One bit for each of the 5 ports and 5 plugins and one bit for the XScale Plugin/Port Selection Bit (1b): 0: Send packet to port indicated by Unicast Output Port field 1: Send packet to plugin indicated by Unicast Output Plugin field Unicast Output Port, QID, Stats Index, and NH fields also get sent to plugin Unicast Output Port (3b): Port or XScale 0: Port0, 1: Port1, 2: Port2, 3: Port3, 4: Port4 Unicast Output Plugin (3b): 0: Plugin0, 1: Plugin1, 2: Plugin2, 3: Plugin3, 4: Plugin4 5: XScale QID (16b) Stats Index (16b) NH_IP/NH_MAC (48b) At most one of NH_IP or NH_MAC should be valid Valid Bits (3b) At most one of the following three bits should be set IP_MCast Valid (1b) NH_IP_Valid (1b) NH_MAC_Valid (1b) We can probably be clever and overload the MCast CopyVector field and use it for the Plugin/Port Selection bit, Unicast Output Port and Unicast Output Plugin fields as well. If the IP_MCast Valid bit is set then it is an MCast_CopyVector if the IP_MCast Valid Bit is not set then it is used as the Unicast fields.

94 Filters, Ports/Plugins, Unicast/Multicast, Etc
Primary Filter Lookup Result (119b) Plugin/Port Selection Bit (1b): 0: Unicast: Send packet to port indicated by Unicast Output Port field MCast: Send pkt to both Port and Plugin. Does it get the MCast CopyVector? 1: Unicast: Send packet to plugin indicated by Unicast Output Plugin field. Unicast Output Port, QID, Stats Index, and NH fields also get sent to plugin MCast: Send pkt to all Plugin bits set, include MCast CopyVector in data going to plugins MCast CopyVector (11b) One bit for each of the 5 ports and 5 plugins and one bit for the XScale Unicast Output Port (3b): Port or XScale 0: Port0, 1: Port1, 2: Port2, 3: Port3, 4: Port4 Unicast Output Plugin (3b): 0: Plugin0, 1: Plugin1, 2: Plugin2, 3: Plugin3, 4: Plugin4 5: XScale NH IP(32b)/MAC(48b) (48b) At most one of NH_IP or NH_MAC should be valid QID (16b) Stats Index (16b) Valid Bits (3b) At most one of the following three bits should be set NH IP Valid (1b) NH MAC Valid (1b) IP_MCast Valid (1b) Priority (8b) We can probably be clever and overload the MCast CopyVector field and use it for the Plugin/Port Selection bit, Unicast Output Port and Unicast Output Plugin fields as well. If the IP_MCast Valid bit is set then it is an MCast_CopyVector if the IP_MCast Valid Bit is not set then it is used as the Unicast fields.

95 Filters, Ports/Plugins, Unicast/Multicast, Etc
Auxiliary Filter Lookup Result (92b) Plugin/Port Selection Bit (1b): 0: Send packet to port indicated by Unicast Output Port field Ignore Unicast Output Plugin field 1: Send packet to plugin indicated by Unicast Output Plugin field Unicast Output Port, QID, Stats Index, and NH fields also get sent to plugin Unicast Output Port (3b): Port or XScale 0: Port0, 1: Port1, 2: Port2, 3: Port3, 4: Port4 5: XScale Unicast Output Plugin (3b): 0: Plugin0, 1: Plugin1, 2: Plugin2, 3: Plugin3, 4: Plugin4 NH IP(32b)/MAC(48b) (48b) At most one of NH_IP or NH_MAC should be valid QID (16b) Stats Index (16b) Valid Bits (3b) At most one of the following three bits should be set NH IP Valid (1b) NH MAC Valid (1b) IP_MCast Valid (1b) Sampling bits (2b) : For Aux Filters only 00: “Sample All”

96 Filters, Ports/Plugins, Unicast/Multicast, Etc
A packet could match either a Primary Filter or a Route Lookup but not both. If it matches both, one of the will be selected based on priority. In addition to a Primary Filter or Route Lookup, a packet could match an Auxiliary Filter Extra copies of Unicast flows could be made using Auxiliary filters, Plugins and sending packets back through PLC for reclassification. With this model, I believe a Plugin would have the information it would need to process a packet and send it on to the specified output port.

97 Filters, Ports/Plugins, Unicast/Multicast, Etc
This is the end of the slides on Filters, Ports/Plugins, Unicast/Multicast, Etc.

98 ONL NP Router xScale Rx (2 ME) Mux (1 ME) Parse, Lookup, Copy (3 MEs)
Flags: PassThrough/Classify (1b): Reserved (7b) Flags: Source (3b): Rx/XScale/Plugin PassThrough/Classify (1b): Reserved (4b) Rsv (4b) Out Port (4b) Buffer Handle(24b) Rsv (4b) Out Port (4b) Buffer Handle(24b) L3 (IP, ARP, …) Pkt Length (16b) QID(16b) xScale L3 (IP, ARP, …) Pkt Length (16b) QID(16b) In Plugin (4b) In Port (4b) Flags (8b) Stats Index (16b) In Plugin (4b) In Port (4b) Flags (8b) Stats Index (16b) 64KW Rx (2 ME) Mux (1 ME) Parse, Lookup, Copy (3 MEs) 64KW Flags: PassThrough/Classify (1b): Reserved (7b) Buf Handle(32b) InPort (4b) Reserved (12b) Eth. Frame Len (16b) Rsv (4b) Out Port (4b) Buffer Handle(24b) L3 (IP, ARP, …) Pkt Length (16b) QID(16b) plugins In Plugin (4b) In Port (4b) Flags (8b) Stats Index (16b) What is the priority for servicing input rings In Port: Used as part of lookup key In Plugin: Used as part of lookup key Out Port: Used to tell QM, HF and Tx physical interface pkt is destined for SRAM Ring Scratch Ring NN Ring

99 ONL NP Router TCAM xScale Lookup Parse Copy QM SRAM Plugins
Rsv (4b) Out Port (4b) Buffer Handle(24b) Rsv (4b) Out Port (4b) Buffer Handle(24b) L3 (IP, ARP, …) Pkt Length (16b) QID(16b) L3 (IP, ARP, …) Pkt Length (16b) QID(16b) In Plugin (4b) In Port (4b) Flags (8b) Stats Index (16b) In Plugin (4b) In Port (4b) Flags (8b) Stats Index (16b) TCAM Assoc. Data ZBT-SRAM xScale Lookup Rsv (4b) Out Port (4b) Buffer Handle(24b) Parse Copy QM L3 (IP, ARP, …) Pkt Length (16b) QID(16b) SRAM Plugins Reserved (8b) Buffer Handle(24b) L3 (IP, ARP, …) Pkt Length (16b) QID(16b) In Plugin (4b) In Port (4b) Rsv (8b) Stats Index (16b)

100 ONL NP Router QM will not do any Stats Operations so it does not
xScale xScale TCAM Assoc. Data ZBT-SRAM SRAM 64KW Rx (2 ME) Mux (1 ME) Parse, Lookup, Copy (3 MEs) QM (1 ME) HdrFmt (1 ME) Tx (1 ME) NN 64KW QM will not do any Stats Operations so it does not Need the Stats Index. SRAM 64KW Each L3 (IP, ARP, …) Pkt Length (16b) Buffer Handle(24b) QID(16b) Rsv (4b) Out Port SRAM Ring NN NN NN NN Plugin0 Plugin1 Plugin2 Plugin3 Plugin4 Scratch Ring xScale SRAM NN NN Ring Stats (1 ME) QM Copy Plugins SRAM FreeList Mgr (1 ME) Tx, QM Parse Plugin XScale


Download ppt "An NP-Based Router for the Open Network Lab Design"

Similar presentations


Ads by Google