ONL NP Router xScale xScale TCAM SRAM Rx (2 ME) Mux (1 ME) Parse, Assoc. Data ZBT-SRAM SRAM 64KW Rx (2 ME) Mux (1 ME) Parse, Lookup, Copy (3 MEs) QM (1 ME) HdrFmt (1 ME) Tx (1 ME) NN 64KW SRAM 64KW Each SRAM Ring NN NN NN NN Plugin0 Plugin1 Plugin2 Plugin3 Plugin4 Scratch Ring xScale SRAM NN NN Ring Stats (1 ME) QM Copy Plugins SRAM FreeList Mgr (1 ME) Tx, QM Parse Plugin XScale
MUX -> PLC PLC Mux 1 2 3 7 Buffer Handle(24b) QID(16b) Flags: Src: Source (2b): 00: Rx 01: XScale 10: Plugin 11: Undefined PT(1b): PassThrough(1)/Classify(0) Reserved (5b) L3 (IP, ARP, …) Pkt Length (16b) Stats Index (16b) QID(16b) In Port (3b) Plugin Tag (5b) Flags (8b) Rsv (4b) Out Buffer Handle(24b) PLC Mux Reserved (5b) Src (2b) PT (1b) 1 2 3 7
PLC -> XScale PLC 1 2 3 7 4 5 XScale SRAM xScale Buffer Handle(24b) Flags(8b): Why pkt is being sent to XScale TTL(1b): TTL expired Options(1b): IP Options present NoRoute(1b): No matching route or filter NonIP(1b): Non IP Packet received ARP_Needed(1b): NH_IP valid, but no MAC NH_Invalid(1b): NH_IP AND NH_MAC both invalid Reserved(2b): currently unused xScale SRAM L3 (IP, ARP, …) Pkt Length (16b) Buffer Handle(24b) Stats Index (16b) QID(16b) In Port (3b) Flags (8b) Plugin Tag (5b) NH MAC DA[47:16] (32b) NH MAC DA[15:0] (16b) Unicast/MCast bits Rsv Reserved (16b) EtherType (16b) Same as ring_in Based on Parse & Lookup results PLC 1 2 3 7 Reserved (2b) NR (1b) TTL Opt NI ARP NH INV 4 5
PLC -> Plugins PLC 1 2 3 7 4 5 Plugins 0-4 SRAM xScale L3 (IP, ARP, …) Pkt Length (16b) Buffer Handle(24b) Stats Index (16b) QID(16b) In Port (3b) Flags (8b) Plugin Tag (5b) NH MAC DA[47:16] (32b) NH MAC DA[15:0] (16b) Unicast/MCast bits Rsv Reserved (16b) EtherType (16b) Same as ring_in Based on Parse & Lookup results PLC Flags(8b): Why pkt is being sent to XScale TTL(1b): TTL expired Options(1b): IP Options present NoRoute(1b): No matching route or filter NonIP(1b): Non IP Packet received ARP_Needed(1b): NH_IP valid, but no MAC NH_Invalid(1b): NH_IP AND NH_MAC both invalid Reserved(2b): currently unused 1 2 3 7 Reserved (2b) NR (1b) TTL Opt NI ARP NH INV 4 5 Plugins 0-4
PLC -> QM PLC QM Buffer Handle(24b) QID(16b) Reserved(16b) Rsv (4b) Out Port L3 (IP, ARP, …) Pkt Length (16b) Reserved(16b) (8b) PLC QM Same as ring_in Based on Parse & Lookup results Maybe the added hdr buffer handle
Types of Pkts Arriving at PLC From Rx: Only have a payload buf, ref_cnt == 1. Subject to classification except mal-formed IP pkts detected by Parse. From XScale/Plugins: Passthrough (PT) pkt May/May not have a hdr buf, ref_cnt >= 1. Not processed by PLC. Copy sends it to QM. Non-PT pkt Only have a payload buf at arrival, ref_cnt >= 1. Is IP pkt. Will be classified if it passes IP hdr validation done by Parse.
PLC() PLC() { if (dlNextBlock != BID_FREELISTMGR) Lookup(); } Copy(); if (!ring_in.PT) { Parse(); if (dlNextBlock != BID_FREELISTMGR) Lookup(); } Copy(); Inside Copy(), dl_sink() is called to enqueue pkts to downstream blocks.
dl_sink Special design of dl_sink(first, last, try_action) “first” == TRUE, means current pkt is the first one of a sequence of pkts to be sunk. “last” == TRUE, means current pkt is the last one of a sequence of pkts to be sunk. “try_action” == 1, drop the pkt if ring is full; otherwise try till succeeds. dl_sink() has a return value to indicate if enqueue is successful or not. If DL_ORDERED is defined and “first” == TRUE, thread waits for signal from previous context before enqueuing the first pkt. If DL_ORDERED is defined and “last” == TRUE, thread passes signal to next context after euqueuing the last pkt.
Parse Functions: Input: Output: Do IP Router checks (wrong ver, Hlen, Plen, cksum), count error pkts. Decrement TTL on pkts from Rx and recompute IP cksum Detect exceptions (expired TTL, IP option, Non-IP) Extract lookup key Input: ring_in data Output: lookup key, dlNextBlock, eth_type, DG QID. IP DAddr (32b) IP SAddr (32b) P Tag (5b) P (3b) Proto (8b) DPort (16b) SPort (16b) Exceptions (16b) TCP Flags (12b) 140 Bit Key: RL PF and AF
Parse Operations: If pkt is from Rx: eth_type <= 0x0800. Read eth_type in Ethernet hdr If eth_type != 0x0800, set NIP in exception bits. If eth_type == 0x0806, set ARP in exception bits. Return. eth_type <= 0x0800. For pkt from Rx, offset <= 0x18E; for pkt from XScale/plugins, get offset from payload buf desc. Read 20B IP hdr. Check for ver, Hlen, Plen. If fails, count and dlNextBlock <= BID_FREELISTMGR. Return. If Hlen > 5, set OPT in exception bits, read IP option. Verify cksum. If fails, count and dlNextBlock <= BID_FREELISTMGR. Return. If pkt is from Rx and TTL <= 1, or pkt is from plugins/XScale and TTL < 1, set TTL in exception bits. If pkt is from Rx and TTL > 1, decrement TTL, recompute cksum and write them back to dram. Form Datagram QID using DG QID = SA[9:8] SA[6:5] DA[6:5] (Used by Copy in case of a zero QID). Extract key from IP hdr. If IP protocol is TCP/UDP, read 14B, and extract key from TCP/UDP hdr. Copy Plugin Tag and In Port to lookup key.
Lookup Key and Results Formats IP DAddr (32b) IP SAddr (32b) P Tag (5b) P (3b) Proto (8b) DPort (16b) SPort (16b) Exceptions (16b) TCP Flags (12b) 140 Bit Key: RL PF and AF 32 Bit Result in TCAM Assoc. Data SRAM: 96 Bit Result in QDR SRAM Bank0: PF Prio (8b) D (1b) H M Address (21b) V (4b) UCast MCast (12b) QID (16b) Stats Index (16b) NH_MAC (48b) NH_IP (32b) Res (16b) AF Res (8b) D (1b) H M Address (21b) V (4b) S B (2b) R e s (2b) Uni Cast (8b) QID (16b) Stats Index (16b) NH_MAC (48b) NH_IP (32b) Res (16b) RL Res (8b) D (1b) H M Address (21b) V (4b) UCast MCast (12b) QID (16b) Stats Index (16b) NH_MAC (48b) NH_IP (32b) Res (16b) TCAM Ctrl Bits: D:Done H:HIT MH:Multi-Hit Entry Valid (1b) NH IP MAC MC D (1b) PPS UCast Out Port (3b) Out Plugin Reserved (4b) If IP MC Valid = 0 Multicast Copy Vector (11b) PPS (1b) If IP MC Valid = 1
Lookup Overview Initialization Runtime Updates Control Plane initializes TCAM and Route and Filter DBs Runtime Updates Control Plane updates to Route and Filter DBs Design – in upcoming slides Processing – in upcoming slides Lookup will be written in C There are many things about writing IXP code in “C” that I need to learn. Here are some of them: Performing multiple memory operations in parallel and waiting on a set of signals (If needed for performance reasons) Performing timestamp waits Calling IDT microcode macros
Lookup: Design -- Databases Three Databases: Route Lookup: Unicast Sorted by DAddr Prefix Length Multicast Exact match on DAddr and prefix of SAddr Primary Filter Filters should be sorted in the DB with higher priority filters first Auxiliary Filter Priority between Primary Filter and Route Lookup A priority will be stored with each Primary Filter A priority will be assigned to RLs (all routes have same priority) PF priority and RL priority compared after result is retrieved. One of them will be selected based on this priority comparison. Auxiliary Filters: If matched, cause a copy of packet to be sent out according to the Aux Filter’s result.
Lookup: Design -- Results Use SRAM Bank 0 (2 MB per NPU) for Results B0 Byte Address Range: 0x000000 – 0x1FFFFF 21 bits B0 Word Address Range: 0x000000 – 0x1FFFFC 19 significant bits 2 trailing 0’s Store result in two parts: 32-bit Associated Data SRAM result for Address of actual Result: TCAM Control Bits (3b) Done: 1b Hit: 1b MHit: 1b Priority: 8b Present for Primary Filters, for RL and Aux Filters should be 0 SRAM B0 Word Address: 21b 2 spare bitS if needed for anything else 3 Words (<= 96 bits) of Result in SRAM Bank0 Use Multi-Database Lookup (MDL) Indirect for searching all 3 DBs Order of fields in Key is important. Each thread will need one TCAM context
Lookup Processing write KEY to TCAM use timestamp delay to wait appropriate time make delay long enough that we are as sure as possible that we will have to read the 1st word of the Results MB only once while !DoneBit // DONE Bit BUG Fix requires reading just first word read 1 word from Results Mailbox and check DoneBit done read words 2 and 3 from Results Mailbox If (PrimaryFilter AND RouteLookup results HIT) { PrimaryResult.Valid TRUE compare priorities store higher priority result as Primary Result (read result from SRAM Bank0) } else if (PrimaryFilter results HIT) { PrimaryResults.* PrimaryFilter.* (read result from SRAM Bank0) } else if (RouterLookup results HIT) { PrimaryResults.* RouteLookup.* (read result from SRAM Bank0) } else PrimaryResult.Valid False if (AuxiliaryFilter result HIT) { AuxiliaryResult.Valid TRUE AuxiliaryResults.* (read result from SRAM Bank0) } else AuxiliaryResult.Valid FALSE
Lookup Key and Results Formats IP DAddr (32b) IP SAddr (32b) P Tag (5b) P (3b) Proto (8b) DPort (16b) SPort (16b) Exceptions (16b) TCP Flags (12b) 140 Bit Key: RL PF and AF 32 Bit Result in TCAM Assoc. Data SRAM: 96 Bit Result in QDR SRAM Bank0: PF Prio (8b) D (1b) H M Address (21b) V (4b) UCast MCast (12b) QID (16b) Stats Index (16b) NH_MAC (48b) NH_IP (32b) Res (16b) AF Res (8b) D (1b) H M Address (21b) V (4b) S B (2b) R e s (2b) Uni Cast (8b) QID (16b) Stats Index (16b) NH_MAC (48b) NH_IP (32b) Res (16b) RL Res (8b) D (1b) H M Address (21b) V (4b) UCast MCast (12b) QID (16b) Stats Index (16b) NH_MAC (48b) NH_IP (32b) Res (16b) TCAM Ctrl Bits: D:Done H:HIT MH:Multi-Hit Entry Valid (1b) NH IP MAC MC D (1b) PPS UCast Out Port (3b) Out Plugin Reserved (4b) If IP MC Valid = 0 Multicast Copy Vector (11b) PPS (1b) If IP MC Valid = 1
Exception Bits in Lookup Key IP DAddr (32b) IP SAddr (32b) P Tag (5b) P (3b) Proto (8b) DPort (16b) SPort (16b) Exceptions (16b) TCP Flags (12b) 140 Bit Key: RL PF and AF Reserved (12b) Non-IP (1b) ARP (1b) IP Opt (1b) TTL (1b) Exception Bits: TTL: TTL has expired. It was 0 or 1 on arriving packet IP Opt: IP Packet contained Options ARP: Ethertype field in ethernet header was ARP Non-IP: Ethertype field in ethernet header was NOT IP NOTE: An ARP packet will have ARP bit and Non-IP bit set
Lookup Block Diagram mem access Latency Write Lookup Key to TCAM Setup Lookup Key Write Lookup Key to TCAM SRAM Write: 5W TimeStamp Delay 315 cycles ctx_swap Read 1W Result from AD SRAM Read: 1W 150 cycles ctx_swap Check Done Bit Read 2W Result from AD SRAM Read: 2W 150 cycles ctx_swap SRAM Read: 3W 150 cycles Read 2 Full Results from QDR ctx_swap SRAM Read: 3W 150 cycles Setup Results for Copy TOTAL (No optimization) 915 cycles
Lookup File locations Code Include Paths src/applications/ONL_Router/src/plc/ONL/lookup.c Include Paths src/applications/ONL_Router/src/dispatch_loop/ONL/ dl_source.h and dl_source.c dl_source() and dl_sink() functions src/IDT_NSE/data_place_IXP2XXX/include IDT IIPC defines and macros others?
Copy Functions: Input: Output: Drop error pkts detected by Parse. Count and send PT pkts to QM. Process lookup results: When “control error” (NH_IP and NH_MAC both invalid or both valid), or “ARP request needed” (Uncast pkt with valid NH_IP but invalid NH_MAC), or “no route” (invalid PR and AR) is detected, pkt should be sent to XScale, or one of five plugins, or dropped based on user preference (dynamically configurable). Compute MAC DAddr for IP multicast pkts. Update total ref_cnt in paylod buf desc for each classified pkt. If total ref_cnt == 1 and pkt goes to QM, fill in payload buf desc with NH_MAC, Stats index and EtherType. If total ref_cnt > 1, add hdr buf to each pkt going to QM, and fill in hdr buf desc with buffer_next, packet_size, ref_cnt (=1), NH_MAC, Stats index, EtherType. For each copy, form ring_out data and enqueue it to the outgoing ring. Input: ring_in data, exception bits, dlNextBlock, eth_type, DG QID, lookup results Output: Ring_out data
Copy Operations: Error pkt detected by Parse (to freelist_mgr): Construct ring_out data to Freelist_mgr Call dl_sink(TRUE, TRUE, 0). Return. PT pkt (to QM): dlNextBlock <= BID_QM Construct ring_out data to QM from ring_in data Classified pkt: Pre-check: Copy exception bits to flags. If PR and AR are both invalid, this pkt has no route. Set NR bit in flags. Set dlNextBlock to user preference. Construct ring_out data. Call dl_sink(TRUE, TRUE, 1). Return. Pre-process lookup results, because: Both PR and AR can generate pkt copies to QM and XScale/plugins. Need to know total num of copies to Decide whether hdr buffer decs should be added to copy going to QM. Set “last” bit in dl_sink(). Update total ref_cnt in payload buf desc using one atomic addition. Post-process lookup results, copy and send pkts to proper rings.
Copy Pre-processing data structure: struct copy_qm_prep{ unsigned int pkt_cnt; bool pr_ucast; bool pr_mcast; bool ar_ucast; } qm; PR: QID (16b) Stats Index (16b) UCast MCast (12b) V (4b) NH_MAC (48b) NH_IP (32b) Res (16b) struct copy_plugins_prep{ unsigned int pkt_cnt; bool pr_arp; bool pr_nh_inv; bool pr_ucast; bool pr_mcast; bool ar_arp; bool ar_nh_inv; bool ar_ucast; } plugins; AR: QID (16b) Stats Index (16b) Uni Cast (8b) V (4b) S B (2b) NH_MAC (48b) NH_IP (32b) Res (16b) R e s Pre-process fields Post-process fields Only one can be TRUE
Multicast Copy Vector (11b) Copy (pre-process PR) V (4b) UCast MCast (12b) QID (16b) Stats Index (16b) NH_MAC (48b) NH_IP (32b) Res (16b) Entry Valid (1b) NH IP MAC MC D (1b) PPS UCast Out Port (3b) Out Plugin Reserved (4b) If IP MC Valid = 0 Multicast Copy Vector (11b) PPS (1b) If IP MC Valid = 1 If PR is valid, If NH_IP is valid and NH_MAC is invalid and IP_MC is invalid, this pkt needs ARP. Set APR bit in flags. If pkt should be sent to XSclae/plugins, plugins.pkt_cnt ++; plugins.pr_arp <= TRUE. Else if NH_IP and NH_MAC are both invalid or both valid, lookup entry is mis-configured. Set NH_INV bit in flags. plugins.pkt_cnt ++; plugins.pr_nh_inv <= TRUE.
Multicast Copy Vector (11b) Copy (pre-process PR) Else, If MC_valid == 0 (unicast), If Drop == 0, If PPS == 0, qm.pkt_cnt ++, qm.pr_ucast <= TRUE. Else (PPS == 1), plugins.pkt_cnt ++, plugins.pr_ucast <= TRUE. Else (MC_valid == 1 (Multicast)), If PPS == 0, qm.pkt_cnt <= total 1’s in high 5 bits in MCast_copyVector; plugins.pkt_cnt <= total 1’s in low 6 bits in MCAST_copyVector. If qm.pkt_cnt > 0, qm.pr_mcast <= TRUE. If plugins.pkt_cnt > 0, plugins.pr_mcast <= TRUE. Else (PPS == 1), plugins.pkt_cnt <= total 1’s in low 6 bits in Mcast_copyVector; If plugins.pkt_cnt > 0, plugins.pr_mcast <= TURE. D (1b) PPS UCast Out Port (3b) Out Plugin Reserved (4b) Multicast Copy Vector (11b) PPS (1b)
Copy (pre-process AR) If AR is valid, QID (16b) Stats Index (16b) Uni Cast (8b) V (4b) S B (2b) NH_MAC (48b) NH_IP (32b) Res (16b) R e s Entry Valid (1b) NH IP MAC MC D (1b) PPS UCast Out Port (3b) Out Plugin If AR is valid, If NH_IP is valid and NH_MAC is invalid, this pkt needs ARP. Set APR bit in flags. If pkt should be sent to XSclae/plugins, plugins.pkt_cnt ++; plugins.pr_arp <= TRUE. Else if NH_IP and NH_MAC are both invalid or both valid, lookup entry is mis-configured. Set NH_INV bit in flags. plugins.pkt_cnt ++; plugins.pr_nh_inv <= TRUE.
Copy (pre-process AR) Else, D (1b) PPS UCast Out Port (3b) Out Plugin Rsv (2b) SB Else, Each SB value is associated with a sampling rate rt(SB) that is dynamically configurable by users. Generate a random number rd. If rd <= rt(SB), If PPS == 0, qm.ar_ucast <= TRUE, qm.pkt_cnt ++. Else (PPS == 1), plugins.ar_ucast <= TRUE, plugins.pkt_cnt ++.
Copy (post-process) PR: AR: struct copy_qm_prep{ unsigned int pkt_cnt; bool pr_ucast; bool pr_mcast; bool ar_ucast; } qm; struct copy_plugins_prep{ unsigned int pkt_cnt; bool pr_arp; bool pr_nh_inv; bool pr_ucast; bool pr_mcast; bool ar_arp; bool ar_nh_inv; bool ar_ucast; } plugins; QID (16b) Stats Index (16b) UCast MCast (12b) V (4b) NH_MAC (48b) NH_IP (32b) Res (16b) AR: QID (16b) Stats Index (16b) Uni Cast (8b) V (4b) S B (2b) NH_MAC (48b) NH_IP (32b) Res (16b) R e s If qm.pkt_cnt + plugins.pkt_cnt == 0, dlNextBlock <= BID_FREELISTMGR, dl_sink(TRUE, TRUE, 0). Return. Read ref_cnt from payload buf desc. If qm.pkt_cnt + plugins.pkt_cnt + (ref_cnt – 1) == 1, If qm.pkt_cnt == 1, DO NOT add hdr buf desc. Based on qm.pr_ucast, qm.pr_mcast, qm.ar_ucast, Fill in NH_MAC, stats index, and eth_type in payload buf desc. Construct QM ring_out data. dlNextBlock <= BID_QM. dl_sink(TRUE, TRUE, 0). Return. Else (plugins.pkt_cnt == 1), Based on plugins.*, construct plugins/XScale ring_out data. Set dlNextBlock to XScale or one of the five plugins. dl_sink(TRUE, TRUE, 1). Return.
Copy (post-process) Else (qm.pkt_cnt + plugins.pkt_cnt + (ref_cnt – 1) > 1), Add (qm.pkt_cnt + plugins.pkt_cnt – 1) to ref_cnt in payload buf desc. If qm.pkt_cnt >0, for each copy to QM, add hdr buf desc. Based on qm.pr_ucast, qm.pr_mcast, qm.ar_ucast, Fill in buffer_next, packet_size, ref_cnt (=1), NH, stats index, and eth_type in hdr buf desc. Construct QM ring_out data. dlNextBlock <= BID_QM. Call dl_sink() qm.pkt_cnt times. If plugins.pkt_cnt == 0, Set “last” bit in dl_sink() for the last copy and return. If plugins.pkt_cnt > 0, Based on plugins.*, Construct plugins/XScale ring_out data. Set dlNextBlock to XScale or one of the five plugins. Call dl_sink() plugins.pkt_cnt times. Set “last” bit in dl_sink() for the last copy and return.
PLC Typical Case Valid IP Pkt from Rx, no option, go to one port as a result of PR. AR entry is invalid. Operation Memory Access Cycle Count dl_source Dequeue pkt from MUX 3LW Scratch Ring read 60 Read eth_type, IP, TCP/UDP hdr Aligned 10LW DRAM read 300 Parse Write TTL, IP cksum Aligned 4LW DRAM write 300 Lookup Lookup 915 Write NH MAC, stats index, eth_type Copy Aligned 3LW SRAM write 150 dl_sink Enqueue pkt to QM 3LW Scratch Ring write 60 Total: 1785