John DeHart Design of a Diversified Router: Lookup Block with All Associated Data in SRAM John DeHart


Similar presentations
NetFPGA Project: 4-Port Layer 2/3 Switch Ankur Singla Gene Juknevicius

1 SpaceWire Update NASA GSFC November 25, GSFC SpaceWire Status New Link core with split clock domains complete (Much faster) New Router core.
John DeHart ONL NP Router Block Design Review: Lookup (Part of the PLC Block)
Jon Turner, John DeHart, Fred Kuhns Computer Science & Engineering Washington University Wide Area OpenFlow Demonstration.
Michael Wilson Block Design Review: ONL Header Format.
1 - Charlie Wiseman - 05/11/07 Design Review: XScale Charlie Wiseman ONL NP Router.
Michael Wilson Block Design Review: Line Card Key Extract (Ingress and Egress)
David M. Zar Applied Research Laboratory Computer Science and Engineering Department ONL Freelist Manager.
John DeHart Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress.
Brandon Heller Block Design Review: Substrate Decap and IPv4 Parse.
David M. Zar Block Design Review: PlanetLab Line Card Header Format.
Mart Haitjema Block Design Review: ONL NP Router Multiplexer (MUX)
Supercharged PlanetLab Platform, Control Overview
Behrouz A. Forouzan TCP/IP Protocol Suite, 3rd Ed.
Memory Hierarchy Ideal memory is fast, large, and inexpensive
Flow Stats Module James Moscola September 12, 2007.
ONL NP Router xScale xScale TCAM SRAM Rx (2 ME) Mux (1 ME) Parse,
Chapter 6 Delivery & Forwarding of IP Packets
Design of a High Performance PlanetLab Node
Design of a Diversified Router: Memory Usage
Design of a Diversified Router: TCAM Usage
Design of a Diversified Router: TCAM Usage
An NP-Based Router for the Open Network Lab
Design of a Diversified Router: Model and System Overview
John DeHart Design of a Diversified Router: Lookup Block with All Associated Data in SRAM John DeHart
An NP-Based Ethernet Switch for the Open Network Lab Design
Design of a Diversified Router: Lookup Block
Design of a Diversified Router: Lookup Block
Design of a Diversified Router: Line Card
Design of a Diversified Router: Packet Formats
Design of a Diversified Router: Common Router Framework
Design of a Diversified Router: Project Management
Design of a Diversified Router: Line Card
ONL NP Router Plugins Shakir James, Charlie Wiseman, Ken Wong, John DeHart {scj1, cgw1, kenw,
Design of a Diversified Router: Model and System Overview
Design of a Diversified Router: Lookup Block
Design of a Diversified Router: Dedicated CRF for IPv4 Metarouter
An NP-Based Router for the Open Network Lab Hardware
An NP-Based Router for the Open Network Lab
Design of a Diversified Router: Packet Formats
Design of a Diversified Router: IPv4 MR (Dedicated NP)
SPP V2 Router Plans and Design
Flow Stats Module James Moscola September 6, 2007.
Design of a Diversified Router: Line Card
Design of a Diversified Router: Monitoring
An NP-Based Router for the Open Network Lab Overview by JST
Supercharged PlanetLab Platform, Control Overview
Next steps for SPP & ONL 2/6/2007
Network Core and QoS.
John DeHart Design of a Diversified Router: Lookup Block with All Associated Data in SRAM John DeHart
IXP Based Router for ONL: Architecture
QM Performance Analysis
Design of a Diversified Router: Project Assignments and Status Updates
John DeHart Design of a Diversified Router: Lookup Block with All Associated Data in SRAM John DeHart
Design of a Diversified Router: Dedicated CRF plus IPv4 Metarouter
Layered Protocol Wrappers Design and Interface review
Design of a Diversified Router: November 2006 Demonstration Plans
Code Review for IPv4 Metarouter Header Format
Code Review for IPv4 Metarouter Header Format
An NP-Based Router for the Open Network Lab Meeting Notes
Implementing an OpenFlow Switch on the NetFPGA platform
SPP Router Plans and Design
IXP Based Router for ONL: Architecture
Design of a High Performance PlanetLab Node: Line Card
Network Layer: Control/data plane, addressing, routers
Delivery, Forwarding, and Routing of IP Packets
Ch 17 - Binding Protocol Addresses
Design of a Diversified Router: Project Management
Network Core and QoS.
Chapter 4: outline 4.1 Overview of Network layer data plane
Presentation transcript:

John DeHart Design of a Diversified Router: Lookup Block with All Associated Data in SRAM John DeHart

Revision History 5/23/06 (JDD): 5/25/06 (JDD): 5/26/06 (JDD): Changes for all Associated Data in SRAM 5/25/06 (JDD): Put Port # back in MR Results 5/26/06 (JDD): Added data format from Lookup block to downstream neighbor. 5/30-5/31/06 (JDD): Clean up definition of data going from Lookup block to Hdr Format blocks.

Issues to investigate Questions/Issues that came up 5/16/06: Negation bit Match everything but this key Exclusive/Non-exclusive Filters GM filters for monitoring (makes a copy of packet) Protocol field “trick” to shorten GM filter Key 2 bits to define: UDP, TCP, Other Maybe even expand it to 4 bits. For Other, full 8 bit protocol field overlaps a TCP/UDP Port field Even better, use this trick with the TCP_Flags field. 76 Bytes as minimum size frame for judging performance: 64 Byte minimum Ethernet Frame 96 bit (12 byte) Ethernet inter-frame spacing. To increase the lookup rate we might need to move one of the LC Associated Data storage to SRAM Probably keep them both in TCAM AD for November and then look at modifying it in the next phase of Lookup block development. Multicast Separate Multicast DB MHL on Multicast DB yielding 8 32-bit AD Results Actually 29 useful bits per Result Maximum of 232 bits QID(20b) and MI(16) specified for each copy 232/36 = 6 copies We’d need to get result down to 29 bits to support 8 copies Is there any way to make use of the Loopback block to make more copies?

Overview These slides are as much a definition of what is NOT in the Lookup Block as they are what is. In defining what is not in the Lookup Block I am putting some requirements on other blocks. These requirements have to do with where fields are added to frame headers. Not everything can or needs to be kept in the TCAM. There are also: Constants Fields that have to be calculated for each frame Fields that are configurable per Blade or per physical interface. Etc. Also, there is a lot of information about the TCAM here. And, finally, a design for the Lookup Block(s).

Architecture Review First lets review the architecture of the Promentum™ ATCA-7010 card which will be used to implement our LC and NP Blades: Two Intel IXP2850 NPs 1.4 GHz Core 700 MHz Xscale Each NPU has: 3x256MB RDRAM, 533 MHz 4 QDR II SRAM Channels Channels 1, 2 and 3 populated with 8MB each running at 200 MHz Channel 0 TCAM with an associated ZBT SRAM 2 MB of QDR-II SRAM for EACH NPU 16KB of Scratch Memory 16 Microengines Instruction Store: 8K 40-bit wide instructions Local Memory: 640 32-bit words TCAM: Network Search Engine (NSE) on SRAM channel 0 Each NPU has a separate LA-1 Interface Part Number: IDT75K72234 18Mb TCAM

NP Blades

TCAM HW Details CAM Size: Segments: Data: 256K 72-bit entries Organized into Segments. Mask: 256K 72-bit entries Segments: Each Segment is 8k 72-bit entries 32 Segments Segments are not shared between Databases. Minimum database size is therefore 8K 72-bit entries. Databases wider than 72-bits use sequential entries in a segment to make up longer entries 36b DB has 16K entries per segment 72b DB has 8K entries per segment 144b DB has 4K entries per segment 288b DB has 2K entries per segment 576b DB has 1K entries per segment Segments can be dynamically added to a Database as it grows More on this feature in a future issue of the IDT User Manual…

TCAM HW Details Number of Databases available: 16 Database Core Sizes: 36b, 72b, 144b, 288b, 576b Core size implies how many CAM core entries are used per DB entry Key/Entry size Can be different for each Database. Key/Entry size <= Database Core Size Key/Entry size tells us how many memory access cycles it will take to get the Key into the TCAM across the 16-bit wide QDR II SRAM interface. Result Type Absolute Index: relative to beginning of CAM Database Relative Index: relative to beginning of Database Memory Pointer: Translation based on database configuration registers Base address Result size TCAM Associated Data of width 32, 64 or 128 bits

TCAM HW Details Memory Usage: Results can be stored in TCAM Associated Data SRAM or IXP SRAM. TCAM Associated Data 512K x 36 bit ZBT SRAM (4 bits of parity) Supports 256K 64-bit Results If used for Ingress and Egress then 128K in each direction Supports 128K 128-bit Results If used for Ingress and Egress then 64K in each direction Results deposited directly in Results Mailbox IXP QDR II SRAM Channel 2 x 2Mx18 (effective 4M x 18b) 4 times as much as the TCAM ZBT SRAM. Supports 1024K 64-bit Results If used for Ingress and Egress then 512K in each direction Supports 512K 128-bit Results If used for Ingress and Egress then 256K in each direction Read Results Mailbox to check Hit bit and to get Index or Memory Pointer Then read SRAM for actual Result.

TCAM HW Details Lookup commands supported: Lookup (Direct) Direct: Command is encoded in 2b Instruction field on Address bus Indirect: Instruction field = 11b, Command encoded on Data bus. Lookup (Direct) 1 DB, 1 Result Multi-Hit Lookup (Direct) 1 DB, <= 8 Results Simultaneous Multi-Database Lookup (Direct) 2 DB, 1 Result Each DBs must be consecutive! Multi-Database Lookup (Indirect) <= 8 DB, 1 Result Each Simultaneous Multi-Database Lookup (Indirect) Functionally same as Direct version but key presentation and DB selection are different. DBs need not be consecutive. Re-Issue Multi-Database Lookup (Indirect) Search Key can be modified for each DB being searched. First 32 bits of search key can be specified for each Rest of key is same for each.

TCAM HW Details Mask Registers Notes (mostly for reference) When are these used? I think we will need one of these for each database that is to be used in a Multi Database Lookup (MDL), where the database entries do not actually use all the bits in the corresponding core size. For example: a 32-bit lookup would have a core size of 36 bits and so would need a GMR configured as 0xFFFFFFFF00 to mask off the low order 4 bits when it is used in a MDL where there are larger databases also being searched. 64 72-bit Global Mask Registers (GMR) Can be combined for different database sizes 36-bit databases have access to 31 out of a total of 64 GMRs A bit in the configuration for a database selects which half of the GMRs can be used A field in each lookup command selects which specific GMR is to be used with the lookup key. Value of 0x1F (31) is used in command to indicate no GMR is to be used. Hence, 36-bit lookups cannot use all 32 GMRs in its half. 72-bit databases have access to 31 out of a total of 64 GMRs Value of 0x1F (31) is used in command to indicate no GMR is to be used. Hence, 72-bit lookups cannot use all 32 GMRs in its half. 144-bit lookups have 32 GMRs available to it. 288-bit lookups have 16 GMRs available to it. 576-bit lookups have 8 GMRs available to it. Each lookup command can have one GMR associated with it.

TCAM Usage Notes Database Types are defined and managed by the IMS Software. The Type of the Database is defined in the software only. It tells the software how to define and use masks and priorities (weights). Allows the software to provide to the user a more flexible way to specify entries. Types of Databases: Longest Prefix Match (LPM): Mask matches length of prefix Exact Match (EM) Mask matches full Entry size Best/Range Match: What we typically call General Match. Mask is completely general. Priority: Priority within a database is done by order of the entries. Exact Match should not need priority within the database since only one Entry should match a supplied Key. LPM and Best/Range Match do use priority within the databases. So, the order in which the entries are stored in these databases is important. For LPM DBs we would want to group prefixes by length in the TCAM. And this is almost certainly what the IDT software does. Changing priorities on existing entries may cause us some problems. It appears that the only way to change the priority of a Best/Range Match entry might be to write a new entry in a different location (different priority) and then delete the old entry. Changing the priority of an LPM entry really would mean changing its prefix. The IDT software uses a weight assigned to Entries as they are added for LPM and Best/Range Match I believe this weight is just used to group entries of the same weight together and to ensure that entries are ordered based on their weights as they are added.

TCAM Performance Three Factors that affect performance: Lookup Size (Entry/Key) Associated Data Width (Result) CAM Core Lookup Rate IXP/TCAM LA-1 Interface 16 bits wide 200 MHz QDR II SRAM Interface Effectively 32bits per clock tick So getting Key in is 32bits/tick Example: 128b Key would take 4 ticks to get clocked into TCAM. Max of 50 M Lookups/sec Table on next slide shows some of the performance numbers for some Sizes that are of interest to us. What we’ll see a little later is that in the worst case, we need a TOTAL Lookup rate of 12.5 M/sec (6.25 M/sec on each LA-1 interface)

TCAM Performance (Rates in M/sec) Lookup Size #LA-1 Words Core Size Assoc. Data Single LA-1 Max Rate Max Core Rate Avg Shared Rate (Each of 2 LA-1s) 32 1 36 50 25 64 128 12.5 2 72 100 3 67 4 144 5 40 160 288 LC_Egress LC_Ingress

TCAM Software Several software components exist, enough to be really confusing. IDT Libraries: MicroEngine Libraries: NSE-QDR Data Plane Macro (DPM) API Iipc.uc and Iipc.h IIPC: Integrated IP Co-processor Microengine Lookup Library (MLL) IipcMll.uc 5 slightly higher level macros than Iipc.uc XScale: Lookup Management Library (LML) Control Plane: Initialization Management and Search (IMS) Library Simulation: NSE with Dual QDR Interfaces IDT75K234SLAM Intel Libraries: TCAM Classifier Library Microengine and XScale support for using TCAM. Requires installation of MLL and LML. Is geared toward a very specific application of NSE to IPv4 Forwarding App. May be useful as an example of code to look at but probably not useful for us to use directly. IXA SDK 4.0 Location: src/library/microblocks_library/microcode/idt_tcam_classifier

Lookup Block Three Lookup Blocks Needed: All the Lookup Blocks will use the TCAM LC-Ingress All Databases for Ingress will be Exact Match LC-Egress All Databases for Egress will be Exact Match MR There will probably be multiple versions of this: Shared Dedicated IPv4 MPLS But lets think of it as one for now and focus on IPv4. Discussion later on what combination of the three types of DB we might use. Base functionality and code should be the same for all three Sizes of Keys and Results will differ. LC-Ingress and LC-Egress will share a TCAM ARP on the LC might need/want to use the TCAM. The aging properties of the TCAM might be very useful for ARP. So, we should leave some room for ARP on the LC TCAM. We will need to think more about ARP when we get into the details of the control plane. There will be two MR Lookup Blocks sharing a TCAM

MR Lookup Block Control TCAM XScale XScale Rx DeMux Parse Parse DeMux Tx QM Header Format Header Format QM Tx MR (NPUA) MR (NPUB)

LC Lookup Block Ingress (NPUB) S W R XScale I T T M C H LC TCAM ARP Phy Int Rx Key Extract Lookup Hdr Format QM/Schd Switch Tx S W I T C H XScale LC TCAM ARP XScale Phy Int Tx QM/Schd Hdr Format Lookup Rate Monitor Key Extract Switch Rx Egress (NPUA)

Lookup Block Requirements Average: Number of Packets per second required to handle? Line Rate: 10Gb/s Assume an average IP Packet Size of 200 Bytes (1600 bits) (10Gb/s)/(1600 bits/pkt) = 6.25 Mpkts/s Ethernet Header of 14 Bytes Average Frame Size of 214 Bytes (1712 bits) (10Gb/s)/(1712 bits/pkt) = 5.841 Mpkts/s Ethernet Inter-Frame Spacing: 96 bits Average Frame Size with Inter-Frame Spacing: 1808 bits (10Gb/s)/(1808 bits/pkt) = 5.53 Mpkts/s I’ll use 6.25 Mpkts/s as a target. Minimum Pkt size: Minimum Ethernet Frame Size: 64 Bytes (512 bits) Ethernet Inter-Frame Spacing: 96 bits (512 + 96 = 608 bits) (10Gb/s)/(608 bits/pkt) = 16.45 Mpkts/s Max Core rate for LC Ingress: 50 M Lookups/s 16.45/50 = 32.9 % Max Core rate for LC Egress: 25 M Lookups/s 16.45/25 = 65.80

Lookup Block Requirements LC: Number of Lookups per second required: 1 Ingress and 1 Egress lookup required per packet If we assume 6.25 MPkts/sec then we need 12.5 M Lookups/sec. MR/NPU: Number of Lookups per second required: 5 Gb/s per MR/NPU: 3.125 M Lookups/sec Total of 6.25 M Lookups/sec Total Number of Lookup Entries to be supported? Dependent on Size of Entries Size of Entries and Keys? Dependent on type of Lookup: MR, LC-Ingress, LC-Egress Size of Results?

Keys and Results for Ingress LC and Egress LC Ingress (Link  Router): What fields in the External Frame Formats uniquely identify the MetaLink? First we have to identify the Substrate Link Type Then we can Identify the Substrate Link and MetaLink Egress (Router  Link): What fields in the Internal Frame Format uniquely identify the MetaLink? Results: We need to identify what fields are needed to build the appropriate frame headers. The fields needed may consist of several parts: Constant fields: Ethertype in most cases Calculated fields: Things like Checksums Statically configured Fields that can be stored in Local Memory Things like per physical interface or Blade Ethernet Src Addresses ARP results for Ethernet DAddr on a Multi-Access link Lookup Result from TCAM Everything else… Ingress (Link  Router): Details later Egress (Router Link): Details later

Field Sizes in Keys and Results Field and Identifier sizes: MR id: 16 bits (64K Meta Routers per Substrate Router) MR ID == VLAN (Defined locally on a Substrate Router) Note: We can probably shorten this to 12 bits since our switch only supports 4K VLANs which is 12 bits. MI id: 16 bits (64K Meta Interfaces per Meta Router) This seems like a lot. What level of flexibility do we need to support? MLI: 16 bits (64K Meta Links per Substrate Link) This seems safe and should not changed. Port: 4 bits (16 Physical Interfaces per Line Card) Note: I originally had this defined as 8 bits but since the RTM only supports 10 physical interfaces, 4 bits is enough. There were some places where the extra 4 bits pushed us to a larger size. QID: 20 bits (QM_ID:Queue_ID) Queue_ID: 17 bits (128K Queues per Queue Manager) QM_ID: 3 bits (8 Queue Managers per LC or PE.) We probably can only support 4 QMs, which could be encoded in 2 bits. (64 Q-Array Entries) / (16 CAM entries)  4 QMs per SRAM Controller.

LC: Internal Frame Formats RxMI (2B) LEN (2B) PAD (nB) CRC (4B) Meta Frame NhAddr (nB) MnFlags (1B) Type=802.1Q (2B) VLAN (2B) Type=Substrate (2B) DstAddr (6B) SrcAddr (6B) TxMI (2B) LEN (2B) PAD (nB) CRC (4B) Meta Frame NhAddr (nB) MnFlags (1B) Type=802.1Q (2B) VLAN (2B) Type=Substrate (2B) DstAddr (6B) SrcAddr (6B) Internal Frame Leaving Ingress LC Internal Frame Arriving at Egress LC Packet arriving On Port N LC … LC Packet leaving On Port M MR Switch Switch … IXP PE

LC: External Frame Formats P2P-Tunnel Type=IP (2B) MLI (2B) LEN (2B) PAD (nB) CRC (4B) Meta Frame Dst Addr (4B) Src Addr (4B) Ver/HLen/Tos/Len (4B) ID/Flags/FragOff (4B) TTL (1B) Protocol=Substrate (1B) Hdr Cksum (2B) Type=802.1Q (2B) TCI ≠ VLAN0 (2B) DstAddr (6B) SrcAddr (6B) Type=IP (2B) PAD (nB) CRC (4B) IP Payload Dst Addr (4B) Src Addr (4B) Ver/HLen/Tos/Len (4B) ID/Flags/FragOff (4B) TTL (1B) Protocol (1B) Hdr Cksum (2B) Type=802.1Q (2B) TCI ≠ VLAN0 (2B) DstAddr (6B) SrcAddr (6B) Type=802.1Q (2B) MLI (2B) LEN (2B) Meta Frame TCI (2B) Type=Substrate (2B) PAD (nB) CRC (4B) DstAddr (6B) SrcAddr (6B) Type=802.1Q (2B) MLI (2B) LEN (2B) Meta Frame TCI=VLAN0 (2B) Type=Substrate (2B) PAD (nB) CRC (4B) DstAddr (6B) SrcAddr (6B) P2P-VLAN0 Type=802.1Q (2B) MLI (2B) LEN (2B) Meta Frame TCI≠VLAN0 (2B) Type=Substrate (2B) PAD (nB) CRC (4B) DstAddr (6B) SrcAddr (6B) P2P-DC (Configured) Legacy Multi-Access

Protocol=Substrate (1B) LC: TCAM Lookup Keys RxMI (2B) LEN (2B) PAD (nB) CRC (4B) Meta Frame NhAddr (nB) MnFlags (1B) Type=802.1Q (2B) VLAN (2B) Type=Substrate (2B) DstAddr (6B) SrcAddr (6B) Internal Frame Leaving Ingress LC Internal Frame Arriving at Egress LC TxMI (2B) LEN (2B) PAD (nB) CRC (4B) Meta Frame NhAddr (nB) MnFlags (1B) Type=802.1Q (2B) VLAN (2B) Type=Substrate (2B) DstAddr (6B) SrcAddr (6B) Ingress LC Egress LC Type=IP (2B) MLI (2B) LEN (2B) PAD (nB) CRC (4B) Meta Frame Dst Addr (4B) Src Addr (4B) Ver/HLen/Tos/Len (4B) ID/Flags/FragOff (4B) TTL (1B) Protocol=Substrate (1B) Hdr Cksum (2B) Type=802.1Q (2B) TCI ≠ VLAN0 (2B) DstAddr (6B) SrcAddr (6B) Type=IP (2B) PAD (nB) CRC (4B) IP Payload Dst Addr (4B) Src Addr (4B) Ver/HLen/Tos/Len (4B) ID/Flags/FragOff (4B) TTL (1B) Protocol (1B) Hdr Cksum (2B) Type=802.1Q (2B) TCI ≠ VLAN0 (2B) DstAddr (6B) SrcAddr (6B) Blue Shading: Determine SL Type Black Outline: Key Fields from pkt Type=802.1Q (2B) MLI (2B) LEN (2B) Meta Frame TCI (2B) Type=Substrate (2B) PAD (nB) CRC (4B) DstAddr (6B) SrcAddr (6B) Type=802.1Q (2B) MLI (2B) LEN (2B) Meta Frame TCI=VLAN0 (2B) Type=Substrate (2B) PAD (nB) CRC (4B) DstAddr (6B) SrcAddr (6B) Type=802.1Q (2B) MLI (2B) LEN (2B) Meta Frame TCI≠VLAN0 (2B) Type=Substrate (2B) PAD (nB) CRC (4B) DstAddr (6B) SrcAddr (6B) P2P-DC (Configured) P2P-Tunnel Legacy P2P-VLAN0 Multi-Access

LC: TCAM Lookup Keys on Ingress P2P-DC MLI(16b) SL(4b) 0000 Port (4b) 24 bits IPv4 Tunnel MLI (16b) IP SAddr (32b) EtherType (16b) 0x0800 SL(4b) 0001 Port (4b) 72 bits Legacy Port (4b) EtherType (16b) 0x0800 SL(4b) 0010 24 bits P2P-VLAN0 MLI(16b) SL(4b) 0011 Port (4b) 24 bits MA MLI (16b) Ethernet SAddr (48b) SL(4b) 0100 Port (4b) 72 bits DstAddr (6B) Type=IP (2B) PAD (nB) CRC (4B) IP Payload Dst Addr (4B) Src Addr (4B) Ver/HLen/Tos/Len (4B) ID/Flags/FragOff (4B) TTL (1B) Protocol (1B) Hdr Cksum (2B) Type=802.1Q (2B) TCI ≠ VLAN0 (2B) DstAddr (6B) SrcAddr (6B) Legacy Blue Shading: Determine SL Type Black Outline: Key Fields from pkt SrcAddr (6B) Type=802.1Q (2B) TCI ≠ VLAN0 (2B) Type=802.1Q (2B) MLI (2B) LEN (2B) Meta Frame TCI (2B) Type=Substrate (2B) PAD (nB) CRC (4B) DstAddr (6B) SrcAddr (6B) Type=IP (2B) Type=802.1Q (2B) MLI (2B) LEN (2B) Meta Frame TCI=VLAN0 (2B) Type=Substrate (2B) PAD (nB) CRC (4B) DstAddr (6B) SrcAddr (6B) P2P-VLAN0 Multi-Access Type=802.1Q (2B) MLI (2B) LEN (2B) Meta Frame TCI≠VLAN0 (2B) Type=Substrate (2B) PAD (nB) CRC (4B) DstAddr (6B) SrcAddr (6B) Ver/HLen/Tos/Len (4B) ID/Flags/FragOff (4B) TTL (1B) Protocol=Substrate (1B) Hdr Cksum (2B) Src Addr (4B) Dst Addr (4B) MLI (2B) LEN (2B) Meta Frame PAD (nB) CRC (4B) P2P-DC (Configured) P2P-Tunnel

LC: TCAM Lookup Results on Ingress We need the Ethernet Header fields to get the frame to the blade that is to process it next. We also need a QID and RxMI Ethernet header fields that are constants can be configured and do not need to be in the TCAM Lookup Result. Ethernet Header fields: DAddr: Depends on MetaLink SAddr: Can be constant and configured per LC EtherType1: Can be a constant: 802.1Q VLAN(TCI): Different for each MR EtherType2: Can be a constant: Substrate TCAM Lookup Result (76b) VLAN (16b) RxMI (16b) DAddr (8b) We can control the MAC Addresses of the Blades, so lets say that 40 of the 48 bits of DAddr are constant across all blades and 8 bits are assigned and stored in the Lookup Result. Will 8 bits be enough to support multiple chasses? We could go up to 12 bits and still use 64bit Associated Data QID (20b) Stats Index(16b) What about Ingress  Egress Pass Thru MetaLinks? We will define a special Substrate VLAN for this use We will also define a special set of MIs

LC: TCAM Lookup Results on Ingress TCAM Lookup Result (76b) VLAN (16b) RxMI (16b) DAddr (8b) We can control the MAC Addresses of the Blades, so lets say that 40 of the 48 bits of DAddr are constant across all blades and 8 bits are assigned and stored in the Lookup Result. Will 8 bits be enough to support multiple chasses? We could go up to 12 bits and still use 64bit Associated Data QID (20b) Stats Index(16b) 31 23 15 7 Buf Handle (32b) VLAN(16b) RxMi(16b) Rsv (12b) QID(20b) Rsv (8b) DA (8b) Stats(16b) Data format to downstream neighbor

Pass Thru MetaLinks and Multi-Access SLs When going MR  LC-Egress the MR may provide a Next Hop MN Address for the LC to use to map to a MAC address. This is particularly used when the destination Substrate Link is Multi-Access and there may be multiple MAC addresses used on the same Multi-Access MetaLink. When going LC-Ingress  LC-Egress for a pass through MetaLink, do we need to do something similar? This could arise when a MetaNet has hosts on a multi-access network but the first Substrate Router that these hosts have access to does not have a MR for that MN. However, I contend that if there is no MR on that access SR, then there is nothing there to discriminate between the multiple MN addresses on the single MA MetaLink and hence it cannot be supported.

Pass Thru MetaLinks and Multi-Access SLs Host1 No way to communicate Next Hop addresses from MR to distant LC Host2 Host3 Host4 LC LC LC ARP MR ML MA Network Host5 P2P SL MA SL Host6 Substrate Router1 Substrate Router2 Host7 Host8 … HostN Implications: We will not extend MA links across Substrate Routers and other Substrate Links. MetaNets must place a MR in the substrate router that terminates a MA Substrate Link on which they want to support hosts.

LC: TCAM Lookups on Egress Key: VLAN(16b) TxMI(16b) Result The Lookup Result for Egress will consist of several parts: Lookup Result Constant fields Calculated fields Fields that can be stored in Local Memory Some of these are common across all SL Types Other fields are specific to each SL Type Common across all SL Types (108b): From Result (60b) SL Type(4b) Port(4b) (Physical Interface 1-10 on LC RTM) MLI(16b) QID (20b) Stats Index (16b) Local Memory (48b) Eth Hdr SA (48b) : tied to physical interface (10 entry table in Egress Hdr Format) SL Type Specific Headers are on following slides TxMI (2B) LEN (2B) PAD (nB) CRC (4B) Meta Frame NhAddr (nB) MnFlags (1B) Type=802.1Q (2B) VLAN (2B) Type=Substrate (2B) DstAddr (6B) SrcAddr (6B)

LC: TCAM Lookups on Egress Key: VLAN(16b) TxMI(16b) Result Common across all SL Types (108b): From Result (60b) SL Type(4b) Port(4b) MLI(16b) QID (20b) Stats Index (16b) Local Memory (48b) Eth Hdr SA (48b) : tied to physical interface (10 entry table in Egress Hdr Format) SL Type Specific Headers P2P-DC Hdr (64b) Constant (16b): In Egress Hdr Format EtherType (16b) = Substrate Calculated (0b) From Result (48b) Eth DA (48b) Lookup Result Total (Common Result + Specific Result): 108 bits Total (Common + Specific) : 156 bits Type=802.1Q (2B) MLI (2B) LEN (2B) Meta Frame TCI (2B) Type=Substrate (2B) PAD (nB) CRC (4B) DstAddr (6B) SrcAddr (6B) 31 23 15 7 Buf Handle (32b) MLI(16b) Eth DA[15:0] (16b) Rsv (4b) Port (4b) SL (4b) QID(20b) Eth DA[47:16] (32b) Data format to downstream neighbor

LC: TCAM Lookups on Egress Key: VLAN(16b) TxMI(16b) Result Common across all SL Types (108b): From Result (60b) SL Type(4b) Port(4b) MLI(16b) QID (20b) Stats Index (16b) Local Memory (48b) Eth Hdr SA (48b) : tied to physical interface (10 entry table in Egress Hdr Format) SL Type Specific Headers MA Hdr (64b) : Constant (16b): In Egress Hdr Format EtherType (16b) = Substrate Calculated (0b) ARP Lookup on NhAddr (Is ARP cache another database in TCAM?) (48b) Eth DA (48b) From Result (0b) Lookup Result Total (Common From Result + Specific From Result): 60 bits Total (Common + Specific) : 156 bits MLI (2B) LEN (2B) Meta Frame Type=Substrate (2B) PAD (nB) CRC (4B) DstAddr (6B) SrcAddr (6B) 31 23 15 7 Buf Handle (32b) MLI(16b) Rsv (16b) Rsv (4b) Port (4b) SL (4b) QID(20b) Data format to downstream neighbor

LC: TCAM Lookups on Egress Key: VLAN(16b) TxMI(16b) Result Common across all SL Types (108b): From Result (60b) SL Type(4b) Port(4b) MLI(16b) QID (20b) Stats Index (16b) Local Memory (48b) Eth Hdr SA (48b) : tied to physical interface (10 entry table in Egress Hdr Format) SL Type Specific Headers MA with VLAN Hdr (96b) : Constant (32b): In Egress Hdr Format EtherType1 (16b) = 802.1Q EtherType2 (16b) = Substrate Calculated (0b) ARP Lookup on NhAddr (Is ARP cache another database in TCAM?) (48b) Eth DA (48b) From Result (16b) VLAN/TCI (16b) Lookup Result Total (Common From Result + Specific From Result): 76 bits Total (Common + Specific) : 188 bits Type=802.1Q (2B) MLI (2B) LEN (2B) Meta Frame TCI≠VLAN0 (2B) Type=Substrate (2B) PAD (nB) CRC (4B) DstAddr (6B) SrcAddr (6B) 31 23 15 7 Buf Handle (32b) MLI(16b) VLAN(16b) Rsv (4b) Port (4b) SL (4b) QID(20b) Data format to downstream neighbor

LC: TCAM Lookups on Egress Key: VLAN(16b) TxMI(16b) Result Common across all SL Types (108b): From Result (60b) SL Type(4b) Port(4b) MLI(16b) QID (20b) Stats Index (16b) Local Memory (48b) Eth Hdr SA (48b) : tied to physical interface (10 entry table in Egress Hdr Format) SL Type Specific Headers P2P-VLAN0 Hdr (96b): Constant (32b): In Egress Hdr Format EtherType1 (16b) = 802.1Q EtherType2 (16b) = Substrate Calculated (0b) From Result (64b) Eth DA (48b) VLAN/TCI (16b) Lookup Result Total (Common From Result + Specific From Result): 124 bits Total (Common + Specific) : 188 bits Type=802.1Q (2B) MLI (2B) LEN (2B) Meta Frame TCI=VLAN0 (2B) Type=Substrate (2B) PAD (nB) CRC (4B) DstAddr (6B) SrcAddr (6B) 31 23 15 7 Buf Handle (32b) MLI(16b) Rsv (16b) Rsv (4b) Port (4b) SL (4b) QID(20b) Eth DA[47:16] (32b) VLAN (16b) Eth DA[15:0] (16b) Data format to downstream neighbor

LC: TCAM Lookups on Egress Result (continued) Common across all SL Types (108b): From Result (60b) SL Type(4b) Port(4b) MLI(16b) QID (20b) Stats Index (16b) Local Memory (48b) Eth Hdr SA (48b): tied to physical interface (10 entry tbl in Egress Hdr Format) SL Type Specific Headers P2P-Tunnel Hdr for IPv4 Tunnel without VLANs (224b): Constant (48b): In Egress Hdr Format Eth Hdr EtherType (16b) = 0x0800 IPHdr Version(4b)/HLen(4b)/Tos(8b) (16b): All can be constant? IP Hdr TTL (8b): Initialized to a contant when sending. IP Hdr Proto (8b) = Substrate Calculated (64b): By Egress Hdr Format IP Pkt Len(16b) : Calculated for each packet. IP Hdr ID(16b): should be unique for each packet sent, so shouldn’t be in Result. IP Hdr Checksum (16b): Needs to be calculated, so shouldn’t be in Result. IP Hdr Flags(3b)/FragOff(13b) (16b) : If fragments are never used, these are constants, if it is possible we will have to use them, then this has to be calculated. Either way, shouldn’t be in Result Local Memory (32b) IP Hdr Src Addr (32b) : tied to physical interface (10 entry table in Egress Hdr Format) From Result (80b) Eth Hdr DA (48b) IP Hdr Dst Addr (32b) Lookup Result Total (Common From Result + Specific From Result): 140 bits Total (Common + Specific) : 316 bits Type=IP (2B) MLI (2B) LEN (2B) PAD (nB) CRC (4B) Meta Frame Dst Addr (4B) Src Addr (4B) Ver/HLen/Tos/Len (4B) ID/Flags/FragOff (4B) TTL (1B) Protocol=Substrate (1B) Hdr Cksum (2B) DstAddr (6B) SrcAddr (6B) 31 23 15 7 Buf Handle (32b) MLI(16b) Rsv (16b) Rsv (4b) Port (4b) SL (4b) QID(20b) Eth DA[47:16] (32b) Rsv (16b) Eth DA[15:0] (16b) IP DA (32b) Data format to downstream neighbor

LC: TCAM Lookups on Egress Result (continued) Common across all SL Types (108b): From Result (60b) SL Type(4b) Port(4b) MLI(16b) QID (20b) Stats Index (16b) Local Memory (48b) Eth Hdr SA (48b) : tied to physical interface (10 entry tbl in Egress Hdr Format) SL Type Specific Headers P2P-Tunnel Hdr for IPv4 Tunnel with VLANs (256b): Constant (64b): In Egress Hdr Format First Eth Hdr EtherType (16b) = 802.1QS Second Eth Hdr EtherType (16b) = 0x0800 IPHdr Version(4b)/HLen(4b)/Tos(8b) (16b): All can be constant? IP Hdr TTL (8b): Initialized to a contant when sending. IP Hdr Proto (8b) = Substrate Calculated (64b): By Egress Hdr Format IP Pkt Len(16b) : Calculated for each packet. IP Hdr ID(16b): should be unique for each packet sent, so shouldn’t be in Result. IP Hdr Checksum (16b): Needs to be calculated, so shouldn’t be in Result. IP Hdr Flags(3b)/FragOff(13b) (16b) :Frags needed? Local Memory (32b) IP Hdr Src Addr (32b) : tied to physical interface (10 entry table in Egress Hdr Format) From Result (96b) Eth Hdr DA (48b) IP Hdr Dst Addr (32b) VLAN/TCI (16b) Lookup Result Total (Common From Result + Specific From Result): 156 bits (PROBLEM!) Total (Common + Specific) : 348 bits Type=IP (2B) MLI (2B) LEN (2B) PAD (nB) CRC (4B) Meta Frame Dst Addr (4B) Src Addr (4B) Ver/HLen/Tos/Len (4B) ID/Flags/FragOff (4B) TTL (1B) Protocol=Substrate (1B) Hdr Cksum (2B) Type=802.1Q (2B) TCI ≠ VLAN0 (2B) DstAddr (6B) SrcAddr (6B) 31 23 15 7 Buf Handle (32b) MLI(16b) Rsv (16b) Rsv (4b) Port (4b) SL (4b) QID(20b) Eth DA[47:16] (32b) VLAN (16b) Eth DA[15:0] (16b) IP DA (32b) Data format to downstream neighbor

LC: TCAM Lookups on Egress Key: VLAN(16b) TxMI(16b) Result Common across all SL Types (108b): From Result (60b) SL Type(4b) Port(4b) MLI(16b) Ignored for Legacy Traffic QID (20b) Stats Index (16b) Local Memory (48b) Eth Hdr SA (48b) : tied to physical interface (10 entry tbl in Egress Hdr Format) SL Type Specific Headers Legacy (IPv4) with VLAN Hdr (96b): IP Header provided by MR! Constant (16b) In Egress Hdr Format EtherType1 (16b) = 802.1Q Calculated (0b) ARP Lookup on NhAddr (Is ARP cache another database in TCAM?) (48b) Eth DA (48b) From Result (32b) EtherType2 (16b) = IPv4 TCI (16b) Lookup Result Total (Common From Result + Specific From Result): 92 bits Total (Common + Specific) : 188 bits Type=IP (2B) PAD (nB) CRC (4B) IP Payload Dst Addr (4B) Src Addr (4B) Ver/HLen/Tos/Len (4B) ID/Flags/FragOff (4B) TTL (1B) Protocol (1B) Hdr Cksum (2B) Type=802.1Q (2B) TCI ≠ VLAN0 (2B) DstAddr (6B) SrcAddr (6B) 31 23 15 7 Buf Handle (32b) MLI(16b) Rsv (16b) Rsv (4b) Port (4b) SL (4b) QID(20b) VLAN (16b) ETYpe(16b) Data format to downstream neighbor

LC: TCAM Lookups on Egress Key: VLAN(16b) TxMI(16b) Result Common across all SL Types (108b): From Result (60b) SL Type(4b) Port(4b) MLI(16b) Ignored for Legacy Traffic QID (20b) Stats Index (16b) Local Memory (48b) Eth Hdr SA (48b) : tied to physical interface (10 entry table in Egress Hdr Format) SL Type Specific Headers Legacy (IPv4) without VLAN Hdr (64b): IP Header provided by MR! Constant (0b) Calculated (0b) ARP Lookup on NhAddr (Is ARP cache another database in TCAM?) (48b) Eth DA (48b) From Result (16b) EtherType (16b) = IPv4 Lookup Result Total (Common From Result + Specific From Result): 76 bits Total (Common + Specific) : 156 bits Type=IP (2B) PAD (nB) CRC (4B) IP Payload Dst Addr (4B) Src Addr (4B) Ver/HLen/Tos/Len (4B) ID/Flags/FragOff (4B) TTL (1B) Protocol (1B) Hdr Cksum (2B) DstAddr (6B) SrcAddr (6B) 31 23 15 7 Buf Handle (32b) MLI(16b) EType (16b) Rsv (4b) Port (4b) SL (4b) QID(20b) Data format to downstream neighbor

LC: Lookup Block Parameters All lookups will be Exact Match. Ingress: # Databases: 1 4 bits in Key identify the SL Type 0000: DC 0001: IPv4 Tunnel 0010: Legacy (non-substrate) with or without VLAN 0011: VLAN0 0100: MA (with or without VLAN) Core Size: 72b Key Size: 24b - 72b AD Result Size: 64b of which we’ll use 60 bits Egress: Core Size: 36b Key Size: 32b AD Result Size: 128b of which we’ll use different amounts per SL Type With one problem to still work out.

SUMMARY: LC: TCAM Lookups DC Tunnel W/ vlan w/o VLAN0 MA w/ Legacy Legacy w/o vlan Ingress Key 24 72 Result 76 Egress 32 108 156 140 124 60 92 Ingress Key Size: 24 bits or 72 bits Ingress Result Size: 76 bits Egress Key Size: 32 bits Egress Result Size: 60-156 bits The IP Tunnel with VLANs Substrate Link option is a problem. Discussion of ways to handle them are on next slide We also need to watch out for the Egress Result for Tunnels w/o VLANs. If we introduce anything else we want in there then we go beyond the 128 bits supportable through the TCAM’s Associated memory.

Handling IP Tunnel SL with VLANs Result Fields (156 bits): SL Type(4b) Port(4b) MLI(16b) QID (20b) Stats Index (16b) Eth Hdr DA (48b) IP Hdr Dst Addr (32b) VLAN (16b) 128 bits is max size of a Result stored in TCAM Associated Data SRAM Options for handling this Result Not allow this type of SL Might be ok for short term but almost certainly not ok for long term. Find 28 bits we don’t really need in Result Do a second lookup when we find a SL like this. Do a Multi-Hit lookup and put two entries in for these SLs and only one entry for all others. Then concatenate the two results when we get them. Only allow a small fixed number of this type of SL: Store an index in the 4 bits we have left store the extra bits we need in a table in Local memory. However, this is a little tricky since we would then need to get the extra bits from the control plane into Local Memory and we will want Substrate Links to be able to be added dynamically.

MR Lookup Block Control TCAM XScale XScale Rx DeMux Parse Parse DeMux Tx QM Header Format Header Format QM Tx MR (NPUA) MR (NPUB)

Common Router Framework (CRF) Functional Blocks Parse Header Format Rx DeMux Lookup QM Tx MR-1 . . . MR-1 MR-n . . . MR-n MR Lookup Key(NB) MR Id(16b) MR Mem Ptr(32b) Buf Handle(32b) Buffer Handle(32b) MR Id(16b) MR_ID and MR Mem Ptr Not needed for Dedicated IPv4 MR Lookup Function Perform lookup in TCAM based on MR Id and lookup key Result: Output MI QID Stats index MR-specific Lookup Result (flags, etc. ?) How wide can/should this be? MR Mem Ptr(32b) Lookup Result(16B)

MR Lookup Block Requirements Shared NP Lookup Engine specific: Number of Lookups per second required: 1 lookup required per packet 5Gb/s per NP on a blade Average sized packet: 200Bytes, 1600 bits If we assume 6.25 MPkts/sec for 10Gb/s then for 5Gb/s would be 3.125 MPkt/s We would want 3.125 M Lookups/sec per LA-1 Interface, total of 6.25 M Lookups/sec for the TCAM Core. Minimum Sized Packet: 76Bytes, 608 bits If we assume 16.45 MPkts/sec for 10Gb/s then for 5Gb/s would be 8.225 MPkt/s We would want 8.225 M Lookups/sec per LA-1 Interface, total of 16.45 M Lookups/sec for the TCAM Core. Number of MRs to be supported? Will each get its own database? No. This would limit it to 16 which is not enough. How many keys will each MR be limited to? How much of Result can be MR-specific? How much of Key can be MR-specific? How are masks to be supported? Mask core is same size as Data core. One mask per Entry Global Mask Registers also available for masking key to match size of Entry during Multi Database Lookups where the multiple databases have different sizes. How will multiple hits across databases be supported? How will priorities be supported? Priorities within a database are purely by the order of the keys. For example, in a GM filter table if Keys 4 and 7 both match, Key 4 is selected. Priorities across databases will have to be included in the Entries Do we need support for non-exclusive (make a copy) filters? Later? How are GM with fields with ranges supported? The IDT libraries support this by adding multiple entries, each with its own mask, to the DB to cover the range of the field.

IPv4 MR Lookup Entry Examples Route Lookup: Longest Prefix Match Entry (64b): MR ID (16b) MI (16b) DAddr (32b) Mask: (32 + Prefix length) high order bits set to 1 GM Match Lookup Entry (142b): SAddr (32b) Sport(16b) Dport(16b) Protocol_Selector (2b) : 00: Protocol is NOT TCP and so following field should be interpreted as Protocol 01: Protocol is TCP and so following field should be interpreted as TCP_Flags 10: reserved 11: reserved Protocol_TCP_Flags (12b) Mask: Completely general, user defined. EM Match Lookup Entry (136b): Mask: 136 high order bits set to 1

IPv4 MR Lookup Databases How many databases to use? Three Options: 3: a separate DB for each 2: one DB for GM and one for RL and EM 1: RL, GM and EM all in one DB Assumptions: We want to be able to easily change priorities of Filters We want Routes being strictly Longest Prefix is the best Match. A Filter, either Exact Match or Range/Best Match, always takes precedence over a Route EM is generally higher priority than Range/Best Match, but not always. We still want the best highest priority match of each and then compare them. We may not want to pay the overhead penalty of shuffling filter entries when we change priorities. Currently unknown what the penalty will be.

IPv4 MR Lookup Databases Means we would use Multi Database Lookup (MDL) command More efficient use of CAM core entries as each DB could be sized closer to its Entry size Guaranteed at least one Result from each Database if an existing match existed in each database. 2 Databases: We could use MDL command Guaranteed one Result from GM and one from either EM or RL but not both! Order is important: EM filters would all go first in EM/RL DB, with full masks. At most one entry would match EM filters would always be higher priority than Routes. If no EM filter match, we would get the best RL match. RL entries would be sorted by prefix length so first match was the longest. We could use two separate commands: Lookup or MHL for GM and MHL for EM/RL Guaranteed at least one Result from each {GM,EM,RL} if an existing match existed in each. Price: Two lookups per packet. 1 Database: Use Multi Hit Lookup (MHL) command Efficient use of the CAM core entries is a potential problem. Would not be as bad, if we could get the GM filters down to 144 bits by making the MR/MI fields a combined 4 bits shorter. Order is important EM Filters first GM Filters second With Result of 64 bit AD, we can get back at most 4 Results (1 EM and 3 GM or 4 GM or something less…) RL Entries last EM and GM always take priority over RL. Priority field in Results could be used to arbitrate between matched EM and GM filters

IPv4 MR Lookup Example: 3 DBs Order matters: Same Key will be applied to all Databases(MDL) Multi-Database Lookup (MDL) Each Database will use the number of bits it was configured for, starting at the MSB. DAddr field needs to be first TCP_Flags field needs to be last Route Lookup: Longest Prefix Match Key (64b): MR ID (16b) MI (16b) DAddr (32b) GM Match Lookup: Best/Range Match Key (148b): SAddr (32b) Protocol (8b) Sport(16b) Dport(16b) TCP_Flags (12b) MASK/Ranges How will we handle Masks for Addr fields Ranges for Port fields Wildcard for Protocol field EM Match Lookup: Exact Match Key (136b): DAddr (32b) SAddr (32b) Sport (16b) TCP_Flags (12b) Protocol (8b) DPort (16b) MI (16b) MR ID (16b)

IPv4 MR Lookup Example: 3 DBs Lookup Key: 148 bits out of 5 32-bit words transmitted with Lookup command. MR ID (16b) MI (16b) DAddr(32b) SAddr(32b) DPort(16b) SPort(16b) Proto (8b) TCP_Flags (12b) Pad (12b) W1 W2 W3 W4 W5 MDL Mask Mask Data Core Entries for EM DB 136 bits Core Size: 144 bits GMR=0xFFFFFFFFF 0xFFFFFFFFF 0xFFFFFFF00 Mask Data Core Entries For RL DB 64 bits Core Size: 72 bits GMR=0xFFFFFFFFF 0xFFFFFFF00 GMR Data GMR GMR Core Entries for GM DB 148 bits Core Size: 288 bits GMR=0xFFFFFFFFF 0xFFFFFFFFF 0xF00000000 0x000000000

IPv4 MR Lookup Result 31 23 15 7 Buf Handle (32b) Data format to QID(20b) Output MI (16b) Priority(8b): range 0-255 Only used by Lookup block to handle arbitration when multiple hits occur. Drop(1b) DAddr (8b) : This identifies the blade this packet is destined for. We can control the MAC Addresses of the Blades, so lets say that 40 of the 48 bits of DAddr are constant across all blades and 8 bits are assigned and stored in the Lookup Result. Will 8 bits be enough to support multiple chasses? We could go up to 12 bits and still use 64bit Associated Data Port(8b) Stats Index (8b): 256 indices for stats Lookup block will take care of incrementing the Stats counters MrBits(40b): Used for things like NhMnFlags and NhMnAddr For IPv4 MR, we might use MrBits[39:32] for the MnFlags and MrBits[31:0] for the NhMnAddr (IPv4 Address) Total of109 bits 31 23 15 7 Buf Handle (32b) Data format to downstream neighbor (HD): H: Hit, D:Drop MI(16b) DA(8b) Port(8b) HD (2b) Rsv (2b) MrBits [39:32](8b) QID(20b) MrBits[31:0](32b)

IPv4 MR Database Core Sizes Route Database Core Size: 72b Entries per Segment: 8K Number of Entries needed per route: 1 Number of Routes per Segment: 8K GM Database Core Size: 288b Entries per Segment: 2K Number of Entries needed per filter: dependent on filter Number of Filters per Segment: <= 2K EM Database Core Size: 144b Entries per Segment: 4K Number of Entries needed per filter: 1 Number of Filters per Segment: 4K Configuration used in our FPX-based Router: 32 GM filters 10K Ingress EM Filters 10K Egress EM Filters ~ 40K Route Entries Configuration needed to achieve approximately the same numbers: 5 Segments for Route Database 1 Segment for GM Database 5 Segments for EM Database Total of 11 Segments (out of a total of 32 in TCAM)

IPv4 MR Database AD Usage Each Segment can be configured with a Base Address and a result size for calculating an address into the Associated Data. The Associated Data is stored in a 512K x 36 bit ZBT SRAM Using 64bit Results will give us 256K slots in the AD SRAM. 48K Route DB Entries <= 2K GM DB Entries 20K EM DB Entries Max Total of 70K Results needed. Plenty of room in the AD for the IPv4 MR Results

MPLS Lookup MPLS uses a 20 bit Label Key (52 bits): MR ID (16b) MI (16b) MPLS_Label (20b) Use an Exact Match Database MPLS Label Database Core Size: 72b Entries per Segment: 8K Number of Entries needed per label: 1 Number of Labels per Segment: 8K Drop Bit: Does MPLS need a Drop Bit? Perhaps it would use a Miss as the same thing as Drop. That is, the fact that a label is not entered in the Database is an indication that frames using that label should be dropped. But, if we explicitly have a drop bit than Hits on those Entries could be counted separately from Misses. What will MPLS Label Lookup Result look like? New Label (20 bits) QID(20b) Output MI (16b) Stats Index(16b) Drop Bit (1b) Total of 73 bits (128 bit wide Associated Data) NOTE: We could use a 64 bit AD if we did not use the Drop bit and only supported 8-bit Stats Index.

MPLS Lookup Result Examples Reserved (3b): Don’t uses these, they will not show up in Results Mailbox. New Label (20b) QID(20b) Output MI (16b) Stats Index(16b) Drop bit (1b) Total of 73 bits (76 counting reserved bits) DB will use 128 bits of associated data and will return the Associated Data followed by the Absolute Index. We don’t need the Absolute Index and we don’t need the top 3 bits of the AD. With this ordering we just have to read the first 4 words on the results Mailbox instead of 5. RTN=1b ADSP=1b AD WIDTH=10b Results Mailbox: D: Done (1b): set to 1 when search is completed. H: Hit (1b): set to 1 if the search was successful and result is valid, 0 otherwise MH: MHit (1b): set to 1 if search was successful and there were additional hits in database. Absolute Index: Index offset from beginning of TCAM array. Associated Data: 128 bits of Associated Data from the Associated Data ZBT SRAM attached to TCAM. D H MH Associated Data [124:96] Associated Data [95:64] Associated Data [63:32] Associated Data [31:0] Results Mailbox Reserved[31:22] Absolute Index[21:0] Not Used Not Used Not Used

Lookup Block Implementation Plan Investigate impact of shortening MR_ID to 12 bits How much shifting, masking and anding will this take? How costly in cycles will that be? Phase 0: Implement a generic Lookup Block with 1 Database With the right #ifdef’s to generalize it this should work for: LC-Ingress LC-Egress IPv4 MR with 1 combined DB MPLS MR This may be what we run for the November demo. Phase 1: Implement a 2 or 3 DB IPv4 MR Lookup Block I believe that for flexibility and ease of management this will be what we really want for this project and for ONL. Phase 2: Shared NPU Lookup Block

Lookup Block . CTX-0 QDR SRAM NSE Interface TCAM CTX-1 . . . CTX-2 SRAM Controller CTX-1 In NN Ring Out NN Ring . . . CTX-2 . . . KEY KEY KEY KEY Result Result Result Result . CTX-7

Lookup Block CTX-x In NN !Empty Out NN !Full NSE Result Read Done Input NN Ring is not empty, something for us to read. Out NN !Full Output NN Ring is not full, space for us to write to it. NSE Result Read Done Our Read of Results Mailbox has completed. Next_Ctx Start Our turn to read from the In NN Ring. Next_Ctx Done Our turn to write to the Out NN Ring. Next_Ctx Start Next_Ctx Done CTX-x NSE Result Read Done In NN !Empty Out NN !Full Next_Ctx Start Next_Ctx Done

Ingress LC Lookup Block Pseudocode Initialization Phase Start Wait on ((Next_Ctx Start signal) and (In NN Ring !Empty signal)) Phase 1 Assert Next_Ctx Start signal Read In NN Ring(buf_handle, Key, SL_Type) Extract Key of correct size based on SL_Type Build Lookup command (IDT Macro) Send Lookup command to NSE (sram[] write instruction) Calculate Delay Time and Wait (IDT Macro) Phase 2 Issue Command to Read Result from Results Mailbox (IDT Macro) Macro does Wait for Result and checks Done bit and continues to read until Done bit is set. Wait for ((Next_Ctx Done signal) and (Out NN Ring !Full signal)) Phase 3 Assert Next_Ctx Done signal Send (buf_handle, Result) to Out NN Ring GoTo Phase 1

Egress LC Lookup Block Pseudocode Initialization Phase Start Wait on ((Next_Ctx Start signal) and (In NN Ring !Empty signal)) Phase 1 Assert Next_Ctx Start signal Read In NN Ring(buf_handle, Offset, Key) Extract Key of VLAN and TxMI Build Lookup command (IDT Macro) Send Lookup command to NSE (sram[] write instruction) Calculate Delay Time and Wait (IDT Macro) Phase 2 Issue command to Read Result from Results Mailbox (IDT Macro) Macro does Wait for Result and checks Done bit and continues to read until Done bit is set. Wait for ((Next_Ctx Done signal) and (Out NN Ring !Full signal)) Phase 3 Assert Next_Ctx Done signal Send (buf_handle, Offset, Result) to Out NN Ring Wait on Next_Ctx Start signal GoTo Phase 1

IPv4 MR 3 DB Lookup Block Pseudocode Initialization Phase Initialize GMR_GM, GMR_EM, GMR_RL for each type/size of lookup database/key Start Wait on ((Next_Ctx Start signal) and (In NN Ring !Empty signal)) Phase 1 Assert Next_Ctx Start signal Read In NN Ring(buf_handle, dram_ptr(?), Offset, MR_Id, Input_MI, MR_Mem_Ptr, Key) Extract Key of correct number of bits Build Multi Database Lookup (MDL) command using Key and GMR_GM, GMR_EM, GMR_RL IDT Macro Send MDL command to NSE (sram[] write instruction) Calculate Delay Time and Wait (IDT Macro) Phase 2 Issue command to Read Result from Results Mailbox (IDT Macro) Macro does Wait for Result and checks Done bit and continues to read until Done bit is set. If no hits, zero Out_Result and then set Miss bit Else compare priority of hits and select highest priority and write into Out_Result Wait for ((Next_Ctx Done signal) and (Out NN Ring !Full signal)) Phase 3 Assert Next_Ctx Done signal Send (buf_handle, dram_ptr(?), Offset, MR_Id, MR_Mem_ptr, Out_Result) to Out NN Ring GoTo Phase 1

Extra The next set of slides are for templates or extra information if needed

Text Slide Template

Image Slide Template

IPv4 MR Lookup Result Examples PROBLEM: one problem with using the MDL is that we do not get an index back with our results. Hence we will not be able to easily increment a counter based on the lookup result. Multi Database Lookup (MDL) cmd returns one and only one of the following per database searched: Absolute Index Translated Index Associated Data Lookup cmd returns Absolute Index followed by Associated Data Associated Data followed by Absolute Index Options 1: three back-to-back Lookup cmds, one for each database. Each result would provide us with the result data Index This requires 3 times the number of lookups in the TCAM. Option 2: Use MDL but have result include the index and not the data. We would then have to have the result data in a separate memory that we would then read. Option 3: Add a Stats index to Result. Keep table of counters and increment based on index. Result (for MDL, really only 61 out of the 64 bits available): QID(20b) Output MI (16b) Priority(8b): range 0-255 Drop(1b) Stats Index (16b): 65535 indices for stats Total of 61 bits Note: If we increase the result size to 128 bits, then we CANNOT get 3 results back in the results mailbox. I like Option 3 and will continue along those lines.

MPLS Lookup Result Examples Note: No Stats Index included in Results Option 1: Use the Absolute Index returned with the Associated Data, to locate a counter to increment Option 2: Increase the result size to 128 bits. This would also allow us to put the Drop bit in. Result: New Label (20b) QID(20b) Output MI (16b) Stats Index(16b) Drop bit (1b) Total of 73 bits Lets go with Option 2.

TCAM Latency Data IDT App Note AN-459: “IDT75K72234 Instruction Latency” Provides data and examples for latency calculations Assumptions: The NSE has no instructions in the pipeline The measured instruction is the only instruction issued The other NSE interfaces are idle.

TCAM Latency Data Example from IDT App Note AN-459 288-bit Lookup (288/32 = 9 QDR cycles to transfer 288 bit key) 32-bits of Associated ZBT SRAM data is returned QDR clock frequency is 200 MHz System clock frequency is 200 MHz Description Clock Domain Freq (MHz) # of clocks Time (ns) QDR xfer time QDR 200 9 45 Instruction FIFO 2 10 Synchronizer System 3 15 Execution Latency 32 160 Re-Synchronizer 1 5 Total Time 47 235 Execution Latency numbers are from Table 1 of AN-459

TCAM Latency Data Parameters for our LC Ingress Lookup 128-bit Lookup (128/32 = 4 QDR cycles to transfer 128 bit key) 128-bits of Associated ZBT SRAM data is returned QDR clock frequency is 200 MHz System clock frequency is 200 MHz Core Blocking (CB) Delay: 8 cycles Backend Latency: 14 cycles Description Clock Domain Freq (MHz) # of clocks Time (ns) QDR xfer time QDR 200 4 20 Instruction FIFO 2 10 Synchronizer System 3 15 Execution Latency 36 180 Re-Synchronizer 1 5 Total Time 46 230 Core Blocking 8 40 Total Time + CB 54 270

TCAM Latency Data Parameters for possible LC Ingress Lookup 72-bit Lookup (72/32 = 3 QDR cycles to transfer 72 bit key) 128-bits of Associated ZBT SRAM data is returned QDR clock frequency is 200 MHz System clock frequency is 200 MHz Core Blocking (CB) Delay: 8 cycles Backend Latency: 14 cycles Description Clock Domain Freq (MHz) # of clocks Time (ns) QDR xfer time QDR 200 3 15 Instruction FIFO 2 10 Synchronizer System Execution Latency 36 180 Re-Synchronizer 1 5 Total Time 45 225 Core Blocking 8 40 Total Time + CB 53 265

TCAM Latency Data Parameters for possible LC Ingress Lookup 72-bit Lookup (72/32 = 3 QDR cycles to transfer 72 bit key) 64-bits of Associated ZBT SRAM data is returned QDR clock frequency is 200 MHz System clock frequency is 200 MHz Core Blocking (CB) Delay: 4 cycles Backend Latency: 10 cycles Description Clock Domain Freq (MHz) # of clocks Time (ns) QDR xfer time QDR 200 3 15 Instruction FIFO 2 10 Synchronizer System Execution Latency 32 160 Re-Synchronizer 1 5 Total Time 41 205 Core Blocking 4 20 Total Time + CB 45 225

TCAM Latency Data Parameters for possible LC Ingress Multi Hit Lookup 72-bit Lookup (72/32 = 3 QDR cycles to transfer 72 bit key) 64-bits of Associated ZBT SRAM data is returned QDR clock frequency is 200 MHz System clock frequency is 200 MHz Core Blocking (CB) Delay: 6 cycles Backend Latency: 10 cycles Description Clock Domain Freq (MHz) # of clocks Time (ns) QDR xfer time QDR 200 3 15 Instruction FIFO 2 10 Synchronizer System Execution Latency 32 160 Re-Synchronizer 1 5 Total Time 41 205 Core Blocking 6 30 Total Time + CB 47 235

IDT Data Plane Macro API IDT provides a set of macros for creating commands to the TCAM Here are some that will be particularly useful to us: IipcMakeBase() IipcMakeDirectInstruction() IipcMakeIndirectInstruction() IipcMakeSubInstruction() IipcQDRDelay() IipcNPUDelay() IipcSignalXXXDone() IipcSramRead() IipcFormContextFromCsrMeCtx() IipcMake36BitLookupInstruction() IipcSramReadResultStatus()

IDT Data Plane Macro API IipcSramRead() … .sig sramread .reg $t00, $t01, $t02, $t03, $t04, $t05, $t06, $t07 .xfer_order_rd $t00 $t01 $t02 $t03 $t04 $t05 $t06 $t07 .set $t00, $t01, $t02, $t03, $t04, $t05, $t06, $t07 ; Read the first word, re-try the reading until Done bit is set SRAMREAD#: sram[ read, $t00, base, 0x0, 1 ], ctx_swap[ sramread ] br_bclr[ $t00, 31, SRAMREAD# ] ; Now, read again to make sure the index is set correctly when the Done bit is cleared sram[ read, $t00, base, 0x0, amount ], ctx_swap[ sramread ] ; Manually set the Done bit on the returned result immed_w0[ result[0], 0x0000 ] immed_w1[ result[0], 0x8000 ] alu[ result[0], result[0], or, $t00 ] ; Transfer the rest of results depending on read amount IipcSramReadResultStatus() #macro IipcSramReadResultStatus( result, amount, base, regnum ) .begin .reg lowword .set lowword immed_w0[ lowword, ( 0x1 << 5 | regnum << 2 ) ] immed_w1[ lowword, 0 ] ;SRAMREAD#: sram[ read, result, base, lowword, amount ], ctx_swap[ sramread ] ; sram[ read, result, base, 0x0, amount ], ctx_swap[ sramread ] ; br_bclr[ result, 31, SRAMREAD# ] .end #endm

IDT Data Plane Macro API: Example Example of a Direct Lookup command ; channel = 0 ; select = 0 ; context = 4 IipcMakeBase[ iipc_base_word, 0x0, IIPC_DOUBLE_32MB_SELECT_0, 0x4 ] ; instruction = 0 (IIPC_LOOKUP) ; gmask = 31 (no GMR) ; database = 0 (database 0) IipcMakeDirectInstruction[ iipc_command_word, IIPC_LOOKUP, IIPC_NO_GMASK, 0x0 ] ; Create the 72 bit search key to lookup in the write transfer registers. ; key = 0xBBBBBBBBBBBBBBBBBB immed_w0[ data, 0xBBBB ] immed_w1[ data, 0xBBBB ] alu[ $w00, --, B, data ] alu[ $w01, --, B, data ] immed_w0[ data, 0x0000 ] immed_w1[ data, 0xBB00 ] alu[ $w02, --, B, data ] ; perform QDR write, sending command to NSE sram[ write, $w00, iipc_base_word, iipc_command_word, 3 ], ctx_swap[sramwrite] ; compute approximate delay time and set signal IipcSignalLookupDone[ 3, 72 ] ; perform a read that does not return until the NSE results mailbox Done bit is set IipcSramRead[ $r00, 1, iipc_base_word ]

TCAM Performance Three Performance metrics: IXP/TCAM LA-1 Interface Single LA-1 Max Lookup Rate CAM Core Max Lookup Rate Associated Data Width IXP/TCAM LA-1 Interface 16 bits wide 200 MHz QDR Effectively 32bits per clock tick So getting Key in and Result out is 32bits/tick Example: 128b Key would take 4 ticks to get clocked into TCAM. Max of 50 M Lookups/sec Example: 128b results from ZBT SRAM via TCAM would take 4 ticks to get clocked out of TCAM. Max of 50 M Results/sec Performance Numbers Key Size (200MHz, 16 bit Interface for commands) Rates are max PER LA-1 Interface, up to CAM Core max rate 32b (1W): 50M Lookups/sec (CAM Core constraint) 36b (2W): 50M Lookups/sec (CAM Core constraint) 64b (2W):100M Lookups/sec (Interface constraint) 72b (3W): 67M Lookups/sec (Interface constraint) 128b (4W): 50M Lookups/sec (Interface constraint)

TCAM Performance Performance Numbers (continued) Result Type Index or Pointer Types: Key Size: Max CAM Core lookup rate 36b: 50M/sec 72b: 100M/sec 144b: 100M/sec 288b: 50M/sec 576b: 25M/sec Associated Data: CAM Core rates: These rates are Total across both LA-1 Interfaces Key Size: (32b Result Rate, 64b Result Rate, 128b Result Rate) 36b: 50, 50, 25 72b: 100, 50, 25 144b: 100, 50, 25

TCAM Performace Tables

TCAM Performace Graphs

TCAM Performace Graphs

OLD The rest of these are old slides that should be deleted at some point.

START: LC-Rx With MA SL on a PT ML This set of slides is with the assumption that we DO need to support a MA SL on one of a Pass Through MetaLink

SUMMARY: LC: TCAM Lookups DC Tunnel VLAN0 MA w/ vlan w/o Legacy w/ vlan Legacy w/o vlan RX Key 24 72 Result 61* TX 32 96 128 112 64 48 80 Rx Key: 72 bits Rx Result Size: 61* bits 61 bits if there is no need for NhAddr Multiple results if there is a need for NhAddr, see earlier discussion. Tx Key Size: 32 bits Tx Result Size: 128 bits We need to watch out for the Tx Result for Tunnels. If we introduce anything else we want in there then we go beyond the 128 bits supportable through the TCAM’s Associated memory.

LC: TCAM Lookup Keys on RX P2P-DC Port(8b) MLI(16b) 24 bits IPv4 Tunnel Port (8b) EtherType (16b) 0x0800 IP SAddr (32b) MLI (16b) 72 bits MA Port (8b) Ethernet SAddr (48b) MLI (16b) 72 bits P2P-VLAN0 Port(8b) MLI(16b) 24 bits Legacy Port (8b) EtherType (16b) 0x0800 24 bits DstAddr (6B) Blue Shading: Determine SL Type Black Outline: Key Fields from pkt DstAddr (6B) SrcAddr (6B) SrcAddr (6B) Type=802.1Q (2B) Type=802.1Q (2B) TCI ≠ VLAN0 (2B) TCI ≠ VLAN0 (2B) Type=802.1Q (2B) MLI (2B) LEN (2B) Meta Frame TCI (2B) Type=Substrate (2B) PAD (nB) CRC (4B) DstAddr (6B) SrcAddr (6B) Type=IP (2B) Multi-Access Type=802.1Q (2B) MLI (2B) LEN (2B) Meta Frame TCI≠VLAN0 (2B) Type=Substrate (2B) PAD (nB) CRC (4B) DstAddr (6B) SrcAddr (6B) Type=802.1Q (2B) MLI (2B) LEN (2B) Meta Frame TCI=VLAN0 (2B) Type=Substrate (2B) PAD (nB) CRC (4B) DstAddr (6B) SrcAddr (6B) P2P-VLAN0 Type=IP (2B) Ver/HLen/Tos/Len (4B) Ver/HLen/Tos/Len (4B) ID/Flags/FragOff (4B) ID/Flags/FragOff (4B) TTL (1B) TTL (1B) Protocol=Substrate (1B) Protocol (1B) Hdr Cksum (2B) Hdr Cksum (2B) Src Addr (4B) Src Addr (4B) Dst Addr (4B) Dst Addr (4B) MLI (2B) IP Payload LEN (2B) Meta Frame PAD (nB) PAD (nB) CRC (4B) CRC (4B) P2P-DC Configured P2P-Tunnel Legacy

LC: TCAM Lookup Results on RX Fields we definitely need: VLAN (16b) We could probably drop this to 12b since our switch is supposed to only support 4K VLANs but we might want to leave this open to switches supporting larger numbers of VLANs. MI (16b) Blade Eth Hdr (8b) Only needs to have the DAddr. The rest can be constant and configured: SAddr can be configured and constant per LC First EtherType can be constant: 802.1Q Second EtherType can be constant: Substrate We can control the MAC Addresses of the Blades, so lets say that 40 of the 48 bits of DAddr are fixed and 8 bits are assigned and stored in the Lookup Result. Will 8 bits be enough to support multiple chasses? QID (20b) Possible Fields to handle pass-through Meta Links: MnFlags(8b): (see next slide) NhAddr(60b): (If needed, see next slide)

LC: TCAM Lookup Results on RX Can we say there will be no Pass Through Meta Links where one side will be on a Multi Access and hence might need a NhAddr field? If so then we can drop the MnFlags and NhAddr fields from result Result size then becomes: 60b Pass Through Meta Link Fields: MnFlags(8b) NhAddr(nB) We have 60 bits left over that could be used for NhAddr If we need more, options: Do a second lookup with the following fields to retrieve the Next Hop bits from another Database for NH Bits? Port (8b) VLAN (16b) MI (16b) If the size indicated in the MnFlags is greater than 60 bits, use put an Memory Pointer in the bits after the MnFlags and lookup the NhAddr in memory. We could even include 32 bits of the NhAddr in the original TCAM result and still have a memory pointer to get the rest. This might cut down on memory access time needed to retrieve the NhAddr.

Rx Handling NhAddr for Pass-Through MLs For Tx to handle Multi-Access Substrate Links we need to provide an ARP capability on behalf of the MetaNets. For MR  LC-TX: to do this we allow the MRs to give the LCs a MN Next Hop Address that the LC-TX will do a lookup on to see if we have a MAC address for and if not, issue an ARP to request it. But, the LC does not know anything in general about MN Addresses or their size. Included in the MnFlags fields is the size so Tx can handle variable sizes. The MnFlags field has a 6 bit length in bytes, so it can be up to 63 bytes But what is a good limit? IPv4 is 32 bits IPv6 is 128 bits But IPv6 uses the Neighbor Discovery Protocol to do what ARP does, and NDP does a lot more. We have a lot to learn about IPv6… For Pass Through MetaLinks where one side is MA we need the LC-RX to have in its lookup result the Next Hop Address so the Tx can do the translation to MAC address in the same way. Can we assume that this won’t happen? Actually it is probably fairly likely to happen on access links. Perhaps the MetaNet does not have a MR located at the first Substrate Router but its access MetaLinks pass-through If we find that we have to store the MnNhAddr in the Rx Result, I think we can make it variable sized by using Multi-Hit Lookups (MHL) and storing the same Key multiple times with different results for each one. Each subsequent result would be the additional bits to concatenate on to the MnNhAddr. In the Results Mailbox for a MHL, we can have at most 250 bits in 2 128-bit AD Results or 244 bits in 4 64-bit AD Results or 232 bits in 8 32-bit AD Results So, lets assume that in general we don’t need the NhAddr in the Rx entry. (next slide…)

Rx Handling NhAddr for Pass-Through MLs Rx Result (61 bits) VLAN (16b) MI (16b) Blade Eth Hdr (8b) QID (20b) Continuation bit (1b): 0: no need for MnFlags and NhAddr, MHL should report MHit of 0 1: MnFlags and NhAddr contained in subsequent results, MHL should report MHit of 0. This then leaves us 3 more possible results to the MHL, giving us (3*61)-8 = 155 bits of Next Hop Address. We have to be careful when writing the entries for Rx: We write the main and subsequent entries in the right order. I assume that the order of the results is based on the priority of the entries in the TCAM which is determined by order. The continuation bit is set correctly. Actually this is just a safety check. If we always do a MHL then the MH bit in the result should tell us if there are subsequent results.

END: LC-RX With MA SL on a PT ML This set of slides is with the assumption that we DO need to support a MA SL on one of a Pass Through MetaLink

Iipc.uc Macros Validate Macros: IipcValidateChannel IipcValidateSelect #if/#error Range and Value checking on fields and parameters IipcValidateChannel IipcValidateSelect IipcValidateContext IipcValidateInstruction IipcValidateLearnOptions IipcValidateGMask IipcValidateDatabase IipcValidate36BitMode IipcValidatePpe IipcValidatePp IipcValidateSubInstruction IipcValidateRegion IipcValidateSubRegion IipcValidateComponent IipcValidateAddress IipcValdiateDatawords

Iipc.uc Macros Instruction Building Macros IipcMakeBase Build the encoded TCAM instruction/command IipcMakeBase IipcMakeDirectInstruction IipcMake36BitLookupInstruction IipcMakePreloadInstruction IipcMakeIndirectInstruction IipcMakeLearnSubInstruction IipcMakeMultiHitInvalidateSubInstruction IipcMakeReadWriteSubRegionSubInstruction IipcMakeSramCopySubInstruction IipcMakeMdlSubInstruction IipcMakeResetSubInstruction IipcMakeFlushSubInstruction IipcMakeSubInstruction

Iipc.uc Macros Results Macros Time and Delay Macros Utility Macros IipcSramRead IipcSramReadResultStatus IipcGetAssocData Time and Delay Macros IipcStartTimestamp IipcDelayUsingFutureCount IipcQDRDelay IipcNPUDelay Utility Macros IipcFormContextFromCsrMeCtx

Iipc.uc Macros Signal Instruction Done Macros IipcSignalLookupDone IipcSignalMHLDone IipcSignalReadDone IipcSignalWriteDone IipcSignalWriteKeepValidDone IipcSignalDualWriteDone IipcSignalLearnDone IipcSignalSetValidDone IipcSignalClearValidDone IipcSignalSramCopyDone IipcSignalMHIDone IipcSignalMDLDone IipcSignalRMDLDone IipcSignalLearnInitDone IipcSignalFlushDone IipcSignalParityCheckDone

Iipc.uc Macros IipcValidateChannel IipcValidateSelect IipcValidateContext IipcValidateInstruction IipcValidateLearnOptions IipcValidateGMask IipcValidateDatabase IipcValidate36BitMode IipcValidatePpe IipcValidatePp IipcValidateSubInstruction IipcValidateRegion IipcValidateSubRegion IipcValidateComponent IipcValidateAddress IipcValdiateDatawords IipcMakeBase IipcMakeDirectInstruction IipcMake36BitLookupInstruction IipcMakePreloadInstruction IipcMakeIndirectInstruction IipcMakeLearnSubInstruction IipcMakeMultiHitInvalidateSubInstruction IipcMakeReadWriteSubRegionSubInstruction IipcMakeSramCopySubInstruction IipcMakeMdlSubInstruction IipcMakeResetSubInstruction IipcMakeFlushSubInstruction IipcMakeSubInstruction IipcSramRead IipcSramReadResultStatus IipcStartTimestamp IipcDelayUsingFutureCount IipcQDRDelay IipcNPUDelay IipcSignalLookupDone IipcSignalMHLDone IipcSignalReadDone IipcSignalWriteDone IipcSignalWriteKeepValidDone IipcSignalDualWriteDone IipcSignalLearnDone IipcSignalSetValidDone IipcSignalClearValidDone IipcSignalSramCopyDone IipcSignalMHIDone IipcSignalMDLDone IipcSignalRMDLDone IipcSignalLearnInitDone IipcSignalFlushDone IipcSignalParityCheckDone IipcFormContextFromCsrMeCtx IipcGetAssocData

IipcMll.uc Macros ix_tcam_lkup_build_handle ix_tcam_lkup_start ix_tcam_lkup_complete ix_tcam_lkup_get_index ix_tcam_lkup_get_data

LC: Notes on TCAM Lookups Lookup Key size options: Key Sz 32 36 64 72 96 128 144 160 192 224 256 288 320 352 384 416 448 480 512 544 576 Core Sz Ticks 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Lookup Result options: Absolute Index: relative to beginning of TCAM array Database Relative Index: relative to beginning of selected database Memory Pointer Index: points to SRAM location of result data Associated Data: 32, 64 or 128 bits of data associated with the lookup result. Associated Data is stored in ZBT SRAM attached to TCAM.

LC: TCAM Lookup Keys on RX P2P-DC Port(8b) MLI(16b) 24 bits IPv4 Tunnel Port (8b) EtherType (16b) 0x0800 IP SAddr (32b) MLI (16b) 72 bits MA Port (8b) Ethernet SAddr (48b) MLI (16b) 72 bits P2P-VLAN0 Port(8b) MLI(16b) 24 bits Legacy Port (8b) EtherType (16b) 0x0800 24 bits DstAddr (6B) Blue Shading: Determine SL Type Black Outline: Key Fields from pkt DstAddr (6B) SrcAddr (6B) SrcAddr (6B) Type=802.1Q (2B) Type=802.1Q (2B) TCI ≠ VLAN0 (2B) TCI ≠ VLAN0 (2B) Type=802.1Q (2B) MLI (2B) LEN (2B) Meta Frame TCI (2B) Type=Substrate (2B) PAD (nB) CRC (4B) DstAddr (6B) SrcAddr (6B) Type=IP (2B) Multi-Access Type=802.1Q (2B) MLI (2B) LEN (2B) Meta Frame TCI≠VLAN0 (2B) Type=Substrate (2B) PAD (nB) CRC (4B) DstAddr (6B) SrcAddr (6B) Type=802.1Q (2B) MLI (2B) LEN (2B) Meta Frame TCI=VLAN0 (2B) Type=Substrate (2B) PAD (nB) CRC (4B) DstAddr (6B) SrcAddr (6B) P2P-VLAN0 Type=IP (2B) Ver/HLen/Tos/Len (4B) Ver/HLen/Tos/Len (4B) ID/Flags/FragOff (4B) ID/Flags/FragOff (4B) TTL (1B) TTL (1B) Protocol=Substrate (1B) Protocol (1B) Hdr Cksum (2B) Hdr Cksum (2B) Src Addr (4B) Src Addr (4B) Dst Addr (4B) Dst Addr (4B) MLI (2B) IP Payload LEN (2B) Meta Frame PAD (nB) PAD (nB) CRC (4B) CRC (4B) P2P-DC Configured P2P-Tunnel Legacy

IPv4 MR Lookup Result Examples QID(20b) Output MI (16b) Priority(8b): range 0-255 Drop(1b) DAddr (8b) : This identifies the blade this packet is destined for. We can control the MAC Addresses of the Blades, so lets say that 40 of the 48 bits of DAddr are constant across all blades and 8 bits are assigned and stored in the Lookup Result. Will 8 bits be enough to support multiple chasses? We could go up to 12 bits and still use 64bit Associated Data Port(8b) Stats Index (8b): 256 indices for stats Total of 69 bits Each Database will have 64 bits of associated data, of which we will use the low order 61 bits. And for MDL lookups only 61 of 64 bits of Associated Data is returned. RTN=1b ADSP=1b AD WIDTH=01b Results Mailbox: D: Done (1b): set to 1 when ALL searches are completed. H: Hit (1b): set to 1 if the search was successful and result is valid, 0 otherwise MH: MHit (1b): set to 1 if search was successful and there were additional hits in database. R: Reserved bits. AD (Associated Data): 61 of the 64 bits of Associated Data from the Associated Data ZBT SRAM attached to TCAM. 31 23 15 7 Buf Handle (32b) MI(16b) Stats(8b) Port(8b) 0H0D (4b) Pri (8b) QID(20b) Data format to downstream neighbor (0H0D): H: Hit, D:Drop AD[60:32] (1st Search) D H MH Results Mailbox AD[31: 0] (1st Search) R AD[60:32] (2nd Search) H MH AD[31: 0] (2nd Search) AD[60:32] (3rd Search) R H MH AD[31: 0] (3rd Search) Not Used Not Used

LC: TCAM Lookups on Egress Key: VLAN(16b) TxMI(16b) Result Common across all SL Types (108b): From Result (60b) SL Type(4b) Port(4b) MLI(16b) QID (20b) Stats Index (16b) Local Memory (48b) Eth Hdr SA (48b) : tied to physical interface (10 entry table in Egress Hdr Format) SL Type Specific Headers P2P-DC Hdr (64b) Constant (16b): In Egress Hdr Format EtherType (16b) = Substrate Calculated (0b) From Result (48b) Eth DA (48b) Lookup Result Total (Common Result + Specific Result): 108 bits Total (Common + Specific) : 156 bits Type=802.1Q (2B) MLI (2B) LEN (2B) Meta Frame TCI (2B) Type=Substrate (2B) PAD (nB) CRC (4B) DstAddr (6B) SrcAddr (6B) 31 23 15 7 Buf Handle (32b) MLI(16b) Stats(16b) Rsv (4b) Port (4b) SL (4b) QID(20b) Eth DA[47:16] (32b) Rsv (16b) Eth DA[15:0] (16b) Data format to downstream neighbor

LC: TCAM Lookups on Egress Key: VLAN(16b) TxMI(16b) Result Common across all SL Types (108b): From Result (60b) SL Type(4b) Port(4b) MLI(16b) QID (20b) Stats Index (16b) Local Memory (48b) Eth Hdr SA (48b) : tied to physical interface (10 entry table in Egress Hdr Format) SL Type Specific Headers MA Hdr (64b) : Constant (16b): In Egress Hdr Format EtherType (16b) = Substrate Calculated (0b) ARP Lookup on NhAddr (Is ARP cache another database in TCAM?) (48b) Eth DA (48b) From Result (0b) Lookup Result Total (Common From Result + Specific From Result): 60 bits Total (Common + Specific) : 156 bits MLI (2B) LEN (2B) Meta Frame Type=Substrate (2B) PAD (nB) CRC (4B) DstAddr (6B) SrcAddr (6B) 31 23 15 7 Buf Handle (32b) MLI(16b) Stats(16b) Rsv (4b) Port (4b) SL (4b) QID(20b) Data format to downstream neighbor

LC: TCAM Lookups on Egress Key: VLAN(16b) TxMI(16b) Result Common across all SL Types (108b): From Result (60b) SL Type(4b) Port(4b) MLI(16b) QID (20b) Stats Index (16b) Local Memory (48b) Eth Hdr SA (48b) : tied to physical interface (10 entry table in Egress Hdr Format) SL Type Specific Headers MA with VLAN Hdr (96b) : Constant (32b): In Egress Hdr Format EtherType1 (16b) = 802.1Q EtherType2 (16b) = Substrate Calculated (0b) ARP Lookup on NhAddr (Is ARP cache another database in TCAM?) (48b) Eth DA (48b) From Result (16b) VLAN/TCI (16b) Lookup Result Total (Common From Result + Specific From Result): 76 bits Total (Common + Specific) : 188 bits Type=802.1Q (2B) MLI (2B) LEN (2B) Meta Frame TCI≠VLAN0 (2B) Type=Substrate (2B) PAD (nB) CRC (4B) DstAddr (6B) SrcAddr (6B) 31 23 15 7 Buf Handle (32b) MLI(16b) Stats(16b) Rsv (4b) Port (4b) SL (4b) QID(20b) Rsv (16b) VLAN (16b) Data format to downstream neighbor

LC: TCAM Lookups on Egress Key: VLAN(16b) TxMI(16b) Result Common across all SL Types (108b): From Result (60b) SL Type(4b) Port(4b) MLI(16b) QID (20b) Stats Index (16b) Local Memory (48b) Eth Hdr SA (48b) : tied to physical interface (10 entry table in Egress Hdr Format) SL Type Specific Headers P2P-VLAN0 Hdr (96b): Constant (32b): In Egress Hdr Format EtherType1 (16b) = 802.1Q EtherType2 (16b) = Substrate Calculated (0b) From Result (64b) Eth DA (48b) VLAN/TCI (16b) Lookup Result Total (Common From Result + Specific From Result): 124 bits Total (Common + Specific) : 188 bits Type=802.1Q (2B) MLI (2B) LEN (2B) Meta Frame TCI=VLAN0 (2B) Type=Substrate (2B) PAD (nB) CRC (4B) DstAddr (6B) SrcAddr (6B) 31 23 15 7 Buf Handle (32b) MLI(16b) Stats(16b) Rsv (4b) Port (4b) SL (4b) QID(20b) Eth DA[47:16] (32b) VLAN (16b) Eth DA[15:0] (16b) Data format to downstream neighbor

LC: TCAM Lookups on Egress Result (continued) Common across all SL Types (108b): From Result (60b) SL Type(4b) Port(4b) MLI(16b) QID (20b) Stats Index (16b) Local Memory (48b) Eth Hdr SA (48b): tied to physical interface (10 entry tbl in Egress Hdr Format) SL Type Specific Headers P2P-Tunnel Hdr for IPv4 Tunnel without VLANs (224b): Constant (48b): In Egress Hdr Format Eth Hdr EtherType (16b) = 0x0800 IPHdr Version(4b)/HLen(4b)/Tos(8b) (16b): All can be constant? IP Hdr TTL (8b): Initialized to a contant when sending. IP Hdr Proto (8b) = Substrate Calculated (64b): By Egress Hdr Format IP Pkt Len(16b) : Calculated for each packet. IP Hdr ID(16b): should be unique for each packet sent, so shouldn’t be in Result. IP Hdr Checksum (16b): Needs to be calculated, so shouldn’t be in Result. IP Hdr Flags(3b)/FragOff(13b) (16b) : If fragments are never used, these are constants, if it is possible we will have to use them, then this has to be calculated. Either way, shouldn’t be in Result Local Memory (32b) IP Hdr Src Addr (32b) : tied to physical interface (10 entry table in Egress Hdr Format) From Result (80b) Eth Hdr DA (48b) IP Hdr Dst Addr (32b) Lookup Result Total (Common From Result + Specific From Result): 140 bits Total (Common + Specific) : 316 bits Type=IP (2B) MLI (2B) LEN (2B) PAD (nB) CRC (4B) Meta Frame Dst Addr (4B) Src Addr (4B) Ver/HLen/Tos/Len (4B) ID/Flags/FragOff (4B) TTL (1B) Protocol=Substrate (1B) Hdr Cksum (2B) DstAddr (6B) SrcAddr (6B) 31 23 15 7 Buf Handle (32b) MLI(16b) Stats(16b) Rsv (4b) Port (4b) SL (4b) QID(20b) Eth DA[47:16] (32b) Rsv (16b) Eth DA[15:0] (16b) IP DA (32b) Data format to downstream neighbor

LC: TCAM Lookups on Egress Result (continued) Common across all SL Types (108b): From Result (60b) SL Type(4b) Port(4b) MLI(16b) QID (20b) Stats Index (16b) Local Memory (48b) Eth Hdr SA (48b) : tied to physical interface (10 entry tbl in Egress Hdr Format) SL Type Specific Headers P2P-Tunnel Hdr for IPv4 Tunnel with VLANs (256b): Constant (64b): In Egress Hdr Format First Eth Hdr EtherType (16b) = 802.1QS Second Eth Hdr EtherType (16b) = 0x0800 IPHdr Version(4b)/HLen(4b)/Tos(8b) (16b): All can be constant? IP Hdr TTL (8b): Initialized to a contant when sending. IP Hdr Proto (8b) = Substrate Calculated (64b): By Egress Hdr Format IP Pkt Len(16b) : Calculated for each packet. IP Hdr ID(16b): should be unique for each packet sent, so shouldn’t be in Result. IP Hdr Checksum (16b): Needs to be calculated, so shouldn’t be in Result. IP Hdr Flags(3b)/FragOff(13b) (16b) :Frags needed? Local Memory (32b) IP Hdr Src Addr (32b) : tied to physical interface (10 entry table in Egress Hdr Format) From Result (96b) Eth Hdr DA (48b) IP Hdr Dst Addr (32b) VLAN/TCI (16b) Lookup Result Total (Common From Result + Specific From Result): 156 bits (PROBLEM!) Total (Common + Specific) : 348 bits Type=IP (2B) MLI (2B) LEN (2B) PAD (nB) CRC (4B) Meta Frame Dst Addr (4B) Src Addr (4B) Ver/HLen/Tos/Len (4B) ID/Flags/FragOff (4B) TTL (1B) Protocol=Substrate (1B) Hdr Cksum (2B) Type=802.1Q (2B) TCI ≠ VLAN0 (2B) DstAddr (6B) SrcAddr (6B) 31 23 15 7 Buf Handle (32b) MLI(16b) Stats(16b) Rsv (4b) Port (4b) SL (4b) QID(20b) Eth DA[47:16] (32b) VLAN (16b) Eth DA[15:0] (16b) IP DA (32b) Data format to downstream neighbor

LC: TCAM Lookups on Egress Key: VLAN(16b) TxMI(16b) Result Common across all SL Types (108b): From Result (60b) SL Type(4b) Port(4b) MLI(16b) Ignored for Legacy Traffic QID (20b) Stats Index (16b) Local Memory (48b) Eth Hdr SA (48b) : tied to physical interface (10 entry tbl in Egress Hdr Format) SL Type Specific Headers Legacy (IPv4) with VLAN Hdr (96b): IP Header provided by MR! Constant (16b) In Egress Hdr Format EtherType1 (16b) = 802.1Q Calculated (0b) ARP Lookup on NhAddr (Is ARP cache another database in TCAM?) (48b) Eth DA (48b) From Result (32b) EtherType2 (16b) = IPv4 TCI (16b) Lookup Result Total (Common From Result + Specific From Result): 92 bits Total (Common + Specific) : 188 bits Type=IP (2B) PAD (nB) CRC (4B) IP Payload Dst Addr (4B) Src Addr (4B) Ver/HLen/Tos/Len (4B) ID/Flags/FragOff (4B) TTL (1B) Protocol (1B) Hdr Cksum (2B) Type=802.1Q (2B) TCI ≠ VLAN0 (2B) DstAddr (6B) SrcAddr (6B) 31 23 15 7 Buf Handle (32b) MLI(16b) Stats(16b) Rsv (4b) Port (4b) SL (4b) QID(20b) VLAN (16b) ETYpe(16b) Data format to downstream neighbor

LC: TCAM Lookups on Egress Key: VLAN(16b) TxMI(16b) Result Common across all SL Types (108b): From Result (60b) SL Type(4b) Port(4b) MLI(16b) Ignored for Legacy Traffic QID (20b) Stats Index (16b) Local Memory (48b) Eth Hdr SA (48b) : tied to physical interface (10 entry table in Egress Hdr Format) SL Type Specific Headers Legacy (IPv4) without VLAN Hdr (64b): IP Header provided by MR! Constant (0b) Calculated (0b) ARP Lookup on NhAddr (Is ARP cache another database in TCAM?) (48b) Eth DA (48b) From Result (16b) EtherType (16b) = IPv4 Lookup Result Total (Common From Result + Specific From Result): 76 bits Total (Common + Specific) : 156 bits Type=IP (2B) PAD (nB) CRC (4B) IP Payload Dst Addr (4B) Src Addr (4B) Ver/HLen/Tos/Len (4B) ID/Flags/FragOff (4B) TTL (1B) Protocol (1B) Hdr Cksum (2B) DstAddr (6B) SrcAddr (6B) 31 23 15 7 Buf Handle (32b) MLI(16b) Stats(16b) Rsv (4b) Port (4b) SL (4b) QID(20b) Rsv (16b) ETYpe(16b) Data format to downstream neighbor