Download presentation
Presentation is loading. Please wait.
1
SPP Router Plans and Design
John DeHart
2
SPP Versions SPP Version 0: SPP Version 1: SPP Version 2:
What we used for SIGCOMM Paper SPP Version 1: Bare minimum we would need to release something to PlanetLab Users SPP Version 2: What we would REALLY like to release to PlanetLab users.
3
SPP Plans SPP Version 1: SPP Version 2:
1 5-Port NPE (still don’t use NPUB) Switch Blade integration 10GE Tx module integration ARP Egress Traffic monitoring MR Code Options Anything new? Control Local Control Booting NPU Add/Remove Slices MR Control Add/Remove Routes Node Manager GPE 1 vs. multiple NAT? SSH Forwarding PLC integration SPP Version 2: Deal with constraints imposed by switch can send to only one NPU; can receive from only one NPU split processing across NPUs parsing, lookup on one; queueing on other Provide more resources for slice-specific processing Decouple QM schedulers from links collection of largely independent schedulers may use several to send to the same link e.g. separate rate classes (1-10M, M, M) optionally adjust scheduler rates dynamically Provide support for multicast requires addition of next-hop IP address after queueing Enable single slice to operate at 10 Gb/s
4
Objectives for SPP-NPE version 2
Deal with constraints imposed by switch can send to only one NPU; can receive from only one NPU split processing across NPUs parsing, lookup on one; queueing on other Provide more resources for slice-specific processing Decouple QM schedulers from links collection of largely independent schedulers may use several to send to the same link e.g. separate rate classes (1-10M, M, M) optionally adjust scheduler rates dynamically Provide support for multicast requires addition of next-hop IP address after queueing Enable single slice to operate at 10 Gb/s
5
NPE Version 2 Block Diagram
SRAM large sram ring NPUA SRAM GPE RxA (2 ME) Decap, Parse, LookupA, AddShim (8 MEs) TxA (2 ME) Stats (1 ME) SRAM SPI Switch TCAM SPI Switch Switch Blade flow control? Stats (1 ME) SRAM TxB (2 ME) HdrFmt (4 MEs) Queue Manager (4 MEs) LookupB &Copy (2 ME) RxB (2 ME) large sram ring NPUB SRAM SRAM
6
NPE Version 2 Block Diagram
slice#, resultIndx, etc, passed in shim Lookup produces resultIndx, statsIndx SRAM NPUA SRAM GPE RxA (2 ME) Decap, Parse, LookupA, AddShim (8 MEs) TxA (2 ME) Stats (1 ME) SRAM TCAM SPI Switch SPI Switch Switch Blade Stats (1 ME) flow control? SRAM TxB (2 ME) HdrFmt (4 MEs) Queue Manager (4 MEs) LookupB &Copy (2 ME) RxB (2 ME) for unicast, resultIndx replaced by QiD; allowing output side to skip lookup NPUB SRAM SRAM Lookup on <slice#, resultIndx> yields fanout, list of QiDs; copy to queues, adding copy#; (slice#, resultIndx remain in packet buffer) use slice# to select slice to format packet; use resultIndx to get next-hop
7
Questions/Issues Where are exit and entry points for packets sent to and from the GPE for exception processing? Parse (NPUA) and LookupA (NPUA) are where most exceptions are generated: IP Options No Route Etc. HdrFormat (NPUB) is where we do ethernet header processing What needs to be in the SHIM going from NPUA to NPUB? ResultIndex (32b) Exception Bits (12b) StatsIndex (16b) Slice# (12b) ??? Will we support multi-copy in a way similar to the ONL Router? How big can the fanout be? How many QIDs need to be stored with the LookupB Result? Is there some encoding for the QIDs that can take into account support for multicast and the copy#? For example: Multicast QID(20b) Multicast (1b): 1 Copy# (4b) PerMulticast QID(15b): One PerMulticast QID allocated for each Multicast Unicast QID(20b) Unicast (1b): 0 QID (19b) Are there timing/synchronization issues with adding, deleting or changing lookup entries between the two NPUs databases? Do we need flow control between TxA and RxB?
8
NPE Version 2 Block Diagram
SRAM NPUA SRAM GPE RxA (2 ME) Decap, Parse, LookupA, AddShim (8 MEs) TxA (2 ME) Stats (1 ME) SRAM TCAM SPI Switch SPI Switch Switch Blade NPUA: RxA:Same as Version 0 TxA: New 10Gb/s Decap: Same as Version 0 Parse: Same as Version 0 New code options? LookupA: Results will be different from Version 0 AddSim: New Stats (1 ME) flow control? SRAM TxB (2 ME) HdrFmt (4 MEs) Queue Manager (4 MEs) LookupB &Copy (2 ME) RxB (2 ME) NPUB SRAM SRAM
9
NPE Version 2 Block Diagram
NPUB: RxB:Same as Version 0 TxB: New 10Gb/s with L2 Header coming in on input ring? LookupB: New Copy: New, may be able to use some code from ONL Copy QM: New, decoupled from Links HF: New, may use some code from Version 0 SRAM NPUA SRAM GPE RxA (2 ME) Decap, Parse, LookupA, AddShim (8 MEs) TxA (2 ME) Stats (1 ME) SRAM TCAM SPI Switch SPI Switch Switch Blade Stats (1 ME) flow control? SRAM TxB (2 ME) HdrFmt (4 MEs) Queue Manager (4 MEs) LookupB &Copy (2 ME) RxB (2 ME) NPUB SRAM SRAM
10
SPP Version 2 System Architecture
Fast-Path Data GPE Blade GPE Blade LC Ingress Decap Parse Lookup AddShim NPUA 1 10Gb/s OR 10 1Gb/s SPI Switch SPI Switch Switch Blade RTM FIC FIC Copy QM HdrFormat LC Egress NPUB NPE 7010 Blade LC 7010 Blade
11
SPP Version 2 System Architecture
Default Data Path GPE Blade GPE Blade LC Ingress Decap Parse Lookup AddShim NPUA 1 10Gb/s OR 10 1Gb/s SPI Switch SPI Switch Switch Blade RTM FIC FIC Copy QM HdrFormat LC Egress NPUB NPE 7010 Blade LC 7010 Blade
12
SPP Version 2 System Architecture
Exception Data Path GPE Blade GPE Blade LC Ingress Decap Parse Lookup AddShim NPUA 1 10Gb/s OR 10 1Gb/s SPI Switch SPI Switch Switch Blade RTM FIC FIC Copy QM HdrFormat LC Egress NPUB NPE 7010 Blade LC 7010 Blade
13
PlanetLab NPE Input Frame from LC
Ethernet Header: DstAddr: MAC address of NPE SrcAddr: MAC address of LC VLAN: One VLAN per MR (MR == Slice) IP Header: Dst Addr: IP address of this node How many IP Addresses can a NODE have? Src Addr: IP address of previous hop Protocol: UDP UDP Header: Dst Port: Identifies input tunnel Src Port: with IP Src Addr identifies sending entity DstAddr (6B) SrcAddr (6B) Ethernet Header Type=802.1Q (2B) VLAN (2B) Type=IP (2B) Ver/HLen/Tos/Len (4B) ID/Flags/FragOff (4B) TTL (1B) Protocol = UDP (1B) Hdr Cksum (2B) Src Addr (4B) Header IP Dst Addr (4B) IP Options (0-40B) Src Port (2B) UDP Header Dst Port (2B) UDP length (2B) UDP checksum (2B) UDP Payload (MN Packet) PAD (nB) Ethernet Trailer CRC (4B) Indicates 8-Byte Boundaries Assuming no IP Options
14
SPP Version2 NPUA to NPUB Frame
SHIM (16B) ResultIndex (32b) Exception Bits (12b) StatsIndex (16b) Slice# (12b) IP Header: Dst Addr: IP address of this node How many IP Addresses can a NODE have? Src Addr: IP address of previous hop Protocol: UDP UDP Header: Dst Port: Identifies input tunnel Src Port: with IP Src Addr identifies sending entity SHIM (16B) Type=IP (2B) Ver/HLen/Tos/Len (4B) ID/Flags/FragOff (4B) TTL (1B) Protocol = UDP (1B) Hdr Cksum (2B) Src Addr (4B) Header IP Dst Addr (4B) IP Options (0-40B) Src Port (2B) UDP Header Dst Port (2B) UDP length (2B) UDP checksum (2B) UDP Payload (MN Packet) PAD (nB) Ethernet Trailer CRC (4B) Indicates 8-Byte Boundaries Assuming no IP Options
15
NPE Version 2 Block Diagram
NPUA SRAM Sram2NN (1 ME) SRAM GPE RxA (2 ME) Decap, Parse, LookupA, AddShim (8 MEs) TxA (2 ME) StatsA (1 ME) SRAM FL MgrA (1 ME) TCAM SPI Switch SPI Switch Switch Blade flow control? FL MgrB (1 ME) StatsB (1 ME) SRAM SRAM TxB (2 ME) HdrFmt (4 MEs) Queue Manager (4 MEs) LookupB &Copy (2 ME) RxB (2 ME) NPUB has 17 MEs currently spec’ed Scr2NN (1 ME) SRAM SRAM NPUB
16
SPP V2: MR Specific Code What about LookupA and LookupB?
Where does the MR Specific Code reside in V2: Parse HdrFormat What about LookupA and LookupB? Lookup is a “service” provided to the MRs by the Substrate. No MR specific code needed in LookupA or LookupB What about SideA AddShim? The Exception bits that go in the shim are MR Specific but they should be passed to AddShim and it will write them into the Shim. No MR Specific code needed in AddShim. What about SideB Copy? Is there anything MR specific about setting up multiple copies of a packet? There shouldn’t be. We will have the Copy block allocate a new hdr buffer descriptor and link it to the existing data buffer descriptor and take care of reference counts. The actual building of the new header(s) for the copies will be left to HF. No MR Specific code needed in Copy.
17
SPP V2: Hdr Format Lots of changes for HF:
Move behind QM More general: Support multiple source IP Addresses General support for Tunnels Eventually different kinds of tunnels (UDP/IP, GRE, …)? Support for Multicast Dealing with header buffer descriptors Reading Fanout table Substrate portion of HF will need to do Decap type table lookup Slice ID (Code Option, Slice Memory Pointer, Slice Memory Size) HF gets a buffer descriptor from the QM The Substrate portion of HF must determine: Code Option (8b) Slice ID (12b) Location of Next Hop information (20b - 32b) LD vs. FWD? Stats Index (16b) Should HF do this of QM? The MR portion of HF must determine: Exception bits (16b) Lets put all of the above data in the Buf Desc LookupB/Copy will need to write it there based on what comes across from SideA in the shim
18
SPP V2 SideB SRAM Buffer Descriptor
Buffer_Next (32b) LW0 Buffer_Size (16b) Offset (16b) LW1 Packet_Size (16b) Free_list 0000 (4b) Reserved (4b) Ref_Cnt (8b) LW2 Stats Index (16b) Reserved (4b) Slice ID(xsid)(12b) LW3 NextHop Data Ptr (32b) LW4 Reserved (16b) MR Exception Bits (16b) LW5 MR Bits (32b) LW6 Packet_Next (32b) LW7 Still working on this
19
SPP V2: Result We need to be much more general in our support for Tunnels, Interfaces, MetaInterfaces, and Next Hops. SideB Result: Interface IP SAddr (32b) Eth MAC DAddr (48b) (LC, GPE1, GPE2, …, GPEn) SchedulerId (8b): which QM should handle pkt TxMI: IP Sport (16b) TxNextHop: IP DAddr (32b) IP DPort (16b)
20
Data Areas Where are the tables and what data is transmitted from SideA to SideB? SideA Tables Shim between SideA and SideB SideB Tables
21
Pkt Processing Data and Tables
SideA: MR/Slice Table: Generated by Control Used by: Substrate Decap to retrieve a MR/Slice’s parameters Indexed by SliceId == VLAN Contains: Code option Slice Memory ptr Slice Memory size ??? TCAM: LookupA Key: Result:
22
Data Areas Shim between SideA and SideB
Written to DRAM Buffer to be sent from SideA to SideB Contains: resultIndex (32b): Generated by Control Result of TCAM lookup on SideA Translates into an SRAM Address on SideB exceptionBits (16b) Generated by SideA Parse/Lookup Used by: SideB HF statsIndex (16b) SideA Lookup/AddShim to increment counters SideB Lookup/Copy to increment PreQ Cntrs (or perhaps SideA is the PreQ cntrs) SideB HF or QM to increment PostQ Cntrs sliceId (12b) Result of Decap read of Ethernet hdr (VLAN) ??? codeOption (12b) Slice Memory Ptr (32b)
23
Data Areas SideB Data Buffer Descriptor Hdr Buffer Descriptor
Used for multi-copy packets SPP V2 may require Tx to handle multi-buffer packets. It is unclear if we can cleanly do that same thing that we do with ONL where HF passes the Ethernet header to Tx. We may also need to have support for MR specific per copy data Results Table Generated by Control Used by: LookupB/Copy HF Should HF get its per copy info from here as well. Contains: Fanout (if fanout is > 1 we can overload some of the following fields with a pointer into a Fanout table) QID InterfaceId TxMI Id Probably doesn’t help to make it an index into a table for UDP Tunnels since UDP Port is 16 bits But for tunnels other than UDP tunnels it may help? TX NextHop Id Index into a table of Tunnel Next Hops Fanout Table QID[Fanout] Tx Next Hop ID[Fanout] Implementation Choices: One contiguous block of memory Fixed size or variable sized Chained with one set of values per entry Chained with N (N=4?) sets of values per entry
24
NPE Version 2 Block Diagram
NPUA SRAM Sram2NN (1 ME) SRAM GPE RxA (2 ME) Decap, Parse, LookupA, AddShim (8 MEs) TxA (2 ME) SHIM: StatsA (1 ME) SRAM FL MgrA (1 ME) TCAM resultIndex (32b) statsIndex (16b) SPI Switch SPI Switch MR Bits (32b) Switch Blade flow control? exceptions (16b) FL MgrB (1 ME) StatsB (1 ME) SRAM sliceId (16b) SRAM TxB (2 ME) HdrFmt (4 MEs) Queue Manager (4 MEs) LookupB &Copy (2 ME) RxB (2 ME) Scr2NN (1 ME) SRAM SRAM NPUB
25
NPE Version 2 Block Diagram
SRAM NPUA SRAM Sched ID (8b) Interface Entry: IP SAddr (32b) Eth DA (48b) GPE RxA (2 ME) Decap, Parse, LookupA, AddShim (8 MEs) TxA (2 ME) entry0 entry1 … entryN Results Tbl: resultIndex (32b) statsIndex (16b) exceptions (16b) sliceId (16b) MR Bits (32b) SHIM: InterfaceId (8b) TxMI Id (16b) Tx NH Id (16b) Results Entry: Fanout (8b) QID (20b) Stats (1 ME) SRAM TCAM SPI Switch SPI Switch Switch Blade flow control? Stats (1 ME) SRAM SRAM NH Entry: IP DAddr (32b) IP DPort (16) TxB (2 ME) HdrFmt (4 MEs) Queue Manager (4 MEs) LookupB &Copy (2 ME) RxB (2 ME) NPUB SRAM SRAM
26
Extra Slides The rest of the slides are old or for extra information
27
NPE Version 2 Block Diagram
SRAM NPUA SRAM GPE RxA (2 ME) Decap, Parse, LookupA, AddShim (8 MEs) TxA (2 ME) entry0 entry1 … entryN Results Tbl: resultIndex (32b) statsIndex (16b) exceptions (16b) sliceId (16b) codeOpt (8b) SHIM: InterfaceId (8b) TxMI Id (16b) Tx NH Id (16b) Results Entry: Fanout (8b) QID (20b) Stats (1 ME) SRAM TCAM SPI Switch SPI Switch Switch Blade flow control? Stats (1 ME) SRAM SRAM TxB (2 ME) HdrFmt (4 MEs) Queue Manager (4 MEs) LookupB &Copy (2 ME) RxB (2 ME) NPUB SRAM SRAM
28
SPP V1 Lookup Result TCAM Status Bits Stored in TCAM
Lookup Result (128b): TCAM Status Bits Port (4b) QID(20b) DA(8b) Tx IP DAddr (32b) Cntr Index (16b) D 1b Reserved (11b) Tx UDP SPort(16b) Tx UDP DPort (16b) O N e H I t M L
29
ONL SRAM Buffer Descriptor
Still working on this Buffer_Next (32b) LW0 Buffer_Size (16b) Offset (16b) LW1 Packet_Size (16b) Free_list 0000 (4b) Reserved (4b) Ref_Cnt (8b) LW2 Stats Index (16b) MAC DAddr_47_32 (16b) LW3 MAC DAddr_31_00 (32b) LW4 EtherType (16b) Reserved (16b) LW5 Reserved (32b) LW6 Packet_Next (32b) LW7 1 Written by Rx, Added to by Copy Decremented by Freelist Mgr Ref_Cnt (8b) Written by Freelist Mgr Written by Rx Written by Copy Written by Rx and Plugins Written by QM
30
Statistics LC provides counts on UDP ports
Matching filter gives slice-specific stats-index which is updated for each packet handled by filter Pre-queue/post-queue counters for each QiD Memory space/bandwidth issues off-chip SRAMs support 200M 32 bit reads & writes per sec can have 16M 80 byte packets/sec, so one SRAM supports 12 reads/12 writes per packet for 250 byte packets, 36 reads/36 writes per packet updating stats for QiD takes 4 reads/4 writes
31
SPP Version 0 NPE: Rx (2ME) Substr Decap (1ME) Parse (1ME) Lookup
Header Format (1ME) QM (2ME) Tx (1ME) LC Ingress and Egress: QM Phy Int Rx Ideally no need to look at the actual code, if I’ve done my job right; this is a block design review, not a code review Encountered some performance discrepancies, and the problems they illustrate are likely to come up again in the future. Would like to spend some time illustrating performance issues and demonstrating common code speed techniques Key Extract Lookup Hdr Format Switch Tx S W I T C H Phy Int Tx QM Hdr Format Lookup Key Extract Switch Rx
32
NPE Version 2 Block Diagram
SRAM NPUA SRAM GPE RxA (2 ME) Decap, Parse, LookupA, AddShim (8 MEs) TxA (2 ME) SHIM: Stats (1 ME) SRAM TCAM resultIndex (32b) statsIndex (16b) SPI Switch SPI Switch MR Bits (32b) Switch Blade flow control? exceptions (16b) Stats (1 ME) SRAM sliceId (16b) SRAM TxB (2 ME) HdrFmt (4 MEs) Queue Manager (4 MEs) LookupB &Copy (2 ME) RxB (2 ME) NPUB SRAM SRAM
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.