Presentation is loading. Please wait.

Presentation is loading. Please wait.

SPP Version 1 Router Plans and Design

Similar presentations


Presentation on theme: "SPP Version 1 Router Plans and Design"— Presentation transcript:

1 SPP Version 1 Router Plans and Design
John DeHart

2 SPP Versions SPP Version 0: SPP Version 1: SPP Version 2:
What we used for SIGCOMM Paper SPP Version 1: Bare minimum we would need to release something to PlanetLab Users SPP Version 2: What we would REALLY like to release to PlanetLab users.

3 November HW Test First test in HW should happen in Nov. 2007 Plan:
Retry SPP V0 demo on Dev. Chassis with new boards Finish all three projects in Simulation: LCI: Currently missing ICMP and NAT LCE: Currently missing NAT KE and Lookup are nearing completion HF MAC Address lookup fine tuning Flow stats: FS2 nearing completion, needs archive thread and testing testing testing Initialization scripts need work NPE: One more memory update needed for Substrate Encap (DRAM write) Working on initialization scripts. Testing Test all three in simulation including initialization scripts Convert *.ind initialization scripts to ‘cmd’ utility HW initialization scripts Review *.ind and ‘cmd’ scripts with Fred and control group TCAM utility for SPP V1? Plan A: Use Jonathon’s test utility Plan B: Packet generation for HW test? Use Traffic Generators? Test all in HW

4 November HW Test Status: Next Steps
TCAM Utilities for all three projects seem to be working LCI: Packets going through LCI and arriving at NPE NPE Project files were not configured correctly Substrate Encap not properly configured/initialized for Chassis MAC addresses Mike and JohnD will work this out today LCE: As soon as we get packets through the NPE we can test the LCE Next Steps NPE config/init (JDD and MLW) Orchid (MLW) V1 Testing (JDD) This results in November HW Test milestone Orchid Integration (MLW and JDD) Performance testing (JDD and MLW) NAT (DMZ) ICMP pkt handling (JDD) Control Integration (JDD, FK, etc) V2

5 QM Scheduler Performance

6 QM Scheduler Performance

7 QM Scheduler Performance

8 QM Scheduler Performance

9 Notes For NPE look at putting a limit on the number of outstanding buffers a Slice has at a time. Add a counter to the Substrate Decap VLAN/Slice table. When SD gets a packet, increment the counter for that Slice When a buffer is freed have the generic buf_free code decrement the counter for that slice. This will probably require recording the Slice ID in the buffer descriptor and having the buf_free code read the descriptor. Look at using all 10 external interfaces on LC Each interface that is used will be connected to different ports on the same router. Thus the SPP node does not have to worry about participating in routing protocols in V1. Use both fabric interfaces on GPEs V1 will use just one of them The one that is used will be associated with 1 external interface. The interfaces from different GPEs may or may not be associated with different external interfaces. There may be cases where GPEs share an IP Address There may be cases where GPEs have different IP Addresses We need to support both cases Check on how we handle fragmentation Add Ring specs to block diagram Schedule for upcoming meetings: 8/14: Charlie’s SIGCOMM talk and NAT 8/21: Plugin Framework (Shakir) 8/28: Flow Stats (JMM)

10 SPP V1 Plans Main focus today will be on the LC: SPP Version 1:
1 5-Port NPE (still don’t use NPUB) Support Multiple External IP Addresses Switch Blade integration 10GE Tx module integration ARP: Probably not needed in V1 NAT: Flow Stats: Egress Traffic monitoring MR Code Options Anything new? Control Local Control Booting NPU Add/Remove Slices MR Control Add/Remove Routes Node Manager GPE Multiple GPEs NAT SSH Forwarding PLC integration Main focus today will be on the LC: Block/ME design Lookups Flow stats ARP

11 Cycle Budget (min eth packets)
To hit 5 Gb rate: 76B per min IPv4 packet (64 min Eth + 12B IFS) 1.4Ghz clock rate 5 Gb/sec * 1B/8b * packet/76B = 8.22 Mp/sec 1.4Gcycle/sec * 1 sec/ 8.22 Mp = cycles per packet compute budget: 170 cycles latency budget: (threads*170) 8 threads: 1360 cycles To hit 10 Gb rate: 10 Gb/sec * 1B/8b * packet/76B = Mp/sec 1.4Gcycle/sec * 1 sec/ Mp = cycles per packet compute budget: 85 cycles latency budget: (threads*85) 8 threads: 680 cycles

12 Cycle Budget (IPv4 MN packets)
To hit 5 Gb rate: 90B per min IPv4 packet (78 min IPv4MN + 12B IFS) 1.4Ghz clock rate 5 Gb/sec * 1B/8b * packet/90B = 6.94 Mp/sec 1.4Gcycle/sec * 1 sec/ 6.94 Mp = cycles per packet compute budget: 201 cycles latency budget: (threads*201) 8 threads: 1608 cycles To support 6.94 M pkts/sec we can Read 28 Words and Write 28 Words per pkt per SRAM Bank (200M/6.94M) = To hit 10 Gb rate: 10 Gb/sec * 1B/8b * packet/90B = Mp/sec 1.4Gcycle/sec * 1 sec/ Mp = cycles per packet compute budget: 100 cycles latency budget: (threads*100) 8 threads: 800 cycles To support M pkts/sec we can Read 14 Words and Write 14 Words per pkt per SRAM Bank (200M/13.88M) =

13 Cycle Budget (Average Pkts)
To hit 5 Gb rate: 218B per min IPv4 packet (200 avg IPv4MN + 12B IFS) 1.4Ghz clock rate 5 Gb/sec * 1B/8b * packet/218B = 2.87 Mp/sec 1.4Gcycle/sec * 1 sec/ 2.87 Mp = cycles per packet compute budget: 487 cycles latency budget: (threads*487) 8 threads: 3896 cycles To support 2.87 M pkts/sec we can Read 69 Words and Write 69 Words per pkt per SRAM Bank (200M/2.87M) = To hit 10 Gb rate: 10 Gb/sec * 1B/8b * packet/218B = 5.74 Mp/sec 1.4Gcycle/sec * 1 sec/ 5.74 Mp = cycles per packet compute budget: 243 cycles latency budget: (threads*243) 8 threads: 1944 cycles To support 5.74 M pkts/sec we can Read 34 Words and Write 34 Words per pkt per SRAM Bank (200M/5.74M) =

14 SPP V1 ARP Notes Statically configure the Ethernet Addr of next hop(s). Don’t need ARP in V1. LC uses scheme similar to ONL LCE Lookup result contains Next Hop IP or NH Ethernet Addr. If NH Ethernet Addr is present than update packet and send If NH IP Addr present instead of NH Ethernet Addr then send to XScale Need to define shim/descriptor for LCE to XScale Physical Interface NH IP Address XScale will send ARP Broadcast on physical interface LCI receives Unicast ARP Response from RTM Port Sends to XScale indicating which physical interface recv’d on. XScale updates filter table If XScale has waiting packet, send to data path. LCI receives ARP Broadcast from RTM port XScale processes and sends ARP Response if needed. ARP Entry Aging.

15 SPP V1 ARP Interfaces LCI to XScale Interface LCE to XScale Interface
LCI just needs to detect EtherType Field of ARP Should be able to do this in Key Extract. Code already there to detect ARP and send to XScale. We may have to adjust for shim/descriptor to communicate additional info to XScale LCE to XScale Interface Needs to be Post Lookup and Pre QM. Needs to update the Shim/Descriptor to send info to the XScale. Hdr Format is probably the best place for this. XScale to LCE Interface Should be Queued to keep Port rate control sane. Does it need to be a separate scratch ring or can it go directly into the QM input ring(s)?

16 SPP V1 NAT Notes We want to support existing PlanetLab applications As Is. Users should not have to make code changes to get there applications to run on our GPEs. Multiple GPEs competing for TCP/UDP Port space and ICMP ID space on physical interfaces. NAT needed for Port translation NAT NOT needed for IP Address translation It would be good if we could: Avoid Packet dropping while awaiting NAT resolution. Maintain Packet Order. NAT translation to be done for: TCP and UDP Src Port for outgoing pkts (LC Egress) Dst Port for incoming pkts (LC Ingress) ICMP Lookup needs to include an ICMP type field to differentiate between Echo Request and Echo Reply ID for outgoing Echo Request (LC Egress) ID for incoming Echo Reply (LC Ingress) Other ICMP messages? No Application Level Gateways needed for our system: Examples of things that need them in “normal” networks FTP SNMP DNS

17 SPP V1 NAT Notes Destined for NPE Destined for GPE Destined for CP
Ingress Traffic: Destined for NPE Preconfigured entries in Lookup table Should be no need for NAT Destined for GPE Slice on GPE registers that it is going to listen on a particular IP Addr, Protocol, Port This will cause a preconfigured entry in Lookup Table Result of Egress traffic from GPE Traffic going through Egress initiated by a GPE causes Xscale/Control to install filter(s) in Ingress. More details in Egress discussion Destined for CP Preconfigured Do we want a preconfigured “default” for ICMP Echo Request? What about ICMP errors (Destination Unreachable)? No default destination for Ingress Lookup Misses. They are sent to the Ingress XScale which in conjunction with the Egress XScale sends an ICMP Error pkt out to the sender.

18 SPP V1 NAT Notes Egress Traffic: Slice on GPE initiates a new flow
From NPE Preconfigured entries in Lookup table Should be no need for NAT From GPE Preconfigured entries in Lookup table: ??? Slice on GPE initiates a new flow Examples: Slice on GPE opens TCP connection to another node. Slice on GPE pings another node Slice on GPE initiates a UDP flow with a bind Slice on GPE initiates a UDP flow without a bind When the first packet of one of these types of flows arrives at the LC Egress we may not have a filter entry that matches it. Anything that does not have a match gets sent to the XScale for resolution This may cause drops and re-ordering in V1 but we’ll live with this for now and try to deal with it in V2. In V2 we will look at the possibility of adding Support in GPE Kernel and/or libc to “catch” calls to bind, send, etc so we can configure entries in the LC for NAT. Support in LC Data path for queuing packets awaiting NAT resolution. Other solutions… From CP

19 SPP V1 NAT Notes LC Ingress Lookup Key (72b):
Interface (8b) IP DAddr (32b) Protocol (8b) TCP UDP ICMP Etc. DPort/Identifier (16b) DPort for TCP and UDP Identifier for ICMP Echo Request/Reply Type (8b) Primarily for use with ICMP to distinguish between ICMP Echo Request and Reply For TCP and UDP should be a Don’t Care. LC Ingress Lookup Result (72b): VLAN (12b) Stats Index (16b) MAC Addr (8b) QID (20b) QM_ID (2b) Scheduler (3b) QID(15b) Translated DPort/Identifier (16b)

20 SPP V1 NAT Notes LC Egress Lookup Key (64b):
IP SAddr (32b) Protocol (8b) TCP UDP ICMP Etc. SPort/Identifier (16b) SPort for TCP and UDP Identifier for ICMP Echo Request/Reply Type (8b) Primarily for use with ICMP to distinguish between ICMP Echo Request and Reply For TCP and UDP should be a Don’t Care. LC Egress Lookup Result (64b): VLAN (12b) Stats Index (16b) QID (20b) QM_ID (2b) Scheduler (3b) QID(15b) Translated SPort/Identifier (16b)

21 SPP V1 NAT Notes ICMP Messages Echo Request Echo Reply Errors Ingress:
Contains the IP Hdr of original packet. Presumably the original packet was sent by a GPE and hence should have an entry in the Egress Lookup table. Egress: Being sent out by GPE, NPE or CP. Treat it like an Echo Request? Translation of embedded IP hdr Ports?

22 ICMP – RFC 792 Purposes of ICMP (Protocol == 1) ICMP Message
IP Hdr ICMP Hdr Data 20B 4B+ Variable Type Code Checksum Optional Data ICMP Message Purposes of ICMP (Protocol == 1) Error reporting from routers or destination host to source host. ICMP data includes header and first 64 bytes of data from the IP packet that caused the error Only fragment 0 of fragmented messages generate ICMP error messages Control messages between routers/hosts.

23 ICMP Echo Request Type = 8 Reply Type = 0 ICMP Message Type = 0/8
Code = 0 Checksum Identifier Sequence Number Optional Data ICMP Message Request Type = 8 Reply Type = 0

24 ICMP Message Types Type Field Code Message Echo Reply 3 - Destination Unreachable (Error) Network Unreachable 1 Host Unreachable 2 Protocol Unreachable Port Unreachable 4 Fragmentation needed and DF set 5 Source route failed 6 Destination network unknown 7 Destination host unknown 8 Source host isolated 9 Communication with destination network administratively prohibited 10 Communication with destination host administratively prohibited 11 Network unreachable for type of service 12 Host unreachable for type of service Source Quench Report congestion to original host Redirect – request host use different route Redirect for network (obsolete) Redirect for host Redirect for type-of-service and network Redirect for type-of-service and host Type Field Code Message 8 Echo Request 9 Router Advertisement 10 Router Solicitation 11 - Time Exceeded for a Datagram Time-to-live equals 0 during transit (traceroute) 1 Time-to-live equals 0 during reassembly Timeout occurred while waiting for fragments 12 Parameter Problem – any other error condition (incorrect option IP Header bad Required option missing 13 Timestamp Request 14 Timestamp Reply 15 Information Request (obsolete) 16 Information Reply (obsolete) 17 Address Mask Request 18 Address Mask Reply From Comer, “Internetworking with TCP/IP”, volume 1, 4th edition, 2000.

25 SPP V1 LC Ingress(1x10Gb/s and 10x1Gb/s)
NAT pkt misses cause an ICMP Error pkt to be generated by the XScale which is sent to the Egress side and put into the QM Input Rings there. XScale NAT Miss Scratch Ring SCR R B U F R T M M S F Rx1 Rx2 Key Extract Lookup Hdr Format NN NN NN NN TCAM NN S W I T C H T B U F QM0 M S F 1x10G Tx2 1x10G Tx1 Scr2NN SCR SCR Port Splitter NN NN SCR QM1 SCR Stats (1 ME) SRAM1 SRAM3 SCR SRAM2

26 SPP V1 LC Egress with 1x10Gb/s Tx
XScale NAT Miss Scratch Ring SCR S W I T C H R B U F M S F Rx1 Rx2 Key Extract Lookup Hdr Format NN NN NN NN TCAM XScale Invalidate FlowStat Entry Ring NN SCR T B U F QM0 R T M M S F 1x10G Tx2 1x10G Tx1 Flow Stats1 SCR SCR Port Splitter NN NN SCR QM1 SCR NAT Pkt return SCR Stats (1 ME) SCR SRAM3 SRAM1 Flow Stats2 XScale SRAM2

27 SPP V1 LC Egress with 10x1Gb/s Tx
XScale NAT Miss Scratch Ring SCR S W I T C H R B U F M S F Rx1 Rx2 Key Extract Lookup Hdr Format NN NN NN NN Invalidate FlowStat Entry Ring XScale TCAM NN SRAM T B U F 5x1G Tx1 (P0-P4) QM0 R T M M S F SCR Flow Stats1 SCR SCR Port Splitter 5x1G Tx2 (P5-P9) SCR SCR QM1 SCR SRAM NAT Pkt return SRAM1 Stats (1 ME) Flow Stats2 XScale SCR SRAM3 SRAM2

28 Block Interfaces

29 Notes on Schedulers and Interfaces
For V1, lets make the leap and go to having 4 QMs. This will give us 20 Schedulers For V2, we will hope to have 6 QMs This will give us 30 Schedulers A lookup result will designate a scheduler but NOT an interface Sched(5b) QM_ID(2b) Upper limit of 4 QM MEs supported If we want more we should have QM_ID(3b), PerQMSched(3b), PreSchedQID(14b) PerQMSched(3b) Each QM currently only supports 5 schedulers. PerSchedQID(15b) A scheduler (QM Dequeue) will be configured with an associated interface. Dequeue reads its rate from SRAM periodically. Rate is 16 bits, stored in a 32 bit SRAM word We can use the other 16 bits to configure the associated physical interface. Of course 16 bits is more than we need. This will allow us to configure and re-configure the associated interface for each scheduler. This will also allow us to configure the case where we use the Switch Blade and need all schedulers to send to interface 0. Thus there should be nothing special that needs to be done by following blocks SCR2NN in LC_Ingress FlowStats in LC_Egress

30 Notes on Schedulers and Interfaces
Decoupling the scheduler and interface has implications for Header Format in each of the three projects LCI: needs to know the Dst MAC Address for frame (i.e. what board it is going to) NPE: needs to know what Src IP Addr to put on outgoing Tunnel Pkt. LCE: needs to know what Src and Dst MAC to put on outgoing Ethernet Frame For LCI and LCE the key is to provide enough schedulers so we can handle the load For V1, the schedulers should be configured at boot time Then we can also configure the HFs at boot time so they know which interface a scheduler is associated with. Schedulers will not be dynamically changed from one interface to another in V1 For V2, we should move the NPE and LCE HFs to be after the QMs We already planned to do this for the NPE, might as well do it for LCE also. LCI gets no help from moving HF after QM LCI remains statically configured. All the information that the HFs will need to re-write the frame and pkt headers will have to be written to the buffer descriptor. The schedulers in the QMs will output the interface that the frame is destined for so the HF will have that information provided to it. Then we can be more dynamic with the schedulers.

31 SPP V1 LC Ingress(1x10Gb/s and 10x1Gb/s)
XScale NAT Miss Scratch Ring SCR R B U F R T M M S F Rx1 Rx2 Key Extract Lookup Hdr Format NN NN NN NN TCAM NN S W I T C H T B U F Scr2NN QM0 SCR Port Splitter SCR M S F 1x10G Tx2 1x10G Tx1 QM1 SCR NN NN QM2 SCR QM3 SCR Stats (1 ME) SRAM1 SRAM3 SCR SRAM2

32 SPP V1 LC Ingress(1x10Gb/s and 10x1Gb/s)
XScale Buf Handle(24b) Intf (4b) Reserved (12b) Eth. Frame Len (16b) Rx Flags (8b) NAT Miss Scratch Ring SCR R B U F R T M M S F Rx1 Rx2 Key Extract Lookup Hdr Format NN NN NN NN TCAM NN S W I T C H T B U F Scr2NN QM0 SCR Port Splitter SCR M S F 1x10G Tx2 1x10G Tx1 QM1 SCR NN NN QM2 SCR QM3 SCR Stats (1 ME) SRAM1 SRAM3 SCR SRAM2

33 Notes on Frame vs. Pkt Lengths
RX reports Ethernet Frame Length KE passes along IP Pkt length and Ethernet Hdr Length HF uses Ethernet Hdr Length and Buffer Offset to find start of IP Pkt so it can put on new ethernet header. HF passes along Ethernet Frame Length TX needs Ethernet Frame Length which it gets from buffer descriptor Buffer Size QM Dequeue gets length from buffer descriptor Thus it will get Ethernet Frame Length just like TX QM Enqueue gets a length from input ring which must agree with what QM Dequeue gets from buffer descriptor. Thus: HF must pass Ethernet Frame length in output ring AND it must write it to buffer descriptor. QM Link rates should include IFS, etc.

34 SPP V1 LC Ingress(1x10Gb/s and 10x1Gb/s)
Reserved (8b) Buf Handle(24b) IP Pkt Length (16b) Eth Hdr Len (8b) Rsv (4b) Intf (4b) XScale Lookup Key IP DAddr (32b) Protocol (8b) UDP DPort (16b) Type (8b) NAT Miss Scratch Ring IP Hdr 1st Word (32b) SCR R B U F IP Hdr 2nd Word (32b) R T M M S F Rx1 Rx2 Key Extract Lookup Hdr Format NN NN NN NN TCAM Buf Handle(24b) Intf (4b) Reserved (12b) Eth. Frame Len (16b) Rx Flags (8b) NN S W I T C H T B U F Scr2NN QM0 SCR Port Splitter SCR M S F 1x10G Tx2 1x10G Tx1 QM1 SCR NN NN QM2 SCR QM3 SCR Stats (1 ME) SRAM1 SRAM3 SCR SRAM2

35 Changes for Key Extract
Lookup Key Changes: Old Lookup Key (64b): SL Type (4b) Port (4b) IP DAddr (32b) IP Proto (8b) UDP DPort (16b) New Lookup Key (72b): Reserved (4b) Interface (4b) ICMP Type (8b) Move Ethernet Hdr Length field

36 SPP V1 LC Ingress(1x10Gb/s and 10x1Gb/s)
Flags (8b) Buf Handle(24b) IP Pkt Length (16b) Eth Hdr Len (8b) Reserved (8b) H XScale Lookup Result rsv 4b VLAN (12b) Stats Index (16b) Translated DPort/ID (16b) PerSchedQID (11b) Sch 3b QM 2b NAT Miss Scratch Ring IP Hdr 1st Word (32b) SCR R B U F IP Hdr 2nd Word (32b) R T M M S F Rx1 Rx2 Key Extract Lookup Hdr Format NN NN NN NN Reserved (8b) Buf Handle(24b) TCAM IP Pkt Length (16b) Eth Hdr Len (8b) Rsv (4b) Intf (4b) NN Lookup Key IP DAddr (32b) Protocol (8b) UDP DPort (16b) Type (8b) S W I T C H T B U F Scr2NN QM0 SCR Port Splitter IP Hdr 1st Word (32b) SCR M S F 1x10G Tx2 1x10G Tx1 IP Hdr 2nd Word (32b) QM1 SCR NN NN QM2 SCR QM3 SCR Stats (1 ME) SRAM1 SRAM3 SCR SRAM2

37 Changes for Lookup Lookup Key Changes: Lookup Result Changes:
See KE notes Lookup Result Changes: Add Translated DPort/ID Move MAC DAddr Move Vlan Remove Port field QID(20b) split into QM_ID (2b) Sched(3b) PerSchedQID(15b) The QM uses the full 20 bits as its QID. Change in size of lookup result Move Eth Hdr Len. Add Flags (H=HIT)

38 SPP V1 LC Ingress(1x10Gb/s and 10x1Gb/s)
XScale NAT MISS! NAT Miss Scratch Ring Reserved (8b) Buf Handle(24b) SCR R B U F IP Pkt Length (16b) Eth Hdr Len (8b) Reserved (8b) R T M M S F Rx1 Rx2 Key Extract Lookup Hdr Format NN NN NN NN Flags (8b) Buf Handle(24b) TCAM IP Pkt Length (16b) Eth Hdr Len (8b) Reserved (8b) NN rsv 4b VLAN (12b) Stats Index (16b) Translated DPort/ID (16b) PerSchedQID (11b) Sch 3b QM 2b S W I T C H T B U F Scr2NN QM0 IP Hdr 1st Word (32b) SCR Port Splitter SCR M S F 1x10G Tx2 1x10G Tx1 IP Hdr 2nd Word (32b) QM1 SCR NN NN QM2 SCR QM3 SCR Stats (1 ME) SRAM1 SRAM3 SCR SRAM2

39 SPP V1 LC Ingress(1x10Gb/s and 10x1Gb/s)
Flags (8b) Buf Handle(24b) IP Pkt Length (16b) Eth Hdr Len (8b) Reserved (8b) XScale NAT HIT! rsv 4b VLAN (12b) PerSchedQID (11b) Sch 3b QM 2b Translated DPort/ID (16b) Stats Index (16b) NAT Miss Scratch Ring IP Hdr 1st Word (32b) SCR R B U F IP Hdr 2nd Word (32b) R T M M S F Rx1 Rx2 Key Extract Lookup Hdr Format NN NN NN NN Reserved (8b) Buffer Handle(24b) TCAM Reserved (16b) PerSchedQID (11b) Sch 3b QM 2b NN Frame Length (16b) Stats Index (16b) S W I T C H T B U F Scr2NN QM0 SCR Port Splitter SCR M S F 1x10G Tx2 1x10G Tx1 QM1 SCR NN NN QM2 SCR QM3 SCR Stats (1 ME) SRAM1 SRAM3 SCR SRAM2

40 Changes for Header Format
Lookup Result Changes: See Lookup notes Process NAT Miss If (H==0) then NAT Miss Send to XScale Ingress/Egress XScales will generate ICMP Error msg. Move Eth Hdr Len in input ring No Interface field to pass along Perform DPort/ID translation recalculate IP Hdr checksum. Calculate incremental Transport (TCP and UDP) checksum Check for arriving UDP checksum of 0 which implies that packet is not using the optional UDP checksum

41 SPP V1 LC Ingress(1x10Gb/s and 10x1Gb/s)
XScale NAT Miss Scratch Ring SCR R B U F R T M M S F Rx1 Rx2 Key Extract Lookup Hdr Format NN NN NN NN Reserved (8b) Buffer Handle(24b) TCAM Reserved (16b) PerSchedQID (11b) Sch 3b QM 2b NN Frame Length (16b) Stats Index (16b) S W I T C H T B U F QM0 Scr2NN SCR Port Splitter SCR M S F 1x10G Tx2 1x10G Tx1 QM1 SCR NN NN QM2 SCR QM3 SCR Stats (1 ME) Reserved (8b) Buffer Handle(24b) SRAM1 SRAM3 SCR Reserved (16b) PerSchedQID (11b) Sch 3b QM 2b SRAM2 Frame Length (16b) Stats Index (16b)

42 Changes for Port Splitter
No Interface field in input ring QM(2b) determines which QM Scr ring to use QM will use the Sch(3b) to determine which scheduler to use. Port Splitter does not have to give a separate field anymore. The QM(2b), Sch(3b), QID(15b) should be left unchanged in the low 20 bits This will be used by the QM as its full 20-bit QID.

43 SPP V1 LC Ingress(1x10Gb/s and 10x1Gb/s)
XScale NAT Miss Scratch Ring SCR R B U F R T M M S F Rx1 Rx2 Key Extract Lookup Hdr Format Reserved (8b) Buffer Handle(24b) NN NN NN NN Reserved (16b) PerSchedQID (11b) Sch 3b QM 2b Frame Length (16b) TCAM Stats Index (16b) NN Buffer Handle(24b) Rsv (3b) Intf (4b) V 1 S W I T C H T B U F Scr2NN QM0 SCR Port Splitter SCR M S F 1x10G Tx2 1x10G Tx1 QM1 SCR NN NN QM2 SCR QM3 SCR Stats (1 ME) SRAM1 SRAM3 SCR SRAM2

44 Changes for QM QM will extract the Sch(3b) to identify the scheduler (called the port_id in the code) instead of getting a separate ‘port’ field. Association of a Scheduler with a physical interface: Dequeue currently reads the interface rate from SRAM periodically. We could extend this to have it also read the interface it is associated with at the same time it is reading the rate. This would also work for the LC_Ingress and NPE where we need to change the interface to 0 before sending it to TX. This could be accomplished by setting the interface read by Dequeue to 0 and then all the Dequeue engines (schedulers) would send to Interface 0.

45 SPP V1 LC Ingress(1x10Gb/s and 10x1Gb/s)
XScale NAT Miss Scratch Ring SCR R B U F R T M M S F Rx1 Rx2 Key Extract Lookup Hdr Format NN NN NN NN Buffer Handle(24b) Rsv (3b) Intf (4b) V 1 TCAM NN Buffer Handle(24b) Rsv (3b) Intf (4b) V 1 S W I T C H T B U F Scr2NN QM0 SCR Port Splitter SCR M S F 1x10G Tx2 1x10G Tx1 QM1 SCR NN NN QM2 SCR QM3 SCR Stats (1 ME) SRAM1 SRAM3 SCR SRAM2

46 Changes for Scr2NN New Block based on Dave’s “Port_Concentrator”
QM now takes care of giving appropriate value for the Interface field

47 SPP V1 LC Ingress(1x10Gb/s and 10x1Gb/s)
XScale NAT Miss Scratch Ring SCR R B U F R T M M S F Rx1 Rx2 Key Extract Lookup Hdr Format NN NN NN NN TCAM NN S W I T C H T B U F Scr2NN QM0 SCR Port Splitter SCR M S F 1x10G Tx2 1x10G Tx1 QM1 SCR NN NN QM2 SCR Stats Index (16b) Opcode (4b) Data (12b) QM3 SCR Stats (1 ME) SRAM1 SRAM3 SCR SRAM2

48 Changes for Stats Do we want to incorporate the improvements we made for the ONL Stats block?

49 SPP V1 LC Egress with 1x10Gb/s Tx
XScale NAT Miss Scratch Ring SCR S W I T C H R B U F M S F Rx1 Rx2 Key Extract Lookup Hdr Format NN NN NN NN TCAM NN T B U F QM0 SCR Port Splitter R T M M S F Flow Stats1 SCR 1x10G Tx2 1x10G Tx1 QM1 SCR NN NN QM2 SCR QM3 SCR SCR NAT Pkt return Stats (1 ME) SRAM3 SRAM1 SCR Flow Stats2 SRAM Freelist SRAM XScale XScale SRAM2 Archive Records

50 SPP V1 LC Egress with 10x1Gb/s Tx
XScale NAT Miss Scratch Ring SCR S W I T C H R B U F M S F Rx1 Rx2 Key Extract Lookup Hdr Format NN NN NN NN TCAM NN T B U F 5x1G Tx1 (P0-P4) QM0 SCR Port Splitter SCR R T M M S F Flow Stats1 SCR QM1 SCR 5x1G Tx2 (P5-P9) QM2 SCR SCR QM3 SCR SCR NAT Pkt return Stats (1 ME) SCR SRAM3 SRAM1 Flow Stats2 SRAM Freelist SRAM XScale XScale SRAM2 Archive Records

51 SPP V1 LC Egress with 10x1Gb/s Tx
XScale Buf Handle(24b) Port (4b) Reserved (12b) Eth. Frame Len (16b) Rx Flags (8b) NAT Miss Scratch Ring SCR S W I T C H R B U F M S F Rx1 Rx2 Key Extract Lookup Hdr Format NN NN NN NN TCAM NN T B U F 5x1G Tx1 (P0-P4) QM0 SCR Port Splitter Flow Stats1 SCR R T M M S F SCR QM1 SCR 5x1G Tx2 (P5-P9) QM2 SCR SCR QM3 SCR SCR NAT Pkt return Stats (1 ME) SCR SRAM3 SRAM1 Flow Stats2 SRAM Freelist SRAM XScale XScale SRAM2 Archive Records

52 SPP V1 LC Egress with 10x1Gb/s Tx
XScale Buf Handle(24b) Port (4b) Reserved (12b) Eth. Frame Len (16b) Rx Flags (8b) NAT Miss Scratch Ring SCR S W I T C H R B U F M S F Rx1 Rx2 Key Extract Lookup Hdr Format NN NN NN NN Buf Handle(24b) VLAN/IP_SAddr (32b) IP Pkt Length (16b) SrcMAC (8b) Eth Hdr Len (8b) UDP SPort (16b) IP Proto Type(8b) IP Hdr 1st Word (32b) Reserved IP Hdr 2nd Word (32b) IP DAddr (32b) TCAM Lookup Key NN T B U F 5x1G Tx1 (P0-P4) QM0 SCR Port Splitter SCR R T M M S F Flow Stats1 SCR QM1 SCR 5x1G Tx2 (P5-P9) QM2 SCR SCR QM3 SCR SCR NAT Pkt return Stats (1 ME) SRAM1 SCR SRAM3 Flow Stats2 SRAM Freelist SRAM XScale XScale SRAM2 Archive Records

53 Changes for Key Extract
Input: No changes Output Move Eth Hdr Len field Add Type field to Lookup Key If ICMP extract TYPE field from ICMP pkt Otherwise, set to 0. Add Src MAC to Lookup Key Extract low 8 bits of Src MAC from ethernet header Add VLAN/IP_SAddr to Lookup Key If low 6 bits of Src MAC are all 1’s then Src MAC is from NPE Use VLAN in the VLAN/IP_SAddr field VLAN goes in lower 12 bits, upper 20 bits are all 0’s If low 6 bits of Src MAC are NOT all 1’s then it is from GPE Use IP_SAddr in the VLAN/IP_SAddr field

54 SPP V1 LC Egress with 10x1Gb/s Tx
Flags (8b) Buf Handle(24b) H IP Pkt Length (16b) Eth Hdr Len (8b) Reserved (8b) XScale Lookup Result VLAN (12b) rsv 4b PerSchedQID (11b) Sch 3b QM 2b Translated SPort(16b) Stats Index (16b) NAT Miss Scratch Ring IP DAddr (32b) SCR S W I T C H IP Hdr 1st Word (32b) R B U F IP Hdr 2nd Word (32b) M S F Rx1 Rx2 Key Extract Lookup Hdr Format NN NN NN NN Reserved (8b) Buf Handle(24b) IP Pkt Length (16b) Eth Hdr Len (8b) SrcMAC (8b) TCAM VLAN/IP_SAddr (32b) NN IP Proto (8b) UDP SPort (16b) Type(8b) T B U F 5x1G Tx1 (P0-P4) QM0 SCR Port Splitter SCR R T M M S F IP DAddr (32b) Flow Stats1 SCR QM1 SCR IP Hdr 1st Word (32b) 5x1G Tx2 (P5-P9) QM2 IP Hdr 2nd Word (32b) SCR SCR QM3 SCR SCR NAT Pkt return Stats (1 ME) SRAM1 SCR SRAM3 Flow Stats2 SRAM Freelist SRAM XScale XScale SRAM2 Archive Records

55 Changes for Lookup Input: Output: Move Eth Hdr Len field
Add Type field to Lookup Key Add Src MAC to Lookup Key Add IP SAddr to Lookup Key Output: Re-organize output Move IP DAddr to 5th word (Do we still need this?) Move Eth Hdr Len Add Flags in 1st word Add Translated Sport to Lookup Result

56 SPP V1 LC Egress with 10x1Gb/s Tx
NAT MISS! XScale NAT Miss Scratch Ring Buf Handle(24b) IP Pkt Length (16b) Reserved (8b) Eth Hdr Len (8b) SCR S W I T C H R B U F M S F Rx1 Rx2 Key Extract Lookup Hdr Format NN NN NN NN Flags (8b) Buf Handle(24b) IP Pkt Length (16b) TCAM Eth Hdr Len (8b) Reserved (8b) VLAN (12b) rsv 4b PerSchedQID (11b) Sch 3b QM 2b NN Translated SPort(16b) Stats Index (16b) T B U F 5x1G Tx1 (P0-P4) IP DAddr (32b) SCR QM0 SCR Port Splitter SCR R T M M S F Flow Stats1 IP Hdr 1st Word (32b) SCR QM1 SCR IP Hdr 2nd Word (32b) 5x1G Tx2 (P5-P9) QM2 SCR SCR QM3 SCR SCR NAT Pkt return Stats (1 ME) SCR SRAM3 SRAM1 Flow Stats2 SRAM Freelist SRAM XScale XScale SRAM2 Archive Records

57 SPP V1 LC Egress with 10x1Gb/s Tx
Flags (8b) Buf Handle(24b) NAT HIT! IP Pkt Length (16b) Eth Hdr Len (8b) Reserved (8b) XScale VLAN (12b) rsv 4b PerSchedQID (11b) Sch 3b QM 2b Translated SPort(16b) Stats Index (16b) NAT Miss Scratch Ring IP DAddr (32b) SCR S W I T C H IP Hdr 1st Word (32b) R B U F IP Hdr 2nd Word (32b) M S F Rx1 Rx2 Key Extract Lookup Hdr Format NN NN NN NN Reserved (8b) Buffer Handle(24b) TCAM Reserved (16b) PerSchedQID (11b) Sch 3b QM 2b NN Ethernet Frame Length (16b) Cntr Index (16b) T B U F 5x1G Tx1 (P0-P4) QM0 SCR Port Splitter SCR R T M M S F Flow Stats1 SCR QM1 SCR 5x1G Tx2 (P5-P9) QM2 SCR SCR QM3 SCR SCR NAT Pkt return Stats (1 ME) SRAM3 SRAM1 SCR Flow Stats2 SRAM Freelist SRAM XScale XScale SRAM2 Archive Records

58 Changes for Header Format
Input: Re-organize input 72 bit lookup result Move IP DAddr to 5th word Move Eth Hdr Len Add Flags in 1st word Add Translated Sport to Lookup Result Output to PortSplitter/QM: No changes Output to XScale: New Function: Write Buffer descriptor including: Packet Size Buffer Size Freelist Offset SliceID (VLAN) Stats Index Should we also write the ethernet header length? Test HIT Flag to determine if NAT Hit or Miss Send Miss to XScale Send Hit to PortSplitter/QM Perform SPort/ID translation recalculate IP Hdr checksum. Calculate incremental Transport (TCP and UDP) checksum Check for arriving UDP checksum of 0 which implies that packet is not using the optional UDP checksum

59 Egress Buffer Descriptor
Buffer_Next (32b) LW0 Buffer_Size (16b) Offset (16b) LW1 Packet_Size (16b) Free_list 0000 (4b) Reserved (4b) Reserved (8b) LW2 Reserved (4b) SliceID (VLAN) (12b) Stats Index (16b) LW3 Reserved (16b) Reserved (8b) Reserved (4b) Reserved (4b) LW4 Reserved (4b) Reserved (4b) Reserved (32b) LW5 Reserved (16b) Reserved (16b) LW6 Packet_Next (32b) LW7

60 SPP V1 LC Egress with 10x1Gb/s Tx
XScale NAT Miss Scratch Ring SCR S W I T C H R B U F M S F Rx1 Rx2 Key Extract Lookup Hdr Format NN NN Reserved (8b) NN Buffer Handle(24b) NN Reserved (16b) QM 2b Sch 3b PerSchedQID (11b) TCAM Ethernet Frame Length (16b) Cntr Index (16b) NN T B U F 5x1G Tx1 (P0-P4) QM0 SCR Port Splitter SCR R T M M S F Flow Stats1 SCR QM1 SCR 5x1G Tx2 (P5-P9) QM2 SCR SCR QM3 SCR Reserved (8b) Buffer Handle(24b) SCR NAT Pkt return Stats (1 ME) SRAM3 Reserved (16b) PerSchedQID (11b) Sch 3b QM 2b SRAM1 SCR Flow Stats2 SRAM Freelist SRAM Ethernet Frame Length (16b) Cntr Index (16b) XScale XScale SRAM2 Archive Records

61 Changes for Port Splitter
No Interface field in input ring QM(2b) determines which QM Scr ring to use Sch(3b) needs to be copied up to the low 3 bits of the top byte to comply with QM’s current input format. We will look into removing this requirement and see if it is easy to have the QM extract the scheduler bits itself The QM(2b), Sch(3b), QID(15b) should also be left unchanged in the low 20 bits This will be used by the QM as its full 20-bit QID.

62 SPP V1 LC Egress with 10x1Gb/s Tx
XScale NAT Miss Scratch Ring SCR S W I T C H R B U F M S F Rx1 Rx2 Key Extract Reserved (8b) Buffer Handle(24b) Lookup Hdr Format NN NN NN NN Reserved (16b) PerSchedQID (11b) Sch 3b QM 2b Ethernet Frame Length (16b) Cntr Index (16b) TCAM Buffer Handle(24b) Rsv (3b) Intf (4b) V 1 NN T B U F 5x1G Tx1 (P0-P4) QM0 SCR Port Splitter Flow Stats1 SCR R T M M S F SCR QM1 SCR 5x1G Tx2 (P5-P9) QM2 SCR SCR QM3 SCR SCR NAT Pkt return Stats (1 ME) SRAM1 SCR SRAM3 Flow Stats2 SRAM Freelist SRAM XScale XScale SRAM2 Archive Records

63 Changes for QM QM will extract the Sch(3b) to identify the scheduler (called the port_id in the code) instead of getting a separate ‘port’ field. Association of a Scheduler with a physical interface: Dequeue currently reads the interface rate from SRAM periodically. We could extend this to have it also read the interface it is associated with at the same time it is reading the rate. This would also work for the LC_Ingress and NPE where we need to change the interface to 0 before sending it to TX. This could be accomplished by setting the interface read by Dequeue to 0 and then all the Dequeue engines (schedulers) would send to Interface 0.

64 SPP V1 LC Egress with 10x1Gb/s Tx
XScale NAT Miss Scratch Ring SCR S W I T C H R B U F M S F Rx1 Rx2 Key Extract Lookup Hdr Format NN NN NN NN Buffer Handle(24b) Rsv (3b) Intf (4b) V 1 TCAM Buffer Handle(24b) Rsv (3b) Intf (4b) V 1 NN T B U F 5x1G Tx1 (P0-P4) QM0 SCR Port Splitter Flow Stats1 SCR R T M M S F SCR QM1 SCR 5x1G Tx2 (P5-P9) QM2 SCR SCR QM3 SCR SCR NAT Pkt return Stats (1 ME) SCR SRAM3 SRAM1 Flow Stats2 SRAM SRAM XScale XScale SRAM2 Archive Records Freelist

65 SPP V1 LC Egress with 1x10Gb/s Tx
XScale NAT Miss Scratch Ring SCR S W I T C H R B U F M S F Rx1 Rx2 Key Extract Lookup Hdr Format NN NN NN NN Buffer Handle(24b) Rsv (3b) Intf (4b) V 1 TCAM Buffer Handle(24b) Rsv (3b) Intf (4b) V 1 NN T B U F QM0 SCR Port Splitter Flow Stats1 SCR R T M M S F 1x10G Tx2 1x10G Tx1 QM1 SCR NN NN QM2 SCR QM3 SCR SCR NAT Pkt return Stats (1 ME) SCR SRAM3 SRAM1 Flow Stats2 SRAM SRAM XScale XScale SRAM2 Archive Records Freelist

66 Changes for FlowStats New Block Output for 10x1Gb/s Tx:
To 1 of two scratch rings dependent on outgoing interface Output for 1x10Gb/s Tx: To NN Ring QM will now take care of setting the appropriate interface, FlowStats doesn’t have to do anything special.

67 SPP V1 LC Egress with 10x1Gb/s Tx
XScale NAT Miss Scratch Ring SCR S W I T C H R B U F M S F Rx1 Rx2 Key Extract Lookup Hdr Format NN NN NN NN TCAM NN T B U F 5x1G Tx1 (P0-P4) QM0 SCR Port Splitter SCR R T M M S F Flow Stats1 SCR QM1 SCR 5x1G Tx2 (P5-P9) QM2 SCR SCR QM3 Stats Index (16b) Opcode (4b) Data (12b) SCR SCR NAT Pkt return Stats (1 ME) SRAM1 SCR SRAM3 Flow Stats2 SRAM SRAM XScale XScale SRAM2 Archive Records Freelist

68 Changes for Stats Do we want to incorporate the improvements we made for the ONL Stats block?

69 NPE Next we’ll look at the design for the NPE for SPP V1

70 SPP V1 NPE (MetaRouters)
Substrate Decap S W I T C H R B U F M S F Rx1 Rx2 NN Lookup Hdr Format NN NN NN NN Parse TCAM NN S W I T C H T B U F Scr2NN QM0 SCR Port Splitter SCR M S F 1x10G Tx2 1x10G Tx1 QM1 SCR NN NN QM2 SCR QM3 SCR Stats (1 ME) SRAM1 SRAM3 SCR SRAM2

71 SPP V1 NPE (MetaRouters)
Substrate Decap S W I T C H R B U F M S F Rx1 Rx2 NN Lookup Hdr Format NN NN NN NN Parse Buf Handle(24b) Port (4b) Reserved (12b) Eth. Frame Len (16b) Rx Flags (8b) TCAM NN S W I T C H T B U F Scr2NN QM0 SCR Port Splitter SCR M S F 1x10G Tx2 1x10G Tx1 QM1 SCR NN NN QM2 SCR QM3 SCR Stats (1 ME) SRAM1 SRAM3 SCR SRAM2

72 SPP V1 NPE (MetaRouters)
Rx UDP DPort (16b) Buf Handle(32b) Slice ID (VLAN) (16b) MN Frm Offset (16b) MN Frm Length(16b) Rx IP SAddr (32b) Reserved (12b) Rx UDP SPort (16b) Code (4b) Slice Data Ptr (32b) Substrate Decap S W I T C H R B U F M S F Rx1 Rx2 NN Lookup Hdr Format NN NN NN NN Parse Buf Handle(24b) Port (4b) Reserved (12b) Eth. Frame Len (16b) Rx Flags (8b) TCAM NN S W I T C H T B U F Scr2NN QM0 SCR Port Splitter SCR M S F 1x10G Tx2 1x10G Tx1 QM1 SCR NN NN QM2 SCR QM3 SCR Stats (1 ME) SRAM1 SRAM3 SCR SRAM2

73 SPP V1 NPE (MetaRouters)
Substrate Decap Rx UDP DPort (16b) Buf Handle(32b) Slice ID (VLAN) (16b) MN Frm Offset (16b) MN Frm Length(16b) Rx IP SAddr (32b) Reserved (12b) Rx UDP SPort (16b) Code (4b) Slice Data Ptr (32b) S W I T C H R B U F M S F Rx1 Rx2 NN Lookup Hdr Format NN NN NN NN Parse Buf Handle(32b) TCAM IP Pkt Length (16b) IP Pkt Offset (16b) NN Lookup Key[ ] Type(1b)/Slice ID(15b)/Rx UDP DPort (16b) S W I T C H T B U F Lookup Key[111-80] DA (32b) Scr2NN QM0 SCR Port Splitter SCR M S F Lookup Key[ 79-48] SA (32b) 1x10G Tx2 1x10G Tx1 Lookup Key[ 47-16] Ports (32b) QM1 SCR NN NN Lookup Key Proto/TCP_Flags [15- 0] (16b) Rsv (4b) Exception Bits (12b) QM2 SCR Slice Data Ptr (32b) QM3 SCR Stats (1 ME) Rx UDP SPort (16b) Reserved (12b) Code (4b) SRAM1 SRAM3 SCR Rx IP SAddr (32b) SRAM2

74 SPP V1 NPE (MetaRouters)
Buf Handle(32b) IP Pkt Length (16b) IP Pkt Offset (16b) Substrate Decap Lookup Key[ ] Type(1b)/Slice ID(15b)/Rx UDP DPort (16b) S W I T C H R B U F Lookup Key[111-80] DA (32b) Lookup Key[ 79-48] SA (32b) M S F Rx1 Lookup Key[ 47-16] Ports (32b) Rx2 NN Lookup Hdr Format NN NN NN NN Lookup Key Proto/TCP_Flags [15- 0] (16b) Rsv (4b) Exception Bits (12b) Parse Slice Data Ptr (32b) Buf Handle(32b) TCAM Rx UDP SPort (16b) Reserved (12b) Code (4b) IP Pkt Length (16b) IP Pkt Offset (16b) NN Rx UDP DPort(16b) Slice ID (VLAN) (16b) Rx IP SAddr (32b) Cntr Index (16b) R V S d (1b) D H Exception Bits (12b) L S W I T C H T B U F Scr2NN QM0 SCR Port Splitter SCR Tx IP DAddr (32b) M S F 1x10G Tx2 1x10G Tx1 Tx UDP DPort (16b) QM1 SCR Tx UDP SPort(16b) NN NN MAC DA (8b) Rsv PerSchedQID (11b) Sch 3b QM 2b QM2 SCR Slice Data Ptr (32b) QM3 SCR Rx UDP SPort (16b) Reserved (12b) Code (4b) Stats (1 ME) SRAM1 SRAM3 SCR Rx IP SAddr (32b) SRAM2

75 Changes for Lookup No Port field in input data
But Lookup doesn’t look at the data anyway so no changes.

76 SPP V1 NPE (MetaRouters)
Substrate Decap S W I T C H Buf Handle(32b) R B U F IP Pkt Length (16b) IP Pkt Offset (16b) M S F Rx1 Rx2 NN Lookup Hdr Format Rx UDP DPort(16b) Slice ID (VLAN) (16b) NN NN NN NN Cntr Index (16b) R S V d (1b) D H Exception Bits (12b) L Parse Reerved (8b) Buffer Handle(24b) Tx IP DAddr (32b) Tx UDP DPort (16b) Tx UDP SPort(16b) TCAM TCAM Reserved (16b) PerSchedQID (11b) Sch 3b QM 2b TCAM NN MAC DA (8b) Rsv PerSchedQID (11b) Sch 3b QM 2b Ethernet Frame Length (16b) Cntr Index (16b) Slice Data Ptr (32b) S W I T C H T B U F QM0 Rx UDP SPort (16b) Reserved (12b) Scr2NN Code (4b) SCR Port Splitter SCR M S F 1x10G Tx2 1x10G Tx1 Rx IP SAddr (32b) QM1 SCR NN NN QM2 SCR QM3 SCR Stats (1 ME) SRAM1 SRAM3 SCR SRAM2

77 Changes for HF No Port field in input data
Use QM/Sched bits to determine Src IP Address to use on the outgoing Tunnel Header. Src MAC Addr should be a constant Dst MAC Addr should be configured: LCE GPE

78 SPP V1 NPE (MetaRouters)
Substrate Decap S W I T C H R B U F M S F Rx1 Rx2 NN Lookup Hdr Format NN NN NN NN Reerved (8b) Buffer Handle(24b) Parse Reserved (16b) PerSchedQID (11b) Sch 3b QM 2b Ethernet Frame Length (16b) Cntr Index (16b) NN S W I T C H T B U F Scr2NN QM0 SCR Port Splitter SCR M S F 1x10G Tx2 1x10G Tx1 QM1 SCR NN NN QM2 SCR QM3 Reerved (8b) Buffer Handle(24b) SCR Stats (1 ME) Reserved (16b) PerSchedQID (11b) Sch 3b QM 2b SRAM1 SCR TCAM SRAM3 TCAM Ethernet Frame Length (16b) SRAM2 Cntr Index (16b)

79 Changes for PortSplitter
No Port field in input data Use QM bits to determine Scratch Ring to write to. 4 Scratch Rings

80 SPP V1 NPE (MetaRouters)
Substrate Decap S W I T C H R B U F M S F Rx1 Rx2 NN Lookup Hdr Format NN NN NN NN Parse TCAM NN Buffer Handle(24b) Rsv (3b) Intf (4b) V 1 S W I T C H T B U F Scr2NN QM0 SCR Port Splitter SCR M S F 1x10G Tx2 1x10G Tx1 QM1 SCR NN NN QM2 SCR QM3 Reerved (8b) Buffer Handle(24b) SCR Stats (1 ME) Reserved (14b) PerSchedQID (13b) Sch 3b QM 2b SRAM1 TCAM SRAM3 SCR TCAM Ethernet Frame Length (16b) SRAM2 Cntr Index (16b)

81 Changes for QM Use Sched bits to determine which Scheduler to use.

82 SPP V1 NPE (MetaRouters)
Substrate Decap S W I T C H R B U F M S F Rx1 Rx2 NN Lookup Hdr Format NN NN NN NN Parse TCAM Buffer Handle(24b) Rsv (3b) Intf (4b) V 1 NN S W I T C H Buffer Handle(24b) Rsv (3b) Intf (4b) V 1 T B U F Scr2NN QM0 SCR Port Splitter SCR M S F 1x10G Tx2 1x10G Tx1 QM1 SCR NN NN QM2 SCR QM3 SCR Stats (1 ME) SRAM1 SRAM3 SCR SRAM2

83 SPP V1 NPE (MetaRouters)
Substrate Decap S W I T C H R B U F M S F Rx1 Rx2 NN Lookup Hdr Format NN NN NN NN Parse TCAM NN S W I T C H T B U F Scr2NN QM0 SCR Port Splitter SCR M S F 1x10G Tx2 1x10G Tx1 QM1 SCR NN NN QM2 SCR Stats Index (16b) Opcode (4b) Data (12b) QM3 SCR Stats (1 ME) SRAM1 SRAM3 SCR SRAM2

84 TCAM Performance (Rates in M/sec)
Lookup Size #LA-1 Words Core Size Assoc. Data Single LA-1 Max Rate Max Core Rate Avg Shared Rate (Each of 2 LA-1s) 32 1 36 50 25 64 128 12.5 2 72 100 3 67 4 144 5 40 160 288 LC_Ingress/LC_Egress IPv4 MR

85 Extra Slides The rest of the slides are old or for extra information

86 SPP V1 LC Ingress(1x10Gb/s and 10x1Gb/s)
Buf Handle(24b) IP Pkt Length (16b) PerSchedQID (15b) Translated DPort/ID (16b) Stats Index (16b) MAC DAddr (8b) VLAN (12b) Eth Hdr Len (8b) IP Hdr 1st Word (32b) Flags (8b) Sch 3b QM 2b IP Hdr 2nd Word (32b) XScale NAT HIT! NAT Miss Scratch Ring SCR R B U F R T M M S F Rx1 Rx2 Key Extract Lookup Hdr Format NN NN NN NN Frame Length (16b) Buffer Handle(24b) Stats Index (16b) Reserved (12b) (8b) PerSchedQID(15b) Sch 3b QM 2b TCAM NN Port Splitter QM0 SCR QM1 QM2 QM3 S W I T C H T B U F Scr2NN M S F 1x10G Tx2 1x10G Tx1 NN NN Stats (1 ME) SRAM1 SRAM3 SCR SRAM2

87 Block Interfaces

88 SPP V0: LC Ingress: Lookup Block Interfaces
Phy Int Rx Key Extract Lookup Hdr Format QM/Schd Switch Tx S W I T C H Buf Handle(32b) Buf Handle(32b) IP Pkt Length (16b) Reserved (8b) Eth Hdr Len (8b) IP Pkt Length (16b) Reserved (8b) Eth Hdr Len (8b) Lookup Key[63-32] (32b) Rsvd (4b) VLAN (16b) Stats Index (16b) Lookup Key[ 31-0] (32b) DAddr (8b) Port (4b) QID (20b) Lookup Key: Lookup Result: SL (4b) Intfc (4b) D_Addr[31:8] (24b) Rsvd (4b) VLAN (16b) Stats Index (16b) D_Addr[7:0] (8b) Protocol (8b) UDP/TCP DPort//ICMP ID (16b) MACDAddr (8b) Intfc (4b) QID (20b) NAT will need to translate ICMP IDs also. SL: Substrate Link Type. May not be needed anymore. Do we need something different in the “UDP DPort” field for different Protocols? How many different protocols?

89 SPP V0: IPv4 MR Lookup Block Interfaces
Rx DeMux Parse Lookup Header Format QM Tx Lookup Key[111-80] DA (32b) Buf Handle(32b) IP Pkt Length (16b) IP Pkt Offset (16b) Lookup Key[ 79-48] SA (32b) Lookup Key[ 47-16] Ports (32b) Lookup Key Proto/TCP_Flags [15- 0] (16b) Exception Bits (12b) Lookup Key[ ] Slice ID/Rx UDP DPort (32b) L Flags (4b) Buf Handle(32b) IP Pkt Length (16b) IP Pkt Offset (16b) Slice ID (VLAN) (16b) Rx UDP DPort(16b) R S V d (1b) H (1b) L D (1b) D (1b) Exception Bits (12b) Cntr Index (16b) Tx IP DAddr (32b) Tx UDP DPort (16b) Tx UDP SPort(16b) DA(8b) Port (4b) QID(20b) Slice Data Ptr (32b) Slice Data Ptr (32b) Reserved (28b) Code (4b) Reserved (28b) Code (4b) Lookup Key (144b): Slice ID/Rx UDP DPort (32b) IP DAddr (32b) IP SAddr (32b) How does this change for V1? SPort (16b) DPort (16b) Proto/TCP_Flags(16b)

90 SPP V0: IPv4 MR Functional Block Results
Lookup Key (144b): IP DAddr (32b) IP SAddr (32b) SPort (16b) Slice ID/Rx UDP DPort (32b) DPort (16b) Proto/TCP_Flags(16b) TCAM Status Bits Stored in TCAM Lookup Result (128b): As given to HF Lookup Result (128b): D O N e 1b H I t 1b M H I t 1b D 1b L D 1b Reserved (11b) Cntr Index (16b) Port (4b) QID(20b) DA(8b) Tx IP DAddr (32b) Cntr Index (16b) D (1b) Exception Bits (12b) Tx UDP SPort(16b) Tx UDP DPort (16b) R S V d H I t L Tx IP DAddr (32b) Tx UDP DPort (16b) Tx UDP SPort(16b) DA(8b) Port (4b) QID(20b) How does this change for V1?

91 SPP V0: LC Egress: Lookup Block Interfaces
Phy Int Tx QM/Schd Hdr Format Lookup Key Extract Switch Rx S W I T C H Buf Handle(32b) Buf Handle(32b) IP Pkt Length (16b) Reserved (8b) Eth Hdr Len (8b) IP Pkt Length (16b) Reserved (8b) Eth Hdr Len (8b) IP DAddr (32b) IP DAddr (32b) Lookup Result [63-32] (32b) Lookup Key IP Proto (8b) Lookup Key – UDP SPort (16b) Reserved (8b) Lookup Result [31-0] (32b) Lookup Result: Lookup Key: QID (20b) VLAN (12b) Stats Index (16b) Port (4b) Rsvd UDP SPort (16b) Protocol (8b) Reserved

92 SPP V1: LC Egress: Lookup Block Interfaces
Phy Int Tx QM/Schd Hdr Format Lookup Key Extract Switch Rx S W I T C H Buf Handle(32b) Buf Handle(32b) IP Pkt Length (16b) Reserved (8b) Eth Hdr Len (8b) IP Pkt Length (16b) Reserved (8b) Eth Hdr Len (8b) IP DAddr (32b) IP DAddr (32b) Lookup Result [63-32] (32b) Lookup Key IP Proto (8b) Lookup Key – UDP SPort (16b) Reserved (8b) Lookup Result [31-0] (32b) Lookup Result: Lookup Key: QID (20b) VLAN (12b) Stats Index (16b) Port (4b) Rsvd X L A T e 1b N H V IP 1b N H V Eth 1b IP Src Addr (32b) TCP/UDP Sport ICMP ID (16b) Protocol (8b) MAC SAddr (8b) NAT Port XLate (16b) NH Address (16b) NH Address (32b)

93 SPP V1 LC: Functional Blocks
Lookup (1 ME) Hdr Format (1 ME) Port- Splitter (1 ME) QM (2 ME) Scr2NN (1 ME) Phy Int Rx (2 ME) Key Extract (1 ME) 10Gb/s Tx (2 ME) S W I T C H Stats (1 ME) Ingress: 12 MEs SRAM TCAM Stats (1 ME) SRAM Egress: 12 MEs Phy Int Tx (1 ME) Flow Stats1 (1 ME) QM (2 ME) Port- Splitter (1 ME) Hdr Format (1 ME) Lookup (1 ME) Key Extract (1 ME) Switch Rx (2 ME) Flow Stats2 (1 ME) How many MEs can we spare for FlowStats? Lets peek ahead to V2…

94 SPP V1 LC Ingress(1x10Gb/s and 10x1Gb/s)
XScale NAT Miss Scratch Ring SCR R B U F R T M M S F Rx1 Rx2 Key Extract Lookup Hdr Format NN NN NN NN TCAM NN S W I T C H T B U F QM0 M S F 1x10G Tx2 1x10G Tx1 Scr2NN SCR SCR Port Splitter NN NN SCR QM1 SCR Stats (1 ME) SRAM1 SRAM3 SCR SRAM2

95 SPP V1 NAT Notes Egress Traffic: From NPE From GPE
Preconfigured entries in Lookup table Should be no need for NAT From GPE Slice on GPE initiates a new flow Examples: Slice on GPE opens TCP connection to another node. Slice on GPE pings another node Slice on GPE initiates a UDP flow with a bind(2) Slice on GPE initiates a UDP flow with a send(2) In order to not drop or re-order packets, seems like we need a way for fast path to handle the allocation of Ports for remapping of GPE-initiated Egress flows Result of Egress Lookup needs to be: Physical Interface QID VLAN Stats Index Egress needs a pool of available Port/ID numbers. From CP

96 SPP V1 LC Ingress R B U F R T M M S F S W I T C H T B U F M S F TCAM
Rx1 Rx2 Key Extract Lookup Hdr Format NN NN NN NN NN S W I T C H T B U F QM0 M S F 10G Tx2 10G Tx1 Scr2NN SCR SCR Port Splitter NN NN SCR QM1 SCR Stats (1 ME) SRAM1 SRAM3 SCR Small SRAM Ring SRAM2 Large SRAM Ring SCR Scratch Ring NN NN Ring

97 SPP V2 LC: Functional Blocks
Lookup (1 ME) Hdr Format (1 ME) Port- Splitter (1 ME) QM (4 ME) Scr2NN (1 ME) Phy Int Rx (2 ME) Key Extract (1 ME) 10Gb/s Tx (2 ME) S W I T C H Stats (1 ME) Ingress: 14 MEs FL_Mgr? (1 ME) SRAM TCAM Stats (1 ME) SRAM Egress: 15 MEs FL_Mgr? (1 ME) 10Port Tx (2 ME) Flow Stats1 (1 ME) QM (4 ME) Port- Splitter (1 ME) Hdr Format (1 ME) Lookup (1 ME) Key Extract (1 ME) Switch Rx (2 ME) Flow Stats2 (1 ME) Depending on the performance of the QM, we may be able to use 2 or 3 MEs for FlowStats.

98 LC Ingress: Functional Blocks
Phy Int Rx Key Extract Lookup Hdr Format QM/Schd Switch Tx S W I T C H RBUF Buf Handle(32b) Eth. Frame Len (16b) Reserved (12b) Port (4b) Rx (2 Microengines): Function: Coordinate transfer of packets from RBUF to DRAM

99 LC Ingress: Functional Blocks
Phy Int Rx Key Extract Lookup Hdr Format QM/Schd Switch Tx S W I T C H Buf Handle(32b) Buf Handle(32b) IP Pkt Length (16b) Reserved (8b) Eth Hdr Len (8b) Eth. Frame Len (16b) Reserved (12b) Port (4b) Lookup Key[63-32] (32b) Lookup Key[ 31-0] (32b) Key_Extract (1 Microengine): Function: Extracts lookup key. Peel ARP packets off and send to XScale??? Lookup Key (64b): SL Type (4b): 0101b Port (4b): May not be needed IP DAddr (32b) IP Proto (8b) UDP DPort (16b) Notes: Frame offset in buffer is a constant and does not need to be read from Buffer Descriptor Ethernet Hdr Length should be passed along chain so Hdr Format can figure out where to start writing its stuff. Ethernet Header could have different lengths depending on whether VLANs are present or not. IP Hdr 1st Word (32b)

100 LC Ingress: Functional Blocks
Phy Int Rx Key Extract Lookup Hdr Format QM/Schd Switch Tx S W I T C H Buf Handle(32b) Buf Handle(32b) IP Pkt Length (16b) Reserved (8b) Eth Hdr Len (8b) IP Pkt Length (16b) Reserved (8b) Eth Hdr Len (8b) Lookup Key[63-32] (32b) VLAN (16b) Stats Index (16b) Lookup Key[ 31-0] (32b) DAddr (8b) Port (4b) QID (20b) IP Hdr 1st Word (32b) IP Hdr 1st Word (32b) Lookup: Notes on next page

101 LC Ingress: Functional Blocks
Lookup: Function: Performs Lookup and passes result on to Hdr Format. Lookup Key (64b): SL Type (4b): 0101b Port (4b): May not be needed IP DAddr (32b) IP Proto (8b) UDP DPort (16b) Lookup Result (56b): DAddr (8b): only 8 bits of Ethernet DAddr are variable, other 40 are static per node. VLAN (12b) QID (20b) Stats Index (16b) Port (4b): For case with external switch it is the actual physical interface to use Also one port per GPE and one port per NPE For case with switch blade, it is just used to spread traffic across QM/Scheduler? Notes: Does Lookup Key need to include Port? Seems like it should not. Does Lookup still need Frame Length? Will it be maintaining any Byte Counters? Result should not have to include RxMI, it is not used for anything. Stats Index may be a Per MI stats index if desired.

102 LC Ingress: Functional Blocks
Phy Int Rx Key Extract Lookup Hdr Format QM/Schd Switch Tx S W I T C H Buf Handle(32b) Buffer Handle(32b) IP Pkt Length (16b) Reserved (8b) Eth Hdr Len (8b) Rsv (4b) Port (4b) Rsv (4b) QID(20b) VLAN (16b) Stats Index (16b) DAddr (8b) Port (4b) QID (20b) Frame Length (16b) Stats Index (16b) IP Hdr 1st Word (32b) Hdr Format: Function: From lookup result: re-writes just the ethernet header in DRAM to make frame ready to transmit. Extract QID, Port, Stats Index and Frame Length to pass on to QM/Scheduler May need to increment a counter based on Stats Index. Notes: Pass Size on to QM/Scheduler so it does not have to read buffer descriptor for Enqueue to update Q Length. Offset to beginning of old Ethernet header should be constant but we don’t necessarily know how long it was so we don’t know where to put our new one. Ethernet Hdr Len is used to determine where new header should go

103 LC Ingress: Functional Blocks
Phy Int Rx Key Extract Lookup Hdr Format QM/Schd Switch Tx S W I T C H Buffer Handle(32b) Buffer Handle(24b) Rsv (3b) Port (4b) V 1 Rsv (4b) Port (4b) Rsv (4b) QID(20b) V: Valid Bit Frame Length (16b) Stats Index (16b) QM/Scheduler (See Sailesh’s slides for more details) Function: Enqueue and Dequeue from queues Scheduling algorithm Drop Policy Notes:

104 LC Ingress: Functional Blocks
Phy Int Rx Key Extract Lookup Hdr Format QM/Schd Switch Tx S W I T C H Buffer Handle(24b) Rsv (3b) Port (4b) V 1 TBUF V: Valid Bit Switch TX: Function: Coordinate transfer of packets from DRAM to TBUF Notes:

105 LC Egress: Functional Blocks
Phy Int Tx QM/Schd Hdr Format Lookup Key Extract Switch Rx S W I T C H Buf Handle(32b) RBUF Eth. Frame Len (16b) Reserved (12b) Port (4b) Rx: Function: Coordinate transfer of packets from RBUF to DRAM Notes: Do we need port? May not make sense to remove it since, it is there for other versions of Rx.

106 LC Egress: Functional Blocks
Phy Int Tx QM/Schd Hdr Format Lookup Key Extract Switch Rx S W I T C H Buf Handle(32b) IP Pkt Length (16b) Reserved (8b) Eth Hdr Len (8b) IP DAddr (32b) Lookup Key – UDP SPort (16b) Lookup Key IP Proto IP Hdr 1st Word (32b) Buf Handle(32b) Eth. Frame Len (16b) Reserved (12b) Port (4b) Key_Extract: Function: Extracts lookup key Notes:

107 LC Egress: Functional Blocks
Phy Int Tx QM/Schd Hdr Format Lookup Key Extract Switch Rx S W I T C H Buf Handle(32b) IP DAddr (32b) Lookup Result [63-32] (32b) Lookup Result [31-0] (32b) IP Pkt Length (16b) Reserved (8b) Eth Hdr Len (8b) IP Hdr 1st Word (32b) Buf Handle(32b) IP DAddr (32b) IP Pkt Length (16b) Reserved (8b) Eth Hdr Len (8b) Lookup Key – UDP SPort (16b) Lookup Key IP Proto IP Hdr 1st Word (32b) Lookup: Function: Performs Lookup and passes result on to Hdr Format. Lookup Key: IP Protocol (8b) UDP Sport (16b) Lookup Result (52b): VLAN (12b): Value of 0x000 or 0xFFF, indicates invalid? QID (20b) Port (4b) Stats/Counter Index (16b) Static values for Egress Ethernet address: Ethernet SAddr Types: IP and/or 802.1Q Notes: Lookup does no processing on the lookup result.

108 Stats/Counter Index (16b)
Lookup Result Buf Handle(32b) IP Pkt Length (16b) Reserved (8b) Eth Hdr Len (8b) IP DAddr (32b) Rsvd (4b) VLAN(12b) Stats/Counter Index (16b) Rsvd (4b) Port (4b) Rsvd (4b) QID (20b) IP Hdr 1st Word (32b)

109 LC Egress: Functional Blocks
Phy Int Tx QM/Schd Hdr Format Lookup Key Extract Switch Rx S W I T C H Buffer Handle(32b) Buf Handle(32b) IP Pkt Length (16b) Reserved (8b) Eth Hdr Len (8b) Rsv (4b) Port (4b) Rsv (4b) QID(20b) IP DAddr (32b) Ethernet Frame Length (16b) Cntr Index (16b) Lookup Result [63-32] (32b) Lookup Result [31-0] (32b) IP Hdr 1st Word (32b) Hdr Format: Function: From lookup result: re-writes ethernet header in DRAM to make frame ready to transmit. Extract QID and frame length to pass on to QM/Scheduler Notes: Pass Size on to QM/Scheduler so it does not have to read buffer descriptor for Enqueue to update Q Length.

110 LC Egress: Functional Blocks
Phy Int Tx QM/Schd Hdr Format Lookup Key Extract Switch Rx S W I T C H Ethernet Frame Length (16b) Buffer Handle(32b) Cntr Index (16b) QID(20b) Rsv (4b) Port Buffer Handle(24b) Rsv (3b) Port (4b) V 1 V: Valid Bit QM/Scheduler (See Sailesh’s slides for more details) Function: Enqueue and Dequeue from queues Scheduling algorithm Drop Policy Memory Accesses: DRAM: None SRAM: Q-Array Reads and Writes Scheduling Data Structure Reads and Writes QLength Data Structure Reads and Writes Dequeue: Read Buffer Descriptor to retrieve Packet Size Buffer Descriptor Accesses: Read packet size Notes:

111 LC Egress: Functional Blocks
Phy Int Tx QM/Schd Hdr Format Lookup Key Extract Switch Rx S W I T C H TBUF Buffer Handle(24b) Rsv (3b) Port (4b) V 1 V: Valid Bit Switch TX: Function: Coordinate transfer of packets from DRAM to TBUF Memory Accesses: SRAM: Read Buffer Descriptor DRAM: Transfer to TBUF Buffer Descriptor Accesses: Read Size and Offset Notes: Calculate DRAM address based on SRAM Descriptor address in buffer handle

112 SPP V2 LC: Functional Blocks
Port- Splitter (1 ME) QM (4 ME) Scr2NN (1 ME) Phy Int Rx (2 ME) Key Extract (1 ME) Lookup (1 ME) Hdr Format (1 ME) 10Gb/s Tx (2 ME) S W I T C H Stats (1 ME) FL_Mgr? (1 ME) SRAM TCAM Stats (1 ME) SRAM FL_Mgr? (1 ME) 10Port Tx (2 ME) Flow Stats1 (1 ME) QM (4 ME) Port- Splitter (1 ME) Hdr Format (1 ME) Lookup (1 ME) Key Extract (1 ME) Switch Rx (2 ME) Flow Stats2 (1 ME)

113 SPP Plans SPP Version 1: SPP Version 2:
1 5-Port NPE (still don’t use NPUB) Switch Blade integration 10GE Tx module integration ARP Egress Traffic monitoring MR Code Options Anything new? Control Local Control Booting NPU Add/Remove Slices MR Control Add/Remove Routes Node Manager GPE 1 vs. multiple NAT? SSH Forwarding PLC integration SPP Version 2: Deal with constraints imposed by switch can send to only one NPU; can receive from only one NPU split processing across NPUs parsing, lookup on one; queueing on other Provide more resources for slice-specific processing Decouple QM schedulers from links collection of largely independent schedulers may use several to send to the same link e.g. separate rate classes (1-10M, M, M) optionally adjust scheduler rates dynamically Provide support for multicast requires addition of next-hop IP address after queueing Enable single slice to operate at 10 Gb/s

114 Objectives for SPP-NPE version 2
Deal with constraints imposed by switch can send to only one NPU; can receive from only one NPU split processing across NPUs parsing, lookup on one; queueing on other Provide more resources for slice-specific processing Decouple QM schedulers from links collection of largely independent schedulers may use several to send to the same link e.g. separate rate classes (1-10M, M, M) optionally adjust scheduler rates dynamically Provide support for multicast requires addition of next-hop IP address after queueing Enable single slice to operate at 10 Gb/s

115 NPE Version 2 Block Diagram
SRAM large sram ring NPUA SRAM GPE RxA (2 ME) Decap, Parse, LookupA, AddShim (8 MEs) TxA (2 ME) Stats (1 ME) SRAM SPI Switch TCAM SPI Switch Switch Blade flow control? Stats (1 ME) SRAM TxB (2 ME) HdrFmt (4 MEs) Queue Manager (4 MEs) LookupB &Copy (2 ME) RxB (2 ME) large sram ring NPUB SRAM SRAM

116 NPE Version 2 Block Diagram
slice#, resultIndx, etc, passed in shim Lookup produces resultIndx, statsIndx SRAM NPUA SRAM GPE RxA (2 ME) Decap, Parse, LookupA, AddShim (8 MEs) TxA (2 ME) Stats (1 ME) SRAM TCAM SPI Switch SPI Switch Switch Blade Stats (1 ME) flow control? SRAM TxB (2 ME) HdrFmt (4 MEs) Queue Manager (4 MEs) LookupB &Copy (2 ME) RxB (2 ME) for unicast, resultIndx replaced by QiD; allowing output side to skip lookup NPUB SRAM SRAM Lookup on <slice#, resultIndx> yields fanout, list of QiDs; copy to queues, adding copy#; (slice#, resultIndx remain in packet buffer) use slice# to select slice to format packet; use resultIndx to get next-hop

117 Questions/Issues Where are exit and entry points for packets sent to and from the GPE for exception processing? Parse (NPUA) and LookupA (NPUA) are where most exceptions are generated: IP Options No Route Etc. HdrFormat (NPUB) is where we do ethernet header processing What needs to be in the SHIM going from NPUA to NPUB? ResultIndex (32b) Exception Bits (12b) StatsIndex (16b) Slice# (12b) ??? Will we support multi-copy in a way similar to the ONL Router? How big can the fanout be? How many QIDs need to be stored with the LookupB Result? Is there some encoding for the QIDs that can take into account support for multicast and the copy#? For example: Multicast QID(20b) Multicast (1b): 1 Copy# (4b) PerMulticast QID(15b): One PerMulticast QID allocated for each Multicast Unicast QID(20b) Unicast (1b): 0 QID (19b) Are there timing/synchronization issues with adding, deleting or changing lookup entries between the two NPUs databases? Do we need flow control between TxA and RxB?

118 NPE Version 2 Block Diagram
SRAM NPUA SRAM GPE RxA (2 ME) Decap, Parse, LookupA, AddShim (8 MEs) TxA (2 ME) Stats (1 ME) SRAM TCAM SPI Switch SPI Switch Switch Blade NPUA: RxA:Same as Version 0 TxA: New 10Gb/s Decap: Same as Version 0 Parse: Same as Version 0 New code options? LookupA: Results will be different from Version 0 AddSim: New Stats (1 ME) flow control? SRAM TxB (2 ME) HdrFmt (4 MEs) Queue Manager (4 MEs) LookupB &Copy (2 ME) RxB (2 ME) NPUB SRAM SRAM

119 NPE Version 2 Block Diagram
NPUB: RxB:Same as Version 0 TxB: New 10Gb/s with L2 Header coming in on input ring? LookupB: New Copy: New, may be able to use some code from ONL Copy QM: New, decoupled from Links HF: New, may use some code from Version 0 SRAM NPUA SRAM GPE RxA (2 ME) Decap, Parse, LookupA, AddShim (8 MEs) TxA (2 ME) Stats (1 ME) SRAM TCAM SPI Switch SPI Switch Switch Blade Stats (1 ME) flow control? SRAM TxB (2 ME) HdrFmt (4 MEs) Queue Manager (4 MEs) LookupB &Copy (2 ME) RxB (2 ME) NPUB SRAM SRAM

120 SPP Version 2 System Architecture
Fast-Path Data GPE Blade GPE Blade LC Ingress Decap Parse Lookup AddShim NPUA 1 10Gb/s OR 10 1Gb/s SPI Switch SPI Switch Switch Blade RTM FIC FIC Copy QM HdrFormat LC Egress NPUB NPE 7010 Blade LC 7010 Blade

121 SPP Version 2 System Architecture
Default Data Path GPE Blade GPE Blade LC Ingress Decap Parse Lookup AddShim NPUA 1 10Gb/s OR 10 1Gb/s SPI Switch SPI Switch Switch Blade RTM FIC FIC Copy QM HdrFormat LC Egress NPUB NPE 7010 Blade LC 7010 Blade

122 SPP Version 2 System Architecture
Exception Data Path GPE Blade GPE Blade LC Ingress Decap Parse Lookup AddShim NPUA 1 10Gb/s OR 10 1Gb/s SPI Switch SPI Switch Switch Blade RTM FIC FIC Copy QM HdrFormat LC Egress NPUB NPE 7010 Blade LC 7010 Blade

123 PlanetLab NPE Input Frame from LC
Ethernet Header: DstAddr: MAC address of NPE SrcAddr: MAC address of LC VLAN: One VLAN per MR (MR == Slice) IP Header: Dst Addr: IP address of this node How many IP Addresses can a NODE have? Src Addr: IP address of previous hop Protocol: UDP UDP Header: Dst Port: Identifies input tunnel Src Port: with IP Src Addr identifies sending entity DstAddr (6B) SrcAddr (6B) Ethernet Header Type=802.1Q (2B) VLAN (2B) Type=IP (2B) Ver/HLen/Tos/Len (4B) ID/Flags/FragOff (4B) TTL (1B) Protocol = UDP (1B) Hdr Cksum (2B) Src Addr (4B) Header IP Dst Addr (4B) IP Options (0-40B) Src Port (2B) UDP Header Dst Port (2B) UDP length (2B) UDP checksum (2B) UDP Payload (MN Packet) PAD (nB) Ethernet Trailer CRC (4B) Indicates 8-Byte Boundaries Assuming no IP Options

124 SPP Version2 NPUA to NPUB Frame
SHIM (16B) ResultIndex (32b) Exception Bits (12b) StatsIndex (16b) Slice# (12b) IP Header: Dst Addr: IP address of this node How many IP Addresses can a NODE have? Src Addr: IP address of previous hop Protocol: UDP UDP Header: Dst Port: Identifies input tunnel Src Port: with IP Src Addr identifies sending entity SHIM (16B) Type=IP (2B) Ver/HLen/Tos/Len (4B) ID/Flags/FragOff (4B) TTL (1B) Protocol = UDP (1B) Hdr Cksum (2B) Src Addr (4B) Header IP Dst Addr (4B) IP Options (0-40B) Src Port (2B) UDP Header Dst Port (2B) UDP length (2B) UDP checksum (2B) UDP Payload (MN Packet) PAD (nB) Ethernet Trailer CRC (4B) Indicates 8-Byte Boundaries Assuming no IP Options

125 NPE Version 2 Block Diagram
NPUA SRAM Sram2NN (1 ME) SRAM GPE RxA (2 ME) Decap, Parse, LookupA, AddShim (8 MEs) TxA (2 ME) StatsA (1 ME) SRAM FL MgrA (1 ME) TCAM SPI Switch SPI Switch Switch Blade flow control? FL MgrB (1 ME) StatsB (1 ME) SRAM SRAM TxB (2 ME) HdrFmt (4 MEs) Queue Manager (4 MEs) LookupB &Copy (2 ME) RxB (2 ME) NPUB has 17 MEs currently spec’ed Scr2NN (1 ME) SRAM SRAM NPUB

126 SPP V2: MR Specific Code What about LookupA and LookupB?
Where does the MR Specific Code reside in V2: Parse HdrFormat What about LookupA and LookupB? Lookup is a “service” provided to the MRs by the Substrate. No MR specific code needed in LookupA or LookupB What about SideA AddShim? The Exception bits that go in the shim are MR Specific but they should be passed to AddShim and it will write them into the Shim. No MR Specific code needed in AddShim. What about SideB Copy? Is there anything MR specific about setting up multiple copies of a packet? There shouldn’t be. We will have the Copy block allocate a new hdr buffer descriptor and link it to the existing data buffer descriptor and take care of reference counts. The actual building of the new header(s) for the copies will be left to HF. No MR Specific code needed in Copy.

127 SPP V2: Hdr Format Lots of changes for HF:
Move behind QM More general: Support multiple source IP Addresses General support for Tunnels Eventually different kinds of tunnels (UDP/IP, GRE, …)? Support for Multicast Dealing with header buffer descriptors Reading Fanout table Substrate portion of HF will need to do Decap type table lookup Slice ID  (Code Option, Slice Memory Pointer, Slice Memory Size) HF gets a buffer descriptor from the QM The Substrate portion of HF must determine: Code Option (8b) Slice ID (12b) Location of Next Hop information (20b - 32b) LD vs. FWD? Stats Index (16b) Should HF do this of QM? The MR portion of HF must determine: Exception bits (16b) Lets put all of the above data in the Buf Desc LookupB/Copy will need to write it there based on what comes across from SideA in the shim

128 SPP V2 SideB SRAM Buffer Descriptor
Buffer_Next (32b) LW0 Buffer_Size (16b) Offset (16b) LW1 Packet_Size (16b) Free_list 0000 (4b) Reserved (4b) Ref_Cnt (8b) LW2 Stats Index (16b) Reserved (4b) Slice ID(xsid)(12b) LW3 NextHop Data Ptr (32b) LW4 Reserved (16b) MR Exception Bits (16b) LW5 MR Bits (32b) LW6 Packet_Next (32b) LW7 Still working on this

129 SPP V2: Result We need to be much more general in our support for Tunnels, Interfaces, MetaInterfaces, and Next Hops. SideB Result: Interface IP SAddr (32b) Eth MAC DAddr (48b) (LC, GPE1, GPE2, …, GPEn) SchedulerId (8b): which QM should handle pkt TxMI: IP Sport (16b) TxNextHop: IP DAddr (32b) IP DPort (16b)

130 Data Areas Where are the tables and what data is transmitted from SideA to SideB? SideA Tables Shim between SideA and SideB SideB Tables

131 Pkt Processing Data and Tables
SideA: MR/Slice Table: Generated by Control Used by: Substrate Decap to retrieve a MR/Slice’s parameters Indexed by SliceId == VLAN Contains: Code option Slice Memory ptr Slice Memory size ??? TCAM: LookupA Key: Result:

132 Data Areas Shim between SideA and SideB
Written to DRAM Buffer to be sent from SideA to SideB Contains: resultIndex (32b): Generated by Control Result of TCAM lookup on SideA Translates into an SRAM Address on SideB exceptionBits (16b) Generated by SideA Parse/Lookup Used by: SideB HF statsIndex (16b) SideA Lookup/AddShim to increment counters SideB Lookup/Copy to increment PreQ Cntrs (or perhaps SideA is the PreQ cntrs) SideB HF or QM to increment PostQ Cntrs sliceId (12b) Result of Decap read of Ethernet hdr (VLAN) ??? codeOption (12b) Slice Memory Ptr (32b)

133 Data Areas SideB Data Buffer Descriptor Hdr Buffer Descriptor
Used for multi-copy packets SPP V2 may require Tx to handle multi-buffer packets. It is unclear if we can cleanly do that same thing that we do with ONL where HF passes the Ethernet header to Tx. We may also need to have support for MR specific per copy data Results Table Generated by Control Used by: LookupB/Copy HF Should HF get its per copy info from here as well. Contains: Fanout (if fanout is > 1 we can overload some of the following fields with a pointer into a Fanout table) QID InterfaceId TxMI Id Probably doesn’t help to make it an index into a table for UDP Tunnels since UDP Port is 16 bits But for tunnels other than UDP tunnels it may help? TX NextHop Id Index into a table of Tunnel Next Hops Fanout Table QID[Fanout] Tx Next Hop ID[Fanout] Implementation Choices: One contiguous block of memory Fixed size or variable sized Chained with one set of values per entry Chained with N (N=4?) sets of values per entry

134 NPE Version 2 Block Diagram
NPUA SRAM Sram2NN (1 ME) SRAM GPE RxA (2 ME) Decap, Parse, LookupA, AddShim (8 MEs) TxA (2 ME) SHIM: StatsA (1 ME) SRAM FL MgrA (1 ME) TCAM resultIndex (32b) statsIndex (16b) SPI Switch SPI Switch MR Bits (32b) Switch Blade flow control? exceptions (16b) FL MgrB (1 ME) StatsB (1 ME) SRAM sliceId (16b) SRAM TxB (2 ME) HdrFmt (4 MEs) Queue Manager (4 MEs) LookupB &Copy (2 ME) RxB (2 ME) Scr2NN (1 ME) SRAM SRAM NPUB

135 NPE Version 2 Block Diagram
SRAM NPUA SRAM Sched ID (8b) Interface Entry: IP SAddr (32b) Eth DA (48b) GPE RxA (2 ME) Decap, Parse, LookupA, AddShim (8 MEs) TxA (2 ME) entry0 entry1 entryN Results Tbl: resultIndex (32b) statsIndex (16b) exceptions (16b) sliceId (16b) MR Bits (32b) SHIM: InterfaceId (8b) TxMI Id (16b) Tx NH Id (16b) Results Entry: Fanout (8b) QID (20b) Stats (1 ME) SRAM TCAM SPI Switch SPI Switch Switch Blade flow control? Stats (1 ME) SRAM SRAM NH Entry: IP DAddr (32b) IP DPort (16) TxB (2 ME) HdrFmt (4 MEs) Queue Manager (4 MEs) LookupB &Copy (2 ME) RxB (2 ME) NPUB SRAM SRAM

136 NPE Version 2 Block Diagram
SRAM NPUA SRAM GPE RxA (2 ME) Decap, Parse, LookupA, AddShim (8 MEs) TxA (2 ME) entry0 entry1 entryN Results Tbl: resultIndex (32b) statsIndex (16b) exceptions (16b) sliceId (16b) codeOpt (8b) SHIM: InterfaceId (8b) TxMI Id (16b) Tx NH Id (16b) Results Entry: Fanout (8b) QID (20b) Stats (1 ME) SRAM TCAM SPI Switch SPI Switch Switch Blade flow control? Stats (1 ME) SRAM SRAM TxB (2 ME) HdrFmt (4 MEs) Queue Manager (4 MEs) LookupB &Copy (2 ME) RxB (2 ME) NPUB SRAM SRAM

137 SPP V1 Lookup Result TCAM Status Bits Stored in TCAM
Lookup Result (128b): TCAM Status Bits Port (4b) QID(20b) DA(8b) Tx IP DAddr (32b) Cntr Index (16b) D 1b Reserved (11b) Tx UDP SPort(16b) Tx UDP DPort (16b) O N e H I t M L

138 ONL SRAM Buffer Descriptor
Still working on this Buffer_Next (32b) LW0 Buffer_Size (16b) Offset (16b) LW1 Packet_Size (16b) Free_list 0000 (4b) Reserved (4b) Ref_Cnt (8b) LW2 Stats Index (16b) MAC DAddr_47_32 (16b) LW3 MAC DAddr_31_00 (32b) LW4 EtherType (16b) Reserved (16b) LW5 Reserved (32b) LW6 Packet_Next (32b) LW7 1 Written by Rx, Added to by Copy Decremented by Freelist Mgr Ref_Cnt (8b) Written by Freelist Mgr Written by Rx Written by Copy Written by Rx and Plugins Written by QM

139 Statistics LC provides counts on UDP ports
Matching filter gives slice-specific stats-index which is updated for each packet handled by filter Pre-queue/post-queue counters for each QiD Memory space/bandwidth issues off-chip SRAMs support 200M 32 bit reads & writes per sec can have 16M 80 byte packets/sec, so one SRAM supports 12 reads/12 writes per packet for 250 byte packets, 36 reads/36 writes per packet updating stats for QiD takes 4 reads/4 writes

140 SPP Version 0 NPE: Rx (2ME) Substr Decap (1ME) Parse (1ME) Lookup
Header Format (1ME) QM (2ME) Tx (1ME) LC Ingress and Egress: QM Phy Int Rx Ideally no need to look at the actual code, if I’ve done my job right; this is a block design review, not a code review Encountered some performance discrepancies, and the problems they illustrate are likely to come up again in the future. Would like to spend some time illustrating performance issues and demonstrating common code speed techniques Key Extract Lookup Hdr Format Switch Tx S W I T C H Phy Int Tx QM Hdr Format Lookup Key Extract Switch Rx

141 NPE Version 2 Block Diagram
SRAM NPUA SRAM GPE RxA (2 ME) Decap, Parse, LookupA, AddShim (8 MEs) TxA (2 ME) SHIM: Stats (1 ME) SRAM TCAM resultIndex (32b) statsIndex (16b) SPI Switch SPI Switch MR Bits (32b) Switch Blade flow control? exceptions (16b) Stats (1 ME) SRAM sliceId (16b) SRAM TxB (2 ME) HdrFmt (4 MEs) Queue Manager (4 MEs) LookupB &Copy (2 ME) RxB (2 ME) NPUB SRAM SRAM

142 SPP V1 NAT Notes LC Ingress Lookup Key (72b):
Interface (8b) IP DAddr (32b) Protocol (8b) TCP UDP ICMP DPort/Identifier (16b) DPort for TCP and UDP Identifier for ICMP Echo Request/Reply Type (8b) Primarily for use with ICMP to distinguish between ICMP Echo Request and Reply For TCP and UDP should be a Don’t Care. Removed from V0 Lookup Key: SL(4b): Substrate Link type

143 SPP V1 NAT Notes LC Ingress Lookup Result (72b):
VLAN (12b) Stats Index (16b) MAC Addr (8b) QID (20b) Translated DPort/Identifier (16b) Removed from V0 Lookup Result: Interface(4b): indicated which RTM port to send pkt out on. Since we won’t be using the RTM between LC and NPE/GPE we don’t need this. We will need some indication of a port so the QM can operate as if it has 10 ports and queue appropriately. But we can do that with some bits from the QID.

144 Traffic Examples: ICMP Echo Request
external interface to fabric and base (additional GPEs) PE NPE GPE NMP MP RMP root context planetlab OS 4 3 2 1 x x x x 10GbE (fabric, data) 5 6 1GbE (base, control) x x Substrate LC mux I E TCAM CP SRM user login info SAddr DAddr Proto=ICMP Type=0 ID=0xABCD SNM Resource DB HIT! sliver tbl Send pkt back to LCE Xscale

145 Traffic Examples: ICMP Echo Reply
external interface to fabric and base (additional GPEs) PE NPE GPE NMP SAddr DAddr Proto=ICMP Type=8 ID=0xABCD MP RMP root context planetlab OS 4 3 2 1 x x x x 10GbE (fabric, data) 5 6 1GbE (base, control) x x Substrate LC mux I E TCAM CP SRM user login info SNM Resource DB GPE Should not receive an ICMP Echo Request, so it should not be sending out an ICMP Echo Reply. MISS! sliver tbl Xscale

146 QM Scheduler Performance

147 QM Scheduler Performance


Download ppt "SPP Version 1 Router Plans and Design"

Similar presentations


Ads by Google