David M. Zar Block Design Review: PlanetLab Line Card Header Format.

Slides:



Advertisements
Similar presentations
P4 demo: a basic L2/L3 switch in 170 LOC
Advertisements

P4: specifying data planes
ENGINEERING WORKSHOP Compute Engineering Workshop P4: specifying data planes Mihai Budiu San Jose, March 11, 2015.
NetFPGA Project: 4-Port Layer 2/3 Switch Ankur Singla Gene Juknevicius
A First Example: The Bump in the Wire A First Example: The Bump in the Wire 9/ INF5061: Multimedia data communication using network processors.
A First Example: The Bump in the Wire A First Example: The Bump in the Wire 8/ INF5062: Programming Asymmetric Multi-Core Processors.
Paper Review Building a Robust Software-based Router Using Network Processors.
John DeHart ONL NP Router Block Design Review: Lookup (Part of the PLC Block)
David M. Zar Applied Research Laboratory Computer Science and Engineering Department ONL Stats Block.
Michael Wilson Block Design Review: ONL Header Format.
John DeHart and Mike Wilson SPP V2 Router Design.
Washington WASHINGTON UNIVERSITY IN ST LOUIS Packet Routing Within MSR Fred Kuhns
1 - Charlie Wiseman - 05/11/07 Design Review: XScale Charlie Wiseman ONL NP Router.
Michael Wilson Block Design Review: Line Card Key Extract (Ingress and Egress)
Block Design Review: Queue Manager and Scheduler Amy M. Freestone Sailesh Kumar.
David M. Zar Applied Research Laboratory Computer Science and Engineering Department ONL Freelist Manager.
John DeHart Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress.
Brandon Heller Block Design Review: Substrate Decap and IPv4 Parse.
1 CSE 5346 Spring Network Simulator Project.
1 - Charlie Wiseman, Shakir James - 05/11/07 Design Review: Plugin Framework Charlie Wiseman and Shakir James ONL.
1 - John DeHart, Jing Lu - 3/8/2016 SRAM ONL NP Router Rx (2 ME) HdrFmt (1 ME) Parse, Lookup, Copy (3 MEs) TCAM SRAM Mux (1 ME) Tx (1 ME) QM (1 ME) xScale.
Mart Haitjema Block Design Review: ONL NP Router Multiplexer (MUX)
Flow Stats Module James Moscola September 12, 2007.
ONL NP Router xScale xScale TCAM SRAM Rx (2 ME) Mux (1 ME) Parse,
Reference Router on NetFPGA 1G
Design of a High Performance PlanetLab Node
Design of a Diversified Router: TCAM Usage
Design of a Diversified Router: TCAM Usage
SPP Version 1 Router Plans and Design
John DeHart Design of a Diversified Router: Lookup Block with All Associated Data in SRAM John DeHart
John DeHart Design of a Diversified Router: Lookup Block with All Associated Data in SRAM John DeHart
An NP-Based Ethernet Switch for the Open Network Lab Design
Design of a Diversified Router: Lookup Block
Design of a Diversified Router: Lookup Block
Design of a Diversified Router: Line Card
Design of a Diversified Router: Packet Formats
ONL NP Router xScale xScale TCAM SRAM Rx (2 ME) Mux (1 ME) Parse,
SPP Version 1 Router NAT John DeHart.
Design of a Diversified Router: Common Router Framework
Design of a Diversified Router: Project Management
Design of a Diversified Router: Line Card
ONL NP Router Plugins Shakir James, Charlie Wiseman, Ken Wong, John DeHart {scj1, cgw1, kenw,
Design of a Diversified Router: Lookup Block
Design of a Diversified Router: Dedicated CRF for IPv4 Metarouter
Design of a Diversified Router: IPv4 MR (Dedicated NP)
SPP V2 Router Plans and Design
Flow Stats Module James Moscola September 6, 2007.
Design of a Diversified Router: Line Card
An NP-Based Router for the Open Network Lab Overview by JST
ONL Stats Engine David M. Zar Applied Research Laboratory Computer Science and Engineering Department.
Next steps for SPP & ONL 2/6/2007
John DeHart Design of a Diversified Router: Lookup Block with All Associated Data in SRAM John DeHart
QM Performance Analysis
John DeHart and Mike Wilson
SPP V1 Memory Map John DeHart Applied Research Laboratory Computer Science and Engineering Department.
Planet Lab Memory Map David M. Zar Applied Research Laboratory Computer Science and Engineering Department.
John DeHart Design of a Diversified Router: Lookup Block with All Associated Data in SRAM John DeHart
Design of a Diversified Router: Dedicated CRF plus IPv4 Metarouter
Design of a Diversified Router: November 2006 Demonstration Plans
Code Review for IPv4 Metarouter Header Format
Code Review for IPv4 Metarouter Header Format
SPP Version 1 Router Plans and Design
An NP-Based Router for the Open Network Lab Meeting Notes
John DeHart and Mike Wilson
SPP Router Plans and Design
Design of a High Performance PlanetLab Node: Line Card
SPP Version 1 Router QM Design
Design of a Diversified Router: Project Management
Reference Router on NetFPGA 1G
Chapter 4: outline 4.1 Overview of Network layer data plane
Presentation transcript:

David M. Zar Block Design Review: PlanetLab Line Card Header Format

2 - David M. Zar - 3/8/2016 Revision History 10/31/06 (DMZ): »Initial Draft 11/04/06 (DMZ): »Updates for performance issues

3 - David M. Zar - 3/8/2016 Line Card Centric Overview Lookup Phy Int Rx Switch Tx QM/Schd Key Extract Hdr Format Lookup Key Extract Switch Rx Phy Int Tx QM/Schd Hdr Format SWITCHSWITCH Port Splitter Port Splitter (Ingress and Egress): »Accepts packets on a NN ring »Based on the physical destination port number 0-4 go to QM1 on a scratch ring 5-9 go to QM2 on a scratch ring »Measured delay is about 120 cycles, including memory latency

Ingress Header Format

5 - David M. Zar - 3/8/2016 Ingress Header Format Microengine Usage »One microengine »Eight identical threads »NN ring input from Lookup »NN ring output to Port Splitter Main functions: »Using data from Lookup, modify packet header in DRAM for proper routing to PE: Destination MAC address Ø First five bytes are same as source MAC address Source MAC address Ø Address of this LC VLAN tag »Adjust pre-queue stats counters »Format input data for QM QID Port Number Ethernet Frame Length

6 - David M. Zar - 3/8/2016 LC Ingress Functional Blocks Type=802.1Q (2B) PAD (nB) CRC (4B) UDP Payload (MN Packet) Src Addr (4B) Dst Addr (4B) Ver/HLen/Tos/Len (4B) ID/Flags/FragOff (4B) TTL (1B) Protocol = UDP (1B) Hdr Cksum (2B) DstAddr (6B) SrcAddr (6B) IP Options (0-40B) Src Port (2B) Dst Port (2B) UDP length (2B) UDP checksum (2B) VLAN (2B) Type=IP (2B) Ethernet Header IP Header UDP Header Ethernet Trailer Lookup Phy Int Rx Switch Tx QM/Schd Key Extract Hdr Format Buf Handle(32b) IP Pkt Length (16b) QID (20b) VLAN (16b)Stats Index (16b) DAddr (8b) Port (4b) Reserved (8b) Eth Hdr Len (8b) Stats Index (16b) Buffer Handle(32b) Frame Length (16b) QID(20b) Rsv (4b) Port (4b) Rsv (4b) Type=IP (2B) PAD (nB) CRC (4B) UDP Payload (MN Packet) Dst Addr (4B) Src Addr (4B) Ver/HLen/Tos/Len (4B) ID/Flags/FragOff (4B) TTL (1B) Protocol = UDP (1B) Hdr Cksum (2B) DstAddr (6B) SrcAddr (6B) IP Options (0-40B) Src Port (2B) Dst Port (2B) UDP length (2B) UDP checksum (2B) Type=802.1Q (2B) PAD (nB) CRC (4B) UDP Payload (MN Packet) Dst Addr (4B) Src Addr (4B) Ver/HLen/Tos/Len (4B) ID/Flags/FragOff (4B) TTL (1B) Protocol = UDP (1B) Hdr Cksum (2B) DstAddr (6B) SrcAddr (6B) IP Options (0-40B) Src Port (2B) Dst Port (2B) UDP length (2B) UDP checksum (2B) VLAN (2B) Type=IP (2B) Ethernet Header IP Header UDP Header Possible Input Packet Formats Ouput Packet Format

7 - David M. Zar - 3/8/2016 MAC Address and VLAN Tag (Ingress) The source MAC address is fixed and set at boot time ( _WU_get_mac_address) The destination MAC address will only differ in the last byte and this byte is obtained from the Lookup data. The VLAN tag is obtained from the Lookup data.

8 - David M. Zar - 3/8/2016 Stats/Counters (Ingress/Egress) The Stats Index is obtained from the Lookup Data The pre-queue packet and byte counters are updated (_WU_update_counters) »Packet counter is incremented (atomic SRAM) »Byte count is incremented by the number of bytes in the entire Ethernet frame (_WU_get_enet_frame_length). Frame_length = IP_pkt_len + 18 Ø 18 is the VLAN Ethernet header length

9 - David M. Zar - 3/8/2016 QM Data Formatting (Ingress and Egress) QID is extracted from Lookup data Port number is extracted from Lookup data Total Ethernet frame length is passed to QM Stats index is passed on for post-queue counters Stats Index (16b) Buffer Handle(32b) Frame Length (16b) QID(20b) Rsv (4b) Port (4b) Rsv (4b)

10 - David M. Zar - 3/8/2016 Ingress HF Block Diagram _WU_get_enet_frame_length _ WU_write_vlan_header _ WU_update_counters _WU_update_buffer_descriptor Wait for prev ctx Signal next ctx NN Enqueue Wait for prev ctx Signal next ctx NN Dequeue init signal dl_sink() dl_source() DRAM: 4|5 4B writes Cycles: 26 SRAM: 1 read 1 write Cycles: 10 SRAM: 3 writes Cycles: 12 Cycles: 10 Cycles: 5 Cycles: 2 Cycles: 1 Total cycles: 33+66=99 Budget: 1400 MHz/(10Gbs/8*90) = => 100 cycles Measured Latency: 745 Cycles: 17 Cycles: 16

11 - David M. Zar - 3/8/2016 Ingress Validation Send in non-tunneled packets and check output packets to see they are our internal, tunneled, packets. »Worked during development but not tested in integrated system at this point. Send in tunneled packets and check output packets to see they are our internal, tunneled, packets. »Example: a0b0c 81000aaa ff11 3a61c0a8 0001c0a ffbd c ff11 3a7dc0a8 0001c0a e87 [6d7e d5be] CRC that’s stripped by RX -> » a a0b ff11 3a61c0a8 0001c0a ffbd c ff11 3a7dc0a8 0001c0a e87

Egress Header Format

13 - David M. Zar - 3/8/2016 Egress Header Format Microengine Usage »One microengine »Eight identical threads »NN ring input from Lookup »NN ring output to Port Splitter Main functions: »Using data from Lookup, modify packet header in DRAM for proper routing to Switch: Destination MAC address Ø First five bytes are same as source MAC address Ø Destination MAC address is looked up based on IP address from lookup Source MAC address Ø Address of this LC VLAN tag »Adjust pre-queue stats counters »Format input data for QM QID Port Number Ethernet Frame Length

14 - David M. Zar - 3/8/2016 LC Egress Functional Blocks Lookup Key Extract Switch Rx Phy Int Tx QM/Schd Hdr Format SWITCHSWITCH Ethernet Frame Length (16b) Buffer Handle(32b) Stats Index (16b) QID(20b) Rsv (4b) Port (4b) Rsv (4b) Type=802.1Q (2B) PAD (nB) CRC (4B) UDP Payload (MN Packet) Src Addr (4B) Dst Addr (4B) Ver/HLen/Tos/Len (4B) ID/Flags/FragOff (4B) TTL (1B) Protocol = UDP (1B) Hdr Cksum (2B) DstAddr (6B) SrcAddr (6B) IP Options (0-40B) Src Port (2B) Dst Port (2B) UDP length (2B) UDP checksum (2B) VLAN (2B) Type=IP (2B) Ethernet Header IP Header UDP Header Ethernet Trailer Input Packet Format Type=802.1Q (2B) PAD (nB) CRC (4B) UDP Payload (MN Packet) Src Addr (4B) Dst Addr (4B) Ver/HLen/Tos/Len (4B) ID/Flags/FragOff (4B) TTL (1B) Protocol = UDP (1B) Hdr Cksum (2B) DstAddr (6B) SrcAddr (6B) IP Options (0-40B) Src Port (2B) Dst Port (2B) UDP length (2B) UDP checksum (2B) VLAN (2B) Type=IP (2B) Ethernet Header IP Header UDP Header Ethernet Trailer Output Packet Format Buf Handle(32b) IP Pkt Length (16b) Reserved (8b) Eth Hdr Len (8b) VLAN(12b) QID (20b) Rsvd (4b) Port (4b) Rsvd (4b) Stats Index (16b) Rsvd (4b) IP DAddr (32b)

15 - David M. Zar - 3/8/2016 MAC Address and VLAN Tag (Egress) The source MAC address is fixed and set at boot time ( _WU_get_mac_address) The destination MAC address will only differ in the last nibble and this nibble is obtained from the Lookup data. »_WU_ip_lookup will take 32 bits from the destination IP address and use the local CAM to obtain the least significant 4 bits of the MAC address. »The CAM state bits are used for this so that’s why there are only 4 bits of data returned The VLAN tag is obtained from the Lookup data.

16 - David M. Zar - 3/8/2016 Egress HF Block Diagram _WU_get_enet_frame_length _ WU_write_vlan_header _ WU_update_counters _WU_update_buffer_descriptor Wait for prev ctx Signal next ctx NN Enqueue Wait for prev ctx Signal next ctx NN Dequeue init signal dl_sink() dl_source() DRAM: 1 4B read 4 4B writes Cycles: 32 SRAM: 1 add 1 incr Cycles: 6 SRAM: 3 writes Cycles: 10 _WU_ip_lookup Cycles: 10 Cycles: 2 Cycles: 1 Total cycles: 65 Measured Latency * : ~660

17 - David M. Zar - 3/8/2016 Egress Validation Send in our internal, tunneled packets and check output packets to see they are our valid IP, tunneled, packets. »For the PlanetLab demo, there are no non-tunneled output packets Check packet and byte counters for valid updates Check CAM for proper initialization (data watch)

18 - David M. Zar - 3/8/2016 HF Initialization (Ingress/Egress) All memory locations defined in dl_system.h: »Base address for HF LC[I/E]_HF_SRAM_INIT_BASE Ø MAC_ADDR_HI32 Ø MAC_ADDR_LO16 »Pre-Queue Counters LC[I/E]_LU_COUNTERS_SRAM_INIT_BASE Ø LC[I/E]_LU_PRE_Q_PKT_CNT_OFFSET – offset into counters structure for packet counter Ø LC[I/E]_LU_PRE_Q_BYTE_CNT_OFFSET – offset into counters structure for byte counter. Thread 0 waits for signal from rx For Egress, the CAM is filled ( _WU_hfe_initialize_ip_lookup ) with data from LCE_HF_SRAM_INIT_BASE + 8: each entry is 64 bits: cam_entry (32b), RSVD (28b), MAC_DEST (4b)

19 - David M. Zar - 3/8/2016 File Locations (Ingress and Egress) Main code »Applications/LC_Ingress/src/hdr_format/PL/hdr_format.uc »Applications/LC_Egress/src/hdr_format/PL/hdr_format.uc Library »library/DataPlane/hdr_format_util.uc

20 - David M. Zar - 3/8/2016 Required Includes (Ingress and Egress) Files »build/PL/dispatch_loop/dl_system.h memory locations »IXA_SDK_4.0/src/library/microblocks_library/ dl_meta – for metadata macros »IXA_SDK_4.0/src/library/dataplane_library/ dram – for DRAM read/write macros sram – for SRAM read/write/add/incr macros xbuf – for transfer buffer macros

Performance Issues

22 - David M. Zar - 3/8/2016 Ingress Performance Anomalies These stalls are in various SRAM and DRAM accesses – the command FIFO is FULL!

23 - David M. Zar - 3/8/2016 Ingress Anomalies (Explanation)

24 - David M. Zar - 3/8/2016 Ingress Anomalies (Explanation) These bus arbiters are shared across all memory interfaces The SRAM Controllers have a command FIFO

25 - David M. Zar - 3/8/2016 Ingress/Egress SRAM Issues It seems that using atomic ADD/INCR instructions is expensive at the SRAM controller If I remove them and read the SRAM, add myself, write the SRAM, this is quicker and consumes less of the SRM controller time an, thus, the command queue never backs up. The this new design, there are more instructions executed, but there may be a few I could eliminate with some optimizing of code. No stalling in the WU microblocks (well QM does and RX and TX still do but these looks normal).

26 - David M. Zar - 3/8/2016 Ingress/Egress Performance ~99 CPU cycles ~745 cycles latency Expected performance »Should have no trouble going at 10 Gb/s but does… Simulated performance (as of 11/06/2006) »~10 Gb »With all other microengines in place (i.e. real simulation)

Future Work

28 - David M. Zar - 3/8/2016 Determine source of I/O stalls Update Stubs projects for validation of Ingress/Egress blocks (done for Ingress) Extend Both blocks for all possible packet formats »Ingress – inputs »Egress – outputs Possible instruction optimization to give a little headroom (99 cycles out of 100). Currently, design will not work for standard IPv4 packets; PlanetLab VLAN packets are OK. Ingress/Egress Future Work