Download presentation
Presentation is loading. Please wait.
Published byΑφροδίσιος Βιτάλη Modified over 6 years ago
1
Code Review for IPv4 Metarouter Header Format
Jing Lu
2
Header Format Rx Substr Decap Parse Lookup Header Format QM Tx
Main functions: Put on MN Internal header (slow path), tunnel frame header (IP/UDP header) and Ethernet VLAN header based on: Exception flags raised by Parse block TTL expired: bit 0 of exception flags IP option: bit 1 of exception flags Lookup result Hit, Drop, Local delivery bits If Rx UDP DPort = Tx UDP SPort, packet should be redirected Increment pre-queue packet counter and byte counter for each incoming packet based on counter index Update buffer descriptor with new buffer/packet size, buffer offset and counter index pass relevant fields to QM NN communication Single thread -handle the substrate header portion of a packet
3
Where is the code Dispatch loop: Header format: Ipv4 header format:
IPv4_MR\src\dispatch_loop\PL\hdr_format_dl.[c,h] IPv4_MR\src\dispatch_loop\PL\dl_source.[c,h] IPv4_MR\src\dispatch_loop\PL\nn_rings.[c,h] Header format: IPv4_MR\src\hdr_format\PL\hdr_format.[c,h] Ipv4 header format: IPv4_MR\src\ipv4\PL\ipv4_hdr_format.[c,h] External Dependencies: Ring Data format: IPv4_MR/src/dispatch_loop/PL/ring_formats.h System definitions and memory locations: IPv4_MR/build/PL/dispatch_loop/dl_system.h
4
Required Includes Files Directories
IXA_SDK_4.0\microengineC\src\intrinsic.c IXA_SDK_4.0\microengineC\src\rtl.c Directories IXA_SDK_4.0\src\library\microblocks_library\microc\ IXA_SDK_4.0\MicroengineC\include\..\..\..\..\ IXA_SDK_4.0\src\library\dataplane_library\microc\ These are required to gain access to the buffer libraries and intrinsic functions!
5
Input and Output Lookup Hdr Format QID(20b) QM H: Hit D: Drop
Buf Handle(32b) Rsv_1 (4b) Port (4b) Rsv_2 (4b) QID(20b) MN Fram Length (16b) Cntr Index (16b) Lookup Hdr Format QM Buf Handle(32b) IP Pkt Length (16b) IP Pkt Offset (16b) Slice ID (VLAN) (16b) Rx UDP DPort(16b) R S d V (1b) H (1b) L D (1b) D (1b) Exception Bits (12b) Cntr Index (16b) H: Hit D: Drop LD: Local Delivery Exception[0]: TTL Exception[1]: IP Option Tx IP DAddr (32b) Tx UDP DPort (16b) Tx UDP SPort(16b) DA(8b) Port (4b) QID(20b) Slice data pointer (32b) Rx UDP SPort (16b) Rsv2(12b) Code opt (4b) Rx IP SAddr (32b)
6
Initialization Static configuration by XScale Control block (12B)
Ethernet address IP address (global IP) Slice info table per slice (36B) GPE IP address (local IP) NPE IP address (local IP) GPE Ethernet address UDP SRC port UDP DST port Port QID for local delivery QID for exception packets typedef struct _hdr_format_control_block { unsigned int eth_addr_hi32; unsigned int eth_addr_lo16; unsigned int this_ip_addr; } hdr_format_control_block; typedef struct _hdr_format_slice_info_table { unsigned int gpe_ip_addr; unsigned int npe_ip_addr; unsigned int gpe_eth_addr_hi32; unsigned int gpe_eth_addr_lo16; unsigned int udp_src_port; unsigned int udp_dst_port; unsigned int port; unsigned int ld_qid; unsigned int excpt_qid; } hdr_format_slice_info_table;
7
Global Variables Externally defined global variables:
In hdr_format_dl.c ring_in ring_out dlNextBlock Initialization variables shared by all threads: In hdr_format.c this_ip_addr eth_addr_hi32 eth_addr_lo16 partial_ip_cksum (computed on known IP header fields) header_format_init() will read the control block in SRAM and initialize these variables
8
Header Data Structure Header Same for all pkts Vary per pkt
DstAddr (6B) Ethernet VLAN Header (18B) SrcAddr (6B) Type=802.1Q (2B) VLAN (2B) Type=IP (2B) Ver/HLen/Tos (2B) Len (2B) ID/Flags/FragOff set(4B) TTL (1B) IP Header (20B) Header Protocol = UDP (1B) Hdr Cksum (2B) Dst Addr (4B) Src Addr (4B) Src Port (2B) UDP Header (8B) Dst Port (2B) UDP length (2B) UDP checksum (2B) Same for all pkts Rsvd, Type, (4B) MN Internal Header (8,16B) hdr_length (2B) Vary per pkt Rx UDP DPort (2B) Rx IP SAddr (4B) Rx UDP SPort (2B) Type dependent data (8B)
9
Function and Performance
Functions: Memory access: Processing cycles: Common case/worst case Dequeue ring_in data NN: 9W reads 42/42 Construct MN int hdr 44/86 Construct IP, UDP, Ethernet, VLAN hdr 64/73 12/12 Set IP checksum 11/11 Set UDP checksum Write hdr to DRAM DRAM: 46-58B writes 37/40 Inc Pre_queue Cnt SRAM: 8B writes 15/15 Update buffer descriptor SRAM: 10B writes 66/66 Enqueue ring_out data NN: 3W writes 27/27 318/372
10
Performance 372 cycles for CPU processing ~1300 cycles latency
Expected performance (90B min IPv4 packet (78 min IPv4MN + 12B IFS)) (201/372)*5Gbps = 2.7Gbps To achieve 5Gbps, need two MEs running in parallel
11
IPv4 Internal Header Format
Type (28b) 0000 Length (2B) Rx UDP DPort (2B) Tx UDP DPort (2B) Rx IP Saddr (4B) Tx IP DAddr (4B) Rx UDP SPort (2B) Type Dependent Data (8B) Tx UDP SPort (2B) Path Category Type field Reason Outgoing MN Internal Hdr GPE->NPE [0] Reclassify Rx UDP DPort if set, otherwise Rx UDP Dport + FwdKey NPE-> Egress LC Fast path No MN Int Hdr NPE->GPE Exception [2] No route Rx UDP DPort [3] Expired TTL [4] IP w/ options Rx UDP DPort + FwdKey [5] Redirect due to Rx UDP DPort =Tx UDP SPort Control [6] Local delivery [7] Inspect Debug [8] Monitor [9] Log due to error in pkts FwdKey = [Tx UDP DPort + Tx UDP Sport + Tx IP DAddr]
12
Construct ipv4 MN Internal header
Yes Drop bit set? No Yes Hit bit set? No No No No No Set NR bit in type TTL expired? Local DL? IP option? Redirect? Yes Yes Yes Yes No Set TTL bit in type; Set Rx UDP DPort; Length = 4 Set LD bit in type; Set Rx UDP DPort; Length = 4 Set OPT bit in type; Set Rx UDP DPort; Set TypeDependData; Length = 12 Set RD bit in type; Set Rx UDP DPort; Set TypeDependData; Length = 12 TTL expired? Yes Set TTL bit in type; Set Rx UDP DPort Length = 4 return 86 cycles for the worst case 44 cycles for the common case
13
Testing MR Header Format
Stub Parse Dummy Lookup Hdr Format Buf Handle(32b) Buf Handle(32b) H: Hit D: Drop LD: Local Delivery Exception[0]: TTL Exception[1]: IP Option IP Pkt Length (16b) IP Pkt Offset (16b) IP Pkt Length (16b) IP Pkt Offset (16b) Lookup Key[ ] Slice ID/Rx UDP DPort (32b) Slice ID (VLAN) (16b) Rx UDP DPort(16b) Lookup Key[111-80] DA (32b) R S d V (1b) H (1b) L D (1b) D (1b) Exception Bits (12b) Cntr Index (16b) Lookup Key[ 79-48] SA (32b) Tx IP DAddr (32b) Lookup Key[ 47-16] Ports (32b) Tx UDP DPort (16b) Tx UDP SPort(16b) Lookup Key Proto/TCP_Flags [15- 0] (16b) L Flags (4b) Exception Bits (12b) DA(8b) Port (4b) QID(20b) Slice Data Ptr (32b) Slice data pointer (32b) Rx UDP SPort (16b) Rsv2(12b) Code opt (4b) Rx UDP SPort (16b) Rsv2(12b) Code opt (4b) Rx IP SAddr (32b) Rx IP SAddr (32b) Dummy Lookup block enumerates all combinations of the five bits and generates corresponding NN ring data to Hdr Format.
14
Possible Optimizations
Functions: Memory access: Processing cycles: Common case/worst case Optimizations: Dequeue ring_in data NN: 9W reads 42/ More efficient Dequeue Reduce redundant assignments for worst case Construct MN int hdr 44/ Static fields only initialized by the first packet in each thread Construct IP, UDP, Ethernet, VLAN hdr 64/ Set IP checksum 12/12 11/11 Set UDP checksum DRAM: 46-58B writes 37/40 Write hdr to DRAM Aligned sram writes, use assembler Inc Pre_queue Cnt SRAM: 8B writes 15/ Similar to DRAM writes Update buffer descriptor SRAM: 10B writes 66/ Enqueue ring_out data NN: 3W writes 27/27 318/
15
Implementation Status
Add dynamic statistics Packet counter for fast path packets Packet counter for exception path packets Packet counter per exception case Decide which field in buffer descriptor to store counter index Run 8-thread simulation
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.