Receiver ‘packet-splitting’ A look at how a driver can cause the 82573L NIC to separate a packet’s headers from its data
NIC can do packet-parsing Intel’s newest gigabit Ethernet controllers offer an enhancement to the ‘extended’ Receive Descriptor, called ‘packet-split’ format, which empowers the hardware to recognize the packet ‘headers’ used with the most common network protocols and to automatically separate those headers from their accompanying packet ‘data’
‘Extended’ RX-Descriptors CPU writes this, NIC reads it: NIC writes this, CPU reads it: Base-address (64-bits) Packet- checksum IP identification MRQ (multiple receive queues) reserved (=0) VLAN tag Packet- length Extended errors Extended status The device-driver initializes the ‘base-address’ field with the physical address of a packet-buffer, and it initializes the ‘reserved’ field with a zero-value… … the network hardware will later modify both fields The network controller will ‘write-back’ the values for these fields when it has transferred a received packet’s data into the packet-buffer
‘Packet-Split’ RX-Descriptors CPU writes this, NIC reads it: NIC writes this, CPU reads it: Base-address 0 (64-bits) Packet- checksum IP identification MRQ (multiple receive queues) Base-address 1 (64-bits) VLAN tag Packet- Length 0 Extended errors Extended status Base-address 2 (64-bits) Packet Length 3 Packet Length 2 Packet Length 1 S P Header Length Base-address 3 (64-bits) reserved 1 The network controller will ‘write-back’ values to these fields when it has transferred a received packet’s data into those packet-buffers The device-driver initializes four ‘base-address’ fields (‘even-numbered’ addresses)
Same ‘Extended’ Status/Errors 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 A C K U D P V I P V P I F I P C S T C P S U D P C S V P I X S M E O P D 11 10 9 8 7 6 5 4 3 2 1 0 RXE IPE TCPE SEQ SE CE
Syntax modifications for ‘fetch’ typedef struct { unsigned long long base_addr0; unsigned long long base_addr1; unsigned long long base_addr2; unsigned long long base_addr3; } RX_DESC_FETCH;
Syntax modifications for ‘store’ typedef struct { unsigned int mrq; unsigned short ip_identification; unsigned short packet_chksum; unsigned int desc_status:20; unsigned int desc_errors:12; unsigned short packet_length0; unsigned short vlan_tag; unsigned short header_length; unsigned short packet_length1; unsigned short packet_length2; unsigned short packet_length3 unsigned long long reserved; } RX_DESC_STORE;
Same syntax for the ‘union’ typedef union { RX_DESC_FETCH rxf; RX_DESC_STORE rxs; } RX_DESCRIPTOR;
NIC Registers involved 31 10 0 D T Y P RCTL Device Control register 31 15 0 Reserved (=0) E X T N RFCTL Receive Filter Control register 31 24 23 16 15 8 7 0 LEN3 (1KB) LEN2 (1KB) LEN1 (1KB) LEN0 (128B) PSRCTL Packet-Split Receive Control register
Each descriptor has four buffers Packet-Split Rx-descriptor base_addr0 base_addr1 base_addr2 base_addr3 buffer3 buffer2 buffer1 buffer0 Four buffers are allocated for receiving one packet
Refresh for ‘reuse’ As with the ‘extended’ receive-descriptors, it is necessary for a device-driver to setup each ‘packet-split’ receive-descriptor any time it is going to be ‘reused’, since prior buffer-addresses get overwritten during a packet-reception by the network controller So driver needs a formula for recalculating buffer-addresses, or use a ‘shadow’ array
Kernel-memory layout Sixteen Rx-descriptors (32-bytes each) Sixty-four receive-buffers (1024-bytes each) 512 bytes 65536 bytes KMEM_SIZE (= 66048 bytes) kmem
Caveats Short packets are not always ‘split’ Unrecognized packet-headers may not be separated from accompanying packet-data Demonstrating the packet-split capability will require us to devise a way to transmit packets which have the TCP/UDP and IP packet-headers that the NIC recognizes
TIMEOUT for an in-class demonstration Our ‘pktsplit.c’ demo We created a ‘minimal’ kernel-module for demonstrating the NIC’s ‘packet-splitting’ capabilities TIMEOUT for an in-class demonstration
In-class exercise Can you enhance our ‘pktsplit.c’ module so that its Receive-Descriptor Queue will function automatically as a ring-buffer (as happens in our ‘extended.c’ example)? Your best option for this is to install an ISR which will reinitialize some Rx-Descriptors (and advance the RDT index) each time an RXDMT0 interrupt gets triggered