Download presentation
Presentation is loading. Please wait.
Published byClaire Tunnicliff Modified over 9 years ago
1
Utilizing NIC’s enhancements A look at how driver software needs to change when using newer features of our hardware
2
‘theory’ versus ‘practice’ The engineering designs one encounters in computer hardware components can be observed to undergo an ‘evolution’ during successive iterations, from a scheme that embodies simplicity, purity, and symmetry at the outset, based upon what designers think will be the device’s likely uses, to a conglomeration of disparate ‘add-ons’ as actual practices dictate accommodations
3
‘backward compatibility’ An historically important consideration in the marketing of computer hardware has been the need to maintain past functions in a ‘transparent’ manner – i.e., no change is needed to run older software on newer equipment, while offering enhancements as ‘options’ that can be selectively enabled
4
Example: Intel’s x86 The current generation of Intel CPU’s will still execute all of the software written for PCs a quarter-century ago – based on a small set of 16-bit registers, a restricted set of instructions, and a one-megabyte memory-space – but is able, as an option, to use more and larger registers (64-bits), richer instruction-sets, and more memory
5
Gigabit NICs Intel’s network controller designs exhibit this same kind of ‘evolution’ over time The ‘Legacy’ descriptor-formats are just one example of keeping prior-generation functionality: it’s simple, it’s ‘pure’ (i.e., not tied to any specific network-protocols, but emphasizing ‘mechanism’, not ‘policy’) But now alternatives exist -- as options!
6
‘Legacy’ RX-Descriptors Base-address (64-bits) status Packet- length Packet- checksum VLAN tag errors The device-driver initializes this ‘base-address’ field with the physical address of a packet-buffer… … and network hardware does not ever modify it The network controller later will ‘write-back’ values into all these fields when it has finished transferring a received packet’s data into that packet-buffer
7
RxDesc Status-field PIFIPCSTCPCSVPIXSMEOPDD 7 6 5 4 3 2 1 0 DD = Descriptor Done (1=yes, 0=no) shows if nic is finished with descriptor EOP = End Of Packet (1=yes, 0=no) shows if this packet is logically last IXSM = Ignore Checksum Indications (1=yes, 0=no) VP = VLAN Packet match (1=yes, 0=no) USPCS = UDP Checksum calculated in packet (1=yes, 0=no) TCPCS = TCP Checksum calculated in packet (1=yes, 0=no) IPCS = IPv4 Checksum calculated on packet (1=yes, 0=no) PIF = Passed In-exact Filter (1=yes, 0=no) shows if software must check UDPCS
8
RxDesc Errors-field RXEIPETCPE reserved (=0) SEQSECE 7 6 5 4 3 2 1 0 CE = CRC Error or Alignment Error (check statistics registers to differentiate) TCPE = TCP/UDP Checksum Error IPE = IPv4 Checksum Error These bits are relevant only while NIC is operating in ‘SerDes’ mode: SE = Symbol Error SEQ = Sequence Error RXE = Rx Data Error reserved (=0)
9
‘Extended’ RX-Descriptors Base-address (64-bits) reserved (=0) MRQ (multiple receive queues) Extended status Packet- length Packet- checksum VLAN tag Extended errors IP identification The device-driver initializes the ‘base-address’ field with the physical address of a packet-buffer, and it initializes the ‘reserved’ field with a zero-value… … the network hardware will later modify both fields The network controller will ‘write-back’ the values for these fields when it has transferred a received packet’s data into the packet-buffer CPU writes this, NIC reads it:NIC writes this, CPU reads it:
10
An alternative option Base-address (64-bits) reserved (=0) MRQ (multiple receive queues) Extended status Packet- length RSS Hash (Receive Side Scaling) VLAN tag Extended errors CPU writes this, NIC reads it:NIC writes this, CPU reads it: ‘Receive Side Scaling’ refers to an optional capability in the network controller to assist with routing of network packets to various CPUs within a modern multiprocessor system (See Section 3.2.13 in Intel’s Software Developer’s Manual)
11
Extended Rx-Status (20-bits) 0000 ACKACK 0000 UDPVUDPV IPIVIPIV 0 PIFPIF IPCSIPCS TCPCSTCPCS UDPCSUDPCS VPVP IXSMIXSM EOPEOP DDDD 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 These eight bits have the same meanings as in a ‘Legacy’ Rx-Status byte These ‘extra’ status-bits provide additional hardware support to driver software for processing ethernet packets that conform to standard TCP/IP network protocols (with possibilities for future expansion) DD = Descriptor Done EOP = End Of Packet IXSM = Ignore Checksum Indications VP = VLAN Packet match USPCS = UDP Checksum calculated TCPCS = TCP Checksum calculated IPCS = IPv4 Checksum calculated PIF = Passed In-exact Filter ACK = TCP ACK-Packet identification UDPV = Valid UDP checksum IPIV = Valid IP Identification
12
Extended Rx-Errors (12 bits) RXEIPETCPE00SEQSECE0000 11 10 9 8 7 6 5 4 3 2 1 0 These eight bits have the same meanings, and the occupy the same arrangement, as in the ‘Legacy’ Rx-Errors byte
13
Main device-driver changes If we want to utilize the NIC’s ‘Extended’ Receive Descriptor format, we will need several significant changes in our driver source-code and data-types: Our module’s initialization of ‘base_address’ fields Our new need for programming register RFCTL Our ‘typedef’ for the ‘RX_DESCRIPTOR’ structs Our ‘get_info_rx()’ function for ‘/proc/nicrx’ display Our interrupt-handler’s treatment of ‘rxring’ entries
14
Use of C language ‘union’ Each Receive-Descriptor now has a ‘dual’ identity, as far as the NIC is concerned: –one layout during its ‘fetch’ from memory –another layout during ‘write-back’ to memory The C language provides a special ‘type’ construction for accommodating this kind of programming situation, it’s known as a union and it requires a special syntax
15
‘Bitfields’ in C Some of the fields in the ‘Extended’ RX Descriptor do not align with the CPU’s natural 8-bit,16-bit and 32-bit data-sizes The C language provides ‘bitfields’ for a situation like this (not yet ‘standardized’) Extended errors Extended status 12-bits20-bits
16
Syntax for Rx-Descriptors typedef struct{ unsigned long longbase_address; unsigned long longreserved; } RX_DESC_FETCH; typedef struct{ unsigned intmrq; unsigned shortip_identification; unsigned shortpacket_chksum; unsigned intdesc_status:20; unsigned intdesc_errors:12; unsigned shortpacket_length; unsigned shortvlan_tag; } RX_DESC_STORE; typedef union{ RX_DESC_FETCHrxf; RX_DESC_STORErxs; } RX_DESCRIPTOR;
17
RFCTL (0x5008) EXTENEXTEN IP FRSP _DIS ACKD _DIS ACK DIS IPv6 XSUM _DIS IPv6 _DIS NFS_VER NFSR _DIS NFSW _DIS iSCSI_DWC iSCSI _DIS reserved (=0) The Receive Filter Control register 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 31 16 EXTEN (bit 15) = Extended Status Enable (1=yes, 0=no) This enables the NIC to write-back the ‘Extended Status’
18
Modifying ‘my_read()’ To implement use of ‘Extended’ Receive Descriptors in our most recent character- mode device-driver (i.e., ‘zerocopy.c’), we need some changes in the ‘read()’ method Most obvious example: a packet-buffer’s memory address can no longer be gotten from an Rx-Descriptor’s ‘base_address’ (which now gets ‘overwritten’ by the NIC)
19
For our pseudo-file’s sake… Also our driver’s ‘read()’ function shouldn’t prepare a current rx-descriptor for reuse, as it did in earlier drivers, since that would destroy all of the useful information which the NIC has just written into that descriptor Instead, the preparation of a descriptor for reuse in a future packet-receive operation should be deferred, at least temporarily
20
OK, but then when? We can reassign the duty to ‘refresh’ some Rx-Descriptors for reuse to our driver’s Interrupt Service Routine; specifically, at the point in time when an ‘RXDMT0’ event is signaled (Rx-Descriptor Min-Threshold) It might be best to create a ‘bottom half’ to take care of those re-initializations, but we haven’t yet done that in our new prototype
21
Handling ‘RXDMT0’ interrupts irqreturn_t my_isr( int irq, void *dev_id ) { intintr_cause = ioread32( io + E1000_ICR ); if ( intr_cause & (1<<4) )// Rx-Descriptors Low { unsigned intrx_buf = virt_to_phys( rxring ) + 16 * N_RX_DESC; unsigned intrxtail = ioread32( io + E1000_RDT ), i, ba; // prepare the next eight Rx-Descriptors for ‘reuse’ by the NIC for (i = 0; i < 8; i++) { ba = rx_buf + rxtail * RX_BUFSIZ; rxring[ rxtail ].base_address = ba; rxring[ rxtail ].reserved = 0LL; rxtail = (1 + rxtail) % N_RX_DESC; } // now give the NIC ‘ownership’ of these reinitialized descriptors iowrite32( rxtail, io + E1000_RDT ); }
22
‘extended.c’ Here’s our revision of ‘zerocopy.c’, aimed at showing how we can incorporate use of the NIC’s ‘Extended’ Receive Descriptors It appears to function exactly as before, until a user attempts to view the driver’s Receive-Descriptor queue: $ cat /proc/nicrx Then we are shown descriptors having two distinct formats (i.e., FETCH and STORE)
23
Demo: ‘bitfield.c’ Because the manner in which ‘bitfields’ are handled in the C language varies with the particular C-compiler being used, we have created a short demo-program that shows us how our GNU C-compiler ‘gcc’ handles the layout of bitfields within a C data-item typedef struct{ unsigned intdesc_status:20;// bits 0..19 unsigned intdesc_errors:12;// bits 20..31 } RXD_ELT;
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.