Our ‘recv1000.c’ driver Implementing a ‘packet-receive’ capability with the Intel 82573L network interface controller.

Slides:



Advertisements
Similar presentations
Device Drivers. Linux Device Drivers Linux supports three types of hardware device: character, block and network –character devices: R/W without buffering.
Advertisements

What is a packet checksum? Here we investigate the NIC’s capabilities for computing and detecting errors using checksums.
The Linux Kernel: Memory Management
Hardware ‘flow control’ How we can activate our NIC’s ability to avoid overwhelming the capacities of its ‘link partner’
Utilizing NIC’s enhancements A look at how driver software needs to change when using newer features of our hardware.
Bridging. Bridge Functions To extend size of LANs either geographically or in terms number of users. − Protocols that include collisions can be performed.
More 82573L details Getting ready to write and test a character-mode device-driver for our anchor-LAN’s ethernet controllers.
Fixing some driver problems Most software is discovered to have some ‘design-flaws’ after it has been put into use for awhile.
Receiver ‘packet-splitting’
I/o multiplexing On adding a ‘poll()’ method to our character-mode device-driver for an 82573L network controller.
Virtual Local Area Networks A look at how the Intel 82573L nic supports IEEE standard 802.1q for ethernet VLANs.
What’s needed to receive? A look at the minimum steps required for programming our 82573L nic to receive packets.
1 Fall 2005 Hardware Addressing and Frame Identification Qutaibah Malluhi CSE Department Qatar University.
Hardware-address filtering How can we send packets to just one node on our ‘anchor’ cluster?
The RealTek interface Introduction to the RTL-8139 network controller registers.
Exploring a modern NIC An introduction to programming the Intel 82573L gigabit ethernet network interface controller.
RTL-8139 experimentation Setting up an environment for studying the Network Controller.
Examining network packets Information about the RTL8139 needed for understanding our ‘watch235.c’ pseudo driver.
Oct 21, 2004CS573: Network Protocols and Standards1 IP: Addressing, ARP, Routing Network Protocols and Standards Autumn
The hardware ringbuffer Understanding the RTL-8139 mechanism for packet reception.
Handling a UART interrupt A look at some recent changes in the Linux kernel’s programming interface for device-interrupts.
Our ‘xmit1000.c’ driver Implementing a ‘packet-transmit’ capability with the Intel 82573L network interface controller.
Home: Phones OFF Please Unix Kernel Parminder Singh Kang Home:
Our ‘nic.c’ module We create a ‘character-mode’ device-driver for the 82573L NIC to use in futrure experiments.
Our ‘nic.c’ module We create a ‘character-mode’ device-driver for the 82573L NIC to use in future experiments.
What’s needed to transmit? A look at the minimum steps required for programming our 82573L nic to send packets.
Adjusting out device-driver Here we complete the job of modifying our ‘nicf.c’ Linux driver to support ‘raw’ packet-transfers.
Looking at kernel objects How a character-mode Linux device driver can be useful in viewing a ‘net_device’ structure.
What’s needed to transmit? A look at the minimum steps required for programming our anchor nic’s to send packets.
Hardware-address filtering How can we send packets to just one node on our ‘anchor’ cluster?
What’s needed to receive? A look at the minimum steps required for programming our anchor nic’s to receive packets.
Building TCP/IP packets A look at the computation-steps which need to be performed for utilizing the TCP/IP protocol.
1 K. Salah Module 4.3: Repeaters, Bridges, & Switches Repeater Hub NIC Bridges Switches VLANs GbE.
CMPT 471 Networking II Address Resolution IPv6 Neighbor Discovery 1© Janice Regan, 2012.
COMP201 Computer Systems Exceptions and Interrupts.
Brierley 1 Module 4 Module 4 Introduction to LAN Switching.
Hardware Definitions –Port: Point of connection –Bus: Interface Daisy Chain (A=>B=>…=>X) Shared Direct Device Access –Controller: Device Electronics –Registers:
LWIP TCP/IP Stack 김백규.
1-1 Embedded Network Interface (ENI) API Concepts Shared RAM vs. FIFO modes ENI API’s.
CMPT 471 Networking II Address Resolution IPv4 ARP RARP 1© Janice Regan, 2012.
Hyung-Min Lee ©Networking Lab., 2001 Chapter 8 ARP and RARP.
Ethernet Driver Changes for NET+OS V5.1. Design Changes Resides in bsp\devices\ethernet directory. Source code broken into more C files. Native driver.
KeyStone Training Multicore Navigator: Packet DMA (PKTDMA)
Chapter 7 ARP and RARP.
ECE 526 – Network Processing Systems Design Computer Architecture: traditional network processing systems implementation Chapter 4: D. E. Comer.
1 Kyung Hee University Chapter 8 ARP(Address Resolution Protocol)
Silberschatz, Galvin and Gagne  Applied Operating System Concepts Chapter 2: Computer-System Structures Computer System Architecture and Operation.
NS Training Hardware Traffic Flow Note: Traffic direction in the 1284 is classified as either forward or reverse. The forward direction is.
Chapter 7 OSI Data Link Layer.
RIP Routing Protocol. 2 Routing Recall: There are two parts to routing IP packets: 1. How to pass a packet from an input interface to the output interface.
Silberschatz, Galvin and Gagne ©2011 Operating System Concepts Essentials – 8 th Edition Chapter 2: The Linux System Part 5.
+ Lecture#2: Ethernet Asma ALOsaimi. + Objectives In this chapter, you will learn to: Describe the operation of the Ethernet sublayers. Identify the major.
Address Resolution Protocol Yasir Jan 20 th March 2008 Future Internet.
Embedded Real-Time Systems Processing interrupts Lecturer Department University.
Introduction to Networks v6.0
IP: Addressing, ARP, Routing
Linux Kernel Development - Robert Love
Zero-copy Receive Path in Virtio
MICROPROCESSOR BASED SYSTEM DESIGN
Instructor Materials Chapter 5: Ethernet
Chapter 8 ARP(Address Resolution Protocol)
ARP and RARP Objectives Chapter 7 Upon completion you will be able to:
Ct1403 Lecture#2: DATA LINK LAYER
Net 323: NETWORK Protocols
Data Link Issues Relates to Lab 2.
Chapter 7 ARP and RARP Prof. Choong Seon HONG.
Implementing an OpenFlow Switch on the NetFPGA platform
Interrupt handling Explain how interrupts are used to obtain processor time and how processing of interrupted jobs may later be resumed, (typical.
Who’s listening? Some experiments with an ‘echo’ service on our anchor-cluster’s local network of 82573L nic’s.
Lecture 12 Input/Output (programmer view)
Presentation transcript:

Our ‘recv1000.c’ driver Implementing a ‘packet-receive’ capability with the Intel 82573L network interface controller

Similarities There exist quite a few similarities between implementing the ‘transmit-capability’ and the ‘receive-capability’ in a device-driver for Intel’s 82573L ethernet controller: –Identical device-discovery and ioremap steps –Same steps for ‘global reset’ of the hardware –Comparable data-structure initializations –Parallel setups for the TX and RX registers But there also are a few fundamental differences (such as ‘active’ versus ‘passive’ roles for driver)

‘push’ versus ‘pull’ Host memory transmit packet buffer transmit-FIFO push Ethernet controller receive-FIFO receive packet buffer pull to/from LAN The ‘write()’ routine in our ‘xmit1000.c’ driver could transfer data at any time, but the ‘read()’ routine in our ‘recv1000.c’ driver has to wait for data to arrive. So to avoid doing any wasteful busy-waiting, our ‘recv1000.c’ driver can use the Linux kernel’s sleep/wakeup mechanism – if it enables NIC’s interrupts!

Sleep/wakeup We will need to employ a wait-queue, we will need to enable device-interrupts, and we will need to write and install the code for an interrupt service routine (ISR) So our ‘recv1000.c’ driver will have a few additional code and data components that were absent in our ‘xmit1000.c’ driver

This function will program the actual data-transfer Driver’s components read my_fops my_read() module_init() module_exit() This function will allow us to inspect the receive-descriptors This function will detect and configure the hardware, define page-mappings, allocate and initialize the descriptors, install our ISR and enable interrupts, start the ‘receive’ engine, create the pseudo-file and register ‘my_fops’ This function will do needed ‘cleanup’ when it’s time to unload our driver – turn off the ‘receive’ engine, disable interrupts and remove our ISR, free memory, delete page-table entries, the pseudo-file, and the ‘my_fops’ ‘struct’ holds one function-pointer my_get_info() This function will awaken any sleeping reader-task my_isr() wait_queue_head

How NIC’s interrupts work There are four interrupt-related registers which are essential for us to understand ICR 0x00C0 0x00C8 0x00D0 0x00D8 ICS IMS IMC Interrupt Cause Read Interrupt Cause Set Interrupt Mask Set/Read Interrupt Mask Clear

Interrupt event-types reserved 82573L 31: INT_ASSERTED (1=yes,0=no) : ACK (Rx-ACK Frame detected) 16: SRPD (Small Rx-Packet detected) 15: TXD_LOW (Tx-Descr Low Thresh hit) 9: MDAC (MDI/O Access Completed) 7: RXT0 ( Receiver Timer expired) 6: RXO (Receiver Overrun) 4: RXDMT0 (Rx-Desc Min Thresh hit) 2: LSC (Link Status Change) 1: TXQE( Transmit Queue Empty) 0: TXDW (Transmit Descriptor Written Back)

Interrupt Mask Set/Read This register is used to enable a selection of the device’s interrupts which the driver will be prepared to recognize and handle A particular interrupt becomes ‘enabled’ if software writes a ‘1’ to the corresponding bit of this Interrupt Mask Set register Writing ‘0’ to any register-bit has no effect, so interrupts can be enabled one-at-a-time

Interrupt Mask Clear Your driver can discover which interrupts have been enabled by reading IMS – but your driver cannot ‘disable’ any interrupts by writing to that register Instead a specific interrupt can be disabled by writing a ‘1’ to the corresponding bit in the Interrupt Mask Clear register Writing ‘0’ to a register-bit has no effect on the interrupt controller’s Interrupt Mask

Interrupt Cause Read Whenever interrupts occur, your driver’s interrupt service routine can discover the specific conditions that triggered them if it reads the Interrupt Cause Read register In this case your driver can clear any selection of these bits (except bit #31) by writing ‘1’s to them (writing ‘0’s to this register will have no effect) If case no interrupt has occurred, reading this register may have the side-effect of clearing it

Interrupt Cause Set For testing your driver’s interrupt-handler, you can artificially trigger any particular combination of interrupts by writing ‘1’s into the corresponding register-bits of this Interrupt Cause Set register (assuming your combination of bits corresponds to interrupts that are ‘enabled’ by ‘1’s being present for them in the Interrupt Mask)

Our interrupt-handler We decided to enable all possible causes (and we ‘log’ them via ‘printk()’ messages we’ve omitted in the code-fragment here): irqreturn_t my_isr( int irq, void *dev_id ) { intintr_cause = ioread32( io + E1000_ICR ); if ( intr_cause == 0 ) return IRQ_NONE; wake_up_interruptible( &wq_rd ); iowrite32( intr_cause, io + E1000_ICR ); returnIRQ_HANDLED; }

We ‘tweak’ our packet-format Our ‘xmit1000.c’ driver elected to have the NIC append ‘padding’ to any short packets But this prevents a receiver from knowing how many bytes represent actual data To solve this problem, we added our own ‘count’ field to each packet’s payload actual bytes of user-data destination MAC-addresssource MAC-address Type/Len count

Our ‘read()’ method ssize_t my_read( struct file *file, char *buf, size_t len, loff_t *pos ) { static intrxhead = 0;// to remember where we left off unsigned char*from = phys_to_virt( rxdesc[ rxhead ].base_addr ); unsigned intcount; // go to sleep if no new data-packets have been received yet if ( ioread32( io + E1000_RDH ) == rxhead ) if ( wait_event_interruptible( wq_rd, ioread32( io + E1000_RDH ) != rxhead ) ) return –EINTR; // get the number of actual data-bytes in the new (possibly padded) data-packet count = *(unsigned short*)(from + 14);// data-count as stored by ‘xmit1000.c’ if ( count > len ) count = len;// can’t transfer more bytes than buffer can hold if ( copy_to_user( buf, from+16, count ) ) return –EFAULT; // advance our static array-index variable to the next receive-descriptor rxhead = (1 + rxhead) % 8;// this index wraps-around after 8 descriptors returncount;// tell kernel how many bytes were transferred }

Hardware’s initialization We allocate and initialize a minimum-size Receive Descriptor Queue (8 descriptors) We perform a ‘global reset’ via the RST-bit in the NIC’s Device Control register (with a side-effect of zeroing both RDH and RDT) We configure the ‘receive’ engine (RCTL) plus a few additional registers that affect the network-controller’s reception-options (namely: RXCSUM, RFCTL, PSRCTL)

Receive Control (0x0100) R =0 00 FLXBUF SE CRC BSEX R =0 PMCF DPF R =0 CFI EN VFE BSIZE BAMBAM R =0 MODTYPRDMTS ILOSILOS SLUSLU LPEUPE 0 R = SBP ENEN LBMMPE EN = Receive Enable DTYP = Descriptor TypeDPF = Discard Pause Frames SBP = Store Bad Packets MO = Multicast OffsetPMCF = Pass MAC Control Frames UPE = Unicast Promiscuous Enable BAM = Broadcast Accept ModeBSEX = Buffer Size Extension MPE = Multicast Promiscuous Enable BSIZE = Receive Buffer SizeSECRC = Strip Ethernet CRC LPE = Long Packet reception Enable VFE = VLAN Filter EnableFLXBUF = Flexible Buffer size LBM = Loopback ModeCFIEN = Canonical Form Indicator Enable RDMTS = Rx-Descriptor Minimum Threshold SizeCFI = Cannonical Form Indicator bit-value 82573L Our driver initially will program this register with the value 0x C. Then later, when everything is ready, it will turn on bit #1 to ‘start the receive engine’

Packet-Split Rx Control (0x2170) BSIZE3 (in KB) BSIZE2 (in KB) BSIZE1 (in KB) BSIZE0 (in 1/8 KB) If the controller is configured to use the packet-split feature (RCTL.DTYP=1), then this register controls the sizes of the four receive-buffers, so there are certain requirements that nonzero values appear in several of these fields. But our ‘recv1000.c’ driver will use the ‘legacy’ receive-descriptor format (i.e., RCRL.DTYP=0) and so this register will be disregarded by the NIC and therefore we are allowed to program it with the value 0x

Receive Filter Control (0x5008) PHY RST VME R =0 TFCERFCE RST R =0 R =0 R =0 R =0 R =0 ADV D3 WUC R =0 D/UD status R =0 reserved EXSTEN IPFRSP _DIS ACKD _DIS ACK DIS IPv6 XSUM _DIS IPv6 _DIS NFS_VER NSFR _DIS NSFW _DIS R =0 R =0 R =1 0 iSCSI _DIS GIO M D iSCSI_DWC Our driver writes 0x to this register, which among other effects will cause the ethernet controller NOT to write Extended Status information into our device-driver’s legacy-format Receive Descriptors (bit 15: EXTEN=0)

RX Checksum Control (0x5000) reserved packet checksum start TCP/UDP Checksum Off-load enabled (1=yes, 0=no) IP Checksum Off-load enabled (1=yes, 0=no) This field controls the starting byte for the Packet Checksum calculation Our driver programs this register with the value 0x (which disables Checksum Off-loading for TCP/UDP packets (which we won’t be receiving) and for IP packets (which likewise won’t be sent by our ‘xmit1000.c’ driver), and all Packet-Checksums will be calculated starting from the very first byte

Rx-Descriptor Control (0x2828) GRANGRAN 00 WTHRESH (Writeback Threshold) 000 FRC DPLX FRC SPD 0 HTHRESH (Host Threshold) ILOSILOS 0 ASDEASDE 0 LRSTLRST PTHRESH (Prefetch Threshold) 00 Recommended for 82573: 0x (GRAN=1, WTHRESH=1) “This register controls the fetching and write back of receive descriptors. The three threshhold values are used to determine when descriptors are read from, and written to, host memory. Their values can be in units of cache lines or of descriptors (each descriptor is 16 bytes), based on the value of the GRAN bit (0=cache lines, 1=descriptors). When GRAN = 1, all descriptors are written back (even if not requested).” --Intel manual

Maximum-size buffers We use a minimal number of maximum- size receive-buffers (eight of 1536-bytes) buffer 7 buffer 6 buffer 5 buffer 4 buffer 3 buffer 2 buffer 1 buffer 0 ring of eight rx-descriptors kernel memory

NIC “owns” our rx-descriptors descriptor descriptor 1 descriptor 2 descriptor 3 descriptor 4 descriptor 5 descriptor 6 descriptor 7 RDT RDH descriptor 8 RDLEN =0x80 RDBAH/RDBAL This register gets initialized to 8, then never gets changed This register gets initialized to 0, then gets changed by the controller as new packets are received rxhead Our ‘static’ variable

Driver ‘defects’ If an application tries to ‘read’ from our device-file ‘/dev/nic’, but the controller received a packet that contains more bytes of data than the user requested, excess bytes get “lost’ (i.e., discarded) If an application delays reading packets while the controller continues receiving, then an earlier packet gets “overwritten”

In-class exercise #1 Discuss with your nearest class-member your ideas for how these driver ‘defects’ might be overcome, so that packet-data being received will be protected against getting “lost” and/or being “overwritten”

In-class exercise #2 Login to a pair of machines on the ‘anchor’ cluster and install our ‘xmit1000.ko’ and our ‘recv1000.ko’ modules (one on each) Try transferring a textfile from one of the machines to the other, by using ‘cat’: anchor01$ cat textfile > /dev/nic anchor02$ cat /dev/nic > recv1000.out How large a textfile can you successfully transfer using our simple driver-modules?