Presentation is loading. Please wait.

Presentation is loading. Please wait.

Architecture and Hardware

Similar presentations


Presentation on theme: "Architecture and Hardware"— Presentation transcript:

1 Architecture and Hardware
APIC Tutorial --- Architecture and Hardware John DeHart Washington University

2 Coverage APIC is a complicated device
No way we can cover everything today. in the original workshop we spent one whole day on the APIC architecture and hardware and a second day on the software Lots more details in Zubin’s slides from the original workshop: go to “Course Slides & Papers” in left margin Also, papers and documentation from web site.

3 Our Original Goals for the APIC
Build a high speed ATM host interface Single Chip Low cost High Bandwidth Gigabit all the way to the application Low Latency Zero copy Support for Quality of Service

4 APIC Features Overview
32 bit and 64 bit PCI at 33MHz All of our cards are 32 bit. Point-to-Point, Multipoint and Loopback VCs AAL5 Segmentation and Reassembly AAL0: Raw ATM (RATM) Support for multiple traffic types Batching of cells in PCI Transaction Control via PCI bus and remotely via control cells Multiple DMA modes Interrupts and Notification List for efficient interrupt handling Flow Control: UTOPIA and ATM GFC field

5 APIC Internal Design Port 0 Port 0 Port 1 Port 1 Port 2 Port 2 Data
Input Port Input Sync Port 0 VC Trans- lation Table (VCXT) Port 0 Output Sync Output Port . Utopia Ports Utopia Ports Cell Store Input Port Input Sync Port 1 Output Sync Output Port Port 1 Port 2 Port 2 Tx Sync Rx Sync Data Paths Control Requestor Register Manager Pacer DataPath BusInterface Interrupt/ Notification Manager PCI-32/64 Bus

6 APIC Internal Design: 6 Clock Regions
Input Port Input Sync F VC Trans- lation Table (VCXT) Output Sync Output Port Port 0 Port 0 . Utopia Ports Utopia Ports B Cell Store D Input Port Input Sync Output Sync Output Port Port 1 Port 1 A,B,C,D: Link Clocks (typically 62.5 MHz) E: Bus Clock (PCI: 33 MHz) F: Internal Clock (85 MHz) Port 2 Port 2 Tx Sync Rx Sync E Requestor Register Manager Pacer DataPath BusInterface Interrupt/ Notification Manager PCI-32/64 Bus

7 APIC Transit Path: ATM Port  ATM Port
Input Port Input Sync VC Trans- lation Table (VCXT) Output Sync Output Port Port 0 Port 0 . Utopia Ports Utopia Ports Cell Store Input Port Input Sync Output Sync Output Port Port 1 Port 1 Port 2 Port 2 Tx Sync Rx Sync Data Paths Control Requestor Register Manager Pacer DataPath BusInterface Interrupt/ Notification Manager PCI-32/64 Bus

8 APIC Receive Path: ATM Port  Memory
Input Port Input Sync VC Trans- lation Table (VCXT) Output Sync Output Port Port 0 Port 0 . Utopia Ports Utopia Ports Cell Store Input Port Input Sync Output Sync Output Port Port 1 Port 1 Port 2 Port 2 Tx Sync Rx Sync Data Paths Control Requestor Register Manager Pacer DataPath BusInterface Interrupt/ Notification Manager PCI-32/64 Bus

9 APIC Transmit Path: Memory  ATM Port
Input Port Input Sync VC Trans- lation Table (VCXT) Output Sync Output Port Port 0 Port 0 . Utopia Ports Utopia Ports Cell Store Input Port Input Sync Output Sync Output Port Port 1 Port 1 Port 2 Port 2 Tx Sync Rx Sync Data Paths Control Requestor Register Manager Pacer DataPath BusInterface Interrupt/ Notification Manager PCI-32/64 Bus

10 APIC Multipoint Receive Path: ATM Port  *
Input Port Input Sync VC Trans- lation Table (VCXT) Output Sync Output Port Port 0 Port 0 . Utopia Ports Utopia Ports Cell Store Input Port Input Sync Output Sync Output Port Port 1 Port 1 Port 2 Port 2 Tx Sync Rx Sync Data Paths Control Requestor Register Manager Pacer DataPath BusInterface Interrupt/ Notification Manager PCI-32/64 Bus

11 APIC Multipoint Transmit Path: Memory  *
Input Port Input Sync VC Trans- lation Table (VCXT) Output Sync Output Port Port 0 Port 0 . Utopia Ports Utopia Ports Cell Store Input Port Input Sync Output Sync Output Port Port 1 Port 1 Port 2 Port 2 Tx Sync Rx Sync Data Paths Control Requestor Register Manager Pacer DataPath BusInterface Interrupt/ Notification Manager PCI-32/64 Bus

12 APIC Loopback Path: Memory  Memory
Input Port Input Sync VC Trans- lation Table (VCXT) Output Sync Output Port Port 0 Port 0 . Utopia Ports Utopia Ports Cell Store Input Port Input Sync Output Sync Output Port Port 1 Port 1 Port 2 Port 2 Tx Sync Rx Sync Data Paths Control Requestor Register Manager Pacer DataPath BusInterface Interrupt/ Notification Manager PCI-32/64 Bus

13 APIC Multipoint Loopback Path: Memory  *
Input Port Input Sync VC Trans- lation Table (VCXT) Output Sync Output Port Port 0 Port 0 . Utopia Ports Utopia Ports Cell Store Input Port Input Sync Output Sync Output Port Port 1 Port 1 Port 2 Port 2 Tx Sync Rx Sync Data Paths Control Requestor Register Manager Pacer DataPath BusInterface Interrupt/ Notification Manager PCI-32/64 Bus

14 APIC Control and Response Cell Path
Input Port Input Sync VC Trans- lation Table (VCXT) Output Sync Output Port Port 0 Port 0 . Utopia Ports Utopia Ports Cell Store Input Port Input Sync Output Sync Output Port Port 1 Port 1 Port 2 Port 2 Tx Sync Rx Sync Data Paths Control Requestor Register Manager Pacer DataPath BusInterface Interrupt/ Notification Manager PCI-32/64 Bus

15 APIC and AALs AAL5 AAL0 Frames up to 65535 bytes. Used for IP Packets
Format on next slide AAL0 Host can send and receive individual ATM Cells Used for: communication with raw ATM devices sending specially formatted control cells APIC uses 56 byte cell format shown on a future slide.

16 AAL5 Frames AAL5 Frame Packet data Padding User-to-User Reserved
Length CRC 1 to bytes 0 to 47 1 4 2 Length Bytes Multiple of 48 Bytes AAL5 Frame

17 the ATM Link, of course it
AAL0 Frames One Cell 56 Bytes Internally, 56 bytes. When it goes out onto the ATM Link, of course it is 53 bytes AAL0 Frame 4 4 48 APIC ATM ChanId L 8 16 24 31 C pOut pIn APIC AAL0 Header pIn: Port In pOut: Port Out C: Control Cell L: Low Delay ChanId: Channel Id

18 APIC Traffic Types Transmit Receive Low Delay Paced Best Effort
highest priority transmitted at link rate (APIC Global Pacing Rate) Paced transmitted at rate configured for channel rates independently configurable for each channel Best Effort lowest priority can use whatever bandwidth is left after low delay and paced channels Receive Strictly higher priority then Normal Delay Normal Delay Only serviced when all Low Delay queues are empty

19 APIC Descriptors and Buffers
Current Descriptor ... Full Buffers Partially Filled Buffer Empty Buffer Descriptor points to a buffer queued for sending data from or receiving data into Buffer Descriptor contains: Address of buffer physical address: PCI bus operates on physical not virtual memory Buffer Length Link to next descriptor Flags

20 Buffer Details Receive Buffers:
8-byte aligned and a multiple of 8 bytes in length CAVEAT: RX Sync Bug AAL0 buffers should be multiple of 56 bytes in length AAL5 buffers should be multiple of 48 bytes in length Single AAL5 frame can span multiple buffers No buffer can contain data from more then one AAL5 frame EndOfFrame bit (E) set in buffer containing the last 8 bytes of the AAL5 frame. with caveat above, this expands to be the last cell of the AAL5 frame Multiple AAL0 frames can occupy the same buffer Single AAL0 frame can span multiple buffers BUT because of caveat above, this won’t happen. Buffers for AAL0 will be completely filled

21 Buffer Details Transmit Buffers:
Need not be aligned on word boundaries But our drivers always do… Can be of any length Single AAL5 frame can span multiple buffers No buffer can contain data from more than one AAL5 frame EndOfFrame bit (E) set in buffer containing first byte of the last cell for the AAL5 frame. Multiple AAL0 frames can occupy the same buffer A single AAL0 frame can span multiple buffers All buffers will be completely transmitted unless there is an error

22 Descriptor Details All descriptors MUST reside in a block of contiguous physical memory, 1MB or less All descriptors MUST be 16-byte aligned APIC global register, descriptor area pointer register, must contain the address of this block of memory Think of the descriptor area as an array of descriptors nextDescOfs field in the descriptors is an index into the descriptor array 16 bit index  descriptors possible 65536 descriptors * 16 bytes per descriptor = 1MB

23 APIC Receive Descriptor
BufAddrLo (physical address) BufAddrHi (physical address) E C Y T X L V I O S Match/TCP_Checksum BufLen NextDescOfs We’ll look at the Y field … For more details, see Zubin’s original workshop slides

24 APIC Transmit Descriptor
BufAddrLo (physical address) BufAddrHi (physical address) E Y T V I O S TCRC Match BufLen NextDescOfs We’ll look at the Y field next … For more details, see Zubin’s original workshop slides

25 Sync Bits (Y Field) of APIC Descriptor
Sync (Y) Bits: Implement Ready/Done 0  DONE_VALIDLINK APIC is finished with this descriptor and its link to the next descriptor is valid 1  DONE_INVALIDLINK APIC is done with this descriptor BUT its link to the next descriptor is not valid! Be Careful of this one 2  NOT_READY Not ready for the APIC to use The last descriptor in a chain is always marked NOT_READY by the driver 3  READY Ready for the APIC to use Set in Receive Descriptors in a chain for APIC to use Set in Transmit Descriptors that are ready for the APIC to send

26 APIC DMA Modes Simple DMA Pool DMA Protected DMA
Separate queue of buffer descriptors for each connection works well for transmit Inefficient for receive no sharing of receive buffers and descriptors Pool DMA multiple connections share a pool of buffer descriptors works well for receive caveat: one connection can use up all the buffer descriptors obviously, does not work for transmit Protected DMA queueing operations executed by user-space driver pair of descriptors associated with each buffer: kernel descriptor user descriptor See details in Zubin’s original workshop slides.

27 Simple DMA

28 Pool DMA

29 APIC Interrupts and Notifications
Interrupts used to report an asynchronous event: completion of transmission/reception of a frame error condition Interrupts can be enabled/disabled per channel Notification List contains list of channels that have had events. APIC issues an interrupt and disables further interrupts until processor re-enables. subsequent events will just set an entry in notification list. This reduces frequency of interrupts This can also help reduce overhead of interrupt processing.

30 APIC Memory Mapped Register Space

31 APIC Register Addresses
27 bit address space On PCI Bus, high order 5 bits are device select These are programmed into the APIC PCI Configuration space at boot time by the BIOS RegID 00 14 9 2 Global Registers (i.e. not per channel):

32 APIC Register Addresses (continued)
Kernel Access Per-channel Registers: 2 8 8 9 2 10 t CID RegID 00 User Access Per-channel Registers: 2 8 8 9 2 11 t CID RegID 00 t=0  Rx Channel, t=1  Tx Channel CID: Channel Index or VCI

33 APIC Pacing: General Stuff
Pacing is for Transmit Channels only Cells are NOT Paced out onto the wire Not Exactly Pacing is done on the PCI bus Pacing is not a Guarantee, it is just a Restriction Pacing Calculations include the ATM headers But not the APIC header

34 APIC Pacing: General Stuff
Two pacer controls: Global Pacing APIC Pacing Parameter register (Global, 0x208) Per VC Pacing TX Channel Pacing Parameter Register (TX, 0x500XX68) XX is the Channel ID Three types of Channels: Low Delay (Highest Priority) Paced Best Effort (Lowest Priority) All channels are paced by the Global Pacing Paced Channels also use Per VC Pacing

35 APIC Data Transfers APIC pulls data from memory across the PCI bus in Batches of cells. The number of cells in a Batch is controlled by a register The Pacer identifies when it is time to transmit data and which connection should transmit Pacer “wakes up” every 14 PCI Bus clock ticks checks to see if it is time to transmit Controlled by the Global APIC Pacing Parameter (APP) If it is time to transmit, it takes the first connection off the previously sorted list of keys and transmits its data. A lot of gory details about keys and heap storage of connections is not going to be included here. Read Rex’s documentation and/or read the VHDL if you want that level of detail

36 Global Pacing Parameter
Pacing parameters are 24 bits 16 bits of Integer 8 bits of fractional part Global Apic Pacing Parameter (APP) (256 * BatchSz * 53 * 8 * 8192 *InteralClockMhz) APP = (14 * ClockEstimate * LinkRateMbps) [Items in formula explained on next slide]

37 Explanation of Expression
(256 * BatchSz * 53 * 8 * 8192 *InteralClockMhz) APP = (14 * ClockEstimate * LinkRateMbps) 256 : shifts left by 8 bits to set “decimal point” BatchSz: How many cells per transfer 53*8: Translate cells/second into bits/second 8192, InternalClockMhz (85MHz), ClockEstimate APIC counts how many of its internal 85MHz clock ticks take place during the time it takes for 8192 PCI bus clock ticks. This value is the ClockEstimate. PCI Bus Clock Rate in MHz = (8192 * 85)/ClockEstimate 14: # of PCI Bus Ticks in a Pacer Period LinkRateMbps: Our target rate [Example on next 2 slides]

38 Example: Units in the APP Formula
(256 * BatchSz * 53 * 8 * 8192 *InteralClockMhz) APP = (14 * ClockEstimate * LinkRateMbps) (256 * Cells * Bytes/Cell * Bits/Byte * 8192 * M/sec) (14 * 1 * MBits/sec)

39 Example: APP for 1Gb/s Link Rate
(256 * BatchSz * 53 * 8 * 8192 *InteralClockMhz) APP = (14 * ClockEstimate * LinkRateMbps) BatchSz=8 53*8: Translate cells/second into bits/second InternalClockMhz = 85MHz ClockEstimate = (typical value) LinkRateMbps: 1000 (1000 Mb/s == 1Gb/s) (256 * 8 * 53 * 8 * 8192 * 85) APP = = (14 * * 1000) APP = 2061 = 0x80D

40 Example: APP for 1Gb/s Link Rate
APP = 2061 = 0x80D This means that every 14*8 = 112 PCI Bus clock ticks the APIC will be able to pull 8 Cells worth of data across the PCI Bus. (8 Cells)/(112 * 30ns) = (3392 bits)/(3360ns) ~= 1Gb/s

41 Per VC Pacing Per VC Pacing Parameter Conceptually like this:
What portion of the full link rate can be used e.g. an integer value of 2 means that this channel can use half the link rate Conceptually like this: This Tx Channel is Ready to Transmit BATCH Cells Count to 14 to APP to TX Pacing Parameter 33 MHz PCI Bus Clock

42 oldExpirationTime + vcPacingParameter  newExpirationTime
Per VC Pacing vcPacingParameter ~ 10 One APIC Pacing Period current pacedTime Expired connections X X X X X X X time oldExpirationTime + vcPacingParameter  newExpirationTime

43 pacedTime pacedTime is incremented every global pacing cycle in which a non-LowDelay connection wins contention Example with two connections: (L) Low Delay at 1/24th of the global rate (P) Paced at 1/6th of the global rate ( ) L L L L P P P P P P P P P P P P P P P 6 12 18 24 30 36 42 48 54 60 66 72 78 84

44 pacedTime (continued)
L P 6 12 18 24 30 36 42 48 54 60 66 72 78 84 We might expect the Paced channel to miss its exact turn and fire on the next global pacing interval but keep it next expiration on the (0,6,12,18,…) boundaries. But…

45 pacedTime (continued)
L P P P P P P P P P P P P P P P t+ 5 11 17 22 28 34 40 45 51 57 63 68 74 80 pacedTime t+ 6 12 18 24 30 36 42 48 54 60 66 72 78 84 “Real” time Actual rate for Paced connection: (GlobalRate) * (3*(1/6) + 1*(1/7))/4 (GlobalRate) * (.1607) For a Global Rate of 24Mb/s (DQ test example) 24 * =

46 Example of a Pacing Oddity
Suppose we have a channel on which we are sending single cell packets at a rate of 2 cells every pacing period for that channel and the BATCH size is 1 cell so that the channel should only send 1 cell during each pacing period. D D D D D D D You would expect the connection to build up a backlog, but it doesn’t……

47 Example of a Pacing Oddity (con’t)
Turns out the Driver does a RESUME each time it puts data in an empty transmit queue to restart it. A RESUME causes the ExpireTime to be set to the current PacedTime. This causes the channel to be expired at the very next Pacer Period. Thus the channel transmits at twice its expected rate D T D T D T D T D T D T D T R R R R R R R

48 APIC Bugs and Caveats: RxSync
RxSync Lockup when buffers too short APIC is receiving data for a connection. APIC runs out of buffers when there is still data left If this happens repeatedly, under certain conditions the APIC’s Rx-Sync module can lock-up. Example: if we have 3 16 byte buffers set up to receive one 56 byte AAL0 cell (re- member that the APIC AAL0 cell size is 56 bytes), then each time we receive a cell with these buffers we will have 8 bytes left over that the APIC SHOULD throw away. After the eighth time we use this chain of buffers to receive a cell, the APIC locks up. A similar problem exists for AAL5. Bug has not been identified in VHDL Work- arounds: For AAL0, always allocate buffers in multiples of 56 bytes. For AAL5, always allocate buffers in multiples of 48 bytes.

49 APIC Bugs and Caveats: Word Swap
APIC swaps contiguous 32bit words when receiving data into host memory. Exists in APIC when used in Intel architectures Exists only in 32bit PCI mode Bug has been identified in VHDL but we aren’t going to respin the chip… Work-arounds: Driver performs a word swap on all data received. painful and costly data touch

50 APIC Bugs and Caveats: ILR
Bug in APIC decode of Interrupt Line Register address on writes ILR is at 0x3C BIOS writes IRQ value to ILR register and then reads it back to see if this is a functioning PCI device. If it doesn’t read back properly, it “removes” this device from the PCI bus BIOS write to 0x3C enters APIC as write to 0x7C reads of 0x3C are ok. Bug has been identified in VHDL. Work-around implemented on NICs and SPCs you should never have to worry about this one…

51 Notes

52 Notes


Download ppt "Architecture and Hardware"

Similar presentations


Ads by Google