Washington WASHINGTON UNIVERSITY IN ST LOUIS Overview of the APIC Pacer.

Slides:



Advertisements
Similar presentations
System Integration and Performance
Advertisements

May 17, USB Power Management Brad Hosler USB Engineering Manager Intel Corporation.
CMSC 611: Advanced Computer Architecture Cache Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from.
Alice EMCAL meeting, July EMCAL jet trigger status Olivier BOURRION LPSC, Grenoble.
Real-Time Library: RTX
1 CNPA B Nasser S. Abouzakhar Queuing Disciplines Week 8 – Lecture 2 16 th November, 2009.
1 CONGESTION CONTROL. 2 Congestion Control When one part of the subnet (e.g. one or more routers in an area) becomes overloaded, congestion results. Because.
CSC1016 Coursework Clarification Derek Mortimer March 2010.
Prepared By: Eng.Ola M. Abd El-Latif
Operating Systems Input/Output Devices (Ch , 12.7; , 13.7)
331 Lec20.1Fall :332:331 Computer Architecture and Assembly Language Fall 2003 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
Chapter 5 Link Layer slides are modified from J. Kurose & K. Ross CPE 400 / 600 Computer Communication Networks Lecture 20.
331 Lec20.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
Chapter 8: Part II Storage, Network and Other Peripherals.
Counters and Registers
Virtual Memory.
Interrupts. 2 Definition: An electrical signal sent to the CPU (at any time) to alert it to the occurrence of some event that needs its attention Purpose:
Higher Computing Computer Systems S. McCrossan 1 Higher Grade Computing Studies 2. Computer Structure Computer Structure The traditional diagram of a computer...
Number Systems Part 2 Numerical Overflow Right and Left Shifts Storage Methods Subtraction Ranges.
Washington WASHINGTON UNIVERSITY IN ST LOUIS January 7, MSR Tutorial John DeHart Washington University, Applied Research Lab
Computing hardware CPU.
Washington WASHINGTON UNIVERSITY IN ST LOUIS How to Implement the WaveVideo Plugin in an MSR Router.
Nachos Phase 1 Code -Hints and Comments
Block I/O. 2 Definition Any I/O operation in which the unit of data is several words, not just one word or byte.
CS1Q Computer Systems Lecture 8
Core 3: Communication Systems. There are three terms that we will consider in relation to the speed of communication. Bits per second (bps) Baud Rate.
Application of Finite Geometry LDPC code on the Internet Data Transport Wu Yuchun Oct 2006 Huawei Hisi Company Ltd.
Washington WASHINGTON UNIVERSITY IN ST LOUIS June 17, 2002 MSR Tutorial MSR Tutorial: MSR_Config and the AAL5_Download Utilitiy John DeHart Washington.
The Functions of Operating Systems Interrupts. Learning Objectives Explain how interrupts are used to obtain processor time. Explain how processing of.
1 Part III Packet Transmission Chapter 7 Packets, Frames, and Error Detection.
Fair Queueing. 2 First-Come-First Served (FIFO) Packets are transmitted in the order of their arrival Advantage: –Very simple to implement Disadvantage:
ATM Technologies. Asynchronous Transfer Mode (ATM) Designed by phone companies Single technology meant to handle –Voice –Video –Data Intended as LAN or.
January 10, Kits Workshop 1 Washington WASHINGTON UNIVERSITY IN ST LOUIS A Smart Port Card Tutorial --- Software John DeHart Washington University.
ECE 526 – Network Processing Systems Design Computer Architecture: traditional network processing systems implementation Chapter 4: D. E. Comer.
PHY 201 (Blum)1 Microcode Source: Digital Computer Electronics (Malvino and Brown)
Washington WASHINGTON UNIVERSITY IN ST LOUIS CP and Full MSR Test Status.
9/23/2003 2:11 PM MSR Transmit Stall 1 Washington WASHINGTON UNIVERSITY IN ST LOUIS APIC Stalling problem Notes with additional notes on Interrupts John.
Chapter 4 MARIE: An Introduction to a Simple Computer.
Network Technologies Definitions –Network: physical connection that allows two computers to communicate –Packet: a unit of transfer »A sequence of bits.
Stored Programs In today’s lesson, we will look at: what we mean by a stored program computer how computers store and run programs what we mean by the.
MicroC/OS-II S O T R.  MicroC/OS-II (commonly termed as µC/OS- II or uC/OS-II), is the acronym for Micro-Controller Operating Systems Version 2.  It.
Chapter 13 – I/O Systems (Pgs ). Devices  Two conflicting properties A. Growing uniformity in interfaces (both h/w and s/w): e.g., USB, TWAIN.
1 Algorithms Queues, Stacks and Records stored in Linked Lists or Arrays.
PHY 201 (Blum)1 Shift registers and Floating Point Numbers Chapter 11 in Tokheim.
Physics 335 project update Anton Kapliy April 2007.
More on Pipelining 1 CSE 2312 Computer Organization and Assembly Language Programming Vassilis Athitsos University of Texas at Arlington.
Washington WASHINGTON UNIVERSITY IN ST LOUIS SPC II Architecture.
How does the CPU work? CPU’s program counter (PC) register has address i of the first instruction Control circuits “fetch” the contents of the location.
More on Pipelining 1 CSE 2312 Computer Organization and Assembly Language Programming Vassilis Athitsos University of Texas at Arlington.
PHY 201 (Blum)1 Shift registers and Floating Point Numbers Chapter 11 in Tokheim.
LHCb upgrade Workshop, Oxford, Xavier Gremaud (EPFL, Switzerland)
U4 - Ratios Notes 1. Introduction We recommend that you actually play this presentation to get the full value You can also print these slides if you prefer.
TRF79xx/MSP430/Stellaris Mifare Direct Mode 0 Training Texas Instruments NFC/RFID Apps Team 12/2011 (updated 12/2012) (added slides 13, 21-24)
Scheduling for QoS Management. Engineering Internet QoS2 Outline  What is Queue Management and Scheduling?  Goals of scheduling  Fairness (Conservation.
Washington WASHINGTON UNIVERSITY IN ST LOUIS Full MSR Test Configuration.
Computer Networks & Digital Lab project. In cooperation with Mellanox Technologies Ltd. Guided by: Crupnicoff Diego & Gurewitz Omer. Students: Cohen Erez,
Lecture # 3: WAN Data Communication Network L.Rania Ahmed Tabeidi.
St. Ignatius Success Mapping English Department AMDG Year 7.
WINLAB Open Cognitive Radio Platform Architecture v1.0 WINLAB – Rutgers University Date : July 27th 2009 Authors : Prasanthi Maddala,
1 load [2], [9] Transfer contents of memory location 9 to memory location 2. Illegal instruction.
COMPUTER NETWORKS CS610 Lecture-21 Hammad Khalid Khan.
After Mcasp_open completed
Buffer Management and Arbiter in a Switch
Architecture and Hardware
CONGESTION CONTROL.
TCP, XCP and Fair Queueing
Processor Fundamentals
Overview of the Lab 2 Assignment: Multicore Real-Time Tasks
Shift registers and Floating Point Numbers
NVMe.
Presentation transcript:

Washington WASHINGTON UNIVERSITY IN ST LOUIS Overview of the APIC Pacer

2 Washington WASHINGTON UNIVERSITY IN ST LOUIS John DeHart- 12/6/2015 3:38 PM APIC Pacing: General Stuff Pacing is for Transmit Channels only Cells are NOT Paced out onto the wire –Not Exactly Pacing is done on the PCI bus Pacing is not a Guarantee, it is just a Restriction Pacing Calculations include the ATM headers –But not the APIC header More APIC Info: –

3 Washington WASHINGTON UNIVERSITY IN ST LOUIS John DeHart- 12/6/2015 3:38 PM APIC Pacing: General Stuff Two pacer controls: –Global Pacing APIC Pacing Parameter register (Global, 0x208) –Per VC Pacing TX Channel Pacing Parameter Register (TX, 0x500XX68) –XX is the Channel ID Three types of Channels: –Low Delay (Highest Priority) –Paced –Best Effort (Lowest Priority) All channels are paced by the Global Pacing Paced Channels also use Per VC Pacing

4 Washington WASHINGTON UNIVERSITY IN ST LOUIS John DeHart- 12/6/2015 3:38 PM APIC Data Transfers APIC pulls data from memory across the PCI bus in Batches of cells. –The number of cells in a Batch is controlled by a register The Pacer identifies when it is time to transmit data and which connection should transmit Pacer “wakes up” every 14 PCI Bus clock ticks –checks to see if it is time to transmit Controlled by the Global APIC Pacing Parameter (APP) –If it is time to transmit, it takes the first connection off the previously sorted list of keys and transmits its data. A lot of gory details about keys and heap storage of connections is not going to be included here. Read Rex’s documentation and/or read the VHDL if you want that level of detail

5 Washington WASHINGTON UNIVERSITY IN ST LOUIS John DeHart- 12/6/2015 3:38 PM Global Pacing Parameter Pacing parameters are 24 bits –16 bits of Integer –8 bits of fractional part Global Apic Pacing Parameter (APP) (256 * BatchSz * 53 * 8 * 8192 *InteralClockMhz) APP = (14 * ClockEstimate * LinkRateMbps) [Items in formula explained on next slide]

6 Washington WASHINGTON UNIVERSITY IN ST LOUIS John DeHart- 12/6/2015 3:38 PM Explanation of Expression (256 * BatchSz * 53 * 8 * 8192 *InteralClockMhz) APP = (14 * ClockEstimate * LinkRateMbps) 256 : shifts left by 8 bits to set “decimal point” BatchSz: How many cells per transfer 53*8: Translate cells/second into bits/second 8192, InternalClockMhz (85MHz), ClockEstimate –APIC counts how many of its internal 85MHz clock ticks take place during the time it takes for 8192 PCI bus clock ticks. This value is the ClockEstimate. –PCI Bus Clock Rate in MHz = (8192 * 85)/ClockEstimate 14: # of PCI Bus Ticks in a Pacer Period LinkRateMbps: Our target rate [Example on next 2 slides]

7 Washington WASHINGTON UNIVERSITY IN ST LOUIS John DeHart- 12/6/2015 3:38 PM Example: Units in the APP Formula (256 * BatchSz * 53 * 8 * 8192 *InteralClockMhz) APP = (14 * ClockEstimate * LinkRateMbps) (256 * Cells * Bytes/Cell * Bits/Byte * 8192 * M/sec) APP = (14 * 1 * MBits/sec)

8 Washington WASHINGTON UNIVERSITY IN ST LOUIS John DeHart- 12/6/2015 3:38 PM Example: APP for 1Gb/s Link Rate (256 * BatchSz * 53 * 8 * 8192 *InteralClockMhz) APP = (14 * ClockEstimate * LinkRateMbps) BatchSz=8 53*8: Translate cells/second into bits/second InternalClockMhz = 85MHz ClockEstimate = (typical value) LinkRateMbps: 1000 (1000 Mb/s == 1Gb/s) (256 * 8 * 53 * 8 * 8192 * 85) APP = = (14 * * 1000) APP = 2061 = 0x80D

9 Washington WASHINGTON UNIVERSITY IN ST LOUIS John DeHart- 12/6/2015 3:38 PM Example: APP for 1Gb/s Link Rate APP = 2061 = 0x80D This means that every 14*8 = 112 PCI Bus clock ticks the APIC will be able to pull 8 Cells worth of data across the PCI Bus. (8 Cells)/(112 * 30ns) = (3392 bits)/(3360ns) ~= 1Gb/s

10 Washington WASHINGTON UNIVERSITY IN ST LOUIS John DeHart- 12/6/2015 3:38 PM Per VC Pacing Per VC Pacing Parameter –What portion of the full link rate can be used –e.g. an integer value of 2 means that this channel can use half the link rate If you change the Pacing Paramter for a VC: –It takes affect the next time the OLD pacing expires!!! Conceptually like this: This Tx Channel is Ready to Transmit BATCH Cells Count to 14 Count to APP Count to TX Channel Pacing Parameter 33 MHz PCI Bus Clock

11 Washington WASHINGTON UNIVERSITY IN ST LOUIS John DeHart- 12/6/2015 3:38 PM Per VC Pacing XXXXXX time Expired connections X oldExpirationTim + vcPacingParameter  newExpirationTime current pacedTime vcPacingParameter ~ 10 One APIC Pacing Period

12 Washington WASHINGTON UNIVERSITY IN ST LOUIS John DeHart- 12/6/2015 3:38 PM pacedTime pacedTime is incremented every global pacing cycle in which a non-LowDelay connection wins contention Example with two connections: –(L) Low Delay at 1/24 th of the global rate –(P) Paced at 1/6 th of the global rate ( ) L P PPPPPPPPPPPPPP LLL

13 Washington WASHINGTON UNIVERSITY IN ST LOUIS John DeHart- 12/6/2015 3:38 PM pacedTime (continued) PPPPPPPPPPPPPPP LLLL We might expect the Paced channel to miss its exact turn and fire on the next global pacing interval but keeps it next expiration on the (0,6,12,18,…) boundaries. But…

14 Washington WASHINGTON UNIVERSITY IN ST LOUIS John DeHart- 12/6/2015 3:38 PM pacedTime (continued) PPPPP P P P P P P P P P P LLLL Actual rate for Paced connection: –(GlobalRate) * (3*(1/6) + 1*(1/7))/4 – (GlobalRate) * (.1607) For a Global Rate of 24Mb/s (DQ test example) –24 *.1607 = “Real” time pacedTime t+

15 Washington WASHINGTON UNIVERSITY IN ST LOUIS John DeHart- 12/6/2015 3:38 PM Per VC Pacing If you change the Pacing Paramter for a VC: –It takes affect the next time the OLD pacing expires!!! –A Resume of the VC will cause an immediate change to the new Pacing parameter This artifact causes some strange behavior for DQ where there are large changes in the Per-VC pacing parameters. –Going from a low rate to a high rate can be delayed by several DQ periods. –This can cause the results to appear flawed. See the next slide for an illustration.

16 Washington WASHINGTON UNIVERSITY IN ST LOUIS John DeHart- 12/6/2015 3:38 PM Per VC Pacing X time X Just Transmitted Next Transmission by oldPacing Paramter Right after, Want to Change Pacing so it will Transmit every other Pacing Interval XXXX These will Transmit This will not transmit

17 Washington WASHINGTON UNIVERSITY IN ST LOUIS John DeHart- 12/6/2015 3:38 PM Example of a Pacing Oddity Suppose we have a channel on which we are sending single cell packets at a rate of 2 cells every pacing period for that channel and the BATCH size is 1 cell so that the channel should only send 1 cell during each pacing period. DDDDDDD You would expect the connection to build up a backlog, but it doesn’t……

18 Washington WASHINGTON UNIVERSITY IN ST LOUIS John DeHart- 12/6/2015 3:38 PM Example of a Pacing Oddity (con’t) Turns out the Driver does a RESUME each time it puts data in an empty transmit queue to restart it. A RESUME causes the ExpireTime to be set to the current PacedTime. This causes the channel to be expired at the very next Pacer Period. Thus the channel transmits at twice its expected rate DDDDDDD RRRRRRR TTTTTTT