Presentation is loading. Please wait.

Presentation is loading. Please wait.

Confessions of a Performance Monitor Hardware Designer Workshop on Hardware Performance Monitor Design HPCA-11 13 February 2005 Jim Callister Intel Corporation.

Similar presentations


Presentation on theme: "Confessions of a Performance Monitor Hardware Designer Workshop on Hardware Performance Monitor Design HPCA-11 13 February 2005 Jim Callister Intel Corporation."— Presentation transcript:

1 Confessions of a Performance Monitor Hardware Designer Workshop on Hardware Performance Monitor Design HPCA-11 13 February 2005 Jim Callister Intel Corporation © Intel Corp. 2005

2 Itanium ® Processor PMU13 February 2005 Why Include a PMU? Ya gotta do something with all those transistors! Cause my PAPI told me to To give competitors a fighting chance To show my boss how great my branch predictor is (ie., get a raise) To improve the performance of current and future systems

3 Itanium ® Processor PMU13 February 2005 How much Performance would you give up for PMU Functionality? Transistors may be “free” but… –Wires are not! –Design time costs –Validation costs –Documentation costs –Time to Market costs The answer is not 0% –PMU proven to improve performance But it’s not 10% either!

4 Itanium ® Processor PMU13 February 2005 PMU Central Collector The PMU Has Tentacles Everywhere!

5 Itanium ® Processor PMU13 February 2005 What to Architect in the PMU? “Machine Architecture is a contract between hardware and software” Architect too much… –Lowers performance through design constraints –Events don’t map well to hardware Architect too little… –Jeopardizes Software Investment –Discourages Software Support

6 Itanium ® Processor PMU13 February 2005 Itanium ® Architecture: PMU Architected –Access & Management of PMU Resources PMD registers for Data, PMC registers to control PMU –Counter Overflow Behavior and Interrupt Handling –Only a few basic counter events Implementation Dependent –Number of counters, width of counters –Non-counter performance monitors –Events: Encourage use of CPU-specific tables Itanium architecture protects OS and Tool infrastructure while promoting performance and full visibility

7 Itanium ® Processor PMU13 February 2005 Performance Events – Let me count the ways… Which events are important? –How will the events be used? –Do you really care about a cache miss if it doesn’t cause any stalls? Mapping an event to signals –Needed signal may not be available On critical path, lack of wires, no signal –Combining signals is problematic Distance between signals, timing, logic

8 Itanium ® Processor PMU13 February 2005 Itanium ® 2 Processor PMU Events Event CategoriesNumber of Events Cycle Accounting89 Instruction Execution42 Branches69 Caches & TLBs150 Bus73 Misc20 Total443

9 Itanium ® Processor PMU13 February 2005 Where are the Performance Problems? Counters only give type of problem and magnitude of the problem Use filters on counters (hunt & peck) Itanium ® architecture currently includes: –Opcode Filters –Privilege Level Filters –Instruction Address Range Filters –Data Address Range Filters

10 Itanium ® Processor PMU13 February 2005 A Better Way to Locate Performance Problems Event Address Registers (EARs) –Logs information about a single cache miss –The logs are sampled by software –Creates a statistical profile of cache misses Branch Trace Buffer (BTB) –Logs information about consecutive branches –Logs also sampled by software

11 Itanium ® Processor PMU13 February 2005 Lend Me an EAR Instruction & Data EARs –Log Instruction Address of Miss Data EAR also logs Data Address of Miss –Log Latency of Miss –Filter by latency bin –Have an associated counter event –Can also log TLB misses And where TLB miss was resolved Have proven to be extremely useful

12 Itanium ® Processor PMU13 February 2005 The D-EAR Shadow Effect Miss Recorded Miss Recorded Without extra hardware, these misses would never be recorded! Latency Counter Busy

13 Itanium ® Processor PMU13 February 2005 The D-EAR Shadow Effect Miss Recorded Miss Recorded Without extra hardware, these misses would never be recorded! Latency Counter Busy The Itanium ® 2 Processor Solution Don’t Track every Opportunity -- randomly pick misses to track Tradeoff: shadow mitigation versus sampling frequency Use LFSR to decide which port to sample and if to sample Every miss has ~1 in 8 chance of being tracked This mitigates the shadow effect, does not totally eliminate it Customer feedback indicates it works very well

14 Itanium ® Processor PMU13 February 2005 The Itanium ® 2 Processor’s Branch Trace Buffer (BTB) An eight entry Circular Buffer Each entry contains either: –Address & Prediction Data of a branch, or –Address of a branch target Uses of the BTB –Mis-predicted branch profiler –An efficient Instruction Address Profiler –Path Profiler Cool use: in conjunction with EARs –Path leading up to sampled miss!

15 Itanium ® Processor PMU13 February 2005 The Itanium ® 2 processor’s PMU Helps Improve Performance Performance is measured using specific computer systems and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance.

16 Itanium ® Processor PMU13 February 2005 Conclusions Walking a micron in HW design shoes –Balancing PMU functionality & overall performance We need to move beyond counters! –Itanium ® 2 processors provide EARs and BTBs –What’s next? The Itanium 2 processor’s PMU has much to offer –Customers are making good use of it –Would like to see more use – how do we do it? Discussion –What is the long-term vision for the PMU? –What can the PMU provide to improve current and future systems –Did anything “stick” or resonate? Itanium ® and Itanium ® 2 are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries

17 Itanium ® Processor PMU13 February 2005 For More Information…. http://developer.intel.com/design/itanium/documentation.htm Manuals Intel Itanium Architecture Software Developer's Manuals Volume 1: Application Architecture Part II: Optimization Guide Intel Itanium Architecture Software Developer's Manuals Volume 2: System Architecture Chapter 7: Debugging and Performance Monitoring Chapter 12: Performance Monitoring Support Intel Itanium 2 Processor Reference Manual for Software Development and Optimization Chapter 10: Performance Monitoring Chapter 11: Performance Monitor Events


Download ppt "Confessions of a Performance Monitor Hardware Designer Workshop on Hardware Performance Monitor Design HPCA-11 13 February 2005 Jim Callister Intel Corporation."

Similar presentations


Ads by Google