Performance Data Standard and API Shirley Browne, Jack Dongarra, and Philip Mucci University of Tennessee from the Ptools Annual Meeting, May 1998.

Slides:



Advertisements
Similar presentations
Performance Analysis and Optimization through Run-time Simulation and Statistics Philip J. Mucci University Of Tennessee
Advertisements

Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.
Computer Organization and Architecture
100 Performance ENGR 3410 – Computer Architecture Mark L. Chang Fall 2006.
Performance of multiprocessing systems: Benchmarks and performance counters Miodrag Bolic ELG7187 Topics in Computers: Multiprocessor Systems on Chip.
EECC551 - Shaaban #1 lec # 5 Fall Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with.
RISC By Don Nichols. Contents Introduction History Problems with CISC RISC Philosophy Early RISC Modern RISC.
Operating System Kernels1 Operating System Support for Performance Monitoring Witawas Srisa-an Chapter: not in the book.
Trace Caches J. Nelson Amaral. Difficulties to Instruction Fetching Where to fetch the next instruction from? – Use branch prediction Sometimes there.
Unit -II CPU Organization By- Mr. S. S. Hire. CPU organization.
Pipelining By Toan Nguyen.
CH12 CPU Structure and Function
CSE378 Gen. Intro1 Machine Organization and Assembly Language Programming Machine Organization –Hardware-centric view (in this class) –Not at the transistor.
Digital Systems Design L01 Introduction.1 Digital Systems Design Lecture 01: Introduction Adapted from: Mary Jane Irwin ( )
PAPI Tool Evaluation Bryan Golden 1/4/2004 HCS Research Laboratory University of Florida.
PAPI Update Shirley Browne, Cricket Deane, George Ho, Philip Mucci University of Tennessee Computer.
Computer Science Department University of Texas at El Paso PCAT Performance Counter Assessment Team PAPI Development Team SC 2003, Phoenix, AZ – November.
Lecture 4: Parallel Tools Landscape – Part 1 Allen D. Malony Department of Computer and Information Science.
Edited By Miss Sarwat Iqbal (FUUAST) Last updated:21/1/13
1 “How Can We Address the Needs and Solve the Problems in HPC Benchmarking?” Jack Dongarra Innovative Computing Laboratory University of Tennesseehttp://
Introduction 1-1 Introduction to Virtual Machines From “Virtual Machines” Smith and Nair Chapter 1.
1 Advanced Computer Architecture Dynamic Instruction Level Parallelism Lecture 2.
 Copyright, HiCLAS1 George Delic, Ph.D. HiPERiSM Consulting, LLC And Arney Srackangast, AS1MET Services
Computer Organization and Architecture Tutorial 1 Kenneth Lee.
CSCI 6461: Computer Architecture Branch Prediction Instructor: M. Lancaster Corresponding to Hennessey and Patterson Fifth Edition Section 3.3 and Part.
Summertime Fun Everyone loves performance Shirley Browne, George Ho, Jeff Horner, Kevin London, Philip Mucci, John Thurman.
CPE 631 Project Presentation Hussein Alzoubi and Rami Alnamneh Reconfiguration of architectural parameters to maximize performance and using software techniques.
1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseehttp://
Computer Architecture: Wrap-up CENG331 - Computer Organization Instructors: Murat Manguoglu(Section 1) Erol Sahin (Section 2 & 3) Adapted from slides of.
Software Performance Monitoring Daniele Francesco Kruse July 2010.
UltraSPARC III Hari P. Ananthanarayanan Anand S. Rajan.
On-board Performance Counters: What do they really tell us? Pat Teller The University of Texas at El Paso (UTEP) PTools 2002 Annual Meeting, University.
Computer Science Department University of Texas at El Paso PCAT Performance Counter Assessment Team PAPI Development Team UGC 2003, Bellevue, WA – June.
1  1998 Morgan Kaufmann Publishers Where we are headed Performance issues (Chapter 2) vocabulary and motivation A specific instruction set architecture.
PAPI on Blue Gene L Using network performance counters to layout tasks for improved performance.
1 CHAPTER 1 COMPUTER ABSTRACTIONS AND TECHNOLOGY Parts of these notes have been adapter from those of Prof. Professor Mike Schulte, Prof. D. Patterson,
ALPHA 21164PC. Alpha 21164PC High-performance alternative to a Windows NT Personal Computer.
William Stallings Computer Organization and Architecture 8th Edition
Unit OS2: Operating System Principles
Performance Analysis, Tools and Optimization
CSCE 212 Chapter 4: Assessing and Understanding Performance
Pipelining: Advanced ILP
Agenda Why simulation Simulation and model Instruction Set model
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
So far we have dealt with control hazards in instruction pipelines by:
CSC 4250 Computer Architectures
Control unit extension for data hazards
So far we have dealt with control hazards in instruction pipelines by:
What is Computer Architecture?
15-740/ Computer Architecture Lecture 10: Out-of-Order Execution
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
What is Computer Architecture?
What is Computer Architecture?
Control unit extension for data hazards
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
Control unit extension for data hazards
So far we have dealt with control hazards in instruction pipelines by:
A Configurable Simulator for OOO Speculative Execution
So far we have dealt with control hazards in instruction pipelines by:
What Are Performance Counters?
CSE378 Introduction to Machine Organization
Presentation transcript:

Performance Data Standard and API Shirley Browne, Jack Dongarra, and Philip Mucci University of Tennessee from the Ptools Annual Meeting, May 1998

Performance Counters Almost all high performance processors include hardware performance counters. Most platforms APIs, if they exist, are not appropriate, functional or well documented. Existing performance counter APIs –Cray T3E –SGI MIPS R10000 –IBM Power series –DEC Alpha pfm pseudo-device interface –Windows 95, NT and Linux

Goals Specify a standard application interface (API) for accessing hardware performance counters Include a standard set of definitions for a common set of performance metrics To encourage vendors to implement the standard API for their platforms based on the reference implementation

Issues Ease of use and interpretation Availability of counters and restrictions on the number that can be measured at once Differentiating between the system, the user’s process and other processes Guarding against counter overflow Handling dynamic/speculative execution

Intended Users Performance Tool Developers Application engineers needing performance data to evaluate, model and tune their applications SIP CHSSI team leader (ARL MSRC) has requested cross-platform access to performance counters

Performance Data A common set of performance metrics. Target information that most affects how applications are tuned (exact set to be determined with help of user input)

Performance Data (cont.) –I/D cache misses for different levels –Branch mispredictions –TLB misses –Pipeline stalls due to memory subsystem –Pipeline stalls due to resource conflicts –Cache invalidations –TLB invalidations –Load/store count –Instruction count –Cycle count –Floating point instruction count –Integer instruction count –Branch taken / not taken count

Reference Implementations Reference implementations of the API have been discussed for the following pending funding and support: –Sun Ultra –MIPS R10000 –POWER architecture –Pentium family –Cray T3E Will be layered over the best existing vendor-specific APIs for these platforms

Results Portable performance tools on every platform No longer will developers have to rely on prof as the lowest common denominator of tools.