PAPI Update Shirley Browne, Cricket Deane, George Ho, Philip Mucci University of Tennessee Computer.

Slides:



Advertisements
Similar presentations
Performance Analysis and Optimization through Run-time Simulation and Statistics Philip J. Mucci University Of Tennessee
Advertisements

MPI Message Passing Interface
Threads Relation to processes Threads exist as subsets of processes Threads share memory and state information within a process Switching between threads.
Parallel Processing with OpenMP
PAPI for Blue Gene/Q: The 5 BGPM Components Heike Jagode and Shirley Moore Innovative Computing Laboratory University of Tennessee-Knoxville
Profiling your application with Intel VTune at NERSC
Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.
Chapter 4: Multithreaded Programming
Operating Systems Parallel Systems (Now basic OS knowledge)
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts Essentials – 2 nd Edition Chapter 4: Threads.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition Chapter 4: Threads CS 170 TY, Sept 2011.
Performance Evaluation on SGI Altix 4700 Guangdeng Liu and Danny Guo.
Operating Systems Parallel Systems and Threads (Soon to be basic OS knowledge)
Threads CSCI 444/544 Operating Systems Fall 2008.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 3: Processes.
1 School of Computing Science Simon Fraser University CMPT 300: Operating Systems I Ch 4: Threads Dr. Mohamed Hefeeda.
Threads CS 416: Operating Systems Design, Spring 2001 Department of Computer Science Rutgers University
Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.
14.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 4: Threads.
PAPI Tool Evaluation Bryan Golden 1/4/2004 HCS Research Laboratory University of Florida.
PAPI The Performance Application Programming Interface Kevin London Nathan Garner
Austin Java Users Group developerWorks article – µActor Library BARRY FEIGENBAUM, PH. D. 02/26/13.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition Chapter 4: Threads CS 170 T Yang, Sept 2012.
Silberschatz, Galvin and Gagne ©2011Operating System Concepts Essentials – 8 th Edition Chapter 4: Threads.
Chapter 4: Threads. 4.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Threads A thread (or lightweight process) is a basic unit of CPU.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Lecture 8. Profiling - for Performance Analysis - Prof. Taeweon Suh Computer Science Education Korea University COM503 Parallel Computer Architecture &
Lecture 4: Parallel Tools Landscape – Part 1 Allen D. Malony Department of Computer and Information Science.
OpenMP – Introduction* *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)
CSE 303 Concepts and Tools for Software Development Richard C. Davis UW CSE – 12/6/2006 Lecture 24 – Profilers.
Source: Operating System Concepts by Silberschatz, Galvin and Gagne.
Summertime Fun Everyone loves performance Shirley Browne, George Ho, Jeff Horner, Kevin London, Philip Mucci, John Thurman.
CPE 631 Project Presentation Hussein Alzoubi and Rami Alnamneh Reconfiguration of architectural parameters to maximize performance and using software techniques.
1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseehttp://
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Operating Systems Processes and Threads.
Threaded Programming Lecture 2: Introduction to OpenMP.
CS307 Operating Systems Threads Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University Spring 2011.
HPD -- A High Performance Debugger Implementation A Parallel Tools Consortium project
Performance Data Standard and API Shirley Browne, Jack Dongarra, and Philip Mucci University of Tennessee from the Ptools Annual Meeting, May 1998.
Chapter 1 Basic Concepts of Operating Systems Introduction Software A program is a sequence of instructions that enables the computer to carry.
PAPI on Blue Gene L Using network performance counters to layout tasks for improved performance.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 4: Threads.
Performance profiling of Experiments’ Geant4 Simulations Geant4 Technical Forum Ryszard Jurga.
Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.
Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.
CSCI/CMPE 4334 Operating Systems Review: Exam 1 1.
Reference Implementation of the High Performance Debugging (HPD) Standard Kevin London ( ) Shirley Browne ( ) Robert.
Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.
CMSC 421 Spring 2004 Section 0202 Part II: Process Management Chapter 5 Threads.
Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.
Shirley Moore Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress Shirley Moore
Chapter 4: Multithreaded Programming
Chapter 4: Threads.
Chapter 4: Threads.
CS 6560: Operating Systems Design
Performance Analysis, Tools and Optimization
Threads and Cooperation
Operating System Concepts
Operating Systems (CS 340 D)
Multi-core CPU Computing Straightforward with OpenMP
Chapter 4: Threads.
Chapter 4: Threads.
Multithreaded Programming
Operating Systems (CS 340 D)
Chapter 4: Threads & Concurrency
Chapter 4: Threads.
Chapter 4: Threads.
In Today’s Class.. General Kernel Responsibilities Kernel Organization
Threads CSE 2431: Introduction to Operating Systems
Presentation transcript:

PAPI Update Shirley Browne, Cricket Deane, George Ho, Philip Mucci University of Tennessee Computer Science Department Ptools Annual Meeting 1999

Review: Why PAPI? - Hardware counters exist on every major processor today and can provide performance tool developer a basis for tool development and application developers valuable information about sections of their code that can be improved. - However, there are only a few APIs that allow access to these counters, and most of them are poorly documented, unstable or unavailable. - Also, performance metrics may have different definitions. (graduated vs. speculative)

PAPI Project Goals To provide a lightweight, portable, and straightforward API to access these counters on major HPC platforms To provide a common subset of these performance metrics on all platforms.

PAPI Project Goals (cont.) Provide application developers the information they may need to tune their codes on different platforms Encourage vendors to standardize the interface to and semantics of the hardware counters

PAPI Project Goals (cont.) To make it easy to write tools for –Performance analysis –Performance modeling –Feedback directed compilation

Current Status API Spec R10K, Pentium Pro, Pentium II nearly complete Null substrate written to help test and debug on any platform

Current Status (cont.) Library calls working: PAPI_add_event()PAPI_read() PAPI_reset()PAPI_write() PAPI_set_opt()PAPI_get_opt() PAPI_start()PAPI_accum() PAPI_stop()

#include #include "papiStdEventDefs.h" #include "papi.h" #include "papi_internal.h" void main() { int r, i; double a, b, c; unsigned long long ct[3]; int EventSet = PAPI_NULL; PAPI_option_t options; r=PAPI_add_event(&EventSet, PAPI_FP_INS); r=PAPI_add_event(&EventSet, PAPI_TOT_INS); r=PAPI_add_event(&EventSet, PAPI_TOT_CYC); options.domain.eventset=1; options.domain.domain=PAPI_DOM_DEFAULT; r=PAPI_set_opt(PAPI_SET_DOMAIN, &options); r=PAPI_reset(EventSet); r=PAPI_start(EventSet); a = 0.5; b = 6.2; for (i=0; i < ; i++) c = a*b; r=PAPI_stop(EventSet, ct); }

Script started on Wed Apr 14 19:07: uname -a Linux redwood.cs.utk.edu #22 Sun Feb 21 16:57:12 EST 1999 i686 unknown make clean rm -rf papi.o linux-pentium.o libpapi.a example1 example2 example3 first second example1.o example2.o example3.o first.o core *~ make first gcc -g -DDEBUG -Wall -c first.c -o first.o gcc -g -DDEBUG -Wall -c papi.c -o papi.o gcc -g -DDEBUG -Wall -c linux-pentium.c -o linux-pentium.o ar ruv libpapi.a papi.o linux-pentium.o a - papi.o a - linux-pentium.o gcc -g first.o -o first libpapi.a first DEBUG: CPU number 1 at 200 MHZ found DEBUG: Empty slot for EventSetInfo at 2 DEBUG: PAPI_reset returns 0 DEBUG: PAPI_start returns 0 DEBUG: PAPI_stop values[0]: DEBUG: PAPI_stop values[1]: DEBUG: PAPI_stop values[2]:

rsh picasso uname -a IRIX64 picasso IP28 cd papi/src/ make clean rm -rf papi.o irix-mips.o libpapi.a example1 example2 example3 first second example1.o example2.o example3.o first.o core *~ make first cc -g -DDEBUG -fullwarn -O0 -c first.c cc -g -DDEBUG -fullwarn -O0 -c papi.c cc -g -DDEBUG -fullwarn -O0 -c irix-mips.c ar ruv libpapi.a papi.o irix-mips.o a - papi.o a - irix-mips.o ar: Warning: creating libpapi.a cc -g -O0 first.o -o first libpapi.a first DEBUG: CPU number 1 at 195 MHZ found DEBUG: PAPI_stop values[0]: DEBUG: PAPI_stop values[1]: DEBUG: PAPI_stop values[2]: exit logout exit exit Script done on Wed Apr 14 19:09:

Specifics  Overflow  Multiplexing  Implementation details

Overflow If requested, PAPI can notify the user when a hardware counter exceeds a certain threshold even when the kernel or hardware cannot. How? A high resolution interval timer with a default setting of 1 ms. Check for overflow and call user handler when necessary.

Multiplexing If requested, PAPI can multiplex the hardware counters even when the kernel cannot. How? A high resolution interval timer with a default setting of 1 ms. User programmable. Accurate? As can be; only multiplex the active events. Best in user domain.

PAPI: A first application Curtis Janssen’s vperf graphical (Qt) performance visualizer. Based on bprof. Gives line by line profiling. All vperf needs is a hash table of text addresses to the number of interrupts at that address. More interrupts mean more time or events. Stay tuned.

Next steps Substrates (IBM, Linux/EV6, Ultra) Overflow (95% complete) Multiple nested event sets. (Ans. 2 new substrate functions) Threading issues. Safety, Portability, Accuracy. (Ans. OpenMP thread library calls and a portable spin-lock)

Related Work Rabbit - Don Heller Perf - Erik Hendriks ftp:// ported to Linux 2.1.x and 2.2.x by Curtis Janssen PCL - Performance Counter Library More at

More Information The draft API is available at To join the project’s reflector, send a message to with the message subscribe ptools-PAPI

A Parallel Tools Consortium Sponsored project Work partially funded by the DoD High Performance Computing Modernization Program, CEWES and ARL Major Shared Resource Centers, through Programming Environment and Training (PET) Views, opinions, and/or findings contained in this report are those of the author(s) and should not be construed as an official Department of Defense position, policy or decision unless so designated by other official documentation.