PAPI Update Shirley Browne, Cricket Deane, George Ho, Philip Mucci University of Tennessee Computer Science Department Ptools Annual Meeting 1999
Review: Why PAPI? - Hardware counters exist on every major processor today and can provide performance tool developer a basis for tool development and application developers valuable information about sections of their code that can be improved. - However, there are only a few APIs that allow access to these counters, and most of them are poorly documented, unstable or unavailable. - Also, performance metrics may have different definitions. (graduated vs. speculative)
PAPI Project Goals To provide a lightweight, portable, and straightforward API to access these counters on major HPC platforms To provide a common subset of these performance metrics on all platforms.
PAPI Project Goals (cont.) Provide application developers the information they may need to tune their codes on different platforms Encourage vendors to standardize the interface to and semantics of the hardware counters
PAPI Project Goals (cont.) To make it easy to write tools for –Performance analysis –Performance modeling –Feedback directed compilation
Current Status API Spec R10K, Pentium Pro, Pentium II nearly complete Null substrate written to help test and debug on any platform
Current Status (cont.) Library calls working: PAPI_add_event()PAPI_read() PAPI_reset()PAPI_write() PAPI_set_opt()PAPI_get_opt() PAPI_start()PAPI_accum() PAPI_stop()
#include #include "papiStdEventDefs.h" #include "papi.h" #include "papi_internal.h" void main() { int r, i; double a, b, c; unsigned long long ct[3]; int EventSet = PAPI_NULL; PAPI_option_t options; r=PAPI_add_event(&EventSet, PAPI_FP_INS); r=PAPI_add_event(&EventSet, PAPI_TOT_INS); r=PAPI_add_event(&EventSet, PAPI_TOT_CYC); options.domain.eventset=1; options.domain.domain=PAPI_DOM_DEFAULT; r=PAPI_set_opt(PAPI_SET_DOMAIN, &options); r=PAPI_reset(EventSet); r=PAPI_start(EventSet); a = 0.5; b = 6.2; for (i=0; i < ; i++) c = a*b; r=PAPI_stop(EventSet, ct); }
Script started on Wed Apr 14 19:07: uname -a Linux redwood.cs.utk.edu #22 Sun Feb 21 16:57:12 EST 1999 i686 unknown make clean rm -rf papi.o linux-pentium.o libpapi.a example1 example2 example3 first second example1.o example2.o example3.o first.o core *~ make first gcc -g -DDEBUG -Wall -c first.c -o first.o gcc -g -DDEBUG -Wall -c papi.c -o papi.o gcc -g -DDEBUG -Wall -c linux-pentium.c -o linux-pentium.o ar ruv libpapi.a papi.o linux-pentium.o a - papi.o a - linux-pentium.o gcc -g first.o -o first libpapi.a first DEBUG: CPU number 1 at 200 MHZ found DEBUG: Empty slot for EventSetInfo at 2 DEBUG: PAPI_reset returns 0 DEBUG: PAPI_start returns 0 DEBUG: PAPI_stop values[0]: DEBUG: PAPI_stop values[1]: DEBUG: PAPI_stop values[2]:
rsh picasso uname -a IRIX64 picasso IP28 cd papi/src/ make clean rm -rf papi.o irix-mips.o libpapi.a example1 example2 example3 first second example1.o example2.o example3.o first.o core *~ make first cc -g -DDEBUG -fullwarn -O0 -c first.c cc -g -DDEBUG -fullwarn -O0 -c papi.c cc -g -DDEBUG -fullwarn -O0 -c irix-mips.c ar ruv libpapi.a papi.o irix-mips.o a - papi.o a - irix-mips.o ar: Warning: creating libpapi.a cc -g -O0 first.o -o first libpapi.a first DEBUG: CPU number 1 at 195 MHZ found DEBUG: PAPI_stop values[0]: DEBUG: PAPI_stop values[1]: DEBUG: PAPI_stop values[2]: exit logout exit exit Script done on Wed Apr 14 19:09:
Specifics Overflow Multiplexing Implementation details
Overflow If requested, PAPI can notify the user when a hardware counter exceeds a certain threshold even when the kernel or hardware cannot. How? A high resolution interval timer with a default setting of 1 ms. Check for overflow and call user handler when necessary.
Multiplexing If requested, PAPI can multiplex the hardware counters even when the kernel cannot. How? A high resolution interval timer with a default setting of 1 ms. User programmable. Accurate? As can be; only multiplex the active events. Best in user domain.
PAPI: A first application Curtis Janssen’s vperf graphical (Qt) performance visualizer. Based on bprof. Gives line by line profiling. All vperf needs is a hash table of text addresses to the number of interrupts at that address. More interrupts mean more time or events. Stay tuned.
Next steps Substrates (IBM, Linux/EV6, Ultra) Overflow (95% complete) Multiple nested event sets. (Ans. 2 new substrate functions) Threading issues. Safety, Portability, Accuracy. (Ans. OpenMP thread library calls and a portable spin-lock)
Related Work Rabbit - Don Heller Perf - Erik Hendriks ftp:// ported to Linux 2.1.x and 2.2.x by Curtis Janssen PCL - Performance Counter Library More at
More Information The draft API is available at To join the project’s reflector, send a message to with the message subscribe ptools-PAPI
A Parallel Tools Consortium Sponsored project Work partially funded by the DoD High Performance Computing Modernization Program, CEWES and ARL Major Shared Resource Centers, through Programming Environment and Training (PET) Views, opinions, and/or findings contained in this report are those of the author(s) and should not be construed as an official Department of Defense position, policy or decision unless so designated by other official documentation.