Presentation is loading. Please wait.

Presentation is loading. Please wait.

11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

Similar presentations


Presentation on theme: "11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,"— Presentation transcript:

1 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory, UTK Performance Evaluation Research Center, LBL mucci@cs.utk.edu http://icl.cs.utk.edu/~mucci/dynaprof/snapshots/sc2002.ppt

2 11/17/02 1 Goals ● Understanding the behavior of the application – Identification of bottlenecks. – Usage of the hardware resources. – Effects of that usage on performance. ● Using Dynaprof to achieve that goal – Command line usage – 3 Dynaprof probes ● Wallclock Time ● Hardware performance counters ● Resource usage traces

3 11/17/02 1 Motivation ● Optimize the application's performance. ● Evaluate the algorithms efficiency. ● Generate an application signature. – A collection of data that represent the major terms in the performance model. ● Develop a performance model.

4 11/17/02 1 Overview of Hardware Counters ● Data is NOT PORTABLE, but PAPI is... ● Small number of registers dedicated for performance monitoring functions. – AMD Athlon, 4 counters – Pentium <= III, 2 counters – Pentium IV, 18 counters – IA64, 4 counters – Alpha 21x64, 2 counters – Power 3, 8 counters – Power 4, 8 counters to a group – UltraSparc II, 2 counters – MIPS R14K, 2 counters

5 11/17/02 1 Applications used in this Tutorial ● Serial: – FSPX: A binary alloy solidification benchmark. – SWIM: The SPEC shallow water benchmark. ● Parallel (MPI): – Ex19 from PetSC distribution. – Solves nonlinear driven cavity with multigrid. A 2D driven cavity problem solved in a velocity-vorticity formulation.

6 11/17/02 1 FPSX Execution Environment ● Intel PIII, 1.2 Ghz – FP Results/Clock: 1 1.2 Gflips ● 4 SP/clk with SSE, 2DP/clk with SSE2 – Caches: 16K/16K, 256K ● G77 version 2.96 -g -O -malign-double -mpentiumpro -funroll- loops -fexpensive-optimizations ● Execution time: > /bin/time fspx 115.370u 0.030s 1:58.17 97.6%0+0k 0+0io 162pf+0w

7 11/17/02 1 swim Execution Environment ● IBM Nighthawk, 16-way Power 3, 375MHz – FP Results/Clock: 4 (1.5 Gflips) – Caches: 32K/64K, 8MB – MPI over TCP/IP via switch ● Xlc 5.0.2.1 built with -g -O3 -qstrict - qarch=pwr3 -qtune=pwr3 ● Execution time: > /bin/time poe swim -procs 2 0.4u 0.0s 0:15 3% 217+3933k 0+0io 1pf+0w

8 11/17/02 1 ex19 Execution Environment ● IBM Nighthawk, 16-way Power 3, 375MHz – FP Results/Clock: 4 (1.5 Gflips) – Caches: 32K/64K, 8MB ● Xlc 5.0.2.1 built with -g ● Execution time: > /bin/time poe ex19 -procs 2 -da_grid_x 56 - da_grid_y 56 0.520u 0.200s 0:44.18 1.6% 297+3580k 0+0io 0pf+0w

9 11/17/02 1 Gprof ● Gathers timer interrupts vs. text address. ● Recompile with -p option. ● Gprof profile is useful for a high level overview ● Does it tell us why?

10 11/17/02 1 Gprof Profile of FSPX

11 11/17/02 1 FPSX: Top 4 functions ● Top 4 functions make up 50% of execution time ● In module update.F – flux – proflux – pde ● In module phase.F – phase ● Use the list command to explore modules and functions

12 11/17/02 1 Gprof Profile of SWIM

13 11/17/02 1 Gprof Profile of ex19

14 11/17/02 1 Dynaprof Environment Variables ● LD_LIBRARY_PATH: Colon seperated list where to look for shared libraries. We need to find: – DynInst library – PAPI library – Any dependancies on the above. (libperfctr.so, libcpc.so) ● DYNINSTAPI_RT_LIB: Full pathname of DynInst runtime library. ● No settings necessary for AIX/DPCL port

15 11/17/02 1 Running Dynaprof ● Usage: dynaprof [-d] [serial_application] ● -d enables debugging output ● Specifying an application automatically loads it into the tool immediately after initialization.

16 11/17/02 1 Command Line Interface ● Uses GNU Readline library for input ● Full featured Command Line Editing – File and command completion: – History: / ● Settings, macros and aliases in ~/.inputrc ● Allows Emacs or VI style bindings – set editing-mode emacs – set editing-mode vi ● See man page, TexInfo file or home page.

17 11/17/02 1 Load command ● Starts the application and stops it at the first instruction. ● Usage: load [args] > dynaprof (dynaprof) load tests/fpsx

18 11/17/02 1 Poeload command ● For use with MPI applications on AIX and DPCL. – DPCL < 3.2.5 requires full path ● Usage: poeload [args] (dynaprof) poeload tests/swim - procs 2

19 11/17/02 1 Mpiload command ● For use with MPI applications. ● Stops the application after it calls PMPI_Init(). ● Mostly useful for script driven execution of MPI jobs ● Usage: mpiload [args] (dynaprof) mpiload tests/mpicount

20 11/17/02 1 Attach command ● Attaches to a running application (or poe process) and stops it. ● Usage: attach (dynaprof) ^Z > tests/fspx & [2] 17500 > fg (dynaprof) attach tests/fspx 17500

21 11/17/02 1 Poeattach Command ● For use with MPI applications on AIX and DPCL. – DPCL < 3.2.5 requires full path ● Usage: poeattach (dynaprof) ^Z poe ex19 -da_grid_x 56 -da_grid_y 56 -procs 2 & [2] 17500 > fg (dynaprof) poeattach ex19 17500

22 11/17/02 1 List command ● list – List all modules in process ● list – List all matching modules ● list – List all functions in module ● list – List all matching functions in module ● list – List instrumentable points in function

23 11/17/02 1 Exploring FSPX ● G77's Fortran Runtime support Code compiled with g77 without -g ends up in the DEFAULT_MODULE ● Application Code ● Shared libraries

24 11/17/02 1 Exploring FSPX 2 ● G77's Fortran Runtime support Code compiled with g77 without -g ends up in the DEFAULT_MODULE

25 11/17/02 1 Exploring FSPX 3 Function Calls

26 11/17/02 1 Use command ● Loads a probe shared library into address space (dynaprof) use [probe [args]] ● Use by itself displays current probe. ● To change options, respecify probe. ● 4 probes in this release – Wallclock: Real time clock – PAPI: Hardware metrics – Perfometer: RT Visi of streaming hardware metrics

27 11/17/02 1 Instr command ● instr – list all instrumented functions ● instr module [arg] – Instrument all functions in modules matching pattern ● instr function [arg] – Instrument all functions matching pattern in module

28 11/17/02 1 Threads and Dynaprof Probes ● For threaded code, use the same probe! ● Dynaprof detects threads and loads a special version of the probe library. ● Each probe specifies what to do when a new thread is discovered. ● Each thread gets the same instrumentation.

29 11/17/02 1 Probe Warning ● Instrumentation is not free. ● Consider granularity of region being measured. ● Overhead for PAPI 2.3 is O(100) cycles. – Between 500 and 2000 cycles for a 2 counter read. ● Overhead for Wallclock is O(100) cycles.

30 11/17/02 1 Wallclock Probe ● High resolution, low latency timer ● Usage: use wallclockprobe ● Reports time in microseconds, 1.0x10 -6 s.

31 11/17/02 1 PAPI Probe ● Count PAPI Presets or Native Events ● Usage: use papiprobe [event,event,...] ● Default argument is either PAPI_FP_INS or PAPI_TOT_INS if the architecture doesn't support it. ● Available events a can be obtained by using: papi_avail -a

32 11/17/02 1 PAPI Probe and Multiplexing ● More than physical number of metrics automatically enables multiplexing. ● Minimum runtime of instrumented regions must be observed, such that all virtual counters get a chance to run at least once. run-time min = num_events *.01s ● Automatic warning functionality is being rolled into PAPI.

33 11/17/02 1 PAPI Native Events ● Look in the PAPI distribution ● See the README file for your architecture in the src directory ● See the example program tests/native.c in the src/tests directory

34 11/17/02 1 Power 3 Events

35 11/17/02 1 Power 3 Events 2

36 11/17/02 1 Power 4 Events

37 11/17/02 1 Pentium III Events

38 11/17/02 1 Intel Pentium IV Events (Arguments to perfex -e from PerfCtr distribution)

39 11/17/02 1 Sun UltraSparc II Events

40 11/17/02 1 Sun UltraSparc III Events

41 11/17/02 1 MIPS R12K Events

42 11/17/02 1 Alpha/DADD 21264 Events

43 11/17/02 1 Perfometer Probe ● Sends a stream of performance data every N seconds to the Perfometer GUI. ● Functions can be colored at instrumentation time. – Default color is white, 0xFFFFFF ● Usage: use perfometerprobe [0xRRGGBB] instr

44 11/17/02 1 Perfometer Probe 2 ● Perfometer GUI is NOT launched automatically. ● showrgb in X11 lists colors and names. ● Run the Java GUI – Java -jar Perfometer.jar ● Connect up to the specified hostname and port.

45 11/17/02 1 Instrumenting SWIM with perfometerprobe

46 11/17/02 1 Instrumenting FSPX for Instructions Per Cycle

47 11/17/02 1 Instrumenting SWIM for Instructions Per Cycle

48 11/17/02 1 Reporting Probe Data ● The wallclock and PAPI probes produce very similar data. ● Both use a parsing script written in Perl. – wallclockrpt – papiproberpt ● Produce 3 profiles – Inclusive: T function = T self + T children – Exclusive: T function = T self – 1-Level Call Tree: T child = Inclusive T function

49 11/17/02 1 Fspx Cycles & Instrs.

50 11/17/02 1 fspx IPC proflux0.61 phase0.63 flux 0.49 pde0.46

51 11/17/02 1 Swim Cycles & Instrs.

52 11/17/02 1 Swim IPC calc2 0.59 calc1 0.53 calc3 0.46

53 11/17/02 1 Perfometer Screenshot

54 11/17/02 1 Dynaprof 0.8 SC Release ● Binary distribution for 4 Platforms on the website – AIX 3.x / DPCL 3.2.5 on Power 3 – Linux / DynInst 3.0 on Pentium <= III – Solaris 2.8 / DynInst 3.0 on UltraSparc II/III – IRIX / DynInst 3.0 on MIPS R10/12/14k – Power 4 and Pentium 4 are coming... ● Xdynaprof Java/Swing GUI included ● perfometerprobe and GUI included ● Updated documentation

55 11/17/02 1 References ● The Dynaprof Homepage http://www.cs.utk.edu/~mucci/dynaprof ● The PAPI Homepage http://icl.cs.utk.edu/projects/papi ● The DynInst Homepage http://www.dyninst.org ● The DPCL Homepage http://oss.software.ibm.com/developerworks/opensource/dpcl ● The Vprof Homepage http://aros.ca.sandia.gov/~cljanss/perf/vprof ● The GNU Readline Homepage http://cnswww.cns.cwru.edu/~chet/readline/rltop.html


Download ppt "11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,"

Similar presentations


Ads by Google