Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dynaprof Evaluation Report Adam Leko, Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red:

Similar presentations


Presentation on theme: "Dynaprof Evaluation Report Adam Leko, Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red:"— Presentation transcript:

1 Dynaprof Evaluation Report Adam Leko, Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative note Green: Positive note

2 2 Basic Information Name: Dynaprof Developer: Philip Mucci (UTK) Current versions:  Dynaprof CVS as of 2/21/2005  DynInst API v4.1.1 (dependency)  PAPI v3.0.7 (dependency)  Website: http://www.cs.utk.edu/~mucci/dynaprof/ Contact:  Philip Mucci (mucci@cs.utk.edu)

3 3 Dynaprof Overview Merges existing tools  PAPI  DynInst API Command-line tool  Dynamically instruments programs at runtime Requires no recompilation!  Insert probes at runtime  Metrics available Wall clock time Any PAPI metrics Can be extended Only simple GUI available (see right)  Just wrapper around command-line version  Currently pretty broken DynaProf 0.9 Philip J. Mucci, mucci@cs.utk.edu, 2000-2003 Provided courtesy of UTK's Innovative Computing Laboratory. See http://icl.cs.utk.edu for more information. This is Open Source Software! (dynaprof)|

4 4 Instrumentation Overview Instrumentation very easy  Especially for sequential/threaded applications Compile application regularly (-g eases naming later)  gcc -O3 -g -o camel camel.c Dynaprof commands  Load the exe load camel  Specify which probe you wish to use use papiprobe [args]  List available functions list camel.c  Instrument command All functions in a file: instr module camel.c A single function: instr function camel.c main  Run command continue pauses execution (currently does not work) Instrumentation output is produced in an additional file (will be shown at runtime)

5 5 Instrumentation Overview (2) No special commands needed for  sequential applications  pthread applications MPI not supported directly through command line  Wrapper scripts available for MPICH and LAM  Dynaprof must be run in “batch mode” A file containing all instrumentation commands  Halts the app before MPI_Init() is called  However, not working with current version of MPICH Get assertion failure and stops working Can only use MPI programs with 1 process UPC?  Tried GCC-UPC BUPC (smp + pthreads)  Both produced no output or crashed Dynaprof

6 6 Instrumentation Overhead Only could instrument one-process MPI code  MPI run wrapper script broken  No PPerf apps! (all require > 1 process) Camel overhead very high  Only instrumented main  LU overhead really low?  Possible causes of overhead Frequent subroutine calls from main Use of tsc.h processor counters for timers confuse Dynaprof Expect overhead similar to Paradyn  5-10% for most applications with a reasonable number of instrumentation points

7 7 Dynaprof Probe Information Probes perform all data collection and analysis  Provide code to insert into a function when instrumented Probes can be called 4 different times  Function entry point  Function exit point  Function call point  Function return point Each probe is encapsulated in a shared library  Allows relatively easy creation of new probes Available probes  “Wallclock” probe (records wall clock time)  PAPI wallclock probe (same as wallclock, uses high-resolution timers)  PAPI probe (records any PAPI metric, such as FLOPs) Specify PAPI metrics as args in use papiprobe [args] command Existing probes provide profile-style data only  Although no reason that a trace could not also be collected

8 8 Probe Output After running, an ASCII file containing raw data is created  At runtime, a message like “ …output will be in /home/leko/… ” will be printed indicating where file will be Three programs are provided which analyze the raw data  wallclockrpt – for wall clock probe  papiclockrpt – for PAPI wall clock probe  papiproberpt – for PAPI probe Summary statistics are provided  Exclusive profile (metric collected excluding children)  Inclusive profile (metric collected including children)  1-call level deep profile (see which functions an instrumented function called) Output from *rpt programs is simple ASCII (sample next page)

9 9 Sample Probe Report (lu.W.1) [leko@eta-1 dynaprof]$ wallclockrpt lu- 1.wallclock.16143 Exclusive Profile. Name Percent Total Calls ------------- ------- ----- ------- TOTAL 100 1.436e+11 1 unknown 100 1.436e+11 1 main 3.837e-06 5511 1 Inclusive Profile. Name Percent Total SubCalls ------------- ------- ----- ------- TOTAL 100 1.436e+11 0 main 100 1.436e+11 5 1 -Level Inclusive Call Tree. Parent/-Child Percent Total Calls ------------- ------- ----- -------- TOTAL 100 1.436e+11 1 main 100 1.436e+11 1 - f_setarg.0 1.414e-05 2.03e+04 1 - f_setsig.1 1.324e-05 1.902e+04 1 - f_init.2 2.569e-05 3.691e+04 1 - atexit.3 7.042e-06 1.012e+04 1 - MAIN__.4 0 0 1 Note: only “main” was instrumented in this profiled run

10 10 Bottleneck Identification Test Suite Testing metric: what did output of probe tell us? CAMEL: FAILED  Instrumenting main caused too much application perturbation NAS LU (“W” workload): TOSS-UP  Given enough time, any bottleneck could be identified Even cache miss problems, thanks to PAPI! But how much time to identify bottlenecks?  Communication problems difficult/impossible to pinpoint No tracing No communication visualization PPerfMark tests: NOT TESTED  Could not evaluate PPerfMark suite (running MPI commands broken)  However, same comments for LU would probably apply to all In general,  Heavily reliant on user’s proficiency with pinpointing problems Incremental approach Instrument, re-run, instrument w/PAPI, re-run…  Process can be tedious  But, ease of instrumentation does ease this

11 11 Dynaprof General Comments Good points  Free  Source code available, relatively organized  Good reference on how to use PAPI & DynInst API  Very easy to use  Relatively easy to extend  Developer very responsive to questions Not-so-good points  High instrumentation overhead in a few cases  Simple to understand, but not much available functionality  Only profiling data with current probes  Not really being updated much any more  Changing program arguments requires reloading & reinstrumenting executable Dynaprof illustrates that a tool doesn’t have to be ultra-complicated to be useful  KISS!

12 12 Adding UPC/SHMEM Support UPC support  Would need to do a ton of work  Best bet Provide a UPC probe  Instrument “known” UPC runtime functions  Gasnet functions for Berkeley  Etc. Need one probe per UPC runtime/compiler environment SHMEM support  No extra work necessary!  Handles instrumenting libraries like any other code However, a few potential problems  Reliance on DynInst Hard to port Hard to compile!  Reliance on PAPI Can add own probes which do not use PAPI though… Best way to use Dynaprof  Steal ideas on how to make tool extensible Probes as shared libraries nice idea!  Steal code on how to use DynInst & PAPI

13 13 Evaluation (1) Available metrics: 1/5  Can use PAPI to get lots of data  Limited in what you can collect in a single run, only Two PAPI metrics or Wall clock time Cost: 5/5  Free Documentation quality: 4/5  Minimal documentation, but covers the basics pretty well Extensibility: 3.5/5  Open source  Can add new functionality by writing new probes  Must write new code to extend (not much existing functionality) Filtering and aggregation: 2/5  Most program data is filtered out for you Direct result of profile-nature of current probes  Many times too much information is lost  Filtering and aggregation behavior fixed in source code of probes

14 14 Evaluation (2) Hardware support: 3/5  64-bit Linux (Itanium only), Sparc, IRIX, AlphaServer (Tru64), IBM SP (AIX)  Most everything supported: Linux, AIX, IRIX, HP-UX  Reliance on PAPI and DynInst could hinder porting  No Cray support Heterogeneity support: 0/5 (not supported) Installation: 3/5  Dynaprof easy to compile, but  PAPI and DynInst a nightmare to install  Also had to hack up some source code a bit to work with newer versions of gcc & javac (JDK1.5) Interoperability: 0.5/5  No export interoperability with other tools  There is a half-done TAU probe Not sure if it works Or how useful it is! Learning curve: 4/5  Very easy to use  Anyone used to prof/gprof will feel right at home

15 15 Evaluation (3) Manual overhead: 3/5  Can automatically instrument all functions, a handful of functions, and all function calls within a given function  Very easy to choose which functions you want instrumented  Can script behavior of dynaprof executable  Reinstrumenting requires no recompilation Measurement accuracy: 5/5  For LU, tracing overhead almost negligible using PAPI probes  Tracing overhead small as long as number of instrumented functions kept reasonable  Program’s correctness of execution not affected  Dynamic instrumentation does not get in compiler’s way for optimizations Multiple executions: 0/5  Not supported Multiple analyses & views: 1/5  One way of recording data, one way of presenting it  Probes could theoretically present things differently, but none currently do

16 16 Evaluation (4) Performance bottleneck identification: 1/5  No automatic detection  Usefulness of tool directly related to cleverness of user  Many bottlenecks would be very difficult to detect with only basic profile information given by hardware counters only Profiling/tracing support: 2/5  Only supports profiling  Could feasibly add tracing if you wanted to code Response time: 3/5  No data at all until after run has completed and tracefile has been opened  Generating reports from raw data instantaneous though Software support: 4.5/5  Can link against (and instrument!!) any existing library  Supports MPI (although broken) and shared-memory threaded programs Source code correlation: 2/5  Data reported to user at the function name level Searching: 0/5 (not supported)

17 17 Evaluation (5) System stability: 3/5  Command-line interface relatively stable  pause while running broken in command-line  GUI severely broken Technical support: 4/5  Responses from contact within 24 hours  Philip Mucci very helpful, knowledgeable


Download ppt "Dynaprof Evaluation Report Adam Leko, Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red:"

Similar presentations


Ads by Google