Dynaprof Evaluation Report Adam Leko, Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red:

Slides:



Advertisements
Similar presentations
Introduction to Linux Recap Installing programs Introduction to Video Editing with Linux.
Advertisements

® IBM Software Group © 2010 IBM Corporation What’s New in Profiling & Code Coverage RAD V8 April 21, 2011 Kathy Chan
(Quickly) Testing the Tester via Path Coverage Alex Groce Oregon State University (formerly NASA/JPL Laboratory for Reliable Software)
Online Performance Auditing Using Hot Optimizations Without Getting Burned Jeremy Lau (UCSD, IBM) Matthew Arnold (IBM) Michael Hind (IBM) Brad Calder (UCSD)
ATOM: A System for Building Customized Program Analysis Tools.
1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.
1 Lecture 6 Performance Measurement and Improvement.
Performance Instrumentation and Measurement for Terascale Systems Jack Dongarra, Shirley Moore, Philip Mucci University of Tennessee Sameer Shende, and.
June 2, 2003ICCS Performance Instrumentation and Measurement for Terascale Systems Jack Dongarra, Shirley Moore, Philip Mucci University of Tennessee.
MPI Program Performance. Introduction Defining the performance of a parallel program is more complex than simply optimizing its execution time. This is.
Instrumentation and Measurement CSci 599 Class Presentation Shreyans Mehta.
Intel Trace Collector and Trace Analyzer Evaluation Report Hans Sherburne, Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding.
Overview of Eclipse Parallel Tools Platform Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red:
Chocolate Bar! luqili. Milestone 3 Speed 11% of final mark 7%: path quality and speed –Some cleverness required for full marks –Implement some A* techniques.
PAPI Tool Evaluation Bryan Golden 1/4/2004 HCS Research Laboratory University of Florida.
Tool Visualizations, Metrics, and Profiled Entities Overview Adam Leko HCS Research Laboratory University of Florida.
System Analysis and Design
UPC/SHMEM PAT High-level Design v.1.1 Hung-Hsun Su UPC Group, HCS lab 6/21/2005.
MpiP Evaluation Report Hans Sherburne, Adam Leko UPC Group HCS Research Laboratory University of Florida.
WORK ON CLUSTER HYBRILIT E. Aleksandrov 1, D. Belyakov 1, M. Matveev 1, M. Vala 1,2 1 Joint Institute for nuclear research, LIT, Russia 2 Institute for.
MPE/Jumpshot Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information.
Adventures in Mastering the Use of Performance Evaluation Tools Manuel Ríos Morales ICOM 5995 December 4, 2002.
Repetition Counting With Microsoft Kinect Presented by: Jonathan Gurary Dai Jun.
University of Maryland The DPCL Hybrid Project James Waskiewicz.
Support for Debugging Automatically Parallelized Programs Robert Hood Gabriele Jost CSC/MRJ Technology Solutions NASA.
11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,
PMaC Performance Modeling and Characterization Performance Modeling and Analysis with PEBIL Michael Laurenzano, Ananta Tiwari, Laura Carrington Performance.
Scalable Analysis of Distributed Workflow Traces Daniel K. Gunter and Brian Tierney Distributed Systems Department Lawrence Berkeley National Laboratory.
Paradyn Evaluation Report Adam Leko, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative note Green:
BLU-ICE and the Distributed Control System Constraints for Software Development Strategies Timothy M. McPhillips Stanford Synchrotron Radiation Laboratory.
11 July 2005 Tool Evaluation Scoring Criteria Professor Alan D. George, Principal Investigator Mr. Hung-Hsun Su, Sr. Research Assistant Mr. Adam Leko,
March 17, 2005 Roadmap of Upcoming Research, Features and Releases Bart Miller & Jeff Hollingsworth.
MPICL/ParaGraph Evaluation Report Adam Leko, Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information.
© Blackboard, Inc. All rights reserved. Deploying a complex building block Andre Koehorst Learning Lab Universiteit Maastricht, the Netherlands July 18.
Overview of CrayPat and Apprentice 2 Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative.
SvPablo Evaluation Report Hans Sherburne, Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red:
Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
March 12, 2001 Kperfmon-MP Multiprocessor Kernel Performance Profiling Alex Mirgorodskii Computer Sciences Department University of Wisconsin.
Performance Analysis Tool List Hans Sherburne Adam Leko HCS Research Laboratory University of Florida.
Performance Monitoring Tools on TCS Roberto Gomez and Raghu Reddy Pittsburgh Supercomputing Center David O’Neal National Center for Supercomputing Applications.
KOJAK Evaluation Report Adam Leko, Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative.
CEPBA Tools (DiP) Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information.
Dynaprof Evaluation Report Adam Leko, Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red:
© 2004 Andrew R. BernatApril 14, 2004Dynamic Call-Path Profiling Incremental Call-Path Profiling Andrew Bernat Computer Sciences Department.
National Center for Supercomputing ApplicationsNational Computational Science Grid Packaging Technology Technical Talk University of Wisconsin Condor/GPT.
© 2001 Barton P. MillerParadyn/Condor Week (12 March 2001, Madison/WI) The Paradyn Port Report Barton P. Miller Computer Sciences Department.
UPC Performance Tool Interface Professor Alan D. George, Principal Investigator Mr. Hung-Hsun Su, Sr. Research Assistant Mr. Adam Leko, Sr. Research Assistant.
HPCToolkit Evaluation Report Hans Sherburne, Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red:
Preparatory Research on Performance Tools for HPC HCS Research Laboratory University of Florida November 21, 2003.
11 July 2005 Briefing on Tool Evaluations Professor Alan D. George, Principal Investigator Mr. Hung-Hsun Su, Sr. Research Assistant Mr. Adam Leko, Sr.
Tool Visualizations, Metrics, and Profiled Entities Overview [Brief Version] Adam Leko HCS Research Laboratory University of Florida.
Overview of dtrace Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative note Green: Positive.
TAU Evaluation Report Adam Leko, Hung-Hsun Su UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative.
Overview of AIMS Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative note Green:
Introduction Selenium IDE is a Firefox extension that allows you to record, edit, and debug tests for HTML Easy record and playback Intelligent field selection.
Performance Analysis with Parallel Performance Wizard Prashanth Prakash, Research Assistant Dr. Vikas Aggarwal, Research Scientist. Vrishali Hajare, Research.
Lecture 2a: Performance Measurement. Goals of Performance Analysis The goal of performance analysis is to provide quantitative information about the performance.
Parallel Performance Wizard: a Performance Analysis Tool for UPC (and other PGAS Models) Max Billingsley III 1, Adam Leko 1, Hung-Hsun Su 1, Dan Bonachea.
21 Sep UPC Performance Analysis Tool: Status and Plans Professor Alan D. George, Principal Investigator Mr. Hung-Hsun Su, Sr. Research Assistant.
TypeCraft Software Evaluation 21/02/ :45 Powered by None Complete: 10 On, Partial: 0 Off, Excluded: 0 Off Country: All, Region:
Testing plan outline Adam Leko Hans Sherburne HCS Research Laboratory University of Florida.
PAPI on Blue Gene L Using network performance counters to layout tasks for improved performance.
© 2001 Week (14 March 2001)Paradyn & Dyninst Demonstrations Paradyn & Dyninst Demos Barton P. Miller Computer.
Performance profiling of Experiments’ Geant4 Simulations Geant4 Technical Forum Ryszard Jurga.
Dynamic Tuning of Parallel Programs with DynInst Anna Morajko, Tomàs Margalef, Emilio Luque Universitat Autònoma de Barcelona Paradyn/Condor Week, March.
Debugging Lab Antonio Gómez-Iglesias Texas Advanced Computing Center.
Introduction to HPC Debugging with Allinea DDT Nick Forrington
Profiling/Tracing Method and Tool Evaluation Strategy Summary Slides Hung-Hsun Su UPC Group, HCS lab 1/25/2005.
Parallel Performance Wizard: A Generalized Performance Analysis Tool Hung-Hsun Su, Max Billingsley III, Seth Koehler, John Curreri, Alan D. George PPW.
Lab 8: GUI testing Software Testing LTAT
Presentation transcript:

Dynaprof Evaluation Report Adam Leko, Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative note Green: Positive note

2 Basic Information Name: Dynaprof Developer: Philip Mucci (UTK) Current versions:  Dynaprof CVS as of 2/21/2005  DynInst API v4.1.1 (dependency)  PAPI v3.0.7 (dependency)  Website: Contact:  Philip Mucci

3 Dynaprof Overview Merges existing tools  PAPI  DynInst API Command-line tool  Dynamically instruments programs at runtime Requires no recompilation!  Insert probes at runtime  Metrics available Wall clock time Any PAPI metrics Can be extended Only simple GUI available (see right)  Just wrapper around command-line version  Currently pretty broken DynaProf 0.9 Philip J. Mucci, Provided courtesy of UTK's Innovative Computing Laboratory. See for more information. This is Open Source Software! (dynaprof)|

4 Instrumentation Overview Instrumentation very easy  Especially for sequential/threaded applications Compile application regularly (-g eases naming later)  gcc -O3 -g -o camel camel.c Dynaprof commands  Load the exe load camel  Specify which probe you wish to use use papiprobe [args]  List available functions list camel.c  Instrument command All functions in a file: instr module camel.c A single function: instr function camel.c main  Run command continue pauses execution (currently does not work) Instrumentation output is produced in an additional file (will be shown at runtime)

5 Instrumentation Overview (2) No special commands needed for  sequential applications  pthread applications MPI not supported directly through command line  Wrapper scripts available for MPICH and LAM  Dynaprof must be run in “batch mode” A file containing all instrumentation commands  Halts the app before MPI_Init() is called  However, not working with current version of MPICH Get assertion failure and stops working Can only use MPI programs with 1 process UPC?  Tried GCC-UPC BUPC (smp + pthreads)  Both produced no output or crashed Dynaprof

6 Instrumentation Overhead Only could instrument one-process MPI code  MPI run wrapper script broken  No PPerf apps! (all require > 1 process) Camel overhead very high  Only instrumented main  LU overhead really low?  Possible causes of overhead Frequent subroutine calls from main Use of tsc.h processor counters for timers confuse Dynaprof Expect overhead similar to Paradyn  5-10% for most applications with a reasonable number of instrumentation points

7 Dynaprof Probe Information Probes perform all data collection and analysis  Provide code to insert into a function when instrumented Probes can be called 4 different times  Function entry point  Function exit point  Function call point  Function return point Each probe is encapsulated in a shared library  Allows relatively easy creation of new probes Available probes  “Wallclock” probe (records wall clock time)  PAPI wallclock probe (same as wallclock, uses high-resolution timers)  PAPI probe (records any PAPI metric, such as FLOPs) Specify PAPI metrics as args in use papiprobe [args] command Existing probes provide profile-style data only  Although no reason that a trace could not also be collected

8 Probe Output After running, an ASCII file containing raw data is created  At runtime, a message like “ …output will be in /home/leko/… ” will be printed indicating where file will be Three programs are provided which analyze the raw data  wallclockrpt – for wall clock probe  papiclockrpt – for PAPI wall clock probe  papiproberpt – for PAPI probe Summary statistics are provided  Exclusive profile (metric collected excluding children)  Inclusive profile (metric collected including children)  1-call level deep profile (see which functions an instrumented function called) Output from *rpt programs is simple ASCII (sample next page)

9 Sample Probe Report (lu.W.1) dynaprof]$ wallclockrpt lu- 1.wallclock Exclusive Profile. Name Percent Total Calls TOTAL e+11 1 unknown e+11 1 main 3.837e Inclusive Profile. Name Percent Total SubCalls TOTAL e+11 0 main e Level Inclusive Call Tree. Parent/-Child Percent Total Calls TOTAL e+11 1 main e f_setarg e e f_setsig e e f_init e e atexit e e MAIN__ Note: only “main” was instrumented in this profiled run

10 Bottleneck Identification Test Suite Testing metric: what did output of probe tell us? CAMEL: FAILED  Instrumenting main caused too much application perturbation NAS LU (“W” workload): TOSS-UP  Given enough time, any bottleneck could be identified Even cache miss problems, thanks to PAPI! But how much time to identify bottlenecks?  Communication problems difficult/impossible to pinpoint No tracing No communication visualization PPerfMark tests: NOT TESTED  Could not evaluate PPerfMark suite (running MPI commands broken)  However, same comments for LU would probably apply to all In general,  Heavily reliant on user’s proficiency with pinpointing problems Incremental approach Instrument, re-run, instrument w/PAPI, re-run…  Process can be tedious  But, ease of instrumentation does ease this

11 Dynaprof General Comments Good points  Free  Source code available, relatively organized  Good reference on how to use PAPI & DynInst API  Very easy to use  Relatively easy to extend  Developer very responsive to questions Not-so-good points  High instrumentation overhead in a few cases  Simple to understand, but not much available functionality  Only profiling data with current probes  Not really being updated much any more  Changing program arguments requires reloading & reinstrumenting executable Dynaprof illustrates that a tool doesn’t have to be ultra-complicated to be useful  KISS!

12 Adding UPC/SHMEM Support UPC support  Would need to do a ton of work  Best bet Provide a UPC probe  Instrument “known” UPC runtime functions  Gasnet functions for Berkeley  Etc. Need one probe per UPC runtime/compiler environment SHMEM support  No extra work necessary!  Handles instrumenting libraries like any other code However, a few potential problems  Reliance on DynInst Hard to port Hard to compile!  Reliance on PAPI Can add own probes which do not use PAPI though… Best way to use Dynaprof  Steal ideas on how to make tool extensible Probes as shared libraries nice idea!  Steal code on how to use DynInst & PAPI

13 Evaluation (1) Available metrics: 1/5  Can use PAPI to get lots of data  Limited in what you can collect in a single run, only Two PAPI metrics or Wall clock time Cost: 5/5  Free Documentation quality: 4/5  Minimal documentation, but covers the basics pretty well Extensibility: 3.5/5  Open source  Can add new functionality by writing new probes  Must write new code to extend (not much existing functionality) Filtering and aggregation: 2/5  Most program data is filtered out for you Direct result of profile-nature of current probes  Many times too much information is lost  Filtering and aggregation behavior fixed in source code of probes

14 Evaluation (2) Hardware support: 3/5  64-bit Linux (Itanium only), Sparc, IRIX, AlphaServer (Tru64), IBM SP (AIX)  Most everything supported: Linux, AIX, IRIX, HP-UX  Reliance on PAPI and DynInst could hinder porting  No Cray support Heterogeneity support: 0/5 (not supported) Installation: 3/5  Dynaprof easy to compile, but  PAPI and DynInst a nightmare to install  Also had to hack up some source code a bit to work with newer versions of gcc & javac (JDK1.5) Interoperability: 0.5/5  No export interoperability with other tools  There is a half-done TAU probe Not sure if it works Or how useful it is! Learning curve: 4/5  Very easy to use  Anyone used to prof/gprof will feel right at home

15 Evaluation (3) Manual overhead: 3/5  Can automatically instrument all functions, a handful of functions, and all function calls within a given function  Very easy to choose which functions you want instrumented  Can script behavior of dynaprof executable  Reinstrumenting requires no recompilation Measurement accuracy: 5/5  For LU, tracing overhead almost negligible using PAPI probes  Tracing overhead small as long as number of instrumented functions kept reasonable  Program’s correctness of execution not affected  Dynamic instrumentation does not get in compiler’s way for optimizations Multiple executions: 0/5  Not supported Multiple analyses & views: 1/5  One way of recording data, one way of presenting it  Probes could theoretically present things differently, but none currently do

16 Evaluation (4) Performance bottleneck identification: 1/5  No automatic detection  Usefulness of tool directly related to cleverness of user  Many bottlenecks would be very difficult to detect with only basic profile information given by hardware counters only Profiling/tracing support: 2/5  Only supports profiling  Could feasibly add tracing if you wanted to code Response time: 3/5  No data at all until after run has completed and tracefile has been opened  Generating reports from raw data instantaneous though Software support: 4.5/5  Can link against (and instrument!!) any existing library  Supports MPI (although broken) and shared-memory threaded programs Source code correlation: 2/5  Data reported to user at the function name level Searching: 0/5 (not supported)

17 Evaluation (5) System stability: 3/5  Command-line interface relatively stable  pause while running broken in command-line  GUI severely broken Technical support: 4/5  Responses from contact within 24 hours  Philip Mucci very helpful, knowledgeable