Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Parallel Performance Analysis with Open|SpeedShop Half Day SC 2008 Austin, TX.

Similar presentations


Presentation on theme: "1 Parallel Performance Analysis with Open|SpeedShop Half Day SC 2008 Austin, TX."— Presentation transcript:

1 1 Parallel Performance Analysis with Open|SpeedShop Half Day Tutorial @ SC 2008 Austin, TX

2 Tutorial @ SC2008, Austin, TX Slide 2 What is Open|SpeedShop? Comprehensive Open Source Performance Analysis Framework Combining Profiling and Tracing Combining Profiling and Tracing Common workflow for all experiments Common workflow for all experiments Flexible instrumentation Flexible instrumentation Extensibility through plugins Extensibility through pluginsPartners DOE/NNSA Tri-Labs (LLNL, LANL, SNLs) DOE/NNSA Tri-Labs (LLNL, LANL, SNLs) Krell Institute Krell Institute Universities of Wisconsin and Maryland Universities of Wisconsin and Maryland

3 Tutorial @ SC2008, Austin, TX Slide 3 Highlights Open Source Performance Analysis Tool Framework Most common performance analysis steps all in one tool Most common performance analysis steps all in one tool Extensible by using plugins for data collection and representation Extensible by using plugins for data collection and representation Several Instrumentation Options All work on unmodified application binaries All work on unmodified application binaries Offline and online data collection / attach to running applications Offline and online data collection / attach to running applications Flexible and Easy to use User access through GUI, Command Line, and Python Scripting User access through GUI, Command Line, and Python Scripting Large Range of Platforms Linux Clusters with x86, IA-64, Opteron, and EM64T CPUs Linux Clusters with x86, IA-64, Opteron, and EM64T CPUs Easier portability with offline data collection mechanism Easier portability with offline data collection mechanismAvailability Used at all three ASC labs with lab-size applications Used at all three ASC labs with lab-size applications Full source available on sourceforge.net Full source available on sourceforge.net

4 Tutorial @ SC2008, Austin, TX Slide 4 O|SS Target Audience Programmers/code teams Use Open|SpeedShop out of the box Use Open|SpeedShop out of the box Powerful performance analysis Powerful performance analysis Ability to integrate O|SS into projects Ability to integrate O|SS into projects Tool developers Single, comprehensive infrastructure Single, comprehensive infrastructure Easy deployment of new tools Easy deployment of new tools Project/product specific customizations Predefined/custom experiments Predefined/custom experiments

5 Tutorial @ SC2008, Austin, TX Slide 5 Tutorial Goals Introduce Open|SpeedShop Basic concepts & terminology Basic concepts & terminology Running first examples Running first examples Provide Overview of Features Sampling & Tracing in O|SS Sampling & Tracing in O|SS Performance comparisons Performance comparisons Parallel performance analysis Parallel performance analysis Overview of advanced techniques Interactive performance analysis Interactive performance analysis Scripting & Python integration Scripting & Python integration

6 Tutorial @ SC2008, Austin, TX Slide 6 “Rules” Let’s keep this interactive Feel free to ask as we go along Feel free to ask as we go along Online demos as we go along Online demos as we go along Feel free to play along Live CDs with O|SS installed (for PCs) Live CDs with O|SS installed (for PCs) Ask us if you get stuck Ask us if you get stuck Feedback on O|SS What is good/missing in the tool? What is good/missing in the tool? What should be done differently? What should be done differently? Please report bugs/incompatibilities Please report bugs/incompatibilities

7 Tutorial @ SC2008, Austin, TX Slide 7 Presenters Martin Schulz, LLNL Jim Galarowicz, Krell Don Maghrak, Krell David Montoya, LANL Scott Cranford, Sandia Larger Team: William Hachfeld, Krell William Hachfeld, Krell Samuel Gutierrez, LANL Samuel Gutierrez, LANL Joseph Kenny, Sandia NLs Joseph Kenny, Sandia NLs Chris Chambreau, LLNL Chris Chambreau, LLNL

8 Tutorial @ SC2008, Austin, TX Slide 8 Outline  Introduction & Overview  Running a First Experiment  O|SS Sampling  Simple Comparisons  Break (30 minutes)  I/O Tracing Experiments  Parallel Performance Analysis  Installation Requirements and Process  Advanced Capabilities

9 9 Section 1 Overview & Terminology Parallel Performance Analysis with Open|SpeedShop

10 Tutorial @ SC2008, Austin, TX Slide 10 Results Experiment Workflow Run Application “Experiment” Results can be displayed using several “Views” Process Management Panel Consists of one or more data “Collectors” Stored in SQL database

11 Tutorial @ SC2008, Austin, TX Slide 11 High-level Architecture GUIpyO|SS CLI AMD and Intel based clusters/SSI using Linux CLI Open Source Software Code Instrumentation Experiments

12 Tutorial @ SC2008, Austin, TX Slide 12 Performance Experiments Concept of an Experiment What to measure and analyze? What to measure and analyze? Experiment chosen by user Experiment chosen by user Any experiment can be applied to any code Any experiment can be applied to any code Consists of Collectors and Views Collectors define specific data sources Collectors define specific data sources Hardware counters Tracing of certain routines Views specify data aggregation and presentation Views specify data aggregation and presentation Multiple collectors per experiment possible Multiple collectors per experiment possible

13 Tutorial @ SC2008, Austin, TX Slide 13 Experiment Types in O|SS Sampling Experiments Periodically interrupt run and record location Periodically interrupt run and record location Report statistical distribution of these locations Report statistical distribution of these locations Typically provides good overview Typically provides good overview Overhead mostly low and uniform Overhead mostly low and uniform Tracing Experiments Gather and store individual application events, e.g., function invocations (MPI, I/O, …) Gather and store individual application events, e.g., function invocations (MPI, I/O, …) Provides detailed, low-level information Provides detailed, low-level information Higher overhead, potentially bursty Higher overhead, potentially bursty

14 Tutorial @ SC2008, Austin, TX Slide 14 Sampling Experiments PC Sampling (pcsamp) Record PC in user defined time intervals Record PC in user defined time intervals Low overhead overview of time distribution Low overhead overview of time distribution User Time (usertime) PC Sampling + Call stacks for each sample PC Sampling + Call stacks for each sample Provides inclusive & exclusive timing data Provides inclusive & exclusive timing data Hardware Counters (hwc, hwctime) Sample HWC overflow events Sample HWC overflow events Access to data like cache and TLB misses Access to data like cache and TLB misses * Updated

15 Tutorial @ SC2008, Austin, TX Slide 15 Tracing Experiments I/O Tracing (io, iot) Record invocation of all POSIX I/O events Record invocation of all POSIX I/O events Provides aggregate and individual timings Provides aggregate and individual timings MPI Tracing (mpi, mpit, mpiotf) Record invocation of all MPI routines Record invocation of all MPI routines Provides aggregate and individual timings Provides aggregate and individual timings Floating Point Exception Tracing (fpe) Triggered by any FPE caused by the code Triggered by any FPE caused by the code Helps pinpoint numerical problem areas Helps pinpoint numerical problem areas * Updated

16 Tutorial @ SC2008, Austin, TX Slide 16 Parallel Experiments O|SS supports MPI and threaded codes Tested with a variety of MPI implementation Tested with a variety of MPI implementation Thread support based on POSIX threads Thread support based on POSIX threads OpenMPI only supported through POSIX threads OpenMPI only supported through POSIX threads Any experiment can be parallel Automatically applied to all tasks/threads Automatically applied to all tasks/threads Default views aggregate across all tasks/threads Default views aggregate across all tasks/threads Data from individual tasks/threads available Data from individual tasks/threads available Specific parallel experiments (e.g., MPI)

17 Tutorial @ SC2008, Austin, TX Slide 17 High-level Architecture GUIpyO|SS CLI AMD and Intel based clusters/SSI using Linux CLI Open Source Software Code Instrumentation

18 Tutorial @ SC2008, Austin, TX Slide 18 Code Instrumentation in O|SS Dynamic Instrumentation through DPCL Data delivered directly to tool online Data delivered directly to tool online Ability to attach to running application Ability to attach to running application

19 Tutorial @ SC2008, Austin, TX Slide 19 Initial Solution: DPCL MPI Application O|SS DPCL Communication Bottleneck OS limitations Communication Bottleneck OS limitations

20 Tutorial @ SC2008, Austin, TX Slide 20 Code Instrumentation in O|SS Dynamic Instrumentation through DPCL Data delivered directly to tool online Data delivered directly to tool online Ability to attach to running application Ability to attach to running application Offline/External Data Collection Instrument application at startup Instrument application at startup Write data to raw files and convert to O|SS Write data to raw files and convert to O|SS

21 Tutorial @ SC2008, Austin, TX Slide 21 Offline Data Collection MPI Application O|SS post- mortem Offline MPI Application O|SS DPCL

22 Tutorial @ SC2008, Austin, TX Slide 22 Code Instrumentation in O|SS Dynamic Instrumentation through DPCL Data delivered directly to tool online Data delivered directly to tool online Ability to attach to running application Ability to attach to running application Offline/External Data Collection Instrument application at startup Instrument application at startup Write data to raw files and convert to O|SS Write data to raw files and convert to O|SS Scalable Data Collection with MRNet Similar to DPCL, but scalable transport layer Similar to DPCL, but scalable transport layer Ability for interactive online analysis Ability for interactive online analysis

23 Tutorial @ SC2008, Austin, TX Slide 23 Hierarchical Online Collection MPI Application O|SS post- mortem Offline MPI Application O|SS DPCL MPI Application O|SS MRNet

24 Tutorial @ SC2008, Austin, TX Slide 24 Code Instrumentation in O|SS Dynamic Instrumentation through DPCL Data delivered directly to tool online Data delivered directly to tool online Ability to attach to running application Ability to attach to running application Offline/External Data Collection Instrument application at startup Instrument application at startup Write data to raw files and convert to O|SS Write data to raw files and convert to O|SS Scalable Data Collection with MRNet Similar to DPCL, but scalable transport layer Similar to DPCL, but scalable transport layer Ability for interactive online analysis Ability for interactive online analysis

25 Tutorial @ SC2008, Austin, TX Slide 25 High-level Architecture GUIpyO|SS CLI AMD and Intel based clusters/SSI using Linux CLI Open Source Software Code Instrumentation

26 Tutorial @ SC2008, Austin, TX Slide 26 Three Interfaces Experiment Commands expAttach expCreate expDetach expGo expView List Commands listExp listHosts listStatus Session Commands setBreak openGui import openss my_filename=oss.FileList("myprog.a.out") my_exptype=oss.ExpTypeList("pcsamp") my_id=oss.expCreate(my_filename,my_exptype) oss.expGo() My_metric_list = oss.MetricList("exclusive") my_viewtype = oss.ViewTypeList("pcsamp“) result = oss.expView(my_id,my_viewtype,my_metric_list)

27 Tutorial @ SC2008, Austin, TX Slide 27 Summary Open|SpeedShop provides comprehensive performance analysis options Important terminology Experiments: types of performance analysis Experiments: types of performance analysis Collectors: data sources Collectors: data sources Views: data presentation and aggregation Views: data presentation and aggregation Sampling vs. Tracing Sampling: overview data at low overhead Sampling: overview data at low overhead Tracing: details, but at higher cost Tracing: details, but at higher cost

28 28 Section 2 Running you First Experiment Parallel Performance Analysis with Open|SpeedShop

29 Tutorial @ SC2008, Austin, TX Slide 29 Running your First Experiment What do we mean by an experiment? Running a very basic experiment What does the command syntax look like? What does the command syntax look like? What are the outputs from the experiment? What are the outputs from the experiment? Viewing and Interpreting gathered measurements Introduce additional command syntax

30 Tutorial @ SC2008, Austin, TX Slide 30 What is an Experiment? The concept of an experiment Identify the application/executable to profile Identify the application/executable to profile Identify the type of performance data that is to be gathered Identify the type of performance data that is to be gathered Together they form what we call an experiment Together they form what we call an experimentApplication/executable Doesn’t need to be recompiled but needs –g type option to associate gathered data with functions and/or statements. Doesn’t need to be recompiled but needs –g type option to associate gathered data with functions and/or statements. Type of performance data (metric) Sampling based (program counter, call stack, # of events) Sampling based (program counter, call stack, # of events) Tracing based (wrap functions and record information) Tracing based (wrap functions and record information)

31 Tutorial @ SC2008, Austin, TX Slide 31 Basic experiment syntax openss –offline –f “executable” pcsamp openss is the command to invoke Open|SpeedShop openss is the command to invoke Open|SpeedShop -offline indicates the user interface to use (immediate command) -offline indicates the user interface to use (immediate command) There are a number of user interface options -f is the option for specifying the executable name -f is the option for specifying the executable name The “executable” can be a sequential or parallel command pcsamp indicates what type of performance data (metric) you will gather pcsamp indicates what type of performance data (metric) you will gather Here pcsamp indicates that we will periodically take a sample of the address that the program counter is pointing to. We will associate that address with a function and/or source line. There are several existing performance metric choices

32 Tutorial @ SC2008, Austin, TX Slide 32 What are the outputs? Outputs from : openss –offline –f “executable” pcsamp Normal program output while executable is running Normal program output while executable is running The sorted list of performance information The sorted list of performance information A list of the top time taking functions The corresponding sample derived time for each function A performance information database file A performance information database file The database file contains all the information needed to view the data at anytime in the future without the executable(s). Symbol table information from executable(s) and system libraries Performance data openss gathered Time stamps for when dso(s) were loaded and unloaded

33 Tutorial @ SC2008, Austin, TX Slide 33 Example Run with Output * Updated openss –offline –f “orterun -np 128 sweep3d.mpi” pcsamp

34 Tutorial @ SC2008, Austin, TX Slide 34 Example Run with Output * Updated openss –offline –f “orterun -np 128 sweep3d.mpi” pcsamp

35 Tutorial @ SC2008, Austin, TX Slide 35 Using the Database file Database file is one of the outputs from running: openss –offline –f “executable” pcsamp Use this file to view the data Use this file to view the data How to open the database file with openss How to open the database file with openss openss –f openss –f openss (then use menus or wizard to open) openss –cli exprestore –f exprestore –f In this example, we show both: In this example, we show both: openss –f X.0.openss (GUI) openss –cli –f X.0.openss (CLI) X.0.openss is the file name openss creates by default * Updated

36 Tutorial @ SC2008, Austin, TX Slide 36 Output from Example Run Loading the database file: openss –cli –f X.0.openss * NEW

37 Tutorial @ SC2008, Austin, TX Slide 37 openss –f X.0.openss: Control your job, focus stats panel, create process subsets Process Management Panel * NEW

38 Tutorial @ SC2008, Austin, TX Slide 38 GUI view of gathered data * Updated

39 Tutorial @ SC2008, Austin, TX Slide 39 Associate Source & Data * Updated

40 Tutorial @ SC2008, Austin, TX Slide 40 Additonal experiment syntax openss –offline –f “executable” pcsamp -offline indicates the user interface is immediate command mode. -offline indicates the user interface is immediate command mode. Uses offline (LD_PRELOAD) collection mechanism. Uses offline (LD_PRELOAD) collection mechanism. openss –cli –f “executable” pcsamp -cli indicates the user interface is interactive command line. -cli indicates the user interface is interactive command line. Uses online (dynamic instrumentation) collection mechanism. Uses online (dynamic instrumentation) collection mechanism. openss –f “executable” pcsamp No option indicates the user interface is graphical user. No option indicates the user interface is graphical user. Uses online (dynamic instrumentation) collection mechanism. Uses online (dynamic instrumentation) collection mechanism. openss –batch < input.commands.file Executes from file of cli commands Executes from file of cli commands

41 Tutorial @ SC2008, Austin, TX Slide 41 Demo of Basic Experiment Run Program Counter Sampling (pcsamp) on a sequential application.

42 42 Section 3 Sampling Experiments Parallel Performance Analysis with Open|SpeedShop

43 Tutorial @ SC2008, Austin, TX Slide 43 Sampling Experiments PC Sampling Approximates CPU Time For Line and Function Approximates CPU Time For Line and Function No Call Stacks No Call Stacks User Time Inclusive vs. Exclusive Time Inclusive vs. Exclusive Time Includes Call stacks Includes Call stacks HW Counters Samples Hardware Counter Overflows Samples Hardware Counter Overflows

44 Tutorial @ SC2008, Austin, TX Slide 44 Sampling - Considerations Sampling: Statistical Subset of All Events Low Overhead Low Overhead Low Perturbation Low Perturbation Good to Get Overview / Find Hotspots Good to Get Overview / Find Hotspots

45 Tutorial @ SC2008, Austin, TX Slide 45 Example 1 Offline usertime Experiment – smg2000 [samuel@yra084 test]$ openss -cli [samuel@yra084 test]$ openss -cli Welcome to OpenSpeedShop 1.6 Welcome to OpenSpeedShop 1.6 openss>>RunOfflineExp("mpirun -np 16 smg2000 -n 100 100 100","usertime") openss>>RunOfflineExp("mpirun -np 16 smg2000 -n 100 100 100","usertime") Experiment Application

46 Tutorial @ SC2008, Austin, TX Slide 46 Example 1- Views Default View Values Aggregated Across All Ranks Values Aggregated Across All Ranks Manually Include/Exclude Individual Processes Manually Include/Exclude Individual Processes

47 Tutorial @ SC2008, Austin, TX Slide 47 Example 1- Views Cont. Load Balance View Load Balance View Calculates min, max, average Across Calculates min, max, average Across Ranks, Processes, or Threads Ranks, Processes, or Threads

48 Tutorial @ SC2008, Austin, TX Slide 48 Example 1- Views Cont. Butterfly View Butterfly View Callers Of hypre_SMGResidual Callees Of hypre_SMGResidual * Updated

49 Tutorial @ SC2008, Austin, TX Slide 49 * Updated Example 1 Cont. Source Code Mapping

50 Tutorial @ SC2008, Austin, TX Slide 50 Example 2 – hwc Offline hwc Experiment – gamess openss -offline -f "./rungms tools/fmo/samples/PIEDA 01 1" hwc openss -offline -f "./rungms tools/fmo/samples/PIEDA 01 1" hwc-OR- openss -cli openss -cli Welcome to OpenSpeedShop 1.6 Welcome to OpenSpeedShop 1.6 openss>>RunOfflineExp("./rungms tools/fmo/samples/PIEDA 01 1","hwc") openss>>RunOfflineExp("./rungms tools/fmo/samples/PIEDA 01 1","hwc")

51 Tutorial @ SC2008, Austin, TX Slide 51 Example 2 – hwc Cont. Offline hwc Experiment –Default ViewOffline hwc Experiment –Default View

52 52 Section 4 Simple Comparisons Parallel Performance Analysis with Open|SpeedShop

53 Tutorial @ SC2008, Austin, TX Slide 53 Simple comparisons Views provide basic “default” views Sorted by statements/functions/objects Sorted by statements/functions/objects Sum across all tasks Sum across all tasks How to get a more insight? Compare results from different sources Compare results from different sources Combine multiple metrics Combine multiple metrics O|SS allows users to customize views Independent of experiment or metric Independent of experiment or metric Predefined “canned” views Predefined “canned” views User defined views User defined views

54 Tutorial @ SC2008, Austin, TX Slide 54 Compare Wizard Compare entire experiments against each other Select compare experiments from Intro Wizard Select compare experiments from Intro Wizard Select experiments from dialog page Select experiments from dialog page Follow wizard instructions Follow wizard instructions Creates a side by side compare panel Creates a side by side compare panel Example shown in next slides Create & Run executable from “orig” directory Create & Run executable from “orig” directory Copy source to “modified” directory & change Copy source to “modified” directory & change Run new “modified” version & compare results Run new “modified” version & compare results * NEW

55 Tutorial @ SC2008, Austin, TX Slide 55 Compare Wizard Select Compare Saved Performance Data * NEW

56 Tutorial @ SC2008, Austin, TX Slide 56 Compare Wizard Choose the experiments you want to compare * NEW

57 Tutorial @ SC2008, Austin, TX Slide 57 Compare Wizard Side by side performance results * NEW

58 Tutorial @ SC2008, Austin, TX Slide 58 Compare Wizard Go to the source for the selected function * NEW

59 Tutorial @ SC2008, Austin, TX Slide 59 Compare Wizard Side by Side Source for the two versions * NEW

60 Tutorial @ SC2008, Austin, TX Slide 60 Customizing Views Pick your own views Compare entire experiments against each other Compare entire experiments against each other Combine any data from any active experiment Combine any data from any active experiment Select metrics provided by any collector Select metrics provided by any collector Restrict data to arbitrary node sets Restrict data to arbitrary node sets Configurable through menu option “Customize Stats Panel” Activate from stats panel context menu or select “CC” icon in toolbar Activate from stats panel context menu or select “CC” icon in toolbar Allows definition of content for each column Allows definition of content for each column

61 Tutorial @ SC2008, Austin, TX Slide 61 Terminology: Views / Columns View Column

62 Tutorial @ SC2008, Austin, TX Slide 62 Designing a New View Decide on what to compare Load all necessary experiments Load all necessary experiments Create all columns using context menu Create all columns using context menu Select data for each column Pick experiment/collector/metric Pick experiment/collector/metric Select process and thread set Select process and thread set Drag and drop from process management panel Process Add/Remove dialog box “Focus StatsPanel” to activate view Context menu of Customize View StatsPanel Context menu of Customize View StatsPanel

63 Tutorial @ SC2008, Austin, TX Slide 63 Customize Comparison Panel Activate from context menu or select icon

64 Tutorial @ SC2008, Austin, TX Slide 64 Customize Comparison Panel “Load another experiment menu item” to load exp. Load additional experiments

65 Tutorial @ SC2008, Austin, TX Slide 65 Customize Comparison Panel Choose experiment to load.

66 Tutorial @ SC2008, Austin, TX Slide 66 Customize Comparison Panel Add a compare column Add compare column menu item

67 Tutorial @ SC2008, Austin, TX Slide 67 Customize Comparison Panel Choose experiment to compare in new column (2) Available Experiments has two choices

68 Tutorial @ SC2008, Austin, TX Slide 68 Customize Comparison Panel Choose “Focus StatsPanel” to create new panel Focus StatsPanel creates a comparison stats panel

69 Tutorial @ SC2008, Austin, TX Slide 69 Comparing Experiments View after Choosing “Focus Stats Panel” Data from “pcsamp” Experiment with ID 1 Data from “hwc” Experiment with ID 2

70 Tutorial @ SC2008, Austin, TX Slide 70 Summary Simple Comparison of two different metrics on same executable Views can be customized Enable detailed analysis Enable detailed analysis Customized Views to contrast … Results from multiple experiments Results from multiple experiments Multiple metrics from several collectors Multiple metrics from several collectors

71 71 Section 5 I/O Tracing Experiments Parallel Performance Analysis with Open|SpeedShop

72 Tutorial @ SC2008, Austin, TX Slide 72 I/O Tracing - Overview What Does IO Tracing Record? I/O Events I/O Events Intercepts All Calls to I/O Functions Intercepts All Calls to I/O Functions Records Current Stack Trace & Start/End Time Records Current Stack Trace & Start/End Time What About IOT Tracing? Collects Additional Information Collects Additional Information Function Parameters Function Return Value

73 Tutorial @ SC2008, Austin, TX Slide 73 I/O Tracing – Overview. What I/O Events Are Recorded? read, readv, pread, pread64 read, readv, pread, pread64 write, writev, pwrite, pwrite64 write, writev, pwrite, pwrite64 open, open64 open, open64 close close pipe, dup, dup2 pipe, dup, dup2 creat, creat64 creat, creat64 lseek, lseek64 lseek, lseek64 * Updated

74 Tutorial @ SC2008, Austin, TX Slide 74 I/O Tracing - Considerations Trace Experiments Collect Large Amounts of Data Collect Large Amounts of Data More Overhead (Compared To Sampling) More Overhead (Compared To Sampling) More Perturbation (Compared To Sampling) More Perturbation (Compared To Sampling) Allows For Fine-grained Analysis Allows For Fine-grained Analysis

75 Tutorial @ SC2008, Austin, TX Slide 75 Example 1 Offline IOT Experiment – gamess openss -offline -f "./ddikick.x gamess.01.x tools/efp/lysine -ddi 1 1 lanz -scr /scr/scranfo" iot openss -offline -f "./ddikick.x gamess.01.x tools/efp/lysine -ddi 1 1 lanz -scr /scr/scranfo" iot-OR- openss -cli openss -cli Welcome to OpenSpeedShop 1.6 Welcome to OpenSpeedShop 1.6 openss>>RunOfflineExp("./ddikick.x gamess.01.x tools/efp/lysine -ddi 1 1 lanz -scr /scr/scranfo","iot") openss>>RunOfflineExp("./ddikick.x gamess.01.x tools/efp/lysine -ddi 1 1 lanz -scr /scr/scranfo","iot")

76 Tutorial @ SC2008, Austin, TX Slide 76 Example 1 Cont. View Experiment Status – CLI expstatus expstatus

77 Tutorial @ SC2008, Austin, TX Slide 77 Example 1 Cont. View Experiment Output – CLI expview expview CLI View Help help expview help expview View Experiment Output – GUI opengui opengui

78 Tutorial @ SC2008, Austin, TX Slide 78 Example 1 Cont. expview Show Top N IO Calls expview -m count iotN openss>>expview -m count iot6 Number of Calls Function (defining location) Number of Calls Function (defining location) 6994 __libc_write (/lib/libpthread-2.4.so) 6994 __libc_write (/lib/libpthread-2.4.so) 548 llseek (/lib/libpthread-2.4.so) 548 llseek (/lib/libpthread-2.4.so) 384 __libc_read (/lib/libpthread-2.4.so) 384 __libc_read (/lib/libpthread-2.4.so) 8 __libc_close (/lib/libpthread-2.4.so) 8 __libc_close (/lib/libpthread-2.4.so) 5 __libc_open64 (/lib/libpthread-2.4.so) 5 __libc_open64 (/lib/libpthread-2.4.so) 5 __dup (/lib/libc-2.4.so) 5 __dup (/lib/libc-2.4.so)

79 Tutorial @ SC2008, Austin, TX Slide 79 Example 1 Cont. expview Show Time Spent In Particular Function expview -f expview -f openss>>expview -f __libc_read Exclusive I/O Call % of Total Function (defining location) Time(ms) Time(ms) 11.742272 0.416965 __libc_read (/lib/libpthread.so) 11.742272 0.416965 __libc_read (/lib/libpthread.so)

80 Tutorial @ SC2008, Austin, TX Slide 80 Example 1 Cont. expview Show Call Count For Particular Function expview -f -m count openss>>expview -f __libc_read -m count Number of Calls Function (defining location) Number of Calls Function (defining location) 384 __libc_read (/lib/libpthread-2.4.so) 384 __libc_read (/lib/libpthread-2.4.so)

81 Tutorial @ SC2008, Austin, TX Slide 81 Example 1 Cont. expview Show Ret. Val. & Start Time For Top N Calls expview -v trace -f -m retval -m start_time iotN openss>>expview -v trace -f __libc_read -m retval -m start_time iot5 Function Dependent Start Time Call Stack Function (defining location) Function Dependent Start Time Call Stack Function (defining location) Return Value Return Value 32720 2008/07/08 15:36:37 >>__libc_read (/lib/libpthread-2.4.so) 32720 2008/07/08 15:36:37 >>__libc_read (/lib/libpthread-2.4.so) 32720 2008/07/08 15:36:38 >>__libc_read (/lib/libpthread-2.4.so) 32720 2008/07/08 15:36:38 >>__libc_read (/lib/libpthread-2.4.so) 20960 2008/07/08 15:36:37 >>__libc_read (/lib/libpthread-2.4.so) 20960 2008/07/08 15:36:37 >>__libc_read (/lib/libpthread-2.4.so) 8192 2008/07/08 15:36:38 >>__libc_read (/lib/libpthread-2.4.so) 8192 2008/07/08 15:36:38 >>__libc_read (/lib/libpthread-2.4.so)

82 Tutorial @ SC2008, Austin, TX Slide 82 Example 1 Cont. expview Show Top N Most Time Consuming Call Trees expview -v calltrees,fullstack iotN o openss>>expview -vcalltrees,fullstack -mcount,time iot1 Number of Calls Exclusive I/O Call Call Stack Function (defining location) Time(ms) _start (gamess.01.x) ……. >>>>main (gamess.01.x) >>>>> @ 545 in MAIN__ (gamess.01.x: gamess.f,352) >>>>>> @ 741 in brnchx_ (gamess.01.x: gamess.f,695) >>>>>>> @ 989 in energx_ (gamess.01.x: gamess.f,777) >>>>>>>> @ 1742 in wfn_ (gamess.01.x: gamess.f,1637) …… >>>>>>>>>>>>>>>find_or_create_unit (libgfortran.so.1.0.0) >>>>>>>>>>>>>>>>find_or_create_unit (libgfortran.so.1.0.0) 845 167.248327 >>>>>>>>>>>>>>>>>__libc_write (libpthread-2.4.so) * Updated

83 Tutorial @ SC2008, Austin, TX Slide 83 Example 1 Cont. - opengui * Updated

84 Tutorial @ SC2008, Austin, TX Slide 84 Example 2 – Parallel App. Let's Look At A Parallel Example Offline IO Experiment - smg2000 Offline IO Experiment - smg2000 [samuel@yra131 smg2000]$ openss -cli [samuel@yra131 smg2000]$ openss -clisamuel@yra131 Welcome to OpenSpeedShop 1.6 Welcome to OpenSpeedShop 1.6 openss>>RunOfflineExp("mpirun -np 16 smg2000 -n 100 100 100","io") openss>>RunOfflineExp("mpirun -np 16 smg2000 -n 100 100 100","io")...... openss>>opengui openss>>opengui

85 Tutorial @ SC2008, Austin, TX Slide 85 Example 2 – Parallel App. * Updated

86 Tutorial @ SC2008, Austin, TX Slide 86 Summary I/O Collectors Intercept All Calls to I/O Functions Intercept All Calls to I/O Functions Record Current Stack Trace & Start/End Time Record Current Stack Trace & Start/End Time Can Collect Detailed Ancillary Data (IOT) Can Collect Detailed Ancillary Data (IOT) Trace Experiments Collect Large Amounts of Data Allows For Fine-grained Analysis

87 87 Section 6 Parallel Performance Analysis Parallel Performance Analysis with Open|SpeedShop

88 Tutorial @ SC2008, Austin, TX Slide 88 Experiment Types Open|SpeedShop is designed to work on parallel jobs Focus here: parallelism using MPI Focus here: parallelism using MPI Sequential experiments Apply experiment/collectors to all nodes Apply experiment/collectors to all nodes By default display aggregate results By default display aggregate results Optional select individual groups of processes Optional select individual groups of processes MPI experiments Tracing of MPI calls Tracing of MPI calls Can be combined with sequential collectors Can be combined with sequential collectors

89 Tutorial @ SC2008, Austin, TX Slide 89 Configuration Multiple MPI versions possible -Open|SpeedShop MPI and Vampirtrace OPENSS_MPI_LAMset to MPI LAM installation dir OPENSS_MPI_OPENMPIset to MPI OPENMPI installation dir OPENSS_MPI_MPICHset to MPI MPICH installation dir OPENSS_MPI_MPICH2set to MPI MPICH2 installation dir OPENSS_MPI_MPICH_DRIVERmpich/mpich2 driver name [ch-p4] OPENSS_MPI_MPTset to SGI MPI MPT installation dir OPENSS_MPI_MVAPICHset to MPI MVAPICH installation dir Specify during installation (using the install.sh script)

90 Tutorial @ SC2008, Austin, TX Slide 90 MPI Job Start & Attach (offline) MPI process control Link in O|SS collector into MPI application Link in O|SS collector into MPI applicationExamples openss –offline –f “mpirun -np 4 sweep3d.mpi” pcsamp openss –offline –f “mpirun -np 4 sweep3d.mpi” pcsamp openss –offline –f “srun –N 4 –n 16 sweep3d.mpi” pcsamp openss –offline –f “srun –N 4 –n 16 sweep3d.mpi” pcsamp openss –offline –f “orterun –np 16 sweep3d.mpi” usertime openss –offline –f “orterun –np 16 sweep3d.mpi” usertime

91 Tutorial @ SC2008, Austin, TX Slide 91 Parallel Result Analysis Default views Values aggregated across all ranks Values aggregated across all ranks Manually include/exclude individual processes Manually include/exclude individual processes Rank comparisons Use Customize Stats Panel View Use Customize Stats Panel View Create columns for process groups Create columns for process groups Cluster Analysis Automatically create process groups of similar processes Automatically create process groups of similar processes Available from Stats Panel context menu Available from Stats Panel context menu

92 Tutorial @ SC2008, Austin, TX Slide 92 Viewing Results by Process Choice of ranks

93 Tutorial @ SC2008, Austin, TX Slide 93 MPI Tracing Similar to I/O tracing Record all MPI call invocations Record all MPI call invocations By default: record call times (mpi) By default: record call times (mpi) Optional: record all arguments (mpit) Optional: record all arguments (mpit) Equal events will be aggregated Save space in database Save space in database Reduce overhead Reduce overhead Future plans: Full MPI traces in public format Full MPI traces in public format

94 Tutorial @ SC2008, Austin, TX Slide 94 Tracing Results: Default View * Updated

95 Tutorial @ SC2008, Austin, TX Slide 95 Tracing Results: Event View * Updated

96 Tutorial @ SC2008, Austin, TX Slide 96 Tracing Results: Creating Event View * Updated

97 Tutorial @ SC2008, Austin, TX Slide 97 Tracing Results: Specialized Event View

98 Tutorial @ SC2008, Austin, TX Slide 98 Results / Show: Callstacks

99 Tutorial @ SC2008, Austin, TX Slide 99 Predefined Analysis Views O|SS provides common analysis functions Designed for quick analysis of MPI applications Designed for quick analysis of MPI applications Create new views in the StatsPanel Create new views in the StatsPanel Accessible through context menu or toolbar Accessible through context menu or toolbar Load Balance View Calculate min, max, average across ranks, processes or threads Calculate min, max, average across ranks, processes or threads Comparative Analysis View Use “cluster analysis” algorithm to group like performing ranks, processes, or threads. Use “cluster analysis” algorithm to group like performing ranks, processes, or threads.

100 Tutorial @ SC2008, Austin, TX Slide 100 Quick Min, Max, Average View Select “LB” in Toolbar

101 Tutorial @ SC2008, Austin, TX Slide 101 Comparative Analysis: Clustering Ranks Select “CA” in Toolbar

102 Tutorial @ SC2008, Austin, TX Slide 102 Comparing Ranks (1) Use CustomizeStatsPanel Create columns for each process set to compare Select set of ranks for each column

103 Tutorial @ SC2008, Austin, TX Slide 103 Comparing Ranks (2) Rank 0 Rank 1

104 Tutorial @ SC2008, Austin, TX Slide 104 Summary Open|SpeedShop manages MPI jobs Works with multiple MPI implementations Works with multiple MPI implementations Process control using MPIR interface (dynamic) Process control using MPIR interface (dynamic) Parallel Experiments Apply sequential collectors to all nodes Apply sequential collectors to all nodes Specialized MPI tracing experiments Specialized MPI tracing experimentsResults By default aggregated across results By default aggregated across results Optional: select individual processes Optional: select individual processes Compare or group ranks & specialized views Compare or group ranks & specialized views

105 105 Section 7 System Requirements & Installation Parallel Performance Analysis with Open|SpeedShop

106 Tutorial @ SC2008, Austin, TX Slide 106 System Requirements System Architecture AMD Opteron/Athlon AMD Opteron/Athlon Intel x86, x86-64, and Itanium-2 Intel x86, x86-64, and Itanium-2 Operating System Tested on Many Popular Linux Distributions Tested on Many Popular Linux DistributionsSLESRHEL Fedora Core Etc.

107 Tutorial @ SC2008, Austin, TX Slide 107 System Software Prerequisites libelf (devel) System-specific version highly recommended System-specific version highly recommended Python (devel) Qt3 (devel) Typical Linux Development Environment GNU Autotools GNU Autotools GNU libtool GNU libtool Etc. Etc.

108 Tutorial @ SC2008, Austin, TX Slide 108 Getting The Source Sourceforge Project Home http://sourceforge.net/projects/openss http://sourceforge.net/projects/openss CVS Access http://sourceforge.net/cvs/?group_id=176777 http://sourceforge.net/cvs/?group_id=176777Packages Accessible From Project Home Download Tab Accessible From Project Home Download Tab Additional Information http://www.openspeedshop.org/ http://www.openspeedshop.org/

109 Tutorial @ SC2008, Austin, TX Slide 109 Preparing The Build Environment Important Environment Variables OPENSS_PREFIX OPENSS_PREFIX OPENSS_INSTRUMENTOR OPENSS_INSTRUMENTOR OPENSS_MPI_ OPENSS_MPI_ QTDIR QTDIR Simple bash Example export OPENSS_PREFIX=/home/skg/local export OPENSS_PREFIX=/home/skg/local export OPENSS_INSTRUMENTOR=mrnet export OPENSS_INSTRUMENTOR=mrnet export OPENSS_MPI_OPENMPI=/opt/openmpi export OPENSS_MPI_OPENMPI=/opt/openmpi export QTDIR=/usr/lib64/qt-3.3 export QTDIR=/usr/lib64/qt-3.3 * Updated

110 Tutorial @ SC2008, Austin, TX Slide 110 Building OpenSpeedShop Preparing Directory Structure For Install Script Needed If OSS & OSS_ROOT Were Pulled From CVS Needed If OSS & OSS_ROOT Were Pulled From CVS Rename OpenSpeedShop to openspeedshop-1.6 tar -czf openspeedshop-1.6.tar.gz openspeedshop-1.6 Note: In General, openspeedshop- Note: In General, openspeedshop- Move openspeedshop-1.6 to OSS_ROOT/SOURCES Navigate To OpenSpeedShop_ROOT We Now Have Two Options... We Now Have Two Options... Use install.sh to Build Prerequisites/OSS Interactively Use Option 9 For Automatic Build./install.sh [--with-option] [OPTION]./install.sh [--with-option] [OPTION]

111 Tutorial @ SC2008, Austin, TX Slide 111 Installation Script Overview

112 Tutorial @ SC2008, Austin, TX Slide 112 Basic Installation Structure

113 Tutorial @ SC2008, Austin, TX Slide 113 Post-Installation Setup Important Environment Variables (All Instrumentors) OPENSS_PREFIX OPENSS_PREFIX Path to installation directory Path to installation directory OPENSS_PLUGIN_PATH OPENSS_PLUGIN_PATH Path to directory where plugins are stored Path to directory where plugins are stored OPENSS_MPI_IMPLEMENTATION OPENSS_MPI_IMPLEMENTATION If multiple MPI implementations, this points openss at the one you are using in your application If multiple MPI implementations, this points openss at the one you are using in your application QTDIR, LD_LIBRARY_PATH, PATH QTDIR, LD_LIBRARY_PATH, PATH QT installation directory and Linux path variables QT installation directory and Linux path variables * Updated

114 Tutorial @ SC2008, Austin, TX Slide 114 Post-Installation Setup Cont. Simple bash Example export OPENSS_PREFIX=/home/skg/local export OPENSS_MPI_IMPLEMENTATION=openmpi export OPENSS_PLUGIN_PATH= $OPENSS_PREFIX/lib64/openspeedshop export LD_LIBRARY_PATH= $OPENSS_PREFIX/lib64:$LD_LIBRARY_PATH export QTDIR=/usr/lib64/qt-3.3 export PATH=$OPENSS_PREFIX/bin:$PATH * Updated

115 Tutorial @ SC2008, Austin, TX Slide 115 Post-Installation Setup (MRNet) Additional MRNet-Specific Environment Variables OPENSS_MRNET_TOPOLOGY_FILE OPENSS_MRNET_TOPOLOGY_FILE Additional Info: http://www.paradyn.org/mrnet DYNINSTAPI_RT_LIB DYNINSTAPI_RT_LIB MRNET_RSH MRNET_RSH XPLAT_RSHCOMMAND XPLAT_RSHCOMMAND OPENSS_RAWDATA_DIR OPENSS_RAWDATA_DIR Simple bash Example export OPENSS_MRNET_TOPOLOGY_FILE=/home/skg/oss.top export OPENSS_MRNET_TOPOLOGY_FILE=/home/skg/oss.top export DYNINSTAPI_RT_LIB=$OPENSS_PREFIX/lib64/libdyninstAPI_RT.so.1 export DYNINSTAPI_RT_LIB=$OPENSS_PREFIX/lib64/libdyninstAPI_RT.so.1 export MRNET_RSH=ssh export MRNET_RSH=ssh export XPLAT_RSHCOMMAND=ssh export XPLAT_RSHCOMMAND=ssh export OPENSS_RAWDATA_DIR=/panfs/scratch2/vol3/skg/rawdata export OPENSS_RAWDATA_DIR=/panfs/scratch2/vol3/skg/rawdata

116 Tutorial @ SC2008, Austin, TX Slide 116 Post-Installation Setup (Offline) Additional Offline-Specific Environment Variables OPENSS_RAWDATA_DIR OPENSS_RAWDATA_DIR OPENSS_RAWDATA_DIR must be defined within the user's environment before OpenSpeedShop is invoked. If OPENSS_RAWDATA_DIR is not defined in the user's current terminal session before the invocation of OpenSpeedShop, then all raw performance data files will be placed in /tmp/$USER/offline-oss That is, OPENSS_RAWDATA_DIR will automatically be initialized to /tmp/$USER/offline-oss for the duration of the OpenSpeedShop session. NOTE: In general, it is best if a user selects a target directory located on scratch space. Simple bash Example export OPENSS_RAWDATA_DIR=/panfs/scratch2/vol3/skg/rawdata export OPENSS_RAWDATA_DIR=/panfs/scratch2/vol3/skg/rawdata

117 117 Section 8 Advanced Capabilities Parallel Performance Analysis with Open|SpeedShop

118 Tutorial @ SC2008, Austin, TX Slide 118 Advanced Features Overview Dynamic instrumentation and online analysis Pros & Cons OR what to use when Pros & Cons OR what to use when Interactive performance analysis wizards Interactive performance analysis wizards Advanced GUI features Scripting Open|SpeedShop Using the command line interface Using the command line interface Integrating O|SS into Python Integrating O|SS into Python Plugin concept to extend Open|SpeedShop Future plugin plans

119 Tutorial @ SC2008, Austin, TX Slide 119 Interactive Analysis Dynamic Instrumentation Works on binaries Works on binaries Add/Change instrumentation at runtime Add/Change instrumentation at runtime Hierarchical Communication Efficient broadcast of commands Efficient broadcast of commands Online data reduction Online data reduction Interactive Control Available through GUI and CLI Available through GUI and CLI Start/Stop/Adjust data collection Start/Stop/Adjust data collection MPI Application O|SS MRNet

120 Tutorial @ SC2008, Austin, TX Slide 120 Pros & Cons Advantage: New capabilities Attach to/detach from running job Attach to/detach from running job Intermediate/partial results Intermediate/partial results Advantage: Easier use Control directly from one environment Control directly from one environment Disadvantage: Longer wait times Less information available at database creation time Less information available at database creation time Binary parsing expensive Binary parsing expensive Disadvantage: Stability Techniques are low-level and OS specific Techniques are low-level and OS specific

121 Tutorial @ SC2008, Austin, TX Slide 121 Using Dynamic Instrumentation MRNet instrumentor Can be installed in parallel with offline capabilities Can be installed in parallel with offline capabilities Site specific daemon launch instructions (site.py) Site specific daemon launch instructions (site.py) Launch control from within O|SS Select binary at experiment creation Select binary at experiment creation Attach option from within O|SS Select PID of target process at experiment creation Select PID of target process at experiment creation Alternative option: Wizards GUI offers wizards for all default experiments GUI offers wizards for all default experiments Easy to use walk through all options Easy to use walk through all options

122 Tutorial @ SC2008, Austin, TX Slide 122 Wizard Panels (1) Gather data from new runs Analyze and/or compare existing data from previous runs O|SS Command Line Interface

123 Tutorial @ SC2008, Austin, TX Slide 123 Wizard Panels (2) Select type of data to be gathered by Open|SpeedShop

124 Tutorial @ SC2008, Austin, TX Slide 124 Using the Wizards Wizard guide through standard steps Access to all available experiments Access to all available experiments Options in separate screens Options in separate screens Attach and start of applications possible Attach and start of applications possible Example: pcsamp Wizard Select “… where my program spends time …” Select “… where my program spends time …” Select sampling rate (100 is a good default) Select sampling rate (100 is a good default) Select program to run or attach to Select program to run or attach to Ensure all online daemons are started (site.py) Ensure all online daemons are started (site.py) Review and select finish Review and select finish

125 Tutorial @ SC2008, Austin, TX Slide 125 Dynamic MPI Job Start & Attach MPI process control Starting: depends on MPI launcher Starting: depends on MPI launcher Uses TotalView MPIR interface for MPI-1 Uses TotalView MPIR interface for MPI-1Examples Example MPICH: Example MPICH: Run on actual binary Example SLURM: Example SLURM: Run on “srun ” Attach: attach to process with MPIR table Select “-v mpi” or check “Attach to MPI job” Select “-v mpi” or check “Attach to MPI job”

126 Tutorial @ SC2008, Austin, TX Slide 126 Process Management Execution Control Process Overview Process Details

127 Tutorial @ SC2008, Austin, TX Slide 127 Experiment Control Process management panel Access to all tasks/processes/threads Access to all tasks/processes/threads Information on status and location Information on status and location Execution control Start/Pause/Resume/Stop jobs Start/Pause/Resume/Stop jobs Select process groups for views Collector control Add additional collectors to tasks Add additional collectors to tasks Create custom experiments Create custom experiments

128 Tutorial @ SC2008, Austin, TX Slide 128 General GUI Features GUI panel management Peel-off and rearrange any panel Peel-off and rearrange any panel Color coded panel groups per experiment Color coded panel groups per experiment Context sensitive menus Right click at any location Right click at any location Access to different views Access to different views Activate additional panels Activate additional panels Access to source location of events Double click on stats panel Double click on stats panel Opens source panel with (optional) statistics Opens source panel with (optional) statistics

129 Tutorial @ SC2008, Austin, TX Slide 129 Leaving the GUI Three different options to run without GUI Equal functionality Equal functionality Can transfer state/results Can transfer state/results Interactive Command Line Interface openss -cli openss -cli Batch Interface openss -batch < openss_cmd_file openss -batch < openss_cmd_file openss –batch –f openss –batch –f Python Scripting API python openss_python_script_file.py python openss_python_script_file.py

130 Tutorial @ SC2008, Austin, TX Slide 130 CLI Language An interactive command Line Interface gdb/dbx like processing gdb/dbx like processing Several interactive commands Create Experiments Create Experiments Provide Process/Thread Control Provide Process/Thread Control View Experiment Results View Experiment Results Where possible commands execute asynchronously http://www.openspeedshop.org/docs/cli_doc/

131 Tutorial @ SC2008, Austin, TX Slide 131 User-Time Example lnx-jeg.americas.sgi.com-17>openss -cli openss>>Welcome to OpenSpeedShop 1.9 openss>>expcreate -f test/executables/ fred/fred usertime The new focused experiment identifier is: -x 1 openss>>expgo Start asynchronous execution of experiment: -x 1 openss>>Experiment 1 has terminated. Create experiments and load application Start application

132 Tutorial @ SC2008, Austin, TX Slide 132 Showing CLI Results openss>>expview Excl CPU time Inclu CPU time % of Total Exclusive Function in seconds. in seconds. CPU Time (defining location) 5.2571 5.2571 49.7297 f3 (fred: f3.c,2) 3.3429 3.3429 31.6216 f2 (fred: f2.c,2) 1.9714 1.9714 18.6486 f1 (fred: f1.c,2) 0.0000 10.5714 0.0000 __libc_start_main (libc.so.6) 0.0000 10.5714 0.0000 _start (fred) 0.0000 10.5429 0.0000 work(fred:work.c,2) 0.0000 10.5714 0.0000 main (fred: fred.c,5)

133 Tutorial @ SC2008, Austin, TX Slide 133 CLI Batch Scripting (1) Create batch file with CLI commands Plain text file Plain text file Example: Example: # Create batch file echo expcreate -f fred pcsamp >> input.script echo expgo >> input.script echo expview pcsamp10 >>input.script # Run OpenSpeedShop openss -batch < input.script

134 Tutorial @ SC2008, Austin, TX Slide 134 CLI Batch Scripting (2) Open|SpeedShop Batch Example Results The new focused experiment identifier is: -x 1 Start asynchronous execution of experiment: -x 1 Experiment 1 has terminated. CPU Time Function (defining location) 24.2700 f3 (mutatee: mutatee.c,24) 16.0000 f2 (mutatee: mutatee.c,15) 8.9400 f1 (mutatee: mutatee.c,6) 0.0200 work (mutatee: mutatee.c,33)

135 Tutorial @ SC2008, Austin, TX Slide 135 CLI Batch Scripting (3) Open|SpeedShop Batch Example: direct #Run Open|SpeedShop as a single non-interactive command openss –batch –f fred pcsamp The new focused experiment identifier is: -x 1 Start asynchronous execution of experiment: -x 1 Experiment 1 has terminated. CPU Time Function (defining location) 24.2700 f3 (mutatee: mutatee.c,24) 16.0000 f2 (mutatee: mutatee.c,15) 8.9400 f1 (mutatee: mutatee.c,6) 0.0200 work (mutatee: mutatee.c,33)

136 Tutorial @ SC2008, Austin, TX Slide 136 CLI Integration in GUI Program Output & Command Line Panel

137 Tutorial @ SC2008, Austin, TX Slide 137 Command Line Panel Integrated with output panel O|SS command line language O|SS command line language Optional python support Optional python support Access to all loaded experiments Same context as GUI Same context as GUI Use experiment ID listed in panel name Use experiment ID listed in panel name “ History ” command Display all commands issued by GUI Display all commands issued by GUI Cut & paste output for scripting Cut & paste output for scripting

138 Tutorial @ SC2008, Austin, TX Slide 138 History Command

139 Tutorial @ SC2008, Austin, TX Slide 139 CLI Command Overview Experiment Creations – –expcreate – –expattach Experiment Control – –expgo – –expwait – –expdisable – –expenable Experiment Storage – –expsave – –exprestore Result Presentation – –expview – –opengui Misc. Commands – –help – –list – –log – –record – –playback – –history – –quit

140 Tutorial @ SC2008, Austin, TX Slide 140 Python Scripting Open|SpeedShop Python API that executes “same” Interactive/Batch Open|SpeedShop commands User can intersperse “normal” Python code with Open|SpeedShop Python API Run Open|SpeedShop experiments via the Open|SpeedShop Python API

141 Tutorial @ SC2008, Austin, TX Slide 141 Python Example (1) Necessary steps: Import O|SS Python module Import O|SS Python module Prepare arguments for target application Prepare arguments for target application Set view and experiment type Set view and experiment type Create experiment Create experiment import openss my_filename=openss.FileList("usability/phaseII/fred") my_viewtype = openss.ViewTypeList() my_viewtype += "pcsamp" exp1=openss.expCreate(my_filename,viewtype)

142 Tutorial @ SC2008, Austin, TX Slide 142 Python Example (2) After experiment creation Start target application (asynchronous!) Start target application (asynchronous!) Wait for completion Wait for completion Write results Write results openss.expGo() openss.wait() except openss.error: print "expGo(exp1,my_modifer) failed" openss.dumpView()

143 Tutorial @ SC2008, Austin, TX Slide 143 Python Example Output Two interfaces to dump data Plain text (similar to CLI) for viewing Plain text (similar to CLI) for viewing As Python objects for post-processing As Python objects for post-processing >python example.py /work/jeg/OpenSpeedShop/usability/phaseII/fred: successfully completed. Excl. CPU time % of CPU Time Function (def. location) \ 4.6700 47.7994 f3 (fred: f3.c,23) 3.5100 35.9263 f2 (fred: f2.c,2) 1.5900 16.2743 f1 (fred: f1.c,2)

144 Tutorial @ SC2008, Austin, TX Slide 144 Summary Multiple non-graphical interfaces Interactive Command Line Interactive Command Line Batch scripting Batch scripting Python module Python module Equal functionality Similar commands in all interfaces Similar commands in all interfaces Results transferable E.g., run in Python and view in GUI E.g., run in Python and view in GUI Possibility to switch GUI ↔ CLI Possibility to switch GUI ↔ CLI

145 Tutorial @ SC2008, Austin, TX Slide 145 Extensibility O|SS is more than a performance tool All functionality in one toolset with one interface All functionality in one toolset with one interface General infrastructure to create new tools General infrastructure to create new tools Plugins to add new functionality Cover all essential steps of performance analysis Cover all essential steps of performance analysis Automatically loaded at O|SS startup Automatically loaded at O|SS startup Three types of plugins Collectors: How to acquire performance data? Collectors: How to acquire performance data? Views: How to aggregate and present data? Views: How to aggregate and present data? Panels: How to visualize data in the GUI? Panels: How to visualize data in the GUI?

146 Tutorial @ SC2008, Austin, TX Slide 146 Plugin Dataflow Wizard Plugin Execution Environment Collector Plugin View Plugin Panel Plugin Instrumentor Data Abstraction CLI SQL Database

147 Tutorial @ SC2008, Austin, TX Slide 147 Plugin Plans Existing extensions to base experiments MPI tracing directly to OTF MPI tracing directly to OTF Extended parameter capture for MPI and I/O traces Extended parameter capture for MPI and I/O traces New plugins planned Code coverage Code coverage Memory profiling Memory profiling Extended Hardware Counter profiling Extended Hardware Counter profiling Interested in creating your own plugins? Step 1: Design tool workflow and map to plugins Step 1: Design tool workflow and map to plugins Step 2: Define data format (XDR encoding) Step 2: Define data format (XDR encoding) Examples and plugin creation guide available Examples and plugin creation guide available

148 Tutorial @ SC2008, Austin, TX Slide 148 Summary / Advanced Features Two techniques for instrumentation Online vs. Offline Online vs. Offline Different strength for different target scenarios Different strength for different target scenarios Flexible GUI that can be customized Several compatible scripting options Command Line Language Command Line Language Direct batch interface Direct batch interface Integration of O|SS into Python Integration of O|SS into Python GUI and scripting interoperable Plugin concept to extend Open|SpeedShop

149 149 Conclusions Parallel Performance Analysis with Open|SpeedShop

150 Tutorial @ SC2008, Austin, TX Slide 150 Open|SpeedShop Goal: One stop performance analysis All basic experiments in one tool All basic experiments in one tool Extensible using plugin mechanism Extensible using plugin mechanism Supports parallel programs Supports parallel programs Multiple interface Flexible GUI Flexible GUI Multiple scripting and batch options Multiple scripting and batch options Rich features Comparing results Comparing results Tracebacks/Callstacks in most experiments Tracebacks/Callstacks in most experiments Full I/O tracing support Full I/O tracing support

151 Tutorial @ SC2008, Austin, TX Slide 151 Documentation Open|SpeedShop User Guide Documentation http://www.openspeedshop.org/docs/users_guide/ http://www.openspeedshop.org/docs/users_guide/ http://www.openspeedshop.org/docs/users_guide/ /opt/OSS/share/doc/packages/OpenSpeedShop/users_guide /opt/OSS/share/doc/packages/OpenSpeedShop/users_guide Where /opt/OSS is the installation directory Python scripting API Documentation http://www.openspeedshop.org/docs/pyscripting_doc/ http://www.openspeedshop.org/docs/pyscripting_doc/ http://www.openspeedshop.org/docs/pyscripting_doc/ /opt/OSS/share/doc/packages/OpenSpeedShop/pyscripting_doc /opt/OSS/share/doc/packages/OpenSpeedShop/pyscripting_doc Where /opt/OSS is the installation directory Command Line Interface Documentation http://www.openspeedshop.org/docs/cli_doc/ http://www.openspeedshop.org/docs/cli_doc/ /opt/OSS/share/doc/packages/OpenSpeedShop/cli_doc /opt/OSS/share/doc/packages/OpenSpeedShop/cli_doc Where /opt/OSS is the installation directory

152 Tutorial @ SC2008, Austin, TX Slide 152 Current Status Open|SpeedShop 1.9 available Packages and source from sourceforge.net Packages and source from sourceforge.net Tested on a variety of platforms Tested on a variety of platforms Keep in mind, though: Open|SpeedShop is still under development Open|SpeedShop is still under development Low-level OS specific components Low-level OS specific components Keep us informed We are happy to help with problems We are happy to help with problems Interested in getting feedback Interested in getting feedback * Updated

153 Tutorial @ SC2008, Austin, TX Slide 153 Building a Community More then yet another research tool Large scale environment Large scale environment Flexible framework Flexible framework Cooperative atmosphere Cooperative atmosphereWanted: New Users and Open source tool developers New Users and Open source tool developers Tools/Software companies Tools/Software companies Goal: self-sustaining project * Updated

154 Tutorial @ SC2008, Austin, TX Slide 154 Availability and Contact Open|SpeedShop website: http://www.openspeedshop.org/ http://www.openspeedshop.org/ Download options: Package with Install Script Package with Install Script Source for tool and base libraries Source for tool and base librariesFeedback Bug tracking available from website Bug tracking available from website Contact information on website Contact information on website Feel free to contact presenters directly Feel free to contact presenters directly


Download ppt "1 Parallel Performance Analysis with Open|SpeedShop Half Day SC 2008 Austin, TX."

Similar presentations


Ads by Google