Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Parallel Performance Analysis with Open|SpeedShop NASA NASA Ames Research Center October 29, 2008.

Similar presentations


Presentation on theme: "1 Parallel Performance Analysis with Open|SpeedShop NASA NASA Ames Research Center October 29, 2008."— Presentation transcript:

1 1 Parallel Performance Analysis with Open|SpeedShop Seminar@ NASA NASA Ames Research Center October 29, 2008

2 Seminar @ NASA, 10-29-08 Slide 2 Presenters and Partners Jim Galarowicz, Krell Don Maghrak, Krell Larger Team: Martin Schulz, LLNL Martin Schulz, LLNL David Montoya, LANL David Montoya, LANL Scott Cranford, Sandia NLs University of Wisconsin Scott Cranford, Sandia NLs University of Wisconsin William Hachfeld, Krell University of Maryland William Hachfeld, Krell University of Maryland Samuel Gutierrez, LANL Rice University Samuel Gutierrez, LANL Rice University Joseph Kenny, Sandia NLs Joseph Kenny, Sandia NLs Chris Chambreau, LLNL Chris Chambreau, LLNL

3 Seminar @ NASA, 10-29-08 Slide 3 Seminar Goals Introduce Open|SpeedShop Basic concepts, terminology, modes of operation Basic concepts, terminology, modes of operation Running first examples Running first examples Provide Overview of Features Sampling & Tracing in O|SS Sampling & Tracing in O|SS Performance comparisons Performance comparisons Parallel performance analysis Parallel performance analysis Status and Roadmap

4 Seminar @ NASA, 10-29-08 Slide 4 Highlights Open Source Performance Analysis Tool Framework Most common performance analysis steps in one tool Most common performance analysis steps in one tool Extensible by using plugins for data collection and representation Extensible by using plugins for data collection and representation Profiling (sampling) and Tracing (wrapping functions) Profiling (sampling) and Tracing (wrapping functions) Multiple Instrumentation Options All work on unmodified application binaries All work on unmodified application binaries Need –g, but can be with –O3, O2, etc., in order to map to source lines. Offline data collection: run program start to end Offline data collection: run program start to end Online data collection with ability to attach to running applications. Start and stop data collection. Online data collection with ability to attach to running applications. Start and stop data collection.

5 Seminar @ NASA, 10-29-08 Slide 5 Highlights Flexible and Easy to use User access through: User access through: Graphical User Interface (GUI) Interactive Command Line Python Scripting API Large Range of Platforms Linux Clusters/SSI with x86, IA-64, Opteron, and EM64T CPUs Linux Clusters/SSI with x86, IA-64, Opteron, and EM64T CPUs New: more portable offline data collection mechanism New: more portable offline data collection mechanismAvailability Full source available on sourceforge.net Full source available on sourceforge.net Release tar balls on sourceforge.net Release tar balls on sourceforge.net

6 Seminar @ NASA, 10-29-08 Slide 6 O|SS Target Audience Programmers/code teams Use Open|SpeedShop out of the box Use Open|SpeedShop out of the box Powerful performance analysis Powerful performance analysis Ability to integrate O|SS into projects Ability to integrate O|SS into projects Tool developers Single, comprehensive infrastructure Single, comprehensive infrastructure Easy deployment of new tools Easy deployment of new tools Project/product specific customizations Predefined/custom experiments Predefined/custom experiments

7 Seminar @ NASA, 10-29-08 Slide 7 Performance Experiments Concept of an Experiment What program to analyze What program to analyze What type of performance data to gather What type of performance data to gather How often the performance data is gathered How often the performance data is gathered Consists of Collectors and Views Collectors define specific type of performance data Collectors define specific type of performance data Hardware counters, program counter samples Tracing of certain routines (I/O, MPI) Views specify data aggregation and presentation Views specify data aggregation and presentation Multiple collectors per experiment possible Multiple collectors per experiment possible

8 Seminar @ NASA, 10-29-08 Slide 8 Results Experiment Workflow Run Application “Experiment” Results can be displayed using several “Views” Process Management Panel Consists of one or more data “Collectors” Stored in SQL database

9 Seminar @ NASA, 10-29-08 Slide 9 Experiment Types in O|SS Sampling Experiments Periodically interrupt run and record location Periodically interrupt run and record location Report statistical distribution of these locations Report statistical distribution of these locations Typically provides good overview Typically provides good overview Overhead mostly low and uniform Overhead mostly low and uniform Tracing Experiments Gather and store individual application events, e.g., function invocations (MPI, I/O, …) Gather and store individual application events, e.g., function invocations (MPI, I/O, …) Provides detailed, low-level information Provides detailed, low-level information Higher overhead, potentially bursty Higher overhead, potentially bursty

10 Seminar @ NASA, 10-29-08 Slide 10 Sampling Experiments PC Sampling (pcsamp) Record PC in user defined time intervals Record PC in user defined time intervals Low overhead overview of time distribution Low overhead overview of time distribution User Time (usertime) PC Sampling + Call stacks for each sample PC Sampling + Call stacks for each sample Provides inclusive & exclusive timing data Provides inclusive & exclusive timing data Hardware Counters (hwc, hwctime) Sample HWC overflow events Sample HWC overflow events Access to data like cache and TLB misses Access to data like cache and TLB misses

11 Seminar @ NASA, 10-29-08 Slide 11 Tracing Experiments I/O Tracing (io, iot) Record invocation of all POSIX I/O events Record invocation of all POSIX I/O events Provides aggregate and individual timings Provides aggregate and individual timings MPI Tracing (mpi, mpit, mpiotf) Record invocation of all MPI routines Record invocation of all MPI routines Provides aggregate and individual timings Provides aggregate and individual timings Floating Point Exception Tracing (fpe) Triggered by any FPE caused by the code Triggered by any FPE caused by the code Helps pinpoint numerical problem areas Helps pinpoint numerical problem areas

12 Seminar @ NASA, 10-29-08 Slide 12 Parallel Experiments O|SS supports MPI and threaded codes Tested with a variety of MPI implementation Tested with a variety of MPI implementation Thread support based on POSIX threads Thread support based on POSIX threads Any collector can be applied to parallel job Automatically applied to all tasks/threads Automatically applied to all tasks/threads Default views aggregate across all tasks/threads Default views aggregate across all tasks/threads Data from individual tasks/threads available Data from individual tasks/threads available Specific parallel experiments ( e.g., mpi, mpit )

13 Seminar @ NASA, 10-29-08 Slide 13 High-level Architecture GUIpyO|SS CLI AMD and Intel based clusters/SSI using Linux CLI Open Source Software Code Instrumentation

14 Seminar @ NASA, 10-29-08 Slide 14 Code Instrumentation in O|SS Offline/External Data Collection Instrument application at start-up Instrument application at start-up Write data to raw files and convert to O|SS Write data to raw files and convert to O|SS Performance data available at end of execution. Performance data available at end of execution. Online Scalable Data Collection via MRNet Scalable transport layer Scalable transport layer Performance data delivered directly to tool online Performance data delivered directly to tool online Ability for interactive online analysis and viewing intermediate results Ability for interactive online analysis and viewing intermediate results

15 Seminar @ NASA, 10-29-08 Slide 15 Offline & Online Data Collection MPI Application O|SS post- mortem Offline MPI Application O|SS MRNet

16 Seminar @ NASA, 10-29-08 Slide 16 High-level Architecture GUIpyO|SS CLI AMD and Intel based clusters/SSI using Linux CLI Open Source Software Code Instrumentation

17 Seminar @ NASA, 10-29-08 Slide 17 Three Interfaces (GUI, CLI, Python) Experiment Commands expAttach expCreate expDetach expGo expView List Commands list -v exp list -v hosts list -v status Session Commands setBreak openGui import openss my_filename=oss.FileList("myprog.a.out") my_exptype=oss.ExpTypeList("pcsamp") my_id=oss.expCreate(my_filename,my_exptype) oss.expGo() My_metric_list = oss.MetricList("exclusive") my_viewtype = oss.ViewTypeList("pcsamp“) result = oss.expView(my_id,my_viewtype,my_metric_list)

18 Seminar @ NASA, 10-29-08 Slide 18 Running an Experiment Running a simple example experiment Examine the command syntax Examine the command syntax List the outputs from the experiment List the outputs from the experiment Viewing and Interpreting gathered measurements GUI, CLI via the experiment database file GUI, CLI via the experiment database file Show “–offline” example in more detail Introduce additional command syntax

19 Seminar @ NASA, 10-29-08 Slide 19 Basic offline experiment syntax openss –offline –f “executable” pcsamp openss is the command to invoke Open|SpeedShop openss is the command to invoke Open|SpeedShop -offline indicates the user interface to use (immediate command) -offline indicates the user interface to use (immediate command) There are a number of user interface options -f is the option for specifying the executable name -f is the option for specifying the executable name The “executable” can be a sequential or parallel command pcsamp indicates what type of performance data (metric) you will gather pcsamp indicates what type of performance data (metric) you will gather Here pcsamp indicates that we will periodically take a sample of the address that the program counter is pointing to. We will associate that address with a function and/or source line. There are several existing performance metric choices

20 Seminar @ NASA, 10-29-08 Slide 20 What are the outputs? Outputs from : openss –offline –f “executable” pcsamp Normal program output while executable is running Normal program output while executable is running The sorted list of performance information The sorted list of performance information A list of the top time taking functions The corresponding sample derived time for each function A performance information database file A performance information database file The database file contains all the information needed to view the data at anytime in the future without the executable(s). Symbol table information from executable(s) and system libraries Performance data openss gathered Time stamps for when dso(s) were loaded and unloaded

21 Seminar @ NASA, 10-29-08 Slide 21 Example Parallel Run with Output openss –offline –f “orterun -np 128 sweep3d.mpi” pcsamp

22 Seminar @ NASA, 10-29-08 Slide 22 Output from Example Run openss –offline –f “orterun -np 128 sweep3d.mpi” pcsamp

23 Seminar @ NASA, 10-29-08 Slide 23 Using the Database file Database file is one of the outputs from running: openss –offline –f “executable” pcsamp Use this file to view the data Use this file to view the data How to open the database file with openss How to open the database file with openss openss –f openss –f openss (then use menus or wizard to open) openss –cli exprestore –f exprestore –f In this example, we show: both In this example, we show: both openss –cli –f X.0.openss (CLI) openss –f X.0.openss (GUI) X.0.openss is the file name openss creates by default

24 Seminar @ NASA, 10-29-08 Slide 24 Output from Example Run Loading the database file: openss –cli –f X.0.openss

25 Seminar @ NASA, 10-29-08 Slide 25 Process Management Panel Control your job, focus stats panel, create process subsets

26 Seminar @ NASA, 10-29-08 Slide 26 Default Stats Panel View openss –f X.0.openss: Performance statistics by function is default view

27 Seminar @ NASA, 10-29-08 Slide 27 Results map to Source Split screen mapping of performance data to source line

28 Seminar @ NASA, 10-29-08 Slide 28 Min,Max,Average (Load Balance) View Select “LB” in Toolbar to generate Load Balance View

29 Seminar @ NASA, 10-29-08 Slide 29 Comparative Analysis: Clustering Ranks Select “CA” in Toolbar to generate Comp. Analysis View

30 Seminar @ NASA, 10-29-08 Slide 30 Comparative Analysis: Clustering Ranks Select “CA” in Toolbar to generate Comp. Analysis View

31 Seminar @ NASA, 10-29-08 Slide 31 Additional experiment syntax openss –offline –f “executable” pcsamp -offline indicates the user interface is immediate command mode. -offline indicates the user interface is immediate command mode. Uses offline (LD_PRELOAD) collection mechanism. Uses offline (LD_PRELOAD) collection mechanism. openss –cli –f “executable” pcsamp -cli indicates the user interface is interactive command line. -cli indicates the user interface is interactive command line. Uses online (dynamic instrumentation) collection mechanism. Uses online (dynamic instrumentation) collection mechanism. openss –f “executable” pcsamp No interface option indicates the user interface is graphical user. No interface option indicates the user interface is graphical user. Uses online (dynamic instrumentation) collection mechanism. Uses online (dynamic instrumentation) collection mechanism. openss –batch < input.commands.file Executes from file of cli commands Executes from file of cli commands

32 Seminar @ NASA, 10-29-08 Slide 32 Wizard Panel – page 1 Gather data from new runs Analyze and/or compare existing data from previous runs O|SS Command Line Interface

33 Seminar @ NASA, 10-29-08 Slide 33 Wizard Panel – Gather new data Select type of data to be gathered by Open|SpeedShop

34 Seminar @ NASA, 10-29-08 Slide 34 Compare Wizard Side by side performance results

35 Seminar @ NASA, 10-29-08 Slide 35 Compare Wizard Side by Side Source for the two versions

36 Seminar @ NASA, 10-29-08 Slide 36 Comparing MPI Ranks Rank 0 Rank 1

37 Seminar @ NASA, 10-29-08 Slide 37 CLI Language An interactive command Line Interface gdb/dbx like processing gdb/dbx like processing Several interactive commands Create Experiments Create Experiments Provide Process/Thread Control Provide Process/Thread Control View Experiment Results View Experiment Results Where possible commands execute asynchronously http://www.openspeedshop.org/docs/cli_doc/

38 Seminar @ NASA, 10-29-08 Slide 38 CLI Command Overview Experiment Creations – –expcreate – –expattach Experiment Control – –expgo – –expwait – –expdisable – –expenable Experiment Storage – –expsave – –exprestore Result Presentation – –expview – –opengui Misc. Commands – –help – –list – –log – –record – –playback – –history – –quit

39 Seminar @ NASA, 10-29-08 Slide 39 User-Time Example lnx-jeg.americas.sgi.com-17>openss -cli openss>>Welcome to OpenSpeedShop 1.9 openss>>expcreate -f test/executables/ fred/fred usertime The new focused experiment identifier is: -x 1 openss>>expgo Start asynchronous execution of experiment: -x 1 openss>>Experiment 1 has terminated. Create experiments and load application Start application

40 Seminar @ NASA, 10-29-08 Slide 40 Showing CLI Results openss>>expview Excl CPU time Inclu CPU time % of Total Exclusive Function in seconds. in seconds. CPU Time (defining location) 5.2571 5.2571 49.7297 f3 (fred: f3.c,2) 3.3429 3.3429 31.6216 f2 (fred: f2.c,2) 1.9714 1.9714 18.6486 f1 (fred: f1.c,2) 0.0000 10.5714 0.0000 __libc_start_main (libc.so.6) 0.0000 10.5714 0.0000 _start (fred) 0.0000 10.5429 0.0000 work(fred:work.c,2) 0.0000 10.5714 0.0000 main (fred: fred.c,5)

41 Seminar @ NASA, 10-29-08 Slide 41 CLI Batch Scripting (1) Create batch file with CLI commands Plain text file Plain text file Example: Example: # Create batch file echo expcreate -f fred pcsamp >> input.script echo expgo >> input.script echo expview pcsamp10 >>input.script # Run OpenSpeedShop openss -batch < input.script

42 Seminar @ NASA, 10-29-08 Slide 42 CLI Batch Scripting (2) Open|SpeedShop Batch Example Results The new focused experiment identifier is: -x 1 Start asynchronous execution of experiment: -x 1 Experiment 1 has terminated. CPU Time Function (defining location) 24.2700 f3 (mutatee: mutatee.c,24) 16.0000 f2 (mutatee: mutatee.c,15) 8.9400 f1 (mutatee: mutatee.c,6) 0.0200 work (mutatee: mutatee.c,33)

43 Seminar @ NASA, 10-29-08 Slide 43 CLI Batch Scripting (3) Open|SpeedShop Batch Example: direct #Run Open|SpeedShop as a single non-interactive command openss –batch –f fred pcsamp The new focused experiment identifier is: -x 1 Start asynchronous execution of experiment: -x 1 Experiment 1 has terminated. CPU Time Function (defining location) 24.2700 f3 (mutatee: mutatee.c,24) 16.0000 f2 (mutatee: mutatee.c,15) 8.9400 f1 (mutatee: mutatee.c,6) 0.0200 work (mutatee: mutatee.c,33)

44 Seminar @ NASA, 10-29-08 Slide 44 Python Scripting Open|SpeedShop Python API that executes “same” Interactive/Batch Open|SpeedShop commands User can intersperse “normal” Python code with Open|SpeedShop Python API Run Open|SpeedShop experiments via the Open|SpeedShop Python API

45 Seminar @ NASA, 10-29-08 Slide 45 Python Example (1) Necessary steps: Import O|SS Python module Import O|SS Python module Prepare arguments for target application Prepare arguments for target application Set view and experiment type Set view and experiment type Create experiment Create experiment import openss my_filename=openss.FileList("usability/phaseII/fred") my_viewtype = openss.ViewTypeList() my_viewtype += "pcsamp" exp1=openss.expCreate(my_filename,viewtype)

46 Seminar @ NASA, 10-29-08 Slide 46 Python Example (2) After experiment creation Start target application (asynchronous!) Start target application (asynchronous!) Wait for completion Wait for completion Write results Write results openss.expGo() openss.wait() except openss.error: print "expGo(exp1,my_modifer) failed" openss.dumpView()

47 Seminar @ NASA, 10-29-08 Slide 47 Python Example Output Two interfaces to dump data Plain text (similar to CLI) for viewing Plain text (similar to CLI) for viewing As Python objects for post-processing As Python objects for post-processing >python example.py /work/jeg/OpenSpeedShop/usability/phaseII/fred: successfully completed. Excl. CPU time % of CPU Time Function (def. location) \ 4.6700 47.7994 f3 (fred: f3.c,23) 3.5100 35.9263 f2 (fred: f2.c,2) 1.5900 16.2743 f1 (fred: f1.c,2)

48 Seminar @ NASA, 10-29-08 Slide 48 Extensibility O|SS is more than a performance tool All functionality in one toolset with one interface All functionality in one toolset with one interface General infrastructure to create new tools General infrastructure to create new tools Plugins to add new functionality Cover all essential steps of performance analysis Cover all essential steps of performance analysis Automatically loaded at O|SS startup Automatically loaded at O|SS startup Three types of plugins Collectors: How to acquire performance data? Collectors: How to acquire performance data? Views: How to aggregate and present data? Views: How to aggregate and present data? Panels: How to visualize data in the GUI? Panels: How to visualize data in the GUI?

49 Seminar @ NASA, 10-29-08 Slide 49 Overview Summary Two techniques for instrumentation Online vs. Offline Online vs. Offline Different strength for different target scenarios Different strength for different target scenarios Flexible GUI that can be customized Several compatible scripting options Command Line Language Command Line Language Direct batch interface Direct batch interface Integration of O|SS into Python Integration of O|SS into Python GUI and scripting interoperable Plugin concept to extend Open|SpeedShop

50 Seminar @ NASA, 10-29-08 Slide 50 Status & Future Plans Open|SpeedShop 1.9 available shortly Packages and source from sourceforge.net Packages and source from sourceforge.net Tested on a variety of platforms Tested on a variety of platforms Offline version featured in version 1.9 Online (MRNet) work in progress Target is version 2.0 in December Target is version 2.0 in December Working on some platforms but not all Working on some platforms but not all Focus on Scalability in coming months Support for capability machines via Office of Science proposal with ASC assistance

51 Seminar @ NASA, 10-29-08 Slide 51 Availability and Contact Open|SpeedShop website: http://www.openspeedshop.org/ http://www.openspeedshop.org/ Installed on cfe1.nas.nasa.gov Download options: Package with Install Script Package with Install Script Source for tool and base libraries Source for tool and base librariesFeedback Bug tracking and contact info available from website Bug tracking and contact info available from website Feel free to contact presenters directly Feel free to contact presenters directly jeg@krellinst.orgjeg@krellinst.org and/or dpm@krellinst.org jeg@krellinst.org


Download ppt "1 Parallel Performance Analysis with Open|SpeedShop NASA NASA Ames Research Center October 29, 2008."

Similar presentations


Ads by Google