1 Parallel Performance Analysis with Open|SpeedShop Half Day SC 2008 Austin, TX.

Slides:



Advertisements
Similar presentations
K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.
Advertisements

With Folder HelpDesk for Outlook, support centres and other helpdesks can work efficiently with support cases inside Microsoft Outlook. The support tickets.
® IBM Software Group © 2010 IBM Corporation What’s New in Profiling & Code Coverage RAD V8 April 21, 2011 Kathy Chan
1 Generic logging layer for the distributed computing by Gene Van Buren Valeri Fine Jerome Lauret.
MCTS GUIDE TO MICROSOFT WINDOWS 7 Chapter 10 Performance Tuning.
11/18/2013 SC2013 Tutorial How to Analyze the Performance of Parallel Codes 101 A case study with Open|SpeedShop Section 2 Introduction into Tools and.
© by Pearson Education, Inc. All Rights Reserved.
Automating Tasks With Macros
Chapter 11 - Monitoring Server Performance1 Ch. 11 – Monitoring Server Performance MIS 431 – created Spring 2006.
A Guide to Oracle9i1 Creating an Integrated Database Application Chapter 8.
1 QED In Vivo USB Input Output Box configuration This tutorial contains a number of instructions embedded in a great deal of explanation. Procedures that.
Understanding and Managing WebSphere V5
DB Audit Expert v1.1 for Oracle Copyright © SoftTree Technologies, Inc. This presentation is for DB Audit Expert for Oracle version 1.1 which.
2. Introduction to the Visual Studio.NET IDE 2. Introduction to the Visual Studio.NET IDE Ch2 – Deitel’s Book.
TrendReader Standard 2 This generation of TrendReader Standard software utilizes the more familiar Windows format (“tree”) views of functions and file.
1 Integrated Development Environment Building Your First Project (A Step-By-Step Approach)
8 Copyright © 2004, Oracle. All rights reserved. Creating LOVs and Editors.
Sage CRM Developers Course
Silicon Graphics, Inc. Presented by: Open|SpeedShop ™ A Dyninst Based Performance Tool Overview & Status Jim Galarowicz Manager - SGI Tools group Bill.
1 Parallel Performance Analysis with Open|SpeedShop Trilab Tools-Workshop Martin Schulz, LLNL/CASC LLNL-PRES
Classroom User Training June 29, 2005 Presented by:
1 Parallel Performance Analysis with Open|SpeedShop NASA NASA Ames Research Center October 29, 2008.
MCTS Guide to Microsoft Windows 7
Blaise Barney, LLNL ASC Tri-Lab Code Development Tools Workshop Thursday, July 29, 2010 Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
UPC/SHMEM PAT High-level Design v.1.1 Hung-Hsun Su UPC Group, HCS lab 6/21/2005.
What is Sure BDCs? BDC stands for Batch Data Communication and is also known as Batch Input. It is a technique for mass input of data into SAP by simulating.
® IBM Software Group © 2009 IBM Corporation Rational Publishing Engine RQM Multi Level Report Tutorial David Rennie, IBM Rational Services A/NZ
WORK ON CLUSTER HYBRILIT E. Aleksandrov 1, D. Belyakov 1, M. Matveev 1, M. Vala 1,2 1 Joint Institute for nuclear research, LIT, Russia 2 Institute for.
Developing Workflows with SharePoint Designer David Coe Application Development Consultant Microsoft Corporation.
CCS APPS CODE COVERAGE. CCS APPS Code Coverage Definition: –The amount of code within a program that is exercised Uses: –Important for discovering code.
Silicon Graphics, Inc. Presented by: Open|SpeedShop™ An Open Source Performance Debugger Overview Jack Carter MTS SGI Core Engineering.
The Pipeline Processing Framework LSST Applications Meeting IPAC Feb. 19, 2008 Raymond Plante National Center for Supercomputing Applications.
T U T O R I A L  2009 Pearson Education, Inc. All rights reserved. 1 2 Welcome Application Introducing the Visual Basic 2008 Express Edition IDE.
Automating Database Processing Chapter 6. Chapter Introduction Design and implement user-friendly menu – Called navigation form Macros – Automate repetitive.
04/27/2011 Paradyn Week Open|SpeedShop & Component Based Tool Framework (CBTF) project status and news Jim Galarowicz, Don Maghrak The Krell Institute.
© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Introduction to HP Availability Manager.
*** CONFIDENTIAL *** © Toshiba Corporation 2008 Confidential Creating Report Templates.
SC 2012 © LLNL / JSC 1 HPCToolkit / Rice University Performance Analysis through callpath sampling  Designed for low overhead  Hot path analysis  Recovery.
Teacher’s Assessment Assistant Worksheet Builder Starting the Program
Silicon Graphics, Inc. Presented by: Open|SpeedShop™ Status, Issues, Schedule, Demonstration Jim Galarowicz SGI Core Engineering March 21, 2006.
DDT Debugging Techniques Carlos Rosales Scaling to Petascale 2010 July 7, 2010.
Introduction of Geoprocessing Topic 7a 4/10/2007.
CHAPTER TEN AUTHORING.
Renesas Technology America Inc. 1 SKP8CMINI Tutorial 2 Creating A New Project Using HEW.
Linux Operations and Administration
Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
Guide to Programming with Python Chapter One Getting Started: The Game Over Program.
FIX Eye FIX Eye Getting started: The guide EPAM Systems B2BITS.
GISMO/GEBndPlan Overview Geographic Information System Mapping Object.
Debugging parallel programs. Breakpoint debugging Probably the most widely familiar method of debugging programs is breakpoint debugging. In this method,
Overview of MSWS Control of 212-LC July 15, 2006.
Introduction to Interactive Media Interactive Media Tools: Authoring Applications.
Reading Flash. Training target: Read the following reading materials and use the reading skills mentioned in the passages above. You may also choose some.
Differences Training BAAN IVc-BaanERP 5.0c: Application Administration, Customization and Exchange BaanERP 5.0c Tools / Exchange.
® IBM Software Group © 2007 IBM Corporation Module 1: Getting Started with Rational Software Architect Essentials of Modeling with IBM Rational Software.
Lawrence Livermore National Laboratory Approaches for Scalable Performance Analysis and Optimization Martin Schulz  Open Source Performance Analysis Tool.
Introduction of Geoprocessing Lecture 9 3/24/2008.
IBM Express Runtime Quick Start Workshop © 2007 IBM Corporation Deploying a Solution.
Systems and User Interface Software. Types of Operating System  Single User  Multi User  Multi-tasking  Batch Processing  Interactive  Real Time.
Active-HDL Server Farm Course 11. All materials updated on: September 30, 2004 Outline 1.Introduction 2.Advantages 3.Requirements 4.Installation 5.Architecture.
Chapter 2 Build Your First Project A Step-by-Step Approach 2 Exploring Microsoft Visual Basic 6.0 Copyright © 1999 Prentice-Hall, Inc. By Carlotta Eaton.
Geant4 Computing Performance Task with Open|Speedshop Soon Yung Jun, Krzysztof Genser, Philippe Canal (Fermilab) 21 st Geant4 Collaboration Meeting, Ferrara,
Development Environment
Chapter 2: System Structures
Kanban Task Manager SharePoint Editions ‒ Introduction
COSC-4840 Software Engineering
From Open|SpeedShop to a Component Based Tool Framework
Using JDeveloper.
Kanban Task Manager SharePoint Editions ‒ Introduction
Presentation transcript:

1 Parallel Performance Analysis with Open|SpeedShop Half Day SC 2008 Austin, TX

SC2008, Austin, TX Slide 2 What is Open|SpeedShop? Comprehensive Open Source Performance Analysis Framework Combining Profiling and Tracing Combining Profiling and Tracing Common workflow for all experiments Common workflow for all experiments Flexible instrumentation Flexible instrumentation Extensibility through plugins Extensibility through pluginsPartners DOE/NNSA Tri-Labs (LLNL, LANL, SNLs) DOE/NNSA Tri-Labs (LLNL, LANL, SNLs) Krell Institute Krell Institute Universities of Wisconsin and Maryland Universities of Wisconsin and Maryland

SC2008, Austin, TX Slide 3 Highlights Open Source Performance Analysis Tool Framework Most common performance analysis steps all in one tool Most common performance analysis steps all in one tool Extensible by using plugins for data collection and representation Extensible by using plugins for data collection and representation Several Instrumentation Options All work on unmodified application binaries All work on unmodified application binaries Offline and online data collection / attach to running applications Offline and online data collection / attach to running applications Flexible and Easy to use User access through GUI, Command Line, and Python Scripting User access through GUI, Command Line, and Python Scripting Large Range of Platforms Linux Clusters with x86, IA-64, Opteron, and EM64T CPUs Linux Clusters with x86, IA-64, Opteron, and EM64T CPUs Easier portability with offline data collection mechanism Easier portability with offline data collection mechanismAvailability Used at all three ASC labs with lab-size applications Used at all three ASC labs with lab-size applications Full source available on sourceforge.net Full source available on sourceforge.net

SC2008, Austin, TX Slide 4 O|SS Target Audience Programmers/code teams Use Open|SpeedShop out of the box Use Open|SpeedShop out of the box Powerful performance analysis Powerful performance analysis Ability to integrate O|SS into projects Ability to integrate O|SS into projects Tool developers Single, comprehensive infrastructure Single, comprehensive infrastructure Easy deployment of new tools Easy deployment of new tools Project/product specific customizations Predefined/custom experiments Predefined/custom experiments

SC2008, Austin, TX Slide 5 Tutorial Goals Introduce Open|SpeedShop Basic concepts & terminology Basic concepts & terminology Running first examples Running first examples Provide Overview of Features Sampling & Tracing in O|SS Sampling & Tracing in O|SS Performance comparisons Performance comparisons Parallel performance analysis Parallel performance analysis Overview of advanced techniques Interactive performance analysis Interactive performance analysis Scripting & Python integration Scripting & Python integration

SC2008, Austin, TX Slide 6 “Rules” Let’s keep this interactive Feel free to ask as we go along Feel free to ask as we go along Online demos as we go along Online demos as we go along Feel free to play along Live CDs with O|SS installed (for PCs) Live CDs with O|SS installed (for PCs) Ask us if you get stuck Ask us if you get stuck Feedback on O|SS What is good/missing in the tool? What is good/missing in the tool? What should be done differently? What should be done differently? Please report bugs/incompatibilities Please report bugs/incompatibilities

SC2008, Austin, TX Slide 7 Presenters Martin Schulz, LLNL Jim Galarowicz, Krell Don Maghrak, Krell David Montoya, LANL Scott Cranford, Sandia Larger Team: William Hachfeld, Krell William Hachfeld, Krell Samuel Gutierrez, LANL Samuel Gutierrez, LANL Joseph Kenny, Sandia NLs Joseph Kenny, Sandia NLs Chris Chambreau, LLNL Chris Chambreau, LLNL

SC2008, Austin, TX Slide 8 Outline  Introduction & Overview  Running a First Experiment  O|SS Sampling  Simple Comparisons  Break (30 minutes)  I/O Tracing Experiments  Parallel Performance Analysis  Installation Requirements and Process  Advanced Capabilities

9 Section 1 Overview & Terminology Parallel Performance Analysis with Open|SpeedShop

SC2008, Austin, TX Slide 10 Results Experiment Workflow Run Application “Experiment” Results can be displayed using several “Views” Process Management Panel Consists of one or more data “Collectors” Stored in SQL database

SC2008, Austin, TX Slide 11 High-level Architecture GUIpyO|SS CLI AMD and Intel based clusters/SSI using Linux CLI Open Source Software Code Instrumentation Experiments

SC2008, Austin, TX Slide 12 Performance Experiments Concept of an Experiment What to measure and analyze? What to measure and analyze? Experiment chosen by user Experiment chosen by user Any experiment can be applied to any code Any experiment can be applied to any code Consists of Collectors and Views Collectors define specific data sources Collectors define specific data sources Hardware counters Tracing of certain routines Views specify data aggregation and presentation Views specify data aggregation and presentation Multiple collectors per experiment possible Multiple collectors per experiment possible

SC2008, Austin, TX Slide 13 Experiment Types in O|SS Sampling Experiments Periodically interrupt run and record location Periodically interrupt run and record location Report statistical distribution of these locations Report statistical distribution of these locations Typically provides good overview Typically provides good overview Overhead mostly low and uniform Overhead mostly low and uniform Tracing Experiments Gather and store individual application events, e.g., function invocations (MPI, I/O, …) Gather and store individual application events, e.g., function invocations (MPI, I/O, …) Provides detailed, low-level information Provides detailed, low-level information Higher overhead, potentially bursty Higher overhead, potentially bursty

SC2008, Austin, TX Slide 14 Sampling Experiments PC Sampling (pcsamp) Record PC in user defined time intervals Record PC in user defined time intervals Low overhead overview of time distribution Low overhead overview of time distribution User Time (usertime) PC Sampling + Call stacks for each sample PC Sampling + Call stacks for each sample Provides inclusive & exclusive timing data Provides inclusive & exclusive timing data Hardware Counters (hwc, hwctime) Sample HWC overflow events Sample HWC overflow events Access to data like cache and TLB misses Access to data like cache and TLB misses * Updated

SC2008, Austin, TX Slide 15 Tracing Experiments I/O Tracing (io, iot) Record invocation of all POSIX I/O events Record invocation of all POSIX I/O events Provides aggregate and individual timings Provides aggregate and individual timings MPI Tracing (mpi, mpit, mpiotf) Record invocation of all MPI routines Record invocation of all MPI routines Provides aggregate and individual timings Provides aggregate and individual timings Floating Point Exception Tracing (fpe) Triggered by any FPE caused by the code Triggered by any FPE caused by the code Helps pinpoint numerical problem areas Helps pinpoint numerical problem areas * Updated

SC2008, Austin, TX Slide 16 Parallel Experiments O|SS supports MPI and threaded codes Tested with a variety of MPI implementation Tested with a variety of MPI implementation Thread support based on POSIX threads Thread support based on POSIX threads OpenMPI only supported through POSIX threads OpenMPI only supported through POSIX threads Any experiment can be parallel Automatically applied to all tasks/threads Automatically applied to all tasks/threads Default views aggregate across all tasks/threads Default views aggregate across all tasks/threads Data from individual tasks/threads available Data from individual tasks/threads available Specific parallel experiments (e.g., MPI)

SC2008, Austin, TX Slide 17 High-level Architecture GUIpyO|SS CLI AMD and Intel based clusters/SSI using Linux CLI Open Source Software Code Instrumentation

SC2008, Austin, TX Slide 18 Code Instrumentation in O|SS Dynamic Instrumentation through DPCL Data delivered directly to tool online Data delivered directly to tool online Ability to attach to running application Ability to attach to running application

SC2008, Austin, TX Slide 19 Initial Solution: DPCL MPI Application O|SS DPCL Communication Bottleneck OS limitations Communication Bottleneck OS limitations

SC2008, Austin, TX Slide 20 Code Instrumentation in O|SS Dynamic Instrumentation through DPCL Data delivered directly to tool online Data delivered directly to tool online Ability to attach to running application Ability to attach to running application Offline/External Data Collection Instrument application at startup Instrument application at startup Write data to raw files and convert to O|SS Write data to raw files and convert to O|SS

SC2008, Austin, TX Slide 21 Offline Data Collection MPI Application O|SS post- mortem Offline MPI Application O|SS DPCL

SC2008, Austin, TX Slide 22 Code Instrumentation in O|SS Dynamic Instrumentation through DPCL Data delivered directly to tool online Data delivered directly to tool online Ability to attach to running application Ability to attach to running application Offline/External Data Collection Instrument application at startup Instrument application at startup Write data to raw files and convert to O|SS Write data to raw files and convert to O|SS Scalable Data Collection with MRNet Similar to DPCL, but scalable transport layer Similar to DPCL, but scalable transport layer Ability for interactive online analysis Ability for interactive online analysis

SC2008, Austin, TX Slide 23 Hierarchical Online Collection MPI Application O|SS post- mortem Offline MPI Application O|SS DPCL MPI Application O|SS MRNet

SC2008, Austin, TX Slide 24 Code Instrumentation in O|SS Dynamic Instrumentation through DPCL Data delivered directly to tool online Data delivered directly to tool online Ability to attach to running application Ability to attach to running application Offline/External Data Collection Instrument application at startup Instrument application at startup Write data to raw files and convert to O|SS Write data to raw files and convert to O|SS Scalable Data Collection with MRNet Similar to DPCL, but scalable transport layer Similar to DPCL, but scalable transport layer Ability for interactive online analysis Ability for interactive online analysis

SC2008, Austin, TX Slide 25 High-level Architecture GUIpyO|SS CLI AMD and Intel based clusters/SSI using Linux CLI Open Source Software Code Instrumentation

SC2008, Austin, TX Slide 26 Three Interfaces Experiment Commands expAttach expCreate expDetach expGo expView List Commands listExp listHosts listStatus Session Commands setBreak openGui import openss my_filename=oss.FileList("myprog.a.out") my_exptype=oss.ExpTypeList("pcsamp") my_id=oss.expCreate(my_filename,my_exptype) oss.expGo() My_metric_list = oss.MetricList("exclusive") my_viewtype = oss.ViewTypeList("pcsamp“) result = oss.expView(my_id,my_viewtype,my_metric_list)

SC2008, Austin, TX Slide 27 Summary Open|SpeedShop provides comprehensive performance analysis options Important terminology Experiments: types of performance analysis Experiments: types of performance analysis Collectors: data sources Collectors: data sources Views: data presentation and aggregation Views: data presentation and aggregation Sampling vs. Tracing Sampling: overview data at low overhead Sampling: overview data at low overhead Tracing: details, but at higher cost Tracing: details, but at higher cost

28 Section 2 Running you First Experiment Parallel Performance Analysis with Open|SpeedShop

SC2008, Austin, TX Slide 29 Running your First Experiment What do we mean by an experiment? Running a very basic experiment What does the command syntax look like? What does the command syntax look like? What are the outputs from the experiment? What are the outputs from the experiment? Viewing and Interpreting gathered measurements Introduce additional command syntax

SC2008, Austin, TX Slide 30 What is an Experiment? The concept of an experiment Identify the application/executable to profile Identify the application/executable to profile Identify the type of performance data that is to be gathered Identify the type of performance data that is to be gathered Together they form what we call an experiment Together they form what we call an experimentApplication/executable Doesn’t need to be recompiled but needs –g type option to associate gathered data with functions and/or statements. Doesn’t need to be recompiled but needs –g type option to associate gathered data with functions and/or statements. Type of performance data (metric) Sampling based (program counter, call stack, # of events) Sampling based (program counter, call stack, # of events) Tracing based (wrap functions and record information) Tracing based (wrap functions and record information)

SC2008, Austin, TX Slide 31 Basic experiment syntax openss –offline –f “executable” pcsamp openss is the command to invoke Open|SpeedShop openss is the command to invoke Open|SpeedShop -offline indicates the user interface to use (immediate command) -offline indicates the user interface to use (immediate command) There are a number of user interface options -f is the option for specifying the executable name -f is the option for specifying the executable name The “executable” can be a sequential or parallel command pcsamp indicates what type of performance data (metric) you will gather pcsamp indicates what type of performance data (metric) you will gather Here pcsamp indicates that we will periodically take a sample of the address that the program counter is pointing to. We will associate that address with a function and/or source line. There are several existing performance metric choices

SC2008, Austin, TX Slide 32 What are the outputs? Outputs from : openss –offline –f “executable” pcsamp Normal program output while executable is running Normal program output while executable is running The sorted list of performance information The sorted list of performance information A list of the top time taking functions The corresponding sample derived time for each function A performance information database file A performance information database file The database file contains all the information needed to view the data at anytime in the future without the executable(s). Symbol table information from executable(s) and system libraries Performance data openss gathered Time stamps for when dso(s) were loaded and unloaded

SC2008, Austin, TX Slide 33 Example Run with Output * Updated openss –offline –f “orterun -np 128 sweep3d.mpi” pcsamp

SC2008, Austin, TX Slide 34 Example Run with Output * Updated openss –offline –f “orterun -np 128 sweep3d.mpi” pcsamp

SC2008, Austin, TX Slide 35 Using the Database file Database file is one of the outputs from running: openss –offline –f “executable” pcsamp Use this file to view the data Use this file to view the data How to open the database file with openss How to open the database file with openss openss –f openss –f openss (then use menus or wizard to open) openss –cli exprestore –f exprestore –f In this example, we show both: In this example, we show both: openss –f X.0.openss (GUI) openss –cli –f X.0.openss (CLI) X.0.openss is the file name openss creates by default * Updated

SC2008, Austin, TX Slide 36 Output from Example Run Loading the database file: openss –cli –f X.0.openss * NEW

SC2008, Austin, TX Slide 37 openss –f X.0.openss: Control your job, focus stats panel, create process subsets Process Management Panel * NEW

SC2008, Austin, TX Slide 38 GUI view of gathered data * Updated

SC2008, Austin, TX Slide 39 Associate Source & Data * Updated

SC2008, Austin, TX Slide 40 Additonal experiment syntax openss –offline –f “executable” pcsamp -offline indicates the user interface is immediate command mode. -offline indicates the user interface is immediate command mode. Uses offline (LD_PRELOAD) collection mechanism. Uses offline (LD_PRELOAD) collection mechanism. openss –cli –f “executable” pcsamp -cli indicates the user interface is interactive command line. -cli indicates the user interface is interactive command line. Uses online (dynamic instrumentation) collection mechanism. Uses online (dynamic instrumentation) collection mechanism. openss –f “executable” pcsamp No option indicates the user interface is graphical user. No option indicates the user interface is graphical user. Uses online (dynamic instrumentation) collection mechanism. Uses online (dynamic instrumentation) collection mechanism. openss –batch < input.commands.file Executes from file of cli commands Executes from file of cli commands

SC2008, Austin, TX Slide 41 Demo of Basic Experiment Run Program Counter Sampling (pcsamp) on a sequential application.

42 Section 3 Sampling Experiments Parallel Performance Analysis with Open|SpeedShop

SC2008, Austin, TX Slide 43 Sampling Experiments PC Sampling Approximates CPU Time For Line and Function Approximates CPU Time For Line and Function No Call Stacks No Call Stacks User Time Inclusive vs. Exclusive Time Inclusive vs. Exclusive Time Includes Call stacks Includes Call stacks HW Counters Samples Hardware Counter Overflows Samples Hardware Counter Overflows

SC2008, Austin, TX Slide 44 Sampling - Considerations Sampling: Statistical Subset of All Events Low Overhead Low Overhead Low Perturbation Low Perturbation Good to Get Overview / Find Hotspots Good to Get Overview / Find Hotspots

SC2008, Austin, TX Slide 45 Example 1 Offline usertime Experiment – smg2000 test]$ openss -cli test]$ openss -cli Welcome to OpenSpeedShop 1.6 Welcome to OpenSpeedShop 1.6 openss>>RunOfflineExp("mpirun -np 16 smg2000 -n ","usertime") openss>>RunOfflineExp("mpirun -np 16 smg2000 -n ","usertime") Experiment Application

SC2008, Austin, TX Slide 46 Example 1- Views Default View Values Aggregated Across All Ranks Values Aggregated Across All Ranks Manually Include/Exclude Individual Processes Manually Include/Exclude Individual Processes

SC2008, Austin, TX Slide 47 Example 1- Views Cont. Load Balance View Load Balance View Calculates min, max, average Across Calculates min, max, average Across Ranks, Processes, or Threads Ranks, Processes, or Threads

SC2008, Austin, TX Slide 48 Example 1- Views Cont. Butterfly View Butterfly View Callers Of hypre_SMGResidual Callees Of hypre_SMGResidual * Updated

SC2008, Austin, TX Slide 49 * Updated Example 1 Cont. Source Code Mapping

SC2008, Austin, TX Slide 50 Example 2 – hwc Offline hwc Experiment – gamess openss -offline -f "./rungms tools/fmo/samples/PIEDA 01 1" hwc openss -offline -f "./rungms tools/fmo/samples/PIEDA 01 1" hwc-OR- openss -cli openss -cli Welcome to OpenSpeedShop 1.6 Welcome to OpenSpeedShop 1.6 openss>>RunOfflineExp("./rungms tools/fmo/samples/PIEDA 01 1","hwc") openss>>RunOfflineExp("./rungms tools/fmo/samples/PIEDA 01 1","hwc")

SC2008, Austin, TX Slide 51 Example 2 – hwc Cont. Offline hwc Experiment –Default ViewOffline hwc Experiment –Default View

52 Section 4 Simple Comparisons Parallel Performance Analysis with Open|SpeedShop

SC2008, Austin, TX Slide 53 Simple comparisons Views provide basic “default” views Sorted by statements/functions/objects Sorted by statements/functions/objects Sum across all tasks Sum across all tasks How to get a more insight? Compare results from different sources Compare results from different sources Combine multiple metrics Combine multiple metrics O|SS allows users to customize views Independent of experiment or metric Independent of experiment or metric Predefined “canned” views Predefined “canned” views User defined views User defined views

SC2008, Austin, TX Slide 54 Compare Wizard Compare entire experiments against each other Select compare experiments from Intro Wizard Select compare experiments from Intro Wizard Select experiments from dialog page Select experiments from dialog page Follow wizard instructions Follow wizard instructions Creates a side by side compare panel Creates a side by side compare panel Example shown in next slides Create & Run executable from “orig” directory Create & Run executable from “orig” directory Copy source to “modified” directory & change Copy source to “modified” directory & change Run new “modified” version & compare results Run new “modified” version & compare results * NEW

SC2008, Austin, TX Slide 55 Compare Wizard Select Compare Saved Performance Data * NEW

SC2008, Austin, TX Slide 56 Compare Wizard Choose the experiments you want to compare * NEW

SC2008, Austin, TX Slide 57 Compare Wizard Side by side performance results * NEW

SC2008, Austin, TX Slide 58 Compare Wizard Go to the source for the selected function * NEW

SC2008, Austin, TX Slide 59 Compare Wizard Side by Side Source for the two versions * NEW

SC2008, Austin, TX Slide 60 Customizing Views Pick your own views Compare entire experiments against each other Compare entire experiments against each other Combine any data from any active experiment Combine any data from any active experiment Select metrics provided by any collector Select metrics provided by any collector Restrict data to arbitrary node sets Restrict data to arbitrary node sets Configurable through menu option “Customize Stats Panel” Activate from stats panel context menu or select “CC” icon in toolbar Activate from stats panel context menu or select “CC” icon in toolbar Allows definition of content for each column Allows definition of content for each column

SC2008, Austin, TX Slide 61 Terminology: Views / Columns View Column

SC2008, Austin, TX Slide 62 Designing a New View Decide on what to compare Load all necessary experiments Load all necessary experiments Create all columns using context menu Create all columns using context menu Select data for each column Pick experiment/collector/metric Pick experiment/collector/metric Select process and thread set Select process and thread set Drag and drop from process management panel Process Add/Remove dialog box “Focus StatsPanel” to activate view Context menu of Customize View StatsPanel Context menu of Customize View StatsPanel

SC2008, Austin, TX Slide 63 Customize Comparison Panel Activate from context menu or select icon

SC2008, Austin, TX Slide 64 Customize Comparison Panel “Load another experiment menu item” to load exp. Load additional experiments

SC2008, Austin, TX Slide 65 Customize Comparison Panel Choose experiment to load.

SC2008, Austin, TX Slide 66 Customize Comparison Panel Add a compare column Add compare column menu item

SC2008, Austin, TX Slide 67 Customize Comparison Panel Choose experiment to compare in new column (2) Available Experiments has two choices

SC2008, Austin, TX Slide 68 Customize Comparison Panel Choose “Focus StatsPanel” to create new panel Focus StatsPanel creates a comparison stats panel

SC2008, Austin, TX Slide 69 Comparing Experiments View after Choosing “Focus Stats Panel” Data from “pcsamp” Experiment with ID 1 Data from “hwc” Experiment with ID 2

SC2008, Austin, TX Slide 70 Summary Simple Comparison of two different metrics on same executable Views can be customized Enable detailed analysis Enable detailed analysis Customized Views to contrast … Results from multiple experiments Results from multiple experiments Multiple metrics from several collectors Multiple metrics from several collectors

71 Section 5 I/O Tracing Experiments Parallel Performance Analysis with Open|SpeedShop

SC2008, Austin, TX Slide 72 I/O Tracing - Overview What Does IO Tracing Record? I/O Events I/O Events Intercepts All Calls to I/O Functions Intercepts All Calls to I/O Functions Records Current Stack Trace & Start/End Time Records Current Stack Trace & Start/End Time What About IOT Tracing? Collects Additional Information Collects Additional Information Function Parameters Function Return Value

SC2008, Austin, TX Slide 73 I/O Tracing – Overview. What I/O Events Are Recorded? read, readv, pread, pread64 read, readv, pread, pread64 write, writev, pwrite, pwrite64 write, writev, pwrite, pwrite64 open, open64 open, open64 close close pipe, dup, dup2 pipe, dup, dup2 creat, creat64 creat, creat64 lseek, lseek64 lseek, lseek64 * Updated

SC2008, Austin, TX Slide 74 I/O Tracing - Considerations Trace Experiments Collect Large Amounts of Data Collect Large Amounts of Data More Overhead (Compared To Sampling) More Overhead (Compared To Sampling) More Perturbation (Compared To Sampling) More Perturbation (Compared To Sampling) Allows For Fine-grained Analysis Allows For Fine-grained Analysis

SC2008, Austin, TX Slide 75 Example 1 Offline IOT Experiment – gamess openss -offline -f "./ddikick.x gamess.01.x tools/efp/lysine -ddi 1 1 lanz -scr /scr/scranfo" iot openss -offline -f "./ddikick.x gamess.01.x tools/efp/lysine -ddi 1 1 lanz -scr /scr/scranfo" iot-OR- openss -cli openss -cli Welcome to OpenSpeedShop 1.6 Welcome to OpenSpeedShop 1.6 openss>>RunOfflineExp("./ddikick.x gamess.01.x tools/efp/lysine -ddi 1 1 lanz -scr /scr/scranfo","iot") openss>>RunOfflineExp("./ddikick.x gamess.01.x tools/efp/lysine -ddi 1 1 lanz -scr /scr/scranfo","iot")

SC2008, Austin, TX Slide 76 Example 1 Cont. View Experiment Status – CLI expstatus expstatus

SC2008, Austin, TX Slide 77 Example 1 Cont. View Experiment Output – CLI expview expview CLI View Help help expview help expview View Experiment Output – GUI opengui opengui

SC2008, Austin, TX Slide 78 Example 1 Cont. expview Show Top N IO Calls expview -m count iotN openss>>expview -m count iot6 Number of Calls Function (defining location) Number of Calls Function (defining location) 6994 __libc_write (/lib/libpthread-2.4.so) 6994 __libc_write (/lib/libpthread-2.4.so) 548 llseek (/lib/libpthread-2.4.so) 548 llseek (/lib/libpthread-2.4.so) 384 __libc_read (/lib/libpthread-2.4.so) 384 __libc_read (/lib/libpthread-2.4.so) 8 __libc_close (/lib/libpthread-2.4.so) 8 __libc_close (/lib/libpthread-2.4.so) 5 __libc_open64 (/lib/libpthread-2.4.so) 5 __libc_open64 (/lib/libpthread-2.4.so) 5 __dup (/lib/libc-2.4.so) 5 __dup (/lib/libc-2.4.so)

SC2008, Austin, TX Slide 79 Example 1 Cont. expview Show Time Spent In Particular Function expview -f expview -f openss>>expview -f __libc_read Exclusive I/O Call % of Total Function (defining location) Time(ms) Time(ms) __libc_read (/lib/libpthread.so) __libc_read (/lib/libpthread.so)

SC2008, Austin, TX Slide 80 Example 1 Cont. expview Show Call Count For Particular Function expview -f -m count openss>>expview -f __libc_read -m count Number of Calls Function (defining location) Number of Calls Function (defining location) 384 __libc_read (/lib/libpthread-2.4.so) 384 __libc_read (/lib/libpthread-2.4.so)

SC2008, Austin, TX Slide 81 Example 1 Cont. expview Show Ret. Val. & Start Time For Top N Calls expview -v trace -f -m retval -m start_time iotN openss>>expview -v trace -f __libc_read -m retval -m start_time iot5 Function Dependent Start Time Call Stack Function (defining location) Function Dependent Start Time Call Stack Function (defining location) Return Value Return Value /07/08 15:36:37 >>__libc_read (/lib/libpthread-2.4.so) /07/08 15:36:37 >>__libc_read (/lib/libpthread-2.4.so) /07/08 15:36:38 >>__libc_read (/lib/libpthread-2.4.so) /07/08 15:36:38 >>__libc_read (/lib/libpthread-2.4.so) /07/08 15:36:37 >>__libc_read (/lib/libpthread-2.4.so) /07/08 15:36:37 >>__libc_read (/lib/libpthread-2.4.so) /07/08 15:36:38 >>__libc_read (/lib/libpthread-2.4.so) /07/08 15:36:38 >>__libc_read (/lib/libpthread-2.4.so)

SC2008, Austin, TX Slide 82 Example 1 Cont. expview Show Top N Most Time Consuming Call Trees expview -v calltrees,fullstack iotN o openss>>expview -vcalltrees,fullstack -mcount,time iot1 Number of Calls Exclusive I/O Call Call Stack Function (defining location) Time(ms) _start (gamess.01.x) ……. >>>>main (gamess.01.x) 545 in MAIN__ (gamess.01.x: gamess.f,352) 741 in brnchx_ (gamess.01.x: gamess.f,695) 989 in energx_ (gamess.01.x: gamess.f,777) 1742 in wfn_ (gamess.01.x: gamess.f,1637) …… >>>>>>>>>>>>>>>find_or_create_unit (libgfortran.so.1.0.0) >>>>>>>>>>>>>>>>find_or_create_unit (libgfortran.so.1.0.0) >>>>>>>>>>>>>>>>>__libc_write (libpthread-2.4.so) * Updated

SC2008, Austin, TX Slide 83 Example 1 Cont. - opengui * Updated

SC2008, Austin, TX Slide 84 Example 2 – Parallel App. Let's Look At A Parallel Example Offline IO Experiment - smg2000 Offline IO Experiment - smg2000 smg2000]$ openss -cli smg2000]$ openss Welcome to OpenSpeedShop 1.6 Welcome to OpenSpeedShop 1.6 openss>>RunOfflineExp("mpirun -np 16 smg2000 -n ","io") openss>>RunOfflineExp("mpirun -np 16 smg2000 -n ","io") openss>>opengui openss>>opengui

SC2008, Austin, TX Slide 85 Example 2 – Parallel App. * Updated

SC2008, Austin, TX Slide 86 Summary I/O Collectors Intercept All Calls to I/O Functions Intercept All Calls to I/O Functions Record Current Stack Trace & Start/End Time Record Current Stack Trace & Start/End Time Can Collect Detailed Ancillary Data (IOT) Can Collect Detailed Ancillary Data (IOT) Trace Experiments Collect Large Amounts of Data Allows For Fine-grained Analysis

87 Section 6 Parallel Performance Analysis Parallel Performance Analysis with Open|SpeedShop

SC2008, Austin, TX Slide 88 Experiment Types Open|SpeedShop is designed to work on parallel jobs Focus here: parallelism using MPI Focus here: parallelism using MPI Sequential experiments Apply experiment/collectors to all nodes Apply experiment/collectors to all nodes By default display aggregate results By default display aggregate results Optional select individual groups of processes Optional select individual groups of processes MPI experiments Tracing of MPI calls Tracing of MPI calls Can be combined with sequential collectors Can be combined with sequential collectors

SC2008, Austin, TX Slide 89 Configuration Multiple MPI versions possible -Open|SpeedShop MPI and Vampirtrace OPENSS_MPI_LAMset to MPI LAM installation dir OPENSS_MPI_OPENMPIset to MPI OPENMPI installation dir OPENSS_MPI_MPICHset to MPI MPICH installation dir OPENSS_MPI_MPICH2set to MPI MPICH2 installation dir OPENSS_MPI_MPICH_DRIVERmpich/mpich2 driver name [ch-p4] OPENSS_MPI_MPTset to SGI MPI MPT installation dir OPENSS_MPI_MVAPICHset to MPI MVAPICH installation dir Specify during installation (using the install.sh script)

SC2008, Austin, TX Slide 90 MPI Job Start & Attach (offline) MPI process control Link in O|SS collector into MPI application Link in O|SS collector into MPI applicationExamples openss –offline –f “mpirun -np 4 sweep3d.mpi” pcsamp openss –offline –f “mpirun -np 4 sweep3d.mpi” pcsamp openss –offline –f “srun –N 4 –n 16 sweep3d.mpi” pcsamp openss –offline –f “srun –N 4 –n 16 sweep3d.mpi” pcsamp openss –offline –f “orterun –np 16 sweep3d.mpi” usertime openss –offline –f “orterun –np 16 sweep3d.mpi” usertime

SC2008, Austin, TX Slide 91 Parallel Result Analysis Default views Values aggregated across all ranks Values aggregated across all ranks Manually include/exclude individual processes Manually include/exclude individual processes Rank comparisons Use Customize Stats Panel View Use Customize Stats Panel View Create columns for process groups Create columns for process groups Cluster Analysis Automatically create process groups of similar processes Automatically create process groups of similar processes Available from Stats Panel context menu Available from Stats Panel context menu

SC2008, Austin, TX Slide 92 Viewing Results by Process Choice of ranks

SC2008, Austin, TX Slide 93 MPI Tracing Similar to I/O tracing Record all MPI call invocations Record all MPI call invocations By default: record call times (mpi) By default: record call times (mpi) Optional: record all arguments (mpit) Optional: record all arguments (mpit) Equal events will be aggregated Save space in database Save space in database Reduce overhead Reduce overhead Future plans: Full MPI traces in public format Full MPI traces in public format

SC2008, Austin, TX Slide 94 Tracing Results: Default View * Updated

SC2008, Austin, TX Slide 95 Tracing Results: Event View * Updated

SC2008, Austin, TX Slide 96 Tracing Results: Creating Event View * Updated

SC2008, Austin, TX Slide 97 Tracing Results: Specialized Event View

SC2008, Austin, TX Slide 98 Results / Show: Callstacks

SC2008, Austin, TX Slide 99 Predefined Analysis Views O|SS provides common analysis functions Designed for quick analysis of MPI applications Designed for quick analysis of MPI applications Create new views in the StatsPanel Create new views in the StatsPanel Accessible through context menu or toolbar Accessible through context menu or toolbar Load Balance View Calculate min, max, average across ranks, processes or threads Calculate min, max, average across ranks, processes or threads Comparative Analysis View Use “cluster analysis” algorithm to group like performing ranks, processes, or threads. Use “cluster analysis” algorithm to group like performing ranks, processes, or threads.

SC2008, Austin, TX Slide 100 Quick Min, Max, Average View Select “LB” in Toolbar

SC2008, Austin, TX Slide 101 Comparative Analysis: Clustering Ranks Select “CA” in Toolbar

SC2008, Austin, TX Slide 102 Comparing Ranks (1) Use CustomizeStatsPanel Create columns for each process set to compare Select set of ranks for each column

SC2008, Austin, TX Slide 103 Comparing Ranks (2) Rank 0 Rank 1

SC2008, Austin, TX Slide 104 Summary Open|SpeedShop manages MPI jobs Works with multiple MPI implementations Works with multiple MPI implementations Process control using MPIR interface (dynamic) Process control using MPIR interface (dynamic) Parallel Experiments Apply sequential collectors to all nodes Apply sequential collectors to all nodes Specialized MPI tracing experiments Specialized MPI tracing experimentsResults By default aggregated across results By default aggregated across results Optional: select individual processes Optional: select individual processes Compare or group ranks & specialized views Compare or group ranks & specialized views

105 Section 7 System Requirements & Installation Parallel Performance Analysis with Open|SpeedShop

SC2008, Austin, TX Slide 106 System Requirements System Architecture AMD Opteron/Athlon AMD Opteron/Athlon Intel x86, x86-64, and Itanium-2 Intel x86, x86-64, and Itanium-2 Operating System Tested on Many Popular Linux Distributions Tested on Many Popular Linux DistributionsSLESRHEL Fedora Core Etc.

SC2008, Austin, TX Slide 107 System Software Prerequisites libelf (devel) System-specific version highly recommended System-specific version highly recommended Python (devel) Qt3 (devel) Typical Linux Development Environment GNU Autotools GNU Autotools GNU libtool GNU libtool Etc. Etc.

SC2008, Austin, TX Slide 108 Getting The Source Sourceforge Project Home CVS Access Accessible From Project Home Download Tab Accessible From Project Home Download Tab Additional Information

SC2008, Austin, TX Slide 109 Preparing The Build Environment Important Environment Variables OPENSS_PREFIX OPENSS_PREFIX OPENSS_INSTRUMENTOR OPENSS_INSTRUMENTOR OPENSS_MPI_ OPENSS_MPI_ QTDIR QTDIR Simple bash Example export OPENSS_PREFIX=/home/skg/local export OPENSS_PREFIX=/home/skg/local export OPENSS_INSTRUMENTOR=mrnet export OPENSS_INSTRUMENTOR=mrnet export OPENSS_MPI_OPENMPI=/opt/openmpi export OPENSS_MPI_OPENMPI=/opt/openmpi export QTDIR=/usr/lib64/qt-3.3 export QTDIR=/usr/lib64/qt-3.3 * Updated

SC2008, Austin, TX Slide 110 Building OpenSpeedShop Preparing Directory Structure For Install Script Needed If OSS & OSS_ROOT Were Pulled From CVS Needed If OSS & OSS_ROOT Were Pulled From CVS Rename OpenSpeedShop to openspeedshop-1.6 tar -czf openspeedshop-1.6.tar.gz openspeedshop-1.6 Note: In General, openspeedshop- Note: In General, openspeedshop- Move openspeedshop-1.6 to OSS_ROOT/SOURCES Navigate To OpenSpeedShop_ROOT We Now Have Two Options... We Now Have Two Options... Use install.sh to Build Prerequisites/OSS Interactively Use Option 9 For Automatic Build./install.sh [--with-option] [OPTION]./install.sh [--with-option] [OPTION]

SC2008, Austin, TX Slide 111 Installation Script Overview

SC2008, Austin, TX Slide 112 Basic Installation Structure

SC2008, Austin, TX Slide 113 Post-Installation Setup Important Environment Variables (All Instrumentors) OPENSS_PREFIX OPENSS_PREFIX Path to installation directory Path to installation directory OPENSS_PLUGIN_PATH OPENSS_PLUGIN_PATH Path to directory where plugins are stored Path to directory where plugins are stored OPENSS_MPI_IMPLEMENTATION OPENSS_MPI_IMPLEMENTATION If multiple MPI implementations, this points openss at the one you are using in your application If multiple MPI implementations, this points openss at the one you are using in your application QTDIR, LD_LIBRARY_PATH, PATH QTDIR, LD_LIBRARY_PATH, PATH QT installation directory and Linux path variables QT installation directory and Linux path variables * Updated

SC2008, Austin, TX Slide 114 Post-Installation Setup Cont. Simple bash Example export OPENSS_PREFIX=/home/skg/local export OPENSS_MPI_IMPLEMENTATION=openmpi export OPENSS_PLUGIN_PATH= $OPENSS_PREFIX/lib64/openspeedshop export LD_LIBRARY_PATH= $OPENSS_PREFIX/lib64:$LD_LIBRARY_PATH export QTDIR=/usr/lib64/qt-3.3 export PATH=$OPENSS_PREFIX/bin:$PATH * Updated

SC2008, Austin, TX Slide 115 Post-Installation Setup (MRNet) Additional MRNet-Specific Environment Variables OPENSS_MRNET_TOPOLOGY_FILE OPENSS_MRNET_TOPOLOGY_FILE Additional Info: DYNINSTAPI_RT_LIB DYNINSTAPI_RT_LIB MRNET_RSH MRNET_RSH XPLAT_RSHCOMMAND XPLAT_RSHCOMMAND OPENSS_RAWDATA_DIR OPENSS_RAWDATA_DIR Simple bash Example export OPENSS_MRNET_TOPOLOGY_FILE=/home/skg/oss.top export OPENSS_MRNET_TOPOLOGY_FILE=/home/skg/oss.top export DYNINSTAPI_RT_LIB=$OPENSS_PREFIX/lib64/libdyninstAPI_RT.so.1 export DYNINSTAPI_RT_LIB=$OPENSS_PREFIX/lib64/libdyninstAPI_RT.so.1 export MRNET_RSH=ssh export MRNET_RSH=ssh export XPLAT_RSHCOMMAND=ssh export XPLAT_RSHCOMMAND=ssh export OPENSS_RAWDATA_DIR=/panfs/scratch2/vol3/skg/rawdata export OPENSS_RAWDATA_DIR=/panfs/scratch2/vol3/skg/rawdata

SC2008, Austin, TX Slide 116 Post-Installation Setup (Offline) Additional Offline-Specific Environment Variables OPENSS_RAWDATA_DIR OPENSS_RAWDATA_DIR OPENSS_RAWDATA_DIR must be defined within the user's environment before OpenSpeedShop is invoked. If OPENSS_RAWDATA_DIR is not defined in the user's current terminal session before the invocation of OpenSpeedShop, then all raw performance data files will be placed in /tmp/$USER/offline-oss That is, OPENSS_RAWDATA_DIR will automatically be initialized to /tmp/$USER/offline-oss for the duration of the OpenSpeedShop session. NOTE: In general, it is best if a user selects a target directory located on scratch space. Simple bash Example export OPENSS_RAWDATA_DIR=/panfs/scratch2/vol3/skg/rawdata export OPENSS_RAWDATA_DIR=/panfs/scratch2/vol3/skg/rawdata

117 Section 8 Advanced Capabilities Parallel Performance Analysis with Open|SpeedShop

SC2008, Austin, TX Slide 118 Advanced Features Overview Dynamic instrumentation and online analysis Pros & Cons OR what to use when Pros & Cons OR what to use when Interactive performance analysis wizards Interactive performance analysis wizards Advanced GUI features Scripting Open|SpeedShop Using the command line interface Using the command line interface Integrating O|SS into Python Integrating O|SS into Python Plugin concept to extend Open|SpeedShop Future plugin plans

SC2008, Austin, TX Slide 119 Interactive Analysis Dynamic Instrumentation Works on binaries Works on binaries Add/Change instrumentation at runtime Add/Change instrumentation at runtime Hierarchical Communication Efficient broadcast of commands Efficient broadcast of commands Online data reduction Online data reduction Interactive Control Available through GUI and CLI Available through GUI and CLI Start/Stop/Adjust data collection Start/Stop/Adjust data collection MPI Application O|SS MRNet

SC2008, Austin, TX Slide 120 Pros & Cons Advantage: New capabilities Attach to/detach from running job Attach to/detach from running job Intermediate/partial results Intermediate/partial results Advantage: Easier use Control directly from one environment Control directly from one environment Disadvantage: Longer wait times Less information available at database creation time Less information available at database creation time Binary parsing expensive Binary parsing expensive Disadvantage: Stability Techniques are low-level and OS specific Techniques are low-level and OS specific

SC2008, Austin, TX Slide 121 Using Dynamic Instrumentation MRNet instrumentor Can be installed in parallel with offline capabilities Can be installed in parallel with offline capabilities Site specific daemon launch instructions (site.py) Site specific daemon launch instructions (site.py) Launch control from within O|SS Select binary at experiment creation Select binary at experiment creation Attach option from within O|SS Select PID of target process at experiment creation Select PID of target process at experiment creation Alternative option: Wizards GUI offers wizards for all default experiments GUI offers wizards for all default experiments Easy to use walk through all options Easy to use walk through all options

SC2008, Austin, TX Slide 122 Wizard Panels (1) Gather data from new runs Analyze and/or compare existing data from previous runs O|SS Command Line Interface

SC2008, Austin, TX Slide 123 Wizard Panels (2) Select type of data to be gathered by Open|SpeedShop

SC2008, Austin, TX Slide 124 Using the Wizards Wizard guide through standard steps Access to all available experiments Access to all available experiments Options in separate screens Options in separate screens Attach and start of applications possible Attach and start of applications possible Example: pcsamp Wizard Select “… where my program spends time …” Select “… where my program spends time …” Select sampling rate (100 is a good default) Select sampling rate (100 is a good default) Select program to run or attach to Select program to run or attach to Ensure all online daemons are started (site.py) Ensure all online daemons are started (site.py) Review and select finish Review and select finish

SC2008, Austin, TX Slide 125 Dynamic MPI Job Start & Attach MPI process control Starting: depends on MPI launcher Starting: depends on MPI launcher Uses TotalView MPIR interface for MPI-1 Uses TotalView MPIR interface for MPI-1Examples Example MPICH: Example MPICH: Run on actual binary Example SLURM: Example SLURM: Run on “srun ” Attach: attach to process with MPIR table Select “-v mpi” or check “Attach to MPI job” Select “-v mpi” or check “Attach to MPI job”

SC2008, Austin, TX Slide 126 Process Management Execution Control Process Overview Process Details

SC2008, Austin, TX Slide 127 Experiment Control Process management panel Access to all tasks/processes/threads Access to all tasks/processes/threads Information on status and location Information on status and location Execution control Start/Pause/Resume/Stop jobs Start/Pause/Resume/Stop jobs Select process groups for views Collector control Add additional collectors to tasks Add additional collectors to tasks Create custom experiments Create custom experiments

SC2008, Austin, TX Slide 128 General GUI Features GUI panel management Peel-off and rearrange any panel Peel-off and rearrange any panel Color coded panel groups per experiment Color coded panel groups per experiment Context sensitive menus Right click at any location Right click at any location Access to different views Access to different views Activate additional panels Activate additional panels Access to source location of events Double click on stats panel Double click on stats panel Opens source panel with (optional) statistics Opens source panel with (optional) statistics

SC2008, Austin, TX Slide 129 Leaving the GUI Three different options to run without GUI Equal functionality Equal functionality Can transfer state/results Can transfer state/results Interactive Command Line Interface openss -cli openss -cli Batch Interface openss -batch < openss_cmd_file openss -batch < openss_cmd_file openss –batch –f openss –batch –f Python Scripting API python openss_python_script_file.py python openss_python_script_file.py

SC2008, Austin, TX Slide 130 CLI Language An interactive command Line Interface gdb/dbx like processing gdb/dbx like processing Several interactive commands Create Experiments Create Experiments Provide Process/Thread Control Provide Process/Thread Control View Experiment Results View Experiment Results Where possible commands execute asynchronously

SC2008, Austin, TX Slide 131 User-Time Example lnx-jeg.americas.sgi.com-17>openss -cli openss>>Welcome to OpenSpeedShop 1.9 openss>>expcreate -f test/executables/ fred/fred usertime The new focused experiment identifier is: -x 1 openss>>expgo Start asynchronous execution of experiment: -x 1 openss>>Experiment 1 has terminated. Create experiments and load application Start application

SC2008, Austin, TX Slide 132 Showing CLI Results openss>>expview Excl CPU time Inclu CPU time % of Total Exclusive Function in seconds. in seconds. CPU Time (defining location) f3 (fred: f3.c,2) f2 (fred: f2.c,2) f1 (fred: f1.c,2) __libc_start_main (libc.so.6) _start (fred) work(fred:work.c,2) main (fred: fred.c,5)

SC2008, Austin, TX Slide 133 CLI Batch Scripting (1) Create batch file with CLI commands Plain text file Plain text file Example: Example: # Create batch file echo expcreate -f fred pcsamp >> input.script echo expgo >> input.script echo expview pcsamp10 >>input.script # Run OpenSpeedShop openss -batch < input.script

SC2008, Austin, TX Slide 134 CLI Batch Scripting (2) Open|SpeedShop Batch Example Results The new focused experiment identifier is: -x 1 Start asynchronous execution of experiment: -x 1 Experiment 1 has terminated. CPU Time Function (defining location) f3 (mutatee: mutatee.c,24) f2 (mutatee: mutatee.c,15) f1 (mutatee: mutatee.c,6) work (mutatee: mutatee.c,33)

SC2008, Austin, TX Slide 135 CLI Batch Scripting (3) Open|SpeedShop Batch Example: direct #Run Open|SpeedShop as a single non-interactive command openss –batch –f fred pcsamp The new focused experiment identifier is: -x 1 Start asynchronous execution of experiment: -x 1 Experiment 1 has terminated. CPU Time Function (defining location) f3 (mutatee: mutatee.c,24) f2 (mutatee: mutatee.c,15) f1 (mutatee: mutatee.c,6) work (mutatee: mutatee.c,33)

SC2008, Austin, TX Slide 136 CLI Integration in GUI Program Output & Command Line Panel

SC2008, Austin, TX Slide 137 Command Line Panel Integrated with output panel O|SS command line language O|SS command line language Optional python support Optional python support Access to all loaded experiments Same context as GUI Same context as GUI Use experiment ID listed in panel name Use experiment ID listed in panel name “ History ” command Display all commands issued by GUI Display all commands issued by GUI Cut & paste output for scripting Cut & paste output for scripting

SC2008, Austin, TX Slide 138 History Command

SC2008, Austin, TX Slide 139 CLI Command Overview Experiment Creations – –expcreate – –expattach Experiment Control – –expgo – –expwait – –expdisable – –expenable Experiment Storage – –expsave – –exprestore Result Presentation – –expview – –opengui Misc. Commands – –help – –list – –log – –record – –playback – –history – –quit

SC2008, Austin, TX Slide 140 Python Scripting Open|SpeedShop Python API that executes “same” Interactive/Batch Open|SpeedShop commands User can intersperse “normal” Python code with Open|SpeedShop Python API Run Open|SpeedShop experiments via the Open|SpeedShop Python API

SC2008, Austin, TX Slide 141 Python Example (1) Necessary steps: Import O|SS Python module Import O|SS Python module Prepare arguments for target application Prepare arguments for target application Set view and experiment type Set view and experiment type Create experiment Create experiment import openss my_filename=openss.FileList("usability/phaseII/fred") my_viewtype = openss.ViewTypeList() my_viewtype += "pcsamp" exp1=openss.expCreate(my_filename,viewtype)

SC2008, Austin, TX Slide 142 Python Example (2) After experiment creation Start target application (asynchronous!) Start target application (asynchronous!) Wait for completion Wait for completion Write results Write results openss.expGo() openss.wait() except openss.error: print "expGo(exp1,my_modifer) failed" openss.dumpView()

SC2008, Austin, TX Slide 143 Python Example Output Two interfaces to dump data Plain text (similar to CLI) for viewing Plain text (similar to CLI) for viewing As Python objects for post-processing As Python objects for post-processing >python example.py /work/jeg/OpenSpeedShop/usability/phaseII/fred: successfully completed. Excl. CPU time % of CPU Time Function (def. location) \ f3 (fred: f3.c,23) f2 (fred: f2.c,2) f1 (fred: f1.c,2)

SC2008, Austin, TX Slide 144 Summary Multiple non-graphical interfaces Interactive Command Line Interactive Command Line Batch scripting Batch scripting Python module Python module Equal functionality Similar commands in all interfaces Similar commands in all interfaces Results transferable E.g., run in Python and view in GUI E.g., run in Python and view in GUI Possibility to switch GUI ↔ CLI Possibility to switch GUI ↔ CLI

SC2008, Austin, TX Slide 145 Extensibility O|SS is more than a performance tool All functionality in one toolset with one interface All functionality in one toolset with one interface General infrastructure to create new tools General infrastructure to create new tools Plugins to add new functionality Cover all essential steps of performance analysis Cover all essential steps of performance analysis Automatically loaded at O|SS startup Automatically loaded at O|SS startup Three types of plugins Collectors: How to acquire performance data? Collectors: How to acquire performance data? Views: How to aggregate and present data? Views: How to aggregate and present data? Panels: How to visualize data in the GUI? Panels: How to visualize data in the GUI?

SC2008, Austin, TX Slide 146 Plugin Dataflow Wizard Plugin Execution Environment Collector Plugin View Plugin Panel Plugin Instrumentor Data Abstraction CLI SQL Database

SC2008, Austin, TX Slide 147 Plugin Plans Existing extensions to base experiments MPI tracing directly to OTF MPI tracing directly to OTF Extended parameter capture for MPI and I/O traces Extended parameter capture for MPI and I/O traces New plugins planned Code coverage Code coverage Memory profiling Memory profiling Extended Hardware Counter profiling Extended Hardware Counter profiling Interested in creating your own plugins? Step 1: Design tool workflow and map to plugins Step 1: Design tool workflow and map to plugins Step 2: Define data format (XDR encoding) Step 2: Define data format (XDR encoding) Examples and plugin creation guide available Examples and plugin creation guide available

SC2008, Austin, TX Slide 148 Summary / Advanced Features Two techniques for instrumentation Online vs. Offline Online vs. Offline Different strength for different target scenarios Different strength for different target scenarios Flexible GUI that can be customized Several compatible scripting options Command Line Language Command Line Language Direct batch interface Direct batch interface Integration of O|SS into Python Integration of O|SS into Python GUI and scripting interoperable Plugin concept to extend Open|SpeedShop

149 Conclusions Parallel Performance Analysis with Open|SpeedShop

SC2008, Austin, TX Slide 150 Open|SpeedShop Goal: One stop performance analysis All basic experiments in one tool All basic experiments in one tool Extensible using plugin mechanism Extensible using plugin mechanism Supports parallel programs Supports parallel programs Multiple interface Flexible GUI Flexible GUI Multiple scripting and batch options Multiple scripting and batch options Rich features Comparing results Comparing results Tracebacks/Callstacks in most experiments Tracebacks/Callstacks in most experiments Full I/O tracing support Full I/O tracing support

SC2008, Austin, TX Slide 151 Documentation Open|SpeedShop User Guide Documentation /opt/OSS/share/doc/packages/OpenSpeedShop/users_guide /opt/OSS/share/doc/packages/OpenSpeedShop/users_guide Where /opt/OSS is the installation directory Python scripting API Documentation /opt/OSS/share/doc/packages/OpenSpeedShop/pyscripting_doc /opt/OSS/share/doc/packages/OpenSpeedShop/pyscripting_doc Where /opt/OSS is the installation directory Command Line Interface Documentation /opt/OSS/share/doc/packages/OpenSpeedShop/cli_doc /opt/OSS/share/doc/packages/OpenSpeedShop/cli_doc Where /opt/OSS is the installation directory

SC2008, Austin, TX Slide 152 Current Status Open|SpeedShop 1.9 available Packages and source from sourceforge.net Packages and source from sourceforge.net Tested on a variety of platforms Tested on a variety of platforms Keep in mind, though: Open|SpeedShop is still under development Open|SpeedShop is still under development Low-level OS specific components Low-level OS specific components Keep us informed We are happy to help with problems We are happy to help with problems Interested in getting feedback Interested in getting feedback * Updated

SC2008, Austin, TX Slide 153 Building a Community More then yet another research tool Large scale environment Large scale environment Flexible framework Flexible framework Cooperative atmosphere Cooperative atmosphereWanted: New Users and Open source tool developers New Users and Open source tool developers Tools/Software companies Tools/Software companies Goal: self-sustaining project * Updated

SC2008, Austin, TX Slide 154 Availability and Contact Open|SpeedShop website: Download options: Package with Install Script Package with Install Script Source for tool and base libraries Source for tool and base librariesFeedback Bug tracking available from website Bug tracking available from website Contact information on website Contact information on website Feel free to contact presenters directly Feel free to contact presenters directly