1 Parallel Performance Analysis with Open|SpeedShop NASA NASA Ames Research Center October 29, 2008.

Slides:



Advertisements
Similar presentations
Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
Advertisements

® IBM Software Group © 2010 IBM Corporation What’s New in Profiling & Code Coverage RAD V8 April 21, 2011 Kathy Chan
1 Parallel Performance Analysis with Open|SpeedShop Half Day SC 2008 Austin, TX.
11/18/2013 SC2013 Tutorial How to Analyze the Performance of Parallel Codes 101 A case study with Open|SpeedShop Section 2 Introduction into Tools and.
ManageEngine TM Applications Manager 8 Monitoring Custom Applications.
Robert Bell, Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science.
1 Software Testing and Quality Assurance Lecture 30 – Testing Systems.
Process Concept An operating system executes a variety of programs
Chapter 6: An Introduction to System Software and Virtual Machines
Chapter 6: An Introduction to System Software and Virtual Machines Invitation to Computer Science, C++ Version, Fourth Edition ** Re-ordered, Updated 4/14/09.
Understanding and Managing WebSphere V5
User Group 2015 Version 5 Features & Infrastructure Enhancements.
Python Introduction.
Cortex-M3 Debugging System
Introduction to the Enterprise Library. Sounds familiar? Writing a component to encapsulate data access Building a component that allows you to log errors.
Silicon Graphics, Inc. Presented by: Open|SpeedShop ™ A Dyninst Based Performance Tool Overview & Status Jim Galarowicz Manager - SGI Tools group Bill.
1 Parallel Performance Analysis with Open|SpeedShop Trilab Tools-Workshop Martin Schulz, LLNL/CASC LLNL-PRES
Meir Botner David Ben-David. Project Goal Build a messenger that allows a customer to communicate with a service provider for a fee.
Blaise Barney, LLNL ASC Tri-Lab Code Development Tools Workshop Thursday, July 29, 2010 Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
04/30/2013 Status of Krell Tools Built using Dyninst/MRNet Paradyn Week 2013 Madison, Wisconsin April 30, Paradyn Week 2013 LLNL-PRES
UPC/SHMEM PAT High-level Design v.1.1 Hung-Hsun Su UPC Group, HCS lab 6/21/2005.
An intro to programming. The purpose of writing a program is to solve a problem or take advantage of an opportunity Consists of multiple steps:  Understanding.
Introducing Reporting Services for SQL Server 2005.
03/28/201211/18/2011 Status of Krell Tools Built using Dyninst/MRNet Paradyn Week 2012 College Park, MD March 28, Paradyn Week 2012 LLNL-PRES
Silicon Graphics, Inc. Presented by: Open|SpeedShop™ An Open Source Performance Debugger Overview Jack Carter MTS SGI Core Engineering.
The Pipeline Processing Framework LSST Applications Meeting IPAC Feb. 19, 2008 Raymond Plante National Center for Supercomputing Applications.
1 Apache. 2 Module - Apache ♦ Overview This module focuses on configuring and customizing Apache web server. Apache is a commonly used Hypertext Transfer.
04/27/2011 Paradyn Week Open|SpeedShop & Component Based Tool Framework (CBTF) project status and news Jim Galarowicz, Don Maghrak The Krell Institute.
Configuration Management (CM)
SC 2012 © LLNL / JSC 1 HPCToolkit / Rice University Performance Analysis through callpath sampling  Designed for low overhead  Hot path analysis  Recovery.
The Network Performance Advisor J. W. Ferguson NLANR/DAST & NCSA.
Silicon Graphics, Inc. Presented by: Open|SpeedShop™ Status, Issues, Schedule, Demonstration Jim Galarowicz SGI Core Engineering March 21, 2006.
Introduction of Geoprocessing Topic 7a 4/10/2007.
CHAPTER TEN AUTHORING.
Contents 1.Introduction, architecture 2.Live demonstration 3.Extensibility.
Ch 1. A Python Q&A Session Spring Why do people use Python? Software quality Developer productivity Program portability Support libraries Component.
Windows 2000 Course Summary Computing Department, Lancaster University, UK.
QuikTrac 5.5, a validated Motorola Software Solution, allows you to take your Host ERP screens and extend them out to fixed or mobile devices including.
Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
Performance Monitoring Tools on TCS Roberto Gomez and Raghu Reddy Pittsburgh Supercomputing Center David O’Neal National Center for Supercomputing Applications.
SciDAC SSS Quarterly Report Sandia Labs August 27, 2004 William McLendon Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed.
Framework for MDO Studies Amitay Isaacs Center for Aerospace System Design and Engineering IIT Bombay.
37 Copyright © 2007, Oracle. All rights reserved. Module 37: Executing Workflow Processes Siebel 8.0 Essentials.
Debugging parallel programs. Breakpoint debugging Probably the most widely familiar method of debugging programs is breakpoint debugging. In this method,
Silberschatz, Galvin and Gagne  Operating System Concepts Process Concept An operating system executes a variety of programs:  Batch system.
Building Dashboards SharePoint and Business Intelligence.
Tool Visualizations, Metrics, and Profiled Entities Overview [Brief Version] Adam Leko HCS Research Laboratory University of Florida.
Overview of dtrace Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative note Green: Positive.
Introduction to Interactive Media Interactive Media Tools: Authoring Applications.
Workforce Scheduling Release 5.0 for Windows Implementation Overview OWS Development Team.
Overview of AIMS Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative note Green:
Full and Para Virtualization
Differences Training BAAN IVc-BaanERP 5.0c: Application Administration, Customization and Exchange BaanERP 5.0c Tools / Exchange.
Lawrence Livermore National Laboratory Approaches for Scalable Performance Analysis and Optimization Martin Schulz  Open Source Performance Analysis Tool.
Page 1 PACS GRITS 17 June 2011 Herschel Data Analysis Guerilla Style: Keeping flexibility in a system with long development cycles Bernhard Schulz NASA.
Introduction of Geoprocessing Lecture 9 3/24/2008.
July 19, 2004Joint Techs – Columbus, OH Network Performance Advisor Tanya M. Brethour NLANR/DAST.
Wednesday NI Vision Sessions
Fermilab Scientific Computing Division Fermi National Accelerator Laboratory, Batavia, Illinois, USA. Off-the-Shelf Hardware and Software DAQ Performance.
SciDAC SSS Quarterly Report Sandia Labs January 25, 2005 William McLendon Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed.
1 RIC 2009 Symbolic Nuclear Analysis Package - SNAP version 1.0: Features and Applications Chester Gingrich RES/DSA/CDB 3/12/09.
Geant4 Computing Performance Task with Open|Speedshop Soon Yung Jun, Krzysztof Genser, Philippe Canal (Fermilab) 21 st Geant4 Collaboration Meeting, Ferrara,
Architecture Review 10/11/2004
From Open|SpeedShop to a Component Based Tool Framework
Chapter 2: System Structures
Module 01 ETICS Overview ETICS Online Tutorials
Lecture 1 Runtime environments.
SSDT, Docker, and (Azure) DevOps
Chapter 13: I/O Systems “The two main jobs of a computer are I/O and [CPU] processing. In many cases, the main job is I/O, and the [CPU] processing is.
Presentation transcript:

1 Parallel Performance Analysis with Open|SpeedShop NASA NASA Ames Research Center October 29, 2008

NASA, Slide 2 Presenters and Partners Jim Galarowicz, Krell Don Maghrak, Krell Larger Team: Martin Schulz, LLNL Martin Schulz, LLNL David Montoya, LANL David Montoya, LANL Scott Cranford, Sandia NLs University of Wisconsin Scott Cranford, Sandia NLs University of Wisconsin William Hachfeld, Krell University of Maryland William Hachfeld, Krell University of Maryland Samuel Gutierrez, LANL Rice University Samuel Gutierrez, LANL Rice University Joseph Kenny, Sandia NLs Joseph Kenny, Sandia NLs Chris Chambreau, LLNL Chris Chambreau, LLNL

NASA, Slide 3 Seminar Goals Introduce Open|SpeedShop Basic concepts, terminology, modes of operation Basic concepts, terminology, modes of operation Running first examples Running first examples Provide Overview of Features Sampling & Tracing in O|SS Sampling & Tracing in O|SS Performance comparisons Performance comparisons Parallel performance analysis Parallel performance analysis Status and Roadmap

NASA, Slide 4 Highlights Open Source Performance Analysis Tool Framework Most common performance analysis steps in one tool Most common performance analysis steps in one tool Extensible by using plugins for data collection and representation Extensible by using plugins for data collection and representation Profiling (sampling) and Tracing (wrapping functions) Profiling (sampling) and Tracing (wrapping functions) Multiple Instrumentation Options All work on unmodified application binaries All work on unmodified application binaries Need –g, but can be with –O3, O2, etc., in order to map to source lines. Offline data collection: run program start to end Offline data collection: run program start to end Online data collection with ability to attach to running applications. Start and stop data collection. Online data collection with ability to attach to running applications. Start and stop data collection.

NASA, Slide 5 Highlights Flexible and Easy to use User access through: User access through: Graphical User Interface (GUI) Interactive Command Line Python Scripting API Large Range of Platforms Linux Clusters/SSI with x86, IA-64, Opteron, and EM64T CPUs Linux Clusters/SSI with x86, IA-64, Opteron, and EM64T CPUs New: more portable offline data collection mechanism New: more portable offline data collection mechanismAvailability Full source available on sourceforge.net Full source available on sourceforge.net Release tar balls on sourceforge.net Release tar balls on sourceforge.net

NASA, Slide 6 O|SS Target Audience Programmers/code teams Use Open|SpeedShop out of the box Use Open|SpeedShop out of the box Powerful performance analysis Powerful performance analysis Ability to integrate O|SS into projects Ability to integrate O|SS into projects Tool developers Single, comprehensive infrastructure Single, comprehensive infrastructure Easy deployment of new tools Easy deployment of new tools Project/product specific customizations Predefined/custom experiments Predefined/custom experiments

NASA, Slide 7 Performance Experiments Concept of an Experiment What program to analyze What program to analyze What type of performance data to gather What type of performance data to gather How often the performance data is gathered How often the performance data is gathered Consists of Collectors and Views Collectors define specific type of performance data Collectors define specific type of performance data Hardware counters, program counter samples Tracing of certain routines (I/O, MPI) Views specify data aggregation and presentation Views specify data aggregation and presentation Multiple collectors per experiment possible Multiple collectors per experiment possible

NASA, Slide 8 Results Experiment Workflow Run Application “Experiment” Results can be displayed using several “Views” Process Management Panel Consists of one or more data “Collectors” Stored in SQL database

NASA, Slide 9 Experiment Types in O|SS Sampling Experiments Periodically interrupt run and record location Periodically interrupt run and record location Report statistical distribution of these locations Report statistical distribution of these locations Typically provides good overview Typically provides good overview Overhead mostly low and uniform Overhead mostly low and uniform Tracing Experiments Gather and store individual application events, e.g., function invocations (MPI, I/O, …) Gather and store individual application events, e.g., function invocations (MPI, I/O, …) Provides detailed, low-level information Provides detailed, low-level information Higher overhead, potentially bursty Higher overhead, potentially bursty

NASA, Slide 10 Sampling Experiments PC Sampling (pcsamp) Record PC in user defined time intervals Record PC in user defined time intervals Low overhead overview of time distribution Low overhead overview of time distribution User Time (usertime) PC Sampling + Call stacks for each sample PC Sampling + Call stacks for each sample Provides inclusive & exclusive timing data Provides inclusive & exclusive timing data Hardware Counters (hwc, hwctime) Sample HWC overflow events Sample HWC overflow events Access to data like cache and TLB misses Access to data like cache and TLB misses

NASA, Slide 11 Tracing Experiments I/O Tracing (io, iot) Record invocation of all POSIX I/O events Record invocation of all POSIX I/O events Provides aggregate and individual timings Provides aggregate and individual timings MPI Tracing (mpi, mpit, mpiotf) Record invocation of all MPI routines Record invocation of all MPI routines Provides aggregate and individual timings Provides aggregate and individual timings Floating Point Exception Tracing (fpe) Triggered by any FPE caused by the code Triggered by any FPE caused by the code Helps pinpoint numerical problem areas Helps pinpoint numerical problem areas

NASA, Slide 12 Parallel Experiments O|SS supports MPI and threaded codes Tested with a variety of MPI implementation Tested with a variety of MPI implementation Thread support based on POSIX threads Thread support based on POSIX threads Any collector can be applied to parallel job Automatically applied to all tasks/threads Automatically applied to all tasks/threads Default views aggregate across all tasks/threads Default views aggregate across all tasks/threads Data from individual tasks/threads available Data from individual tasks/threads available Specific parallel experiments ( e.g., mpi, mpit )

NASA, Slide 13 High-level Architecture GUIpyO|SS CLI AMD and Intel based clusters/SSI using Linux CLI Open Source Software Code Instrumentation

NASA, Slide 14 Code Instrumentation in O|SS Offline/External Data Collection Instrument application at start-up Instrument application at start-up Write data to raw files and convert to O|SS Write data to raw files and convert to O|SS Performance data available at end of execution. Performance data available at end of execution. Online Scalable Data Collection via MRNet Scalable transport layer Scalable transport layer Performance data delivered directly to tool online Performance data delivered directly to tool online Ability for interactive online analysis and viewing intermediate results Ability for interactive online analysis and viewing intermediate results

NASA, Slide 15 Offline & Online Data Collection MPI Application O|SS post- mortem Offline MPI Application O|SS MRNet

NASA, Slide 16 High-level Architecture GUIpyO|SS CLI AMD and Intel based clusters/SSI using Linux CLI Open Source Software Code Instrumentation

NASA, Slide 17 Three Interfaces (GUI, CLI, Python) Experiment Commands expAttach expCreate expDetach expGo expView List Commands list -v exp list -v hosts list -v status Session Commands setBreak openGui import openss my_filename=oss.FileList("myprog.a.out") my_exptype=oss.ExpTypeList("pcsamp") my_id=oss.expCreate(my_filename,my_exptype) oss.expGo() My_metric_list = oss.MetricList("exclusive") my_viewtype = oss.ViewTypeList("pcsamp“) result = oss.expView(my_id,my_viewtype,my_metric_list)

NASA, Slide 18 Running an Experiment Running a simple example experiment Examine the command syntax Examine the command syntax List the outputs from the experiment List the outputs from the experiment Viewing and Interpreting gathered measurements GUI, CLI via the experiment database file GUI, CLI via the experiment database file Show “–offline” example in more detail Introduce additional command syntax

NASA, Slide 19 Basic offline experiment syntax openss –offline –f “executable” pcsamp openss is the command to invoke Open|SpeedShop openss is the command to invoke Open|SpeedShop -offline indicates the user interface to use (immediate command) -offline indicates the user interface to use (immediate command) There are a number of user interface options -f is the option for specifying the executable name -f is the option for specifying the executable name The “executable” can be a sequential or parallel command pcsamp indicates what type of performance data (metric) you will gather pcsamp indicates what type of performance data (metric) you will gather Here pcsamp indicates that we will periodically take a sample of the address that the program counter is pointing to. We will associate that address with a function and/or source line. There are several existing performance metric choices

NASA, Slide 20 What are the outputs? Outputs from : openss –offline –f “executable” pcsamp Normal program output while executable is running Normal program output while executable is running The sorted list of performance information The sorted list of performance information A list of the top time taking functions The corresponding sample derived time for each function A performance information database file A performance information database file The database file contains all the information needed to view the data at anytime in the future without the executable(s). Symbol table information from executable(s) and system libraries Performance data openss gathered Time stamps for when dso(s) were loaded and unloaded

NASA, Slide 21 Example Parallel Run with Output openss –offline –f “orterun -np 128 sweep3d.mpi” pcsamp

NASA, Slide 22 Output from Example Run openss –offline –f “orterun -np 128 sweep3d.mpi” pcsamp

NASA, Slide 23 Using the Database file Database file is one of the outputs from running: openss –offline –f “executable” pcsamp Use this file to view the data Use this file to view the data How to open the database file with openss How to open the database file with openss openss –f openss –f openss (then use menus or wizard to open) openss –cli exprestore –f exprestore –f In this example, we show: both In this example, we show: both openss –cli –f X.0.openss (CLI) openss –f X.0.openss (GUI) X.0.openss is the file name openss creates by default

NASA, Slide 24 Output from Example Run Loading the database file: openss –cli –f X.0.openss

NASA, Slide 25 Process Management Panel Control your job, focus stats panel, create process subsets

NASA, Slide 26 Default Stats Panel View openss –f X.0.openss: Performance statistics by function is default view

NASA, Slide 27 Results map to Source Split screen mapping of performance data to source line

NASA, Slide 28 Min,Max,Average (Load Balance) View Select “LB” in Toolbar to generate Load Balance View

NASA, Slide 29 Comparative Analysis: Clustering Ranks Select “CA” in Toolbar to generate Comp. Analysis View

NASA, Slide 30 Comparative Analysis: Clustering Ranks Select “CA” in Toolbar to generate Comp. Analysis View

NASA, Slide 31 Additional experiment syntax openss –offline –f “executable” pcsamp -offline indicates the user interface is immediate command mode. -offline indicates the user interface is immediate command mode. Uses offline (LD_PRELOAD) collection mechanism. Uses offline (LD_PRELOAD) collection mechanism. openss –cli –f “executable” pcsamp -cli indicates the user interface is interactive command line. -cli indicates the user interface is interactive command line. Uses online (dynamic instrumentation) collection mechanism. Uses online (dynamic instrumentation) collection mechanism. openss –f “executable” pcsamp No interface option indicates the user interface is graphical user. No interface option indicates the user interface is graphical user. Uses online (dynamic instrumentation) collection mechanism. Uses online (dynamic instrumentation) collection mechanism. openss –batch < input.commands.file Executes from file of cli commands Executes from file of cli commands

NASA, Slide 32 Wizard Panel – page 1 Gather data from new runs Analyze and/or compare existing data from previous runs O|SS Command Line Interface

NASA, Slide 33 Wizard Panel – Gather new data Select type of data to be gathered by Open|SpeedShop

NASA, Slide 34 Compare Wizard Side by side performance results

NASA, Slide 35 Compare Wizard Side by Side Source for the two versions

NASA, Slide 36 Comparing MPI Ranks Rank 0 Rank 1

NASA, Slide 37 CLI Language An interactive command Line Interface gdb/dbx like processing gdb/dbx like processing Several interactive commands Create Experiments Create Experiments Provide Process/Thread Control Provide Process/Thread Control View Experiment Results View Experiment Results Where possible commands execute asynchronously

NASA, Slide 38 CLI Command Overview Experiment Creations – –expcreate – –expattach Experiment Control – –expgo – –expwait – –expdisable – –expenable Experiment Storage – –expsave – –exprestore Result Presentation – –expview – –opengui Misc. Commands – –help – –list – –log – –record – –playback – –history – –quit

NASA, Slide 39 User-Time Example lnx-jeg.americas.sgi.com-17>openss -cli openss>>Welcome to OpenSpeedShop 1.9 openss>>expcreate -f test/executables/ fred/fred usertime The new focused experiment identifier is: -x 1 openss>>expgo Start asynchronous execution of experiment: -x 1 openss>>Experiment 1 has terminated. Create experiments and load application Start application

NASA, Slide 40 Showing CLI Results openss>>expview Excl CPU time Inclu CPU time % of Total Exclusive Function in seconds. in seconds. CPU Time (defining location) f3 (fred: f3.c,2) f2 (fred: f2.c,2) f1 (fred: f1.c,2) __libc_start_main (libc.so.6) _start (fred) work(fred:work.c,2) main (fred: fred.c,5)

NASA, Slide 41 CLI Batch Scripting (1) Create batch file with CLI commands Plain text file Plain text file Example: Example: # Create batch file echo expcreate -f fred pcsamp >> input.script echo expgo >> input.script echo expview pcsamp10 >>input.script # Run OpenSpeedShop openss -batch < input.script

NASA, Slide 42 CLI Batch Scripting (2) Open|SpeedShop Batch Example Results The new focused experiment identifier is: -x 1 Start asynchronous execution of experiment: -x 1 Experiment 1 has terminated. CPU Time Function (defining location) f3 (mutatee: mutatee.c,24) f2 (mutatee: mutatee.c,15) f1 (mutatee: mutatee.c,6) work (mutatee: mutatee.c,33)

NASA, Slide 43 CLI Batch Scripting (3) Open|SpeedShop Batch Example: direct #Run Open|SpeedShop as a single non-interactive command openss –batch –f fred pcsamp The new focused experiment identifier is: -x 1 Start asynchronous execution of experiment: -x 1 Experiment 1 has terminated. CPU Time Function (defining location) f3 (mutatee: mutatee.c,24) f2 (mutatee: mutatee.c,15) f1 (mutatee: mutatee.c,6) work (mutatee: mutatee.c,33)

NASA, Slide 44 Python Scripting Open|SpeedShop Python API that executes “same” Interactive/Batch Open|SpeedShop commands User can intersperse “normal” Python code with Open|SpeedShop Python API Run Open|SpeedShop experiments via the Open|SpeedShop Python API

NASA, Slide 45 Python Example (1) Necessary steps: Import O|SS Python module Import O|SS Python module Prepare arguments for target application Prepare arguments for target application Set view and experiment type Set view and experiment type Create experiment Create experiment import openss my_filename=openss.FileList("usability/phaseII/fred") my_viewtype = openss.ViewTypeList() my_viewtype += "pcsamp" exp1=openss.expCreate(my_filename,viewtype)

NASA, Slide 46 Python Example (2) After experiment creation Start target application (asynchronous!) Start target application (asynchronous!) Wait for completion Wait for completion Write results Write results openss.expGo() openss.wait() except openss.error: print "expGo(exp1,my_modifer) failed" openss.dumpView()

NASA, Slide 47 Python Example Output Two interfaces to dump data Plain text (similar to CLI) for viewing Plain text (similar to CLI) for viewing As Python objects for post-processing As Python objects for post-processing >python example.py /work/jeg/OpenSpeedShop/usability/phaseII/fred: successfully completed. Excl. CPU time % of CPU Time Function (def. location) \ f3 (fred: f3.c,23) f2 (fred: f2.c,2) f1 (fred: f1.c,2)

NASA, Slide 48 Extensibility O|SS is more than a performance tool All functionality in one toolset with one interface All functionality in one toolset with one interface General infrastructure to create new tools General infrastructure to create new tools Plugins to add new functionality Cover all essential steps of performance analysis Cover all essential steps of performance analysis Automatically loaded at O|SS startup Automatically loaded at O|SS startup Three types of plugins Collectors: How to acquire performance data? Collectors: How to acquire performance data? Views: How to aggregate and present data? Views: How to aggregate and present data? Panels: How to visualize data in the GUI? Panels: How to visualize data in the GUI?

NASA, Slide 49 Overview Summary Two techniques for instrumentation Online vs. Offline Online vs. Offline Different strength for different target scenarios Different strength for different target scenarios Flexible GUI that can be customized Several compatible scripting options Command Line Language Command Line Language Direct batch interface Direct batch interface Integration of O|SS into Python Integration of O|SS into Python GUI and scripting interoperable Plugin concept to extend Open|SpeedShop

NASA, Slide 50 Status & Future Plans Open|SpeedShop 1.9 available shortly Packages and source from sourceforge.net Packages and source from sourceforge.net Tested on a variety of platforms Tested on a variety of platforms Offline version featured in version 1.9 Online (MRNet) work in progress Target is version 2.0 in December Target is version 2.0 in December Working on some platforms but not all Working on some platforms but not all Focus on Scalability in coming months Support for capability machines via Office of Science proposal with ASC assistance

NASA, Slide 51 Availability and Contact Open|SpeedShop website: Installed on cfe1.nas.nasa.gov Download options: Package with Install Script Package with Install Script Source for tool and base libraries Source for tool and base librariesFeedback Bug tracking and contact info available from website Bug tracking and contact info available from website Feel free to contact presenters directly Feel free to contact presenters directly and/or