Profiling OpenSHMEM with TAU Commander

Slides:



Advertisements
Similar presentations
Machine Learning-based Autotuning with TAU and Active Harmony Nicholas Chaimov University of Oregon Paradyn Week 2013 April 29, 2013.
Advertisements

K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.
Lecture 3 Getting Started with ITK!. Goals for this lecture Learn how to use Cmake Build ITK Example programs that use ITK.
Automated Instrumentation and Monitoring System (AIMS)
Low level CASE: Source Code Management. Source Code Management  Also known as Configuration Management  Source Code Managers are tools that: –Archive.
Solaris Software Packaging and Installation Paul Foster 14/11/2000.
Building with MPC Charles Calkins Principal Software Engineer Object Computing, Inc.
Introduction to The Linaro Toolchain Embedded Processors Training Multicore Software Applications Literature Number: SPRPXXX 1.
1 Introduction to Tool chains. 2 Tool chain for the Sitara Family (but it is true for other ARM based devices as well) A tool chain is a collection of.
Renesas Technology America Inc. 1 M16C/Tiny SKP Tutorial 2 Creating A New Project Using HEW4.
Operating System Program 5 I/O System DMA Device Driver.
1 I-Logix Professional Services Specialist Rhapsody IDF (Interrupt Driven Framework) CPU External Code RTOS OXF Framework Rhapsody Generated.
WORK ON CLUSTER HYBRILIT E. Aleksandrov 1, D. Belyakov 1, M. Matveev 1, M. Vala 1,2 1 Joint Institute for nuclear research, LIT, Russia 2 Institute for.
Trilinos 101: Getting Started with Trilinos November 7, :30-9:30 a.m. Mike Heroux Jim Willenbring.
CCS APPS CODE COVERAGE. CCS APPS Code Coverage Definition: –The amount of code within a program that is exercised Uses: –Important for discovering code.
Adventures in Mastering the Use of Performance Evaluation Tools Manuel Ríos Morales ICOM 5995 December 4, 2002.
Makefiles CISC/QCSE 810. BeamApp and Tests in C++ 5 source code files After any modification, changed source needs to be recompiled all object files need.
Introduction Use of makefiles to manage the build process Declarative, imperative and relational rules Environment variables, phony targets, automatic.
Instructor Notes GPU debugging is still immature, but being improved daily. You should definitely check to see the latest options available before giving.
Using TAU on SiCortex Alan Morris, Aroon Nataraj Sameer Shende, Allen D. Malony University of Oregon {amorris, anataraj, sameer,
(1) A Beginner’s Quick Start to SIMICS. (2) Disclaimer This is a quick start document to help users get set up quickly Does not replace the user guide.
DDT Debugging Techniques Carlos Rosales Scaling to Petascale 2010 July 7, 2010.
CS 444 Introduction to Operating Systems
Profile Analysis with ParaProf Sameer Shende Performance Reseaerch Lab, University of Oregon
Overview of CrayPat and Apprentice 2 Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative.
Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
NA-MIC National Alliance for Medical Image Computing Slicer Building and Deployment Steve Pieper, PhD.
Copyright © 2010 Certification Partners, LLC -- All Rights Reserved Perl Specialist.
Performance Monitoring Tools on TCS Roberto Gomez and Raghu Reddy Pittsburgh Supercomputing Center David O’Neal National Center for Supercomputing Applications.
ASC Tri-Lab Code Development Tools Workshop Thursday, July 29, 2010 Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA This work.
Writing a Run Time DLL The application loads the DLL using LoadLibrary() or LoadLibraryEx(). The standard search sequence is used by the operating system.
Copyright © 2003 ProsoftTraining. All rights reserved. Perl Fundamentals.
Debugging Ensemble Productions CAMTA Meeting 11 th November 2010 John Murray.
Overview of AIMS Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative note Green:
Performance Analysis with Parallel Performance Wizard Prashanth Prakash, Research Assistant Dr. Vikas Aggarwal, Research Scientist. Vrishali Hajare, Research.
Chapter 5 Introduction To Form Builder. Lesson A Objectives  Display Forms Builder forms in a Web browser  Use a data block form to view, insert, update,
Shangkar Mayanglambam, Allen D. Malony, Matthew J. Sottile Computer and Information Science Department Performance.
How to configure, build and install Trilinos November 2, :30-9:30 a.m. Jim Willenbring.
Debugging Lab Antonio Gómez-Iglesias Texas Advanced Computing Center.
Xxx Presentation, No 1 Copyright © TAC AB Engineering Classic Networks1.
Martin Kruliš by Martin Kruliš (v1.1)1.
Navigating TAU Visual Display ParaProf and TAU Portal Mahin Mahmoodi Pittsburgh Supercomputing Center 2010.
What type of project? I tried three type of project and the only one I managed to obtain the results I wanted was this one. It is more flexible and much.
Multiple file project management & Makefile
Getting started with the Arxterra software and 3DoT Firmware
Unit 9.1 Learning Objectives Data Access in Code
How to install VisualWorks?
Day 12 Threads.
Chapter 2: System Structures
TAU integration with Score-P
Process Management Presented By Aditya Gupta Assistant Professor
Machine Learning Workshop
Introduction to Operating System (OS)
Pre-processor Directives
CMPE 152: Compiler Design ANTLR 4 and C++
Advanced TAU Commander
TAU, TAU Commander, and ParaTools, Inc. 8 September 2017 Baltimore, MD.
Operation System Program 4
A configurable binary instrumenter
ICS 143 Principles of Operating Systems
Tivoli Common Reporting v1.2 Overview
Kernel Structure and Infrastructure
Electronics II Physics 3620 / 6620
JENKINS TIPS Ideas for making your life with Jenkins easier
Cordova & Cordova Plugin Installation and Management
Quick Tutorial on MPICH for NIC-Cluster
CS703 – Advanced Operating Systems
Overview of System Development for Windows CE.NET
Macrosystems EDDIE: Getting Started + Troubleshooting Tips
Dynamic Binary Translators and Instrumenters
Presentation transcript:

Profiling OpenSHMEM with TAU Commander ParaTools, Inc. 8 September 2017 Baltimore, MD

On Talapas Benchmarks: /usr/local/packages/workshops/09-08-17 ssh studentXX@talapas-ln1.uoregon.edu 40 accounts available, please pick up a card module load prl module load workshops/09-08-17 Benchmarks: /usr/local/packages/workshops/09-08-17 Copyright © ParaTools, Inc.

Installing TAU Commander taucommander.com/downloads Do you have network access? Yes No Use the web-based installer Use an all-in-one package Lightweight package (324k) downloads software as needed. Inclusive packages (700+MB) that do not require network access and will not download software. Copyright © ParaTools, Inc.

Installing TAU Commaner tar xvzf taucmdr-<options>.tar.gz cd taucmdr-<version> make install [INSTALLDIR=/path/to/install/to] Bash (nearly everyone): export PATH=INSTALLDIR/bin:$PATH C-shell (nearly everyone else): set path=(INSTALLDIR/bin $path) Copyright © ParaTools, Inc.

Getting Started with TAU Commander tau initialize tau oshf90 *.f90 -o foo tau srun -n 64 ./foo tau show This works on any supported system, even if TAU is not installed or has not been configured appropriately. TAU and all its dependencies will be downloaded and installed if required. Just put `tau` in front of everything and see what happens. Copyright © ParaTools, Inc.

TAU Commander Online Help: tau --help Copyright © ParaTools, Inc.

Commands have Subcommands tau [subcmd] [subsubcmd] [subsubsubcmd] … Commands are tree-like and become more specific as you move to the right: tau application edit dfft --new-name my_dfft Common command strings: tau initialize tau select [target] [application] [measurement] [experiment] tau target create <name> [options] tau application edit <name> [options] tau measurement copy <name> <new_name> [options] Copyright © ParaTools, Inc.

Command Line Hacks All commands and flags support abbreviation: tau initialize tau initial tau init Boolean flags are flexible: tau init --mpi=True tau init --mpi=yes tau init --mpi=1 tau init --mpi Use --help at any point to get help. Copyright © ParaTools, Inc.

Dashboard Copyright © ParaTools, Inc.

T-A-M Model for Performance Engineering Target Installed software Available compilers Host architecture/OS Application MPI, OpenMP, CUDA, OpenACC, etc. Measurement Profile, trace, or both Sample, source inst… Measurement Application Target Experiment = (Target, Application, Measurement) Copyright © ParaTools, Inc.

All T-A-M Objects Have a Subcommand Configuration Subcommands:   application     Create and manage application configurations.   experiment      Create and manage experiments.   measurement     Create and manage measurement configurations.   project         Create and manage project configurations.   target          Create and manage target configurations.   trial           Create and manage experiment trials. $ tau application --help usage: tau application <subcommand> [arguments] Create and manage application configurations. Positional Arguments:   <subcommand>  See 'subcommands' below.   [arguments]   Arguments to be passed to <subcommand>. Optional Arguments:   -h, --help    Show this help message and exit. Subcommands:   copy            Copy and modify application configurations.   create          Create application configurations.   delete          Delete application configurations.   edit            Modify application configurations.   list            Show application configuration data. See 'tau application <subcommand> --help' for more information on <subcommand>. Copyright © ParaTools, Inc.

Use the long listing format $ tau app list --long == Application Configurations (/storage/users/jlinford/gpu_suite.1.1.0/.tau/project.json) ================== == gpu_suite.1.1.0 ========================================================================================= Attribute | Value | Command Flag | Description ==========+=================+==============+=============================================== cuda | True | --cuda | Application uses NVIDIA CUDA. linkage | dynamic | --linkage | Application linkage. mpc | False | --mpc | Application uses MPC. mpi | False | --mpi | Application uses MPI. opencl | False | --opencl | Application uses OpenCL. openmp | False | --openmp | Application uses OpenMP. projects | gpu_suite.1.1.0 | N/A | Projects using this application. pthreads | False | --pthreads | Application uses pthreads. shmem | False | --shmem | Application uses SHMEM. tbb | False | --tbb | Application uses Thread Building Blocks (TBB). Copyright © ParaTools, Inc.

Compiling with TAU Commander tau <compiler> [options] tau gcc *.c -o foo tau oshfort -c foo.f90 tau oshfort foo.o -o foo tau <compiler> is just a shortcut: Expands to tau build <compiler> Use tau build --help to show all known compilers. NOTE: Compilation isn’t always necessary. Use sampling to gather data on uninstrumented executables. Copyright © ParaTools, Inc.

Autotools Initialize before running configure: tau initialize --shmem If the project is already initialized, be sure you don’t have an “expensive” experiment selected, e.g. tracing or profiling with lots of options. ./configure CC=“tau oshcc” Recommend --disable-dependency-tracking to avoid problems with source-based instrumentation. No worries if only sampling. make && make install If you change your experiment you do not have to reconfigure, just recompile: make clean Copyright © ParaTools, Inc.

CMAKE This should work: If it doesn’t, use the wrapper scripts: cmake -DCMAKE_C_COMPILER=“tau oshcc” If it doesn’t, use the wrapper scripts: export PATH=$PWD/.tau/bin/<target_name> cmake -DCMAKE_C_COMPILER=“tau_oshcc” Wrapper scripts are automatically generated for all compilers supported by the target. Wrapper for <compiler> is “tau_<compiler>” E.g. tau_gcc, tau_mpicc, tau_oshcc, etc. Wrappers can be used for any build system that doesn’t like spaces in the compiler name. Copyright © ParaTools, Inc.

Running with TAU Commander tau srun -n 4 ./a.out tau mpirun -np 4 ./a.out tau ./a.out tau <command> is just a shortcut: Expands to tau trial create <command> Copyright © ParaTools, Inc.

Running with custom launchers tau trial create \ --launcher mylauncher -np 4 -- \ ./a.out bar baz Use the --launcher flag to indicate the launcher command and arguments. Use “--” to mark the beginning of the application command line. tau mpirun -np 4 ./a.out 20 is shorthand for: tau trial create --launcher mpirun –np 4 -- ./a.out 20 Copyright © ParaTools, Inc.

ParaTools, Inc. OpenSHMEM Examples Copyright © ParaTools, Inc.

Step 1: Initialize TAU Project $ cd ISx $ tau initialize --shmem Creates a new project configuration using defaults Project files exist in a directory named “.tau” Like git, all directories below the directory containing the “.tau” directory can access the project E.g. `tau dashboard` works in miniapp1/baseline Copyright © ParaTools, Inc.

Initializing ISx Project on Cori Compiler detection Project initialization Download and install PDT TAU installation progress Copyright © ParaTools, Inc.

TAU ISx Project Dashboard Copyright © ParaTools, Inc.

Step 2: Use `tau` to compile Prepend `tau` command to compiler command Compile as normal TAU Commander constructs a new compilation command line. May replace compiler commands with TAU’s compiler wrapper scripts. May set environment variables, parse configuration files, etc. If no changes are required then nothing is changed. Copyright © ParaTools, Inc.

Step 3: Use `tau` to run TAU Commander sets environment variables Prepend `tau` command to command line TAU Commander sets environment variables Application executes, possibly with tau_exec New data is added to the performance database Copyright © ParaTools, Inc.

Step 4: Use `tau` to view data $ tau show Note: ISx was sampled with callsites, OpenSHMEM measured directly via wrapper library Copyright © ParaTools, Inc.

Performance Data Analysis OpenSHMEM’17 Performance Data Analysis Copyright © ParaTools, Inc.

New: Callsites in Profiles and Traces Copyright © ParaTools, Inc.

Memory Allocation/Deallocation Heap memory usage reported by the mallinfo() call is not 64-bit clean. 32 bit counters in Linux roll over when > 4GB memory is used We keep track of heap memory usage in 64 bit counters inside TAU Compensation of perturbation introduced by tool Only show what application uses Create guards for TAU calls to not track I/O and memory allocations/de-allocations performed inside TAU Provide broad POSIX I/O and memory coverage % paraprof (Right-click label [e.g “node 0”]  Show Context Event Window) Copyright © ParaTools, Inc.

Message Statistics Heap memory usage reported by the mallinfo() call is not 64-bit clean. 32 bit counters in Linux roll over when > 4GB memory is used We keep track of heap memory usage in 64 bit counters inside TAU Compensation of perturbation introduced by tool Only show what application uses Create guards for TAU calls to not track I/O and memory allocations/de-allocations performed inside TAU Provide broad POSIX I/O and memory coverage % paraprof (Right-click label [e.g “node 0”]  Show Context Event Window) Copyright © ParaTools, Inc.

Communication Matrix Copyright © ParaTools, Inc.

3D Profile Visualization % paraprof (Windows  3D Visualization) Copyright © ParaTools, Inc.

How Does Each Routine Scale? % perfexplorer (Charts  Runtime Breakdown) Copyright © ParaTools, Inc.

How Does Each Routine Scale? % perfexplorer (Charts  Stacked Bar Chart) Copyright © ParaTools, Inc.

Which Events Correlate with Runtime? % perfexplorer (Charts  Correlate Events with Total Runtime) Copyright © ParaTools, Inc.

Source-based Instrumentation OpenSHMEM’17 Source-based Instrumentation Copyright © ParaTools, Inc.

Source-based vs. Sampling Direct via Probes Indirect via Sampling call TAU_START(‘potential’) // code call TAU_STOP(‘potential’) Exact measurement Fine-grain control Calls inserted into code No code modification Minimal effort Relies on debug symbols (-g option) Copyright © ParaTools, Inc.

What are the performance characteristics of my application? One target Many measurements One application … Measurement 0 Measurement N Application Target Copyright © ParaTools, Inc.

Create a New Experiment Select a new measurement to create a new experiment TAU Performance System® automatically reconfigured and recompiled. User advised that an application rebuild is required to use source-based instrumentation. Copyright © ParaTools, Inc.

Automatic TAU Configuration ./configure -tag=dea32fb3 -arch=craycnl -cc=icc -c++=icpc -fortran=intel -shmem -shmeminc=/opt/cray/pe/mpt/7.4.4/gni/sma/include -shmemlib=/opt/cray/pe/mpt/7.4.4/gni/sma/lib64 -shmemlibrary=-L/opt/cray/pe/libsci/16.09.1/INTEL/15.0/x86_64/lib#-L/opt/cray/dmapp/default/lib64#-L/opt/cray/pe/mpt/7.4.4/gni/mpich-intel/16.0/lib#-L/opt/cray/rca/2.1.6_g2c60fbf-2.265/lib64#-L/opt/cray/alps/6.3.4-2.21/lib64#-L/opt/cray/xpmem/2.1.1_gf9c9084-2.38/lib64#-L/opt/cray/dmapp/7.1.1-39.37/lib64#-L/opt/cray/pe/pmi/5.0.10-1.0000.11050.0.0.ari/lib64#-L/opt/cray/ugni/6.0.15-2.2/lib64#-L/opt/cray/udreg/2.3.2-7.54/lib64#-L/opt/cray/pe/atp/2.1.0/libApp#-L/opt/cray/wlm_detect/1.2.1-3.10/lib64#-lpthread#-lsma#-lpmi#-ldmapp#-lsci_intel_mpi#-lsci_intel#-lm#-ldl#-lmpich_intel#-lrt#-lugni#-lalpslli#-lwlm_detect#-lalpsutil#-lrca#-lxpmem#-ludreg#-lmpichcxx_intel#-lmpichf90_intel -pdt=/global/project/projectdirs/m88/jlinford/taucmdr-test/system/pdt/77f947dd -pdt_c++=icpc -useropt=-O2#-g Copyright © ParaTools, Inc.

Source-based Instrumentation Data Copyright © ParaTools, Inc.

Tracing OpenSHMEM with TAU Commander Copyright © ParaTools, Inc.

Measurement Approaches Profiling Tracing Shows how much time was spent in each routine Shows when events take place on a timeline Copyright © ParaTools, Inc.

New: OTF2 Now the Default Trace Format OTF2 dramatically improves on SLOG2: Smaller trace files Richer trace data, e.g. RMA events Better trace visualization (Vampir, Ravel) TAU can now generate OTF2 files natively: No Score-P required! Copyright © ParaTools, Inc.

TAU Commander with OpenSHMEM+OTF2 TAU_TRACE_FORMAT=otf2 OTF2 is 143MB vs. SLOG2 514MB Copyright © ParaTools, Inc.

ISx in Vampir Copyright © ParaTools, Inc.

Different Nodes, Different Timelines Copyright © ParaTools, Inc.

Get/Put Recorded as RMA Events Copyright © ParaTools, Inc.

OpenSHMEM’17 TAU Commander with SOS Copyright © ParaTools, Inc.

Create a New Target for SOS Copyright © ParaTools, Inc.

Select the SOS Target Copyright © ParaTools, Inc.

OpenSHMEM’17 TAU Commander and KNL Copyright © ParaTools, Inc.

Create a new Target for KNL Copyright © ParaTools, Inc.

KNL Performance Data Copyright © ParaTools, Inc.

Final Dashboard Copyright © ParaTools, Inc.

TAU Commander and CUDA/OpenCL OpenSHMEM’17 TAU Commander and CUDA/OpenCL Copyright © ParaTools, Inc.

tau init --cuda Copyright © ParaTools, Inc.

Run with `tau` as usual Copyright © ParaTools, Inc.

GPUs are shown as “Threads” Copyright © ParaTools, Inc.

Open the GPU “Thread” to see kernel time Copyright © ParaTools, Inc.

Non-GPU threads show CUDA calls Copyright © ParaTools, Inc.

Compiler-based Instrumentation Copyright © ParaTools, Inc.

OpenCL OpenCL is pretty much the same: tau init --opencl tau gcc *.c tau ./a.out Copyright © ParaTools, Inc.