Scalability Study of S3D using TAU Sameer Shende

Slides:



Advertisements
Similar presentations
Machine Learning-based Autotuning with TAU and Active Harmony Nicholas Chaimov University of Oregon Paradyn Week 2013 April 29, 2013.
Advertisements

Dynamic performance measurement control Dynamic event grouping Multiple configurable counters Selective instrumentation Application-Level Performance Access.
Workload Characterization using the TAU Performance System Sameer Shende, Allen D. Malony, Alan Morris University of Oregon {sameer,
S3D: Performance Impact of Hybrid XT3/XT4 Sameer Shende
Allen D. Malony Department of Computer and Information Science Performance Research Laboratory University of Oregon Multi-Experiment.
Robert Bell, Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science.
Sameer Shende Department of Computer and Information Science Neuro Informatics Center University of Oregon Tool Interoperability.
Profiling S3D on Cray XT3 using TAU Sameer Shende
TAU Parallel Performance System DOD UGC 2004 Tutorial Allen D. Malony, Sameer Shende, Robert Bell Univesity of Oregon.
The TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen.
Nick Trebon, Alan Morris, Jaideep Ray, Sameer Shende, Allen Malony {ntrebon, amorris, Department of.
TAU Performance System
Allen D. Malony Department of Computer and Information Science Performance Research Laboratory University of Oregon Performance Technology.
Case Study: PETSc ex19  Non-linear solver (snes)  2-D driven cavity code  uses velocity-velocity formulation  finite difference discretization on a.
TAU Performance SystemS3D Scalability Study1 Total Execution Time.
Workshop on Performance Tools for Petascale Computing 9:30 – 10:30am, Tuesday, July 17, 2007, Snowbird, UT Sameer S. Shende
TAU Performance System Alan Morris, Sameer Shende, Allen D. Malony University of Oregon {amorris, sameer,
Performance Tools BOF, SC’07 5:30pm – 7pm, Tuesday, A9 Sameer S. Shende Performance Research Laboratory University.
Allen D. Malony Department of Computer and Information Science Computational Science Institute University of Oregon TAU Performance.
TAU PERFORMANCE SYSTEM Sameer Shende Alan Morris, Wyatt Spear, Scott Biersdorff Performance Research Lab Allen D. Malony, Shangkar Mayanglambam, Suzanne.
Allen D. Malony Department of Computer and Information Science Performance Research Laboratory NeuroInformatics Center University.
Workshop on Performance Tools for Petascale Computing 9:30 – 10:30am, Tuesday, July 17, 2007, Snowbird, UT Sameer S. Shende
Performance Evaluation of S3D using TAU Sameer Shende
TAU: Performance Regression Testing Harness for FLASH Sameer Shende
Scalability Study of S3D using TAU Sameer Shende
S3D: Comparing Performance of XT3+XT4 with XT4 Sameer Shende
The TAU Performance System Sameer Shende, Allen D. Malony, Robert Bell University of Oregon.
Sameer Shende, Allen D. Malony Computer & Information Science Department Computational Science Institute University of Oregon.
Performance Tools for Empirical Autotuning Allen D. Malony, Nick Chaimov, Kevin Huck, Scott Biersdorff, Sameer Shende
Allen D. Malony Performance Research Laboratory (PRL) Neuroinformatics Center (NIC) Department.
© 2008 Pittsburgh Supercomputing Center Performance Engineering of Parallel Applications Philip Blood, Raghu Reddy Pittsburgh Supercomputing Center.
Autotuning Large Computational Chemistry Codes PERI Principal Investigators: David H. Bailey (lead)Lawrence Berkeley National Laboratory Jack Dongarra.
Performance Evaluation of Hybrid MPI/OpenMP Implementation of a Lattice Boltzmann Application on Multicore Systems Department of Computer Science and Engineering,
SUPER 1 Bob Lucas University of Southern California Sept. 23, 2011 Science Pipeline Allen D. Malony University of Oregon May 6, 2014 Support for this work.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Chee Wai Lee, Allen D. Malony, Alan Morris Department of Computer and Information Science Performance Research.
Score-P – A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir Alexandru Calotoiu German Research School for.
Integrated Performance Views in Charm++: Projections meets TAU Scott Biersdorff Allen D. Malony Department Computer and Information Science University.
Using TAU on SiCortex Alan Morris, Aroon Nataraj Sameer Shende, Allen D. Malony University of Oregon {amorris, anataraj, sameer,
Allen D. Malony Department of Computer and Information Science Performance Research Laboratory University of Oregon Performance Technology.
Profiling Tools In Ranger Carlos Rosales, Kent Milfeld and Yaakoub Y. El Kharma
Profile Analysis with ParaProf Sameer Shende Performance Reseaerch Lab, University of Oregon
Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
Performance Monitoring Tools on TCS Roberto Gomez and Raghu Reddy Pittsburgh Supercomputing Center David O’Neal National Center for Supercomputing Applications.
Dynamic performance measurement control Dynamic event grouping Multiple configurable counters Selective instrumentation Application-Level Performance Access.
PerfExplorer Component for Performance Data Analysis Kevin Huck – University of Oregon Boyana Norris – Argonne National Lab Li Li – Argonne National Lab.
Allen D. Malony Department of Computer and Information Science TAU Performance Research Laboratory University of Oregon Discussion:
Tool Visualizations, Metrics, and Profiled Entities Overview [Brief Version] Adam Leko HCS Research Laboratory University of Florida.
Simplifying the Usage of Performance Evaluation Tools: Experiences with TAU and DyninstAPI Paradyn/Condor Week 2010, Rm 221, Fluno Center, U. of Wisconsin,
Shangkar Mayanglambam, Allen D. Malony, Matthew J. Sottile Computer and Information Science Department Performance.
Allen D. Malony Department of Computer and Information Science Performance Research Laboratory University.
Integrated Performance Views in Charm++: Projections meets TAU Scott Biersdorff Allen D. Malony Department Computer and Information Science University.
Allen D. Malony Department of Computer and Information Science Performance Research Laboratory.
Parallel Performance Measurement of Heterogeneous Parallel Systems with GPUs Allen D. Malony, Scott Biersdorff, Sameer Shende, Heike Jagode†, Stanimire.
TAU Performance System ® TAU is a profiling and tracing toolkit that supports programs written in C, C++, Fortran, Java, Python,
TAU Performance System Sameer Shende Performance Reseaerch Lab, University of Oregon
© 2010 Pittsburgh Supercomputing Center Performance Engineering of Parallel Applications Philip Blood, Raghu Reddy Pittsburgh Supercomputing Center.
Navigating TAU Visual Display ParaProf and TAU Portal Mahin Mahmoodi Pittsburgh Supercomputing Center 2010.
Performance Tool Integration in Programming Environments for GPU Acceleration: Experiences with TAU and HMPP Allen D. Malony1,2, Shangkar Mayanglambam1.
Kai Li, Allen D. Malony, Sameer Shende, Robert Bell
Introduction to the TAU Performance System®
Python Performance Evaluation with the TAU Performance System
Performance Technology for Scalable Parallel Systems
TAU integration with Score-P
TAU: Performance Technology for Productive, High Performance Computing
Allen D. Malony, Sameer Shende
A configurable binary instrumenter
TAU The 11th DOE ACTS Workshop
Allen D. Malony Computer & Information Science Department
TAU Performance DataBase Framework (PerfDBF)
Presentation transcript:

Scalability Study of S3D using TAU Sameer Shende

TAU Performance SystemS3D Scalability Study2 Acknowledgements  Alan Morris [UO]  Kevin Huck [UO]  Allen D. Malony [UO]  Kenneth Roche [ORNL]  Bronis R. de Supinski [LLNL] The performance data presented here is available at:

TAU Performance SystemS3D Scalability Study3 TAU Parallel Performance System   Multi-level performance instrumentation  Multi-language automatic source instrumentation  Flexible and configurable performance measurement  Widely-ported parallel performance profiling system  Computer system architectures and operating systems  Different programming languages and compilers  Support for multiple parallel programming paradigms  Multi-threading, message passing, mixed-mode, hybrid

TAU Performance SystemS3D Scalability Study4 Scalability Study  Harness testcase  Platform: Jaguar Cray XT3 at ORNL  1p  8p  64p  512p  Goal: to evaluate scaling properties of code regions  Scalability of MPI operations

TAU Performance SystemS3D Scalability Study5 Introduction to ParaProf: Main Window click left mouse button click right mouse button % paraprof *.ppk load all 1p, 8p, 64p, 512p profile datasets together

TAU Performance SystemS3D Scalability Study6 ParaProf: MFLOPs sorted by Exclusive Time

TAU Performance SystemS3D Scalability Study7 Source Code View

TAU Performance SystemS3D Scalability Study8 Comparison Window: Inclusive Time

TAU Performance SystemS3D Scalability Study9 Comparing Level 1 Data Cache Misses

TAU Performance SystemS3D Scalability Study10 CPU Resource Stalls

TAU Performance SystemS3D Scalability Study11 ParaProf: 3D view for 512 cpus - Jagged Edges!

TAU Performance SystemS3D Scalability Study12 MPI_Wait - Jagged Edges Seen in 3D Window pattern repeats every 8 cpus! 512 cpus

TAU Performance SystemS3D Scalability Study13 MPI_Wait - Histogram (Bins) View

TAU Performance SystemS3D Scalability Study14 Comparing MPI_Wait  MPI_Wait time increases steadily with processors!

TAU Performance SystemS3D Scalability Study15 PerfDMF: Performance Data Mgmt. Framework

TAU Performance SystemS3D Scalability Study16 PerfExplorer - Comparative Analysis  Relative speedup, efficiency  total runtime, by event, one event, by phase  Breakdown of total runtime  Group fraction of total runtime  Correlating events to total runtime  Timesteps per second

TAU Performance SystemS3D Scalability Study17 PerfExplorer TAU’s PerfDMF database S3D

TAU Performance SystemS3D Scalability Study18 PerfExplorer: Select Experiment & Analysis

TAU Performance SystemS3D Scalability Study19 Relative Efficiency By Event

TAU Performance SystemS3D Scalability Study20 Relative Efficiency For S3D - Weak Scaling

TAU Performance SystemS3D Scalability Study21 Relative Speedup

TAU Performance SystemS3D Scalability Study22 Relative Efficiency & Speedup for One Event

TAU Performance SystemS3D Scalability Study23 Data Mining: Event Correlation to Total Time r = 1 implies direct correlation

TAU Performance SystemS3D Scalability Study24 MPI Scaling

TAU Performance SystemS3D Scalability Study25 Total Runtime Breakdown by Events

TAU Performance SystemS3D Scalability Study26 S3D - Building with TAU  Change name of compiler in build/make.XT3  ftn=> tau_f90.sh  cc => tau_cc.sh  Set compile time environment variables  setenv TAU_MAKEFILE /spin/proj/perc/TOOLS/tau_latest/xt3/lib/ Makefile.tau-callpath-multiplecounters-mpi-papi-pdt-pgi  Choose callpath, PAPI counters, MPI profiling, PDT for source instrumentation  setenv TAU_OPTIONS ‘-optTauSelectFile=select.tau -optPreProcess’  Selective instrumentation file eliminates instrumentation in lightweight routines  Pre-process Fortran source code using cpp before compiling  Set runtime environment variables for instrumentation control and event PAPI counter selection in job submission script:  export TAU_THROTTLE=1  export COUNTER1 GET_TIME_OF_DAY  export COUNTER2 PAPI_FP_INS  export COUNTER3 PAPI_L1_DCM  export COUNTER4 PAPI_RES_STL  export COUNTER5 PAPI_L2_DCM

TAU Performance SystemS3D Scalability Study27 Selective Instrumentation in TAU % cat select.tau BEGIN_EXCLUDE_LIST MCADIF GETRATES TRANSPORT_M::MCAVIS_NEW MCEDIF MCACON CKYTCP THERMCHEM_M::MIXCP THERMCHEM_M::MIXENTH THERMCHEM_M::GIBBSENRG_ALL_DIMT CKRHOY MCEVAL4 THERMCHEM_M::HIS THERMCHEM_M::CPS THERMCHEM_M::ENTROPY END_EXCLUDE_LIST BEGIN_INSTRUMENT_SECTION loops routine="#" END_INSTRUMENT_SECTION

TAU Performance SystemS3D Scalability Study28 Getting Access to TAU on Jaguar  set path=(/spin/proj/perc/TOOLS/tau_latest/x86_64/bin $path)  Choose Stub Makefiles (TAU_MAKEFILE env. var.) from /spin/proj/perc/TOOLS/tau_latest/xt3/lib/Makefile.*  Makefile.tau-mpi-pdt-pgi (flat profile)  Makefile.tau-mpi-pdt-pgi-trace (event trace, for use with Vampir)  Makefile.tau-callpath-mpi-pdt-pgi (single metric, callpath profile)  Binaries of S3D can be found in:  ~sameer/scratch/S3D-BINARIES withtau »papi, multiplecounters, mpi, pdt, pgi options without_tau

TAU Performance SystemS3D Scalability Study29 Concluding Discussion  Performance tools must be used effectively  More intelligent performance systems for productive use  Evolve to application-specific performance technology  Deal with scale by “full range” performance exploration  Autonomic and integrated tools  Knowledge-based and knowledge-driven process  Performance observation methods do not necessarily need to change in a fundamental sense  More automatically controlled and efficiently use  Develop next-generation tools and deliver to community  Open source with support by ParaTools, Inc. 

TAU Performance SystemS3D Scalability Study30 Support Acknowledgements  Department of Energy (DOE)  Office of Science  LLNL, LANL, ORNL, ASC  PERI