Score-P – A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir Alexandru Calotoiu German Research School for.

Slides:



Advertisements
Similar presentations
Performance Analysis Tools for High-Performance Computing Daniel Becker
Advertisements

Machine Learning-based Autotuning with TAU and Active Harmony Nicholas Chaimov University of Oregon Paradyn Week 2013 April 29, 2013.
Houssam Haitof Technische Univeristät München
Profiling your application with Intel VTune at NERSC
NewsFlash!! Earth Simulator no longer #1. In slightly less earthshaking news… Homework #1 due date postponed to 10/11.
Dynamic performance measurement control Dynamic event grouping Multiple configurable counters Selective instrumentation Application-Level Performance Access.
Workload Characterization using the TAU Performance System Sameer Shende, Allen D. Malony, Alan Morris University of Oregon {sameer,
Robert Bell, Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science.
Small Tools and Interoperability Arend Rensink Formal Methods and Tools University of Twente.
Nick Trebon, Alan Morris, Jaideep Ray, Sameer Shende, Allen Malony {ntrebon, amorris, Department of.
INTRODUCTION TO PERFORMANCE ENGINEERING Allen D. Malony Performance Research Laboratory University of Oregon SC ‘09: Productive Performance Engineering.
On the Integration and Use of OpenMP Performance Tools in the SPEC OMP2001 Benchmarks Bernd Mohr 1, Allen D. Malony 2, Rudi Eigenmann 3 1 Forschungszentrum.
TAU Performance System Alan Morris, Sameer Shende, Allen D. Malony University of Oregon {amorris, sameer,
Performance Tools BOF, SC’07 5:30pm – 7pm, Tuesday, A9 Sameer S. Shende Performance Research Laboratory University.
Performance Instrumentation and Measurement for Terascale Systems Jack Dongarra, Shirley Moore, Philip Mucci University of Tennessee Sameer Shende, and.
June 2, 2003ICCS Performance Instrumentation and Measurement for Terascale Systems Jack Dongarra, Shirley Moore, Philip Mucci University of Tennessee.
Scalability Study of S3D using TAU Sameer Shende
Sameer Shende, Allen D. Malony Computer & Information Science Department Computational Science Institute University of Oregon.
Performance Technology for Complex Parallel Systems REFERENCES.
1 Score-P – A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir Markus Geimer 2), Bert Wesarg 1), Brian Wylie.
DKRZ Tutorial 2013, Hamburg Analysis report examination with CUBE Markus Geimer Jülich Supercomputing Centre.
Analysis report examination with CUBE Alexandru Calotoiu German Research School for Simulation Sciences (with content used with permission from tutorials.
Beyond Automatic Performance Analysis Prof. Dr. Michael Gerndt Technische Univeristät München
1 Performance Analysis with Vampir DKRZ Tutorial – 7 August, Hamburg Matthias Weber, Frank Winkler, Andreas Knüpfer ZIH, Technische Universität.
Paradyn Week – April 14, 2004 – Madison, WI DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Bernd Mohr Forschungszentrum.
SC’13: Hands-on Practical Hybrid Parallel Application Performance Engineering1 Score-P Hands-On CUDA: Jacobi example.
Lecture 8. Profiling - for Performance Analysis - Prof. Taeweon Suh Computer Science Education Korea University COM503 Parallel Computer Architecture &
TRACEREP: GATEWAY FOR SHARING AND COLLECTING TRACES IN HPC SYSTEMS Iván Pérez Enrique Vallejo José Luis Bosque University of Cantabria TraceRep IWSG'15.
SC’13: Hands-on Practical Hybrid Parallel Application Performance Engineering Introduction to VI-HPS Brian Wylie Jülich Supercomputing Centre.
Adventures in Mastering the Use of Performance Evaluation Tools Manuel Ríos Morales ICOM 5995 December 4, 2002.
12th VI-HPS Tuning Workshop, 7-11 October 2013, JSC, Jülich1 Score-P – A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca,
Presented by KOJAK and SCALASCA Jack Dongarra, Shirley Moore, Farzona Pulatova, and Fengguang Song University of Tennessee and Oak Ridge National Laboratory.
Integrated Performance Views in Charm++: Projections meets TAU Scott Biersdorff Allen D. Malony Department Computer and Information Science University.
Using TAU on SiCortex Alan Morris, Aroon Nataraj Sameer Shende, Allen D. Malony University of Oregon {amorris, anataraj, sameer,
VAMPIR. Visualization and Analysis of MPI Resources Commercial tool from PALLAS GmbH VAMPIRtrace - MPI profiling library VAMPIR - trace visualization.
March 17, 2005 Roadmap of Upcoming Research, Features and Releases Bart Miller & Jeff Hollingsworth.
Profile Analysis with ParaProf Sameer Shende Performance Reseaerch Lab, University of Oregon
11th VI-HPS Tuning Workshop, April 2013, MdS, Saclay1 Hands-on exercise: NPB-MZ-MPI / BT VI-HPS Team.
Overview of CrayPat and Apprentice 2 Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative.
Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
1 Performance Analysis with Vampir ZIH, Technische Universität Dresden.
© 2002 IBM Corporation Confidential | Date | Other Information, if necessary PTP 2.1 Release Review October 29, 2008.
Dynamic performance measurement control Dynamic event grouping Multiple configurable counters Selective instrumentation Application-Level Performance Access.
ASC Tri-Lab Code Development Tools Workshop Thursday, July 29, 2010 Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA This work.
Hands-on: NPB-MZ-MPI / BT VI-HPS Team. 10th VI-HPS Tuning Workshop, October 2012, Garching Local Installation VI-HPS tools accessible through the.
1 Typical performance bottlenecks and how they can be found Bert Wesarg ZIH, Technische Universität Dresden.
Preparatory Research on Performance Tools for HPC HCS Research Laboratory University of Florida November 21, 2003.
Introduction to OpenMP Eric Aubanel Advanced Computational Research Laboratory Faculty of Computer Science, UNB Fredericton, New Brunswick.
SC’13: Hands-on Practical Hybrid Parallel Application Performance Engineering Introduction to Parallel Performance Engineering Markus Geimer, Brian Wylie.
Threaded Programming Lecture 2: Introduction to OpenMP.
21 Sep UPC Performance Analysis Tool: Status and Plans Professor Alan D. George, Principal Investigator Mr. Hung-Hsun Su, Sr. Research Assistant.
CSCS-USI Summer School (Lugano, 8-19 July 2013)1 Hands-on exercise: NPB-MZ-MPI / BT VI-HPS Team.
ESMF,WRF and ROMS. Purposes Not a tutorial Not a tutorial Educational and conceptual Educational and conceptual Relation to our work Relation to our work.
Shangkar Mayanglambam, Allen D. Malony, Matthew J. Sottile Computer and Information Science Department Performance.
Integrated Performance Views in Charm++: Projections meets TAU Scott Biersdorff Allen D. Malony Department Computer and Information Science University.
Parallel Performance Measurement of Heterogeneous Parallel Systems with GPUs Allen D. Malony, Scott Biersdorff, Sameer Shende, Heike Jagode†, Stanimire.
SC’13: Hands-on Practical Hybrid Parallel Application Performance Engineering Analysis report examination with CUBE Markus Geimer Jülich Supercomputing.
Score-P – A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir Markus Geimer 1), Peter Philippen 1) Andreas.
EU-Russia Call Dr. Panagiotis Tsarchopoulos Computing Systems ICT Programme European Commission.
Presented by Jack Dongarra University of Tennessee and Oak Ridge National Laboratory KOJAK and SCALASCA.
Introduction to HPC Debugging with Allinea DDT Nick Forrington
COMPARING CROSS-PLATFORM DEVELOPMENT APPROACHES FOR MOBILE APPLICATIONS Henning Heitkötter, Sebastian Hanschke and Tim A. Majchrzak Department of Information.
Performance Tool Integration in Programming Environments for GPU Acceleration: Experiences with TAU and HMPP Allen D. Malony1,2, Shangkar Mayanglambam1.
Introduction to the TAU Performance System®
Performance Technology for Scalable Parallel Systems
Performance Analysis and optimization of parallel applications
Tutorial Outline Welcome (Malony)
TAU integration with Score-P
Advanced TAU Commander
A configurable binary instrumenter
Presentation transcript:

Score-P – A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir Alexandru Calotoiu German Research School for Simulation Sciences (with content used with permission from tutorials by Markus Geimer 1), Peter Philippen 1) Andreas Knüpfer 2), Thomas William 2), Sameer Shende 3) 1) FZ Jülich, 2) ZIH TU Dresden, 3) University of Oregon

EU COST IC0805 ComplexHPC school, 3-7 June 2013, Uppsala Fragmentation of Tools Landscape Several performance tools co-exist Separate measurement systems and output formats Complementary features and overlapping functionality Redundant effort for development and maintenance Limited or expensive interoperability Complications for user experience, support, training Vampir VampirTrace OTF VampirTrace OTF Scalasca EPILOG / CUBE TAU TAU native formats Periscope Online measurement

EU COST IC0805 ComplexHPC school, 3-7 June 2013, Uppsala SILC Project Idea Start a community effort for a common infrastructure –Score-P instrumentation and measurement system –Common data formats OTF2 and CUBE4 Developer perspective: –Save manpower by sharing development resources –Invest in new analysis functionality and scalability –Save efforts for maintenance, testing, porting, support, training User perspective: –Single learning curve –Single installation, fewer version updates –Interoperability and data exchange SILC project funded by BMBF Close collaboration PRIMA project funded by DOE

EU COST IC0805 ComplexHPC school, 3-7 June 2013, Uppsala Partners Forschungszentrum Jülich, Germany German Research School for Simulation Sciences, Aachen, Germany Gesellschaft für numerische Simulation mbH Braunschweig, Germany RWTH Aachen, Germany Technische Universität Dresden, Germany Technische Universität München, Germany University of Oregon, Eugene, USA

EU COST IC0805 ComplexHPC school, 3-7 June 2013, Uppsala Score-P Functionality Provide typical functionality for HPC performance tools Support all fundamental concepts of partner’s tools Instrumentation (various methods) Flexible measurement without re-compilation: –Basic and advanced profile generation –Event trace recording –Online access to profiling data MPI, OpenMP, and hybrid parallelism (and serial) Enhanced functionality (OpenMP 3.0, CUDA, highly scalable I/O)

EU COST IC0805 ComplexHPC school, 3-7 June 2013, Uppsala Design Goals Functional requirements –Generation of call-path profiles and event traces –Using direct instrumentation, later also sampling –Recording time, visits, communication data, hardware counters –Access and reconfiguration also at runtime –Support for MPI, OpenMP, basic CUDA, and all combinations Later also OpenCL/HMPP/PTHREAD/… Non-functional requirements –Portability: all major HPC platforms –Scalability: petascale –Low measurement overhead –Easy and uniform installation through UNITE framework –Robustness –Open Source: New BSD License

EU COST IC0805 ComplexHPC school, 3-7 June 2013, Uppsala Score-P Architecture Instrumentation wrapper Application (MPI×OpenMP×CUDA) VampirScalascaPeriscopeTAU Compiler OPARI 2 POMP2 CUDA User PDT TAU Score-P measurement infrastructure Event traces (OTF2) Call-path profiles (CUBE4, TAU) Online interface Hardware counter (PAPI, rusage) PMPI MPI

EU COST IC0805 ComplexHPC school, 3-7 June 2013, Uppsala Future Features and Management Scalability to maximum available CPU core count Support for OpenCL, HMPP, PTHREAD Support for sampling, binary instrumentation Support for new programming models, e.g., PGAS Support for new architectures Ensure a single official release version at all times which will always work with the tools Allow experimental versions for new features or research Commitment to joint long-term cooperation

EU COST IC0805 ComplexHPC school, 3-7 June 2013, Uppsala Score-P User Instrumentation API Can be used to mark initialization, solver & other phases –Annotation macros ignored by default –Enabled with [--user] flag Appear as additional regions in analyses –Distinguishes performance of important phase from rest Can be of various type –E.g., function, loop, phase –See user manual for details Available for Fortran / C / C++

EU COST IC0805 ComplexHPC school, 3-7 June 2013, Uppsala Score-P User Instrumentation API (Fortran) Requires processing by the C preprocessor #include "scorep/SCOREP_User.inc" subroutine foo(…) ! Declarations SCOREP_USER_REGION_DEFINE( solve ) ! Some code… SCOREP_USER_REGION_BEGIN( solve, “ ", \ SCOREP_USER_REGION_TYPE_LOOP ) do i=1,100 [...] end do SCOREP_USER_REGION_END( solve ) ! Some more code… end subroutine

EU COST IC0805 ComplexHPC school, 3-7 June 2013, Uppsala Score-P User Instrumentation API (C/C++) #include "scorep/SCOREP_User.h" void foo() { /* Declarations */ SCOREP_USER_REGION_DEFINE( solve ) /* Some code… */ SCOREP_USER_REGION_BEGIN( solve, “ ", \ SCOREP_USER_REGION_TYPE_LOOP ) for (i = 0; i < 100; i++) { [...] } SCOREP_USER_REGION_END( solve ) /* Some more code… */ }

EU COST IC0805 ComplexHPC school, 3-7 June 2013, Uppsala Score-P User Instrumentation API (C++) #include "scorep/SCOREP_User.h" void foo() { // Declarations // Some code… { SCOREP_USER_REGION( “ ", SCOREP_USER_REGION_TYPE_LOOP ) for (i = 0; i < 100; i++) { [...] } // Some more code… }

EU COST IC0805 ComplexHPC school, 3-7 June 2013, Uppsala Score-P Measurement Control API Can be used to temporarily disable measurement for certain intervals –Annotation macros ignored by default –Enabled with [--user] flag #include “scorep/SCOREP_User.inc” subroutine foo(…) ! Some code… SCOREP_RECORDING_OFF() ! Loop will not be measured do i=1,100 [...] end do SCOREP_RECORDING_ON() ! Some more code… end subroutine #include “scorep/SCOREP_User.h” void foo(…) { /* Some code… */ SCOREP_RECORDING_OFF() /* Loop will not be measured */ for (i = 0; i < 100; i++) { [...] } SCOREP_RECORDING_ON() /* Some more code… */ } Fortran (requires C preprocessor) C / C++

EU COST IC0805 ComplexHPC school, 3-7 June 2013, Uppsala Further Information Score-P –Community instrumentation & measurement infrastructure Instrumentation (various methods) Basic and advanced profile generation Event trace recording Online access to profiling data –Available under New BSD open-source license –Documentation & Sources: –User guide also part of installation: /share/doc/scorep/{pdf,html}/ –Contact: –Bugs: