1 Score-P – A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir Markus Geimer 2), Bert Wesarg 1), Brian Wylie.

Slides:

Advertisements

Similar presentations

Performance Analysis Tools for High-Performance Computing Daniel Becker

Advertisements

Machine Learning-based Autotuning with TAU and Active Harmony Nicholas Chaimov University of Oregon Paradyn Week 2013 April 29, 2013.

Houssam Haitof Technische Univeristät München

Dynamic performance measurement control Dynamic event grouping Multiple configurable counters Selective instrumentation Application-Level Performance Access.

Robert Bell, Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science.

Presenter : Shih-Tung Huang Tsung-Cheng Lin Kuan-Fu Kuo 2015/6/15 EICE team Model-Level Debugging of Embedded Real-Time Systems Wolfgang Haberl, Markus.

Nick Trebon, Alan Morris, Jaideep Ray, Sameer Shende, Allen Malony {ntrebon, amorris, Department of.

On the Integration and Use of OpenMP Performance Tools in the SPEC OMP2001 Benchmarks Bernd Mohr 1, Allen D. Malony 2, Rudi Eigenmann 3 1 Forschungszentrum.

Performance Tools BOF, SC’07 5:30pm – 7pm, Tuesday, A9 Sameer S. Shende Performance Research Laboratory University.

Performance Instrumentation and Measurement for Terascale Systems Jack Dongarra, Shirley Moore, Philip Mucci University of Tennessee Sameer Shende, and.

June 2, 2003ICCS Performance Instrumentation and Measurement for Terascale Systems Jack Dongarra, Shirley Moore, Philip Mucci University of Tennessee.

Performance Evaluation of S3D using TAU Sameer Shende

Scalability Study of S3D using TAU Sameer Shende

Sameer Shende, Allen D. Malony Computer & Information Science Department Computational Science Institute University of Oregon.

Performance Technology for Complex Parallel Systems REFERENCES.

Hossein Bastan Isfahan University of Technology 1/23.

1 Parallel Performance Analysis with Open|SpeedShop Trilab Tools-Workshop Martin Schulz, LLNL/CASC LLNL-PRES

DKRZ Tutorial 2013, Hamburg Analysis report examination with CUBE Markus Geimer Jülich Supercomputing Centre.

Analysis report examination with CUBE Alexandru Calotoiu German Research School for Simulation Sciences (with content used with permission from tutorials.

DKRZ Tutorial 2013, Hamburg1 Hands-on: NPB-MZ-MPI / BT VI-HPS Team.

Beyond Automatic Performance Analysis Prof. Dr. Michael Gerndt Technische Univeristät München

1 Performance Analysis with Vampir DKRZ Tutorial – 7 August, Hamburg Matthias Weber, Frank Winkler, Andreas Knüpfer ZIH, Technische Universität.

Paradyn Week – April 14, 2004 – Madison, WI DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Bernd Mohr Forschungszentrum.

SC’13: Hands-on Practical Hybrid Parallel Application Performance Engineering1 Score-P Hands-On CUDA: Jacobi example.

Lecture 8. Profiling - for Performance Analysis - Prof. Taeweon Suh Computer Science Education Korea University COM503 Parallel Computer Architecture &

SC’13: Hands-on Practical Hybrid Parallel Application Performance Engineering Automatic trace analysis with Scalasca Markus Geimer, Brian Wylie, David.

TRACEREP: GATEWAY FOR SHARING AND COLLECTING TRACES IN HPC SYSTEMS Iván Pérez Enrique Vallejo José Luis Bosque University of Cantabria TraceRep IWSG'15.

SC’13: Hands-on Practical Hybrid Parallel Application Performance Engineering Introduction to VI-HPS Brian Wylie Jülich Supercomputing Centre.

Adventures in Mastering the Use of Performance Evaluation Tools Manuel Ríos Morales ICOM 5995 December 4, 2002.

Score-P – A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir Alexandru Calotoiu German Research School for.

Presented by KOJAK and SCALASCA Jack Dongarra, Shirley Moore, Farzona Pulatova, and Fengguang Song University of Tennessee and Oak Ridge National Laboratory.

Using TAU on SiCortex Alan Morris, Aroon Nataraj Sameer Shende, Allen D. Malony University of Oregon {amorris, anataraj, sameer,

VAMPIR. Visualization and Analysis of MPI Resources Commercial tool from PALLAS GmbH VAMPIRtrace - MPI profiling library VAMPIR - trace visualization.

March 17, 2005 Roadmap of Upcoming Research, Features and Releases Bart Miller & Jeff Hollingsworth.

Profile Analysis with ParaProf Sameer Shende Performance Reseaerch Lab, University of Oregon

4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.

11th VI-HPS Tuning Workshop, April 2013, MdS, Saclay1 Hands-on exercise: NPB-MZ-MPI / BT VI-HPS Team.

Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,

1 Performance Analysis with Vampir ZIH, Technische Universität Dresden.

CE Operating Systems Lecture 3 Overview of OS functions and structure.

Leibniz Supercomputing Centre Garching/Munich Matthias Brehm HPC Group June 16.

Software Overview Environment, libraries, debuggers, programming tools and applications Jonathan Carter NUG Training 3 Oct 2005.

Dynamic performance measurement control Dynamic event grouping Multiple configurable counters Selective instrumentation Application-Level Performance Access.

ASC Tri-Lab Code Development Tools Workshop Thursday, July 29, 2010 Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA This work.

Hands-on: NPB-MZ-MPI / BT VI-HPS Team. 10th VI-HPS Tuning Workshop, October 2012, Garching Local Installation VI-HPS tools accessible through the.

Debugging parallel programs. Breakpoint debugging Probably the most widely familiar method of debugging programs is breakpoint debugging. In this method,

1 Typical performance bottlenecks and how they can be found Bert Wesarg ZIH, Technische Universität Dresden.

Lawrence Livermore National Laboratory S&T Principal Directorate - Computation Directorate Tools and Scalable Application Preparation Project Computation.

Performance Analysis with Parallel Performance Wizard Prashanth Prakash, Research Assistant Dr. Vikas Aggarwal, Research Scientist. Vrishali Hajare, Research.

Basic UNIX Concepts. Why We Need an Operating System (OS) OS interacts with hardware and manages programs. A safe environment for programs to run is required.

21 Sep UPC Performance Analysis Tool: Status and Plans Professor Alan D. George, Principal Investigator Mr. Hung-Hsun Su, Sr. Research Assistant.

CSCS-USI Summer School (Lugano, 8-19 July 2013)1 Hands-on exercise: NPB-MZ-MPI / BT VI-HPS Team.

Parallel Performance Measurement of Heterogeneous Parallel Systems with GPUs Allen D. Malony, Scott Biersdorff, Sameer Shende, Heike Jagode†, Stanimire.

Testing plan outline Adam Leko Hans Sherburne HCS Research Laboratory University of Florida.

Presented by Jack Dongarra University of Tennessee and Oak Ridge National Laboratory KOJAK and SCALASCA.

 Can access all API’s made available by OS vendor.  SDK’s are platform-specific.  Each mobile OS comes with its own unique tools and GUI toolkit.

Online Software November 10, 2009 Infrastructure Overview Luciano Orsini, Roland Moser Invited Talk at SuperB ETD-Online Status Review.

Fermilab Scientific Computing Division Fermi National Accelerator Laboratory, Batavia, Illinois, USA. Off-the-Shelf Hardware and Software DAQ Performance.

Performance Tool Integration in Programming Environments for GPU Acceleration: Experiences with TAU and HMPP Allen D. Malony1,2, Shangkar Mayanglambam1.

Kai Li, Allen D. Malony, Sameer Shende, Robert Bell

Using Ada-C/C++ Changer as a Converter Automatically convert to C/C++ to reuse or redeploy your Ada code Eliminate the need for a costly and.

Introduction to the TAU Performance System®

Thanks for attending the ParaTools TAU Webex!

Performance Analysis and optimization of parallel applications

TAU integration with Score-P

Advanced TAU Commander

A configurable binary instrumenter

Allen D. Malony Computer & Information Science Department

Parallel Program Analysis Framework for the DOE ACTS Toolkit

TEE-Perf A Profiler for Trusted Execution Environments

Presentation transcript:

1 Score-P – A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir Markus Geimer 2), Bert Wesarg 1), Brian Wylie 2) With contributions from Andreas Knüpfer 1) and Christian Rössel 2) 1) ZIH TU Dresden, 2) FZ Jülich

2 Fragmentation of Tools Landscape Several performance tools co-exist Separate measurement systems and output formats Complementary features and overlapping functionality Redundant effort for development and maintenance Limited or expensive interoperability Complications for user experience, support, training Vampir VampirTrace OTF VampirTrace OTF Scalasca EPILOG / CUBE TAU TAU native formats Periscope Online measurement

3 SILC Project Idea Start a community effort for a common infrastructure –Score-P instrumentation and measurement system –Common data formats OTF2 and CUBE4 Developer perspective: –Save manpower by sharing development resources –Invest in new analysis functionality and scalability –Save efforts for maintenance, testing, porting, support, training User perspective: –Single learning curve –Single installation, fewer version updates –Interoperability and data exchange SILC project funded by BMBF Close collaboration PRIMA project funded by DOE

4 Partners Forschungszentrum Jülich, Germany German Research School for Simulation Sciences, Aachen, Germany Gesellschaft für numerische Simulation mbH Braunschweig, Germany RWTH Aachen, Germany Technische Universität Dresden, Germany Technische Universität München, Germany University of Oregon, Eugene, USA

5 Score-P Functionality Provide typical functionality for HPC performance tools Support all fundamental concepts of partner’s tools Instrumentation (various methods) Flexible measurement without re-compilation: –Basic and advanced profile generation –Event trace recording –Online access to profiling data MPI, OpenMP, and hybrid parallelism (and serial) Enhanced functionality (OpenMP 3.0, CUDA, highly scalable I/O)

6 Design Goals Functional requirements –Generation of call-path profiles and event traces –Using direct instrumentation, later also sampling –Recording time, visits, communication, hardware counters, and metrics from custom sources –Access and reconfiguration also at runtime –Support for MPI, OpenMP, CUDA, and all combinations Later also Pthreads/HMPP/OmpSs/OpenCL/… Non-functional requirements –Portability: all major HPC platforms –Scalability: petascale –Low measurement overhead –Easy and uniform installation through UNITE framework –Robustness –Open Source: New BSD License

7 Score-P Architecture Instrumentation wrapper Application (MPI×OpenMP×CUDA) VampirScalascaPeriscopeTAU Compiler OPARI 2 POMP2 CUDA User PDT TAU Score-P measurement infrastructure Event traces (OTF2) Call-path profiles (CUBE4, TAU) Online interface Hardware counter (PAPI, rusage) PMPI MPI

8 Future Features and Management Scalability to maximum available CPU core count Support for Pthreads, HMPP, OmpSs, OpenCL Support for sampling, binary instrumentation Support for new programming models e.g., PGAS –GASPI, OpenSHMEM, UPC Support for new architectures Ensure a single official release version at all times which will always work with the tools Allow experimental versions for new features or research Commitment to joint long-term cooperation

9 Workflow Step 1: Instrumentation Prefix compile/link commands e.g., or, more convenient Customization via options e.g., --pdt --user % scorep --help for all options Automatic paradigm detection (serial/OpenMP/MPI/hybrid) Manual approach, get libs and includes via scorep- config mpicc –c foo.cscorep mpicc –c foo.c # in Makefile MPICC = $(PREP) mpicc % make PREP=“scorep [options]”

10 Workflow Step 2: Run-Time Recording Just run your instrumented program! Customize via environment variables –Switch between profiling/tracing/online (w/o recompilation) –Select MPI groups to record e.g., P2P, COLL, … –Specify total measurement memory –Trace output options e.g., SION –Hardware counter (PAPI, rusage) –Get help from scorep-info tool Customize via files –In/exclude regions by name/wildcards from recording (filtering) –Restrict trace recording to specified executions of a region (selective tracing) Data written to uniquely named directory

11 Conclusions Common measurement part is community effort –Use released resources for analysis Unite file formats, open for adoption Online access interface, open for adoption Scalability/flexibility/overhead limitations addressed Easy to extend due to layered architecture Robust due to extensive and continuous testing Long-term commitment –Partners have extensive history of collaboration

12 Score-P Team Dieter an Mey, Scott Biersdorf, Kai Diethelm, Dominic Eschweiler, Markus Geimer, Michael Gerndt, Houssam Haitof, Rene Jäkel, Koutaiba Kassem, Andreas Knüpfer, Daniel Lorenz, Suzanne Millstein, Bernd Mohr, Yury Oleynik, Peter Philippen, Christian Rössel, Pavel Saviankou, Dirk Schmidl, Sameer Shende, Wyatt Spear, Ronny Tschüter, Michael Wagner, Bert Wesarg, Felix Wolf, Brian Wylie