SC‘13: Hands-on Practical Hybrid Parallel Application Performance Engineering Hands-on example code: NPB-MZ-MPI / BT (on Live-ISO/DVD) VI-HPS Team.

Slides:



Advertisements
Similar presentations
Profiling your application with Intel VTune at NERSC
Advertisements

Thoughts on Shared Caches Jeff Odom University of Maryland.
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
DKRZ Tutorial 2013, Hamburg1 Score-P – A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir Frank Winkler 1),
MPI – An introduction by Jeroen van Hunen What is MPI and why should we use it? Simple example + some basic MPI functions Other frequently used MPI functions.
Source Code Version Management and Configuration Control Art Amezcua Status 11/5/2007.
Introduction to CCSv5. Outline  Intro to CCSv5 Intro to CCSv5  Functional Overview Functional Overview  Perspectives Perspectives  Projects Projects.
ISG We build general capability Purpose After this tutorial, you should: Be comfortable submitting work to the batch queuing system of olympus and be familiar.
Task Farming on HPCx David Henty HPCx Applications Support
Input/Output Controller (IOC) Overview Andrew Johnson Computer Scientist, AES Controls Group.
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
Automatic trace analysis with Scalasca Alexandru Calotoiu German Research School for Simulation Sciences (with content used with permission from tutorials.
BSC tools hands-on session. 2 Objectives Copy ~nct00001/tools-material into your ${HOME} –cp –r ~nct00001/tools-material ${HOME} Contents of.
70-294: MCSE Guide to Microsoft Windows Server 2003 Active Directory Chapter 12: Deploying and Managing Software with Group Policy.
DKRZ Tutorial 2013, Hamburg Analysis report examination with CUBE Markus Geimer Jülich Supercomputing Centre.
Analysis report examination with CUBE Alexandru Calotoiu German Research School for Simulation Sciences (with content used with permission from tutorials.
Extending the Discovery Environment: Tool Integration and Customization.
DKRZ Tutorial 2013, Hamburg1 Hands-on: NPB-MZ-MPI / BT VI-HPS Team.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
SC’13: Hands-on Practical Hybrid Parallel Application Performance Engineering1 Score-P Hands-On CUDA: Jacobi example.
Lecture 8. Profiling - for Performance Analysis - Prof. Taeweon Suh Computer Science Education Korea University COM503 Parallel Computer Architecture &
SC’13: Hands-on Practical Hybrid Parallel Application Performance Engineering Automatic trace analysis with Scalasca Markus Geimer, Brian Wylie, David.
12th VI-HPS Tuning Workshop, 7-11 October 2013, JSC, Jülich1 Score-P – A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca,
Apache Web Server v. 2.2 Reference Manual Chapter 1 Compiling and Installing.
Introduction Use of makefiles to manage the build process Declarative, imperative and relational rules Environment variables, phony targets, automatic.
Configuration Management (CM)
Parallel Computing Through MPI Technologies Author: Nyameko Lisa Supervisors: Prof. Elena Zemlyanaya, Prof Alexandr P. Sapozhnikov and Tatiana F. Sapozhnikov.
OpenMP – Introduction* *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)
Tutorial build Main ideas –Reuse as much previously obtained configuration information as possible: from Babel, cca-spec-babel, etc. –Extract all irrelevant.
DDT Debugging Techniques Carlos Rosales Scaling to Petascale 2010 July 7, 2010.
Profile Analysis with ParaProf Sameer Shende Performance Reseaerch Lab, University of Oregon
LCG Middleware Testing in 2005 and Future Plans E.Slabospitskaya, IHEP, Russia CERN-Russia Joint Working Group on LHC Computing March, 6, 2006.
11th VI-HPS Tuning Workshop, April 2013, MdS, Saclay1 Hands-on exercise: NPB-MZ-MPI / BT VI-HPS Team.
Hands-on: NPB-MZ-MPI / BT VI-HPS Team. 10th VI-HPS Tuning Workshop, October 2012, Garching Local Installation VI-HPS tools accessible through the.
National Center for Supercomputing ApplicationsNational Computational Science Grid Packaging Technology Technical Talk University of Wisconsin Condor/GPT.
1 Typical performance bottlenecks and how they can be found Bert Wesarg ZIH, Technische Universität Dresden.
Configuration Management CSCI 5801: Software Engineering.
Connections to Other Packages The Cactus Team Albert Einstein Institute
Threaded Programming Lecture 2: Introduction to OpenMP.
An Overview of ROMS Code Kate Hedstrom, ARSC April 2007.
CSCS-USI Summer School (Lugano, 8-19 July 2013)1 Hands-on exercise: NPB-MZ-MPI / BT VI-HPS Team.
Speedup for Multi-Level Parallel Computing School of Computer Engineering Nanyang Technological University 21 st May 2012 Shanjiang Tang, Bu-Sung Lee,
SC’13: Hands-on Practical Hybrid Parallel Application Performance Engineering Analysis report examination with CUBE Markus Geimer Jülich Supercomputing.
OPTIMIZATION OF DIESEL INJECTION USING GRID COMPUTING Miguel Caballer Universidad Politécnica de Valencia.
Installing and Running the WPS Michael Duda 2006 WRF-ARW Summer Tutorial.
Score-P – A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir Markus Geimer 1), Peter Philippen 1) Andreas.
Cliff Addison University of Liverpool NW-GRID Training Event 26 th January 2007 SCore MPI Taking full advantage of GigE.
Installing VERITAS Cluster Server. Topic 1: Using the VERITAS Product Installer After completing this topic, you will be able to install VCS using the.
Debugging Lab Antonio Gómez-Iglesias Texas Advanced Computing Center.
Chapter 7: Delivery, Installation, and Documentation Ronald J. Leach Copyright Ronald J. Leach, 1997, 2009, 2014,
Hernán García CeCalcULA Universidad de los Andes.
1 COMP 3500 Introduction to Operating Systems Project 4 – Processes and System Calls Part 3: Adding System Calls to OS/161 Dr. Xiao Qin Auburn University.
E Copyright © 2006, Oracle. All rights reserved. Using SQL Developer.
HP-SEE FFTs Using FFTW and FFTE Libraries Aleksandar Jović Institute of Physics Belgrade, Serbia Scientific Computing Laboratory
Software Engineering Algorithms, Compilers, & Lifecycle.
Starting Analysis with Athena (Esteban Fullana Torregrosa) Rik Yoshida High Energy Physics Division Argonne National Laboratory.
C Copyright © 2009, Oracle. All rights reserved. Using SQL Developer.
Travel Modelling Group Technical Advisory Committee
Granular Flow Simulations
Virtualisation for NA49/NA61
Dag Toppe Larsen UiB/CERN CERN,
Dag Toppe Larsen UiB/CERN CERN,
Virtualisation for NA49/NA61
Releases and developments
Advanced TAU Commander
CMAQ PARALLEL PERFORMANCE WITH MPI AND OpenMP George Delic, Ph
Operation System Program 4
A configurable binary instrumenter
(Chapter 2) John Carelli, Instructor Kutztown University
Department of Intelligent Systems Engineering
Presentation transcript:

SC‘13: Hands-on Practical Hybrid Parallel Application Performance Engineering Hands-on example code: NPB-MZ-MPI / BT (on Live-ISO/DVD) VI-HPS Team

SC‘13: Hands-on Practical Hybrid Parallel Application Performance Engineering NPB-MZ-MPI suite The NAS Parallel Benchmark suite (MPI+OpenMP version) –Available from –3 benchmarks in Fortran77 –Configurable for various sizes & classes Move into the NPB3.3-MZ-MPI root directory Subdirectories contain source code for each benchmark –plus additional configuration and common code The provided distribution has already been configured for the tutorial, such that it's ready to “make” one or more of the benchmarks and install them into a (tool-specific) “bin” subdirectory 2 % cd Tutorial; ls bin/ common/ jobscript/ Makefile README.install SP-MZ/ BT-MZ/ config/ LU-MZ/ README README.tutorial sys/

SC‘13: Hands-on Practical Hybrid Parallel Application Performance Engineering Building an NPB-MZ-MPI benchmark Type “make” for instructions 3 % make =========================================== = NAS PARALLEL BENCHMARKS 3.3 = = MPI+OpenMP Multi-Zone Versions = = F77 = =========================================== To make a NAS multi-zone benchmark type make CLASS= NPROCS= where is “bt-mz”, “lu-mz”, or “sp-mz” is “S”, “W”, “A” through “F” is number of processes [...] *************************************************************** * Custom build configuration is specified in config/make.def * * Suggested tutorial exercise configuration for LiveISO/DVD: * * make bt-mz CLASS=W NPROCS=4 * *************************************************************** Hint: the recommended build configuration is available via % make suite

SC‘13: Hands-on Practical Hybrid Parallel Application Performance Engineering Building an NPB-MZ-MPI benchmark Specify the benchmark configuration –benchmark name: bt-mz, lu-mz, sp-mz –the number of MPI processes: NPROCS=4 –the benchmark class (S, W, A, B, C, D, E): CLASS=W 4 % make bt-mz CLASS=W NPROCS=4 cd BT-MZ; make CLASS=W NPROCS=4 VERSION= make: Entering directory 'BT-MZ' cd../sys; cc -o setparams setparams.c../sys/setparams bt-mz 4 W mpif77 -c -O3 -fopenmp bt.f [...] cd../common; mpif77 -c -O3 -fopenmp timers.f mpif77 –O3 -fopenmp -o../bin/bt-mz_W.4 \ bt.o initialize.o exact_solution.o exact_rhs.o set_constants.o \ adi.o rhs.o zone_setup.o x_solve.o y_solve.o exch_qbc.o \ solve_subs.o z_solve.o add.o error.o verify.o mpi_setup.o \../common/print_results.o../common/timers.o Built executable../bin/bt-mz_W.4 make: Leaving directory 'BT-MZ'

SC‘13: Hands-on Practical Hybrid Parallel Application Performance Engineering NPB-MZ-MPI / BT (Block Tridiagonal solver) What does it do? –Solves a discretized version of unsteady, compressible Navier- Stokes equations in three spatial dimensions –Performs 200 time-steps on a regular 3-dimensional grid Implemented in 20 or so Fortran77 source modules Uses MPI & OpenMP in combination –4 processes with 4 threads each should be reasonable don’t expect to see speed-up when run on a laptop! –bt-mz_W.4 should run in around 5 to 12 seconds on a laptop –bt-mz_B.4 is more suitable for dedicated HPC compute nodes Each class step takes around 10-15x longer 5

SC‘13: Hands-on Practical Hybrid Parallel Application Performance Engineering NPB-MZ-MPI / BT reference execution Launch as a hybrid MPI+OpenMP application 6 % cd bin % OMP_NUM_THREADS=4 mpiexec -np 4./bt-mz_W.4 NAS Parallel Benchmarks (NPB3.3-MZ-MPI) - BT-MZ MPI+OpenMP Benchmark Number of zones: 4 x 4 Iterations: 200 dt: Number of active processes: 4 Total number of threads: 16 ( 4.0 threads/process) Time step 1 Time step 20 Time step 40 [...] Time step 160 Time step 180 Time step 200 Verification Successful BT-MZ Benchmark Completed. Time in seconds = 5.57 Hint: save the benchmark output (or note the run time) to be able to refer to it later Alternatively execute script: % sh../jobscript/ISO/run.sh

SC‘13: Hands-on Practical Hybrid Parallel Application Performance Engineering Tutorial exercise steps The tutorial steps are similar and repeated for each tool Edit config/make.def to adjust build configuration –Modify specification of compiler/linker: MPIF77 Make clean and build new tool-specific executable Change to the directory containing the new executable before running it with the desired tool configuration 7 % make clean % make bt-mz CLASS=W NPROCS=4 Built executable../bin.$(TOOL)/bt-mz_W.4 % cd bin.$(TOOL) % export … % OMP_NUM_THREADS=4 mpiexec –np 4./bt-mz_W.4

SC‘13: Hands-on Practical Hybrid Parallel Application Performance Engineering NPB-MZ-MPI / BT: config/make.def 8 # SITE- AND/OR PLATFORM-SPECIFIC DEFINITIONS # # Items in this file may need to be changed for each platform. # # # The Fortran compiler used for MPI programs # MPIF77 = mpif77 # Alternative variants to perform instrumentation #MPIF77 = psc_instrument -u user,mpi,omp –s ${PROGRAM}.sir mpif77 #MPIF77 = tau_f90.sh #MPIF77 = scalasca -instrument mpif77 #MPIF77 = vtf77 –vt:hyb -vt:f77 mpif77 #MPIF77 = scorep --user mpif77 # PREP is a generic preposition macro for instrumentation preparation #MPIF77 = $(PREP) mpif77 # This links MPI Fortran programs; usually the same as ${MPIF77} FLINK = $(MPIF77)... Hint: uncomment one of these alternative compiler wrappers to perform instrumentation Default (no instrumentation)