Paradyn Week – April 14, 2004 – Madison, WI DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Bernd Mohr Forschungszentrum.

Slides:



Advertisements
Similar presentations
Performance Analysis Tools for High-Performance Computing Daniel Becker
Advertisements

Machine Learning-based Autotuning with TAU and Active Harmony Nicholas Chaimov University of Oregon Paradyn Week 2013 April 29, 2013.
NewsFlash!! Earth Simulator no longer #1. In slightly less earthshaking news… Homework #1 due date postponed to 10/11.
1 Lawrence Livermore National Laboratory By Chunhua (Leo) Liao, Stephen Guzik, Dan Quinlan A node-level programming model framework for exascale computing*
The Path to Multi-core Tools Paul Petersen. Multi-coreToolsThePathTo 2 Outline Motivation Where are we now What is easy to do next What is missing.
University of Houston Open Source Software Support for the OpenMP Runtime API for Profiling Oscar Hernandez, Ramachandra Nanjegowda, Van Bui, Richard Krufin.
1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.
Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP.
OmpP: A Profiling Tool for OpenMP Karl Fürlinger Michael Gerndt {fuerling, Technische Universität München.
Sameer Shende Department of Computer and Information Science Neuro Informatics Center University of Oregon Tool Interoperability.
On the Integration and Use of OpenMP Performance Tools in the SPEC OMP2001 Benchmarks Bernd Mohr 1, Allen D. Malony 2, Rudi Eigenmann 3 1 Forschungszentrum.
1 ITCS4145/5145, Parallel Programming B. Wilkinson Feb 21, 2012 Programming with Shared Memory Introduction to OpenMP.
Towards a Performance Tool Interface for OpenMP: An Approach Based on Directive Rewriting Bernd Mohr, Felix Wolf Forschungszentrum Jülich John von Neumann.
OpenMPI Majdi Baddourah
A Very Short Introduction to OpenMP Basile Schaeli EPFL – I&C – LSP Vincent Keller EPFL – STI – LIN.
On the Integration and Use of OpenMP Performance Tools in the SPEC OMP2001 Benchmarks Rudi Eigenmann Department of Electrical and Computer Engineering.
Advanced Computing Technology Center © 2005 IBM Corporation The IBM High Performance Computing Toolkit Guojing Cong.
1 Parallel Programming With OpenMP. 2 Contents  Overview of Parallel Programming & OpenMP  Difference between OpenMP & MPI  OpenMP Programming Model.
Budapest, November st ALADIN maintenance and phasing workshop Short introduction to OpenMP Jure Jerman, Environmental Agency of Slovenia.
Programming with Shared Memory Introduction to OpenMP
Shared Memory Parallelization Outline What is shared memory parallelization? OpenMP Fractal Example False Sharing Variable scoping Examples on sharing.
CC02 – Parallel Programming Using OpenMP 1 of 25 PhUSE 2011 Aniruddha Deshmukh Cytel Inc.
Parallel Programming in Java with Shared Memory Directives.
OMPi: A portable C compiler for OpenMP V2.0 Elias Leontiadis George Tzoumas Vassilios V. Dimakopoulos University of Ioannina.
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
Chapter 17 Shared-Memory Programming. Introduction OpenMP is an application programming interface (API) for parallel programming on multiprocessors. It.
OpenMP China MCP.
UAB Dynamic Monitoring and Tuning in Multicluster Environment Genaro Costa, Anna Morajko, Paola Caymes Scutari, Tomàs Margalef and Emilio Luque Universitat.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Adventures in Mastering the Use of Performance Evaluation Tools Manuel Ríos Morales ICOM 5995 December 4, 2002.
OpenMP Blue Waters Undergraduate Petascale Education Program May 29 – June
Score-P – A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir Alexandru Calotoiu German Research School for.
Presented by KOJAK and SCALASCA Jack Dongarra, Shirley Moore, Farzona Pulatova, and Fengguang Song University of Tennessee and Oak Ridge National Laboratory.
Support for Debugging Automatically Parallelized Programs Robert Hood Gabriele Jost CSC/MRJ Technology Solutions NASA.
VAMPIR. Visualization and Analysis of MPI Resources Commercial tool from PALLAS GmbH VAMPIRtrace - MPI profiling library VAMPIR - trace visualization.
OpenMP – Introduction* *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)
Deep Computing © 2008 IBM Corporation The IBM High Performance Computing Toolkit Advanced Computing Technology Center
CS 838: Pervasive Parallelism Introduction to OpenMP Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from online references.
OpenMP fundamentials Nikita Panov
Threaded Programming Lecture 4: Work sharing directives.
Portable Parallel Performance Tools Shirley Browne, UTK Clay Breshears, CEWES MSRC Jan 27-28, 1998.
Debugging parallel programs. Breakpoint debugging Probably the most widely familiar method of debugging programs is breakpoint debugging. In this method,
Introduction to OpenMP Eric Aubanel Advanced Computational Research Laboratory Faculty of Computer Science, UNB Fredericton, New Brunswick.
Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
MPI and OpenMP.
Threaded Programming Lecture 2: Introduction to OpenMP.
1 Using PMPI routines l PMPI allows selective replacement of MPI routines at link time (no need to recompile) l Some libraries already make use of PMPI.
21 Sep UPC Performance Analysis Tool: Status and Plans Professor Alan D. George, Principal Investigator Mr. Hung-Hsun Su, Sr. Research Assistant.
Performane Analyzer Performance Analysis and Visualization of Large-Scale Uintah Simulations Kai Li, Allen D. Malony, Sameer Shende, Robert Bell Performance.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
Other Tools HPC Code Development Tools July 29, 2010 Sue Kelly Sandia is a multiprogram laboratory operated by Sandia Corporation, a.
CEPBA-Tools experiences with MRNet and Dyninst Judit Gimenez, German Llort, Harald Servat
A Dynamic Tracing Mechanism For Performance Analysis of OpenMP Applications - Caubet, Gimenez, Labarta, DeRose, Vetter (WOMPAT 2001) - Presented by Anita.
Dynamic Tuning of Parallel Programs with DynInst Anna Morajko, Tomàs Margalef, Emilio Luque Universitat Autònoma de Barcelona Paradyn/Condor Week, March.
Presented by Jack Dongarra University of Tennessee and Oak Ridge National Laboratory KOJAK and SCALASCA.
Tuning Threaded Code with Intel® Parallel Amplifier.
NPACI Parallel Computing Institute August 19-23, 2002 San Diego Supercomputing Center S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED.
Introduction to OpenMP
Shared Memory Parallelism - OpenMP
TAU integration with Score-P
Computer Engg, IIT(BHU)
Supporting OpenMP and other Higher Languages in Dyninst
Introduction to OpenMP
Computer Science Department
A configurable binary instrumenter
Programming with Shared Memory Introduction to OpenMP
Allen D. Malony Computer & Information Science Department
Introduction to OpenMP
Presentation transcript:

Paradyn Week – April 14, 2004 – Madison, WI DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Bernd Mohr Forschungszentrum Jülich * Work done while authors were at IBM Research * Seetharami Seelam University of Texas El Paso Luiz DeRose Cray Inc. *

DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose 2 Outline  What is POMP  What is DPCL  IBM compiler and run-time library features that makes dPOMP possible  Implementation  POMP not supported features (and why)  Probe libraries  POMPROF  KOJAK  Conclusions

DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose 3 “Standard” OpenMP Monitoring API?  Problem:  OpenMP (unlike MPI) does not define standard monitoring interface  OpenMP is defined mainly by directives/pragmas Solution:  POMP: OpenMP Monitoring Interface  Joint Development Forschungszentrum Jülich University of Oregon  Presented at EWOMP’01, LACSI’01 and SC’01 “The Journal of Supercomputing”, 23, Aug

DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose 4 What is POMP?  Portable cross-platform/cross-language API to simplify the design and implementation of OpenMP tools  POMP was motivated by the MPI profiling interface (PMPI)  PMPI allows selective replacement of MPI routines at link time Used by most MPI performance tools VampirTrace MP-Profiler User Program Call MPI_Bcast Call MPI_Send MPI Library MPI_Bcast PMPI_Send MPI_Send MPI Library MPI_Bcast PMPI_Send MPI_Send Profiling Library MPI_Send

DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose 5 A Brief History of OpenMP Instrumentation  POMP1 OpenMP performance monitoring interface  Forschungszentrum Jülich, University of Oregon  Published at "The Journal of Supercomputing", Vol. 23,  European IST Project INTONE  Development of OpenMP tools (includes Monitoring interface)  Pallas, CEPBA, Royal Inst. Of Technology, Tech. Univ. Dresden   Intel KAI Software Laboratory (KSL) - POMP  Development of OpenMP monitoring interface inside ASCI  Based on POMP, but further developed in other directions  Joint proposal presented at EWOMP’02  POMP2 == POMP

DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose 6 POMP Proposal  Three groups of events  OpenMP constructs and directives/pragmas Enter/Exit around each OpenMP construct Begin/End around associated body Special case for parallel loops: ChunkBegin/End, IterBegin/End, or IterEvent instead of Begin/End “Single” events for small constructs like atomic or flush  OpenMP API calls Enter/Exit events around omp_set_*_lock() functions “single” events for all API functions  User functions and regions  Allows application programmers to specify and control amount of instrumentation

DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose 7 1: int main() { 2: int id; 3: 4: #pragma omp parallel private(id) 5: { 6: id = omp_get_thread_num(); 7: printf("hello from %d\n", id); 8: } 9: } Example: Standard Instrumentation 1: int main() { 2: int id; 3: 4: #pragma omp parallel private(id) 5: { 6: id = omp_get_thread_num(); 7: printf("hello from %d\n", id); 8: } 9: } *** POMP_Init(); *** POMP_Finalize(); *** { POMP_handle_t pomp_hd1 = 0; *** int32 pomp_tid = omp_get_thread_num(); *** int32 pomp_tid = omp_get_thread_num(); *** } *** POMP_Parallel_enter(&pomp_hd1, pomp_tid, -1, 1, *** "49*type=pregion*file=demo.c*slines=4,4*elines=8,8**"); *** POMP_Parallel_begin(pomp_hd1, pomp_tid); *** POMP_Parallel_end(pomp_hd1, pomp_tid); *** POMP_Parallel_exit(pomp_hd1, pomp_tid);

DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose 8 Dynamic Performance Monitoring Interface for OpenMP  Collaboration with Forschungszentrum Jülich  Motivation:  POMP under review by the OpenMP ARB! May take too long to be implemented (if accepted)  Approach  A POMP implementation based on dynamic probes Built on top of DPCL Modifies the binary with performance instrumentation No source code or re-compilation required

DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose 9 What Is DPCL?  IBM’s Dynamic Probe Class Library  Based on DynInst and Paradyn  Allows tools to insert data, functions, and code patches (probes) into a program dynamically  Call site  Call entry  Call exit  Probes can collect and report program information, program state, or modify the program execution  Probes may be placed at specific locations in the program and can be activated  Whenever execution reaches that location  By expiration of a timer  Exactly once

DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose 10 A() { } OMP loop Source code main() { } A() OMP parallel OMP end parallel DPOMP Instrumentation  The IBM compiler and run-time library run-time library Compiler generated A() { } xlf_Par main() { } A() master thread { } xlf_DoPar all threads do I=start,end loop body enddo { } POMP_Parallel_enter POMP_Parallel_exit POMP_Parallel_begin POMP_Parallel_end POMP_Loop_enter POMP_Loop_exit POMP_Loop_chunk_begin POMP_Loop_chunk_end POMP_Functionl_enter POMP_Functionl_exit

DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose 11 DPOMP Usage % dpomp  Input parameters  OpenMP application (or mixed-mode)  POMP compliant monitoring library  List of user functions to instrument (optional) dpomp [-f function.lst] libpomp a.out

DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose 12 DPOMP Instrumentation  Amount of instrumentation can be controlled  By the tool builder Set of POMP calls available in the monitoring library and/or  By the user Environment variables  Events instrumented by default:  All OpenMP constructs  All user functions called in the main program  All MPI Calls  Once the instrumentation is finished, the modified program is executed

DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose 13 Limitations  63 out of 68 POMP events supported !  Limitations due to compiler issues  POMP_Loop_iter_(begin, or end, or event)  POMP_Implicit_barrier_(end, or exit)  OMP Parallel Loop NOT = OMP Parallel / OMP Loop  Compile Time Context (CTC) hasFirstPrivate, hasLastPrivate, hasNowait, hasCopyin, schedule, hasOrdered, and hasCopypriv not available  Limitations due to DPCL issues  Loop iteration values (init, final, incr, chunk)  Other Limitations  C++ not support

DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose 14 POMP Profiler (POMPROF)  Generates a detailed profile describing overheads and time spent by each thread in three key regions of the parallel application:  Parallel regions  OpenMP loops inside a parallel region  User defined functions  Profile data is presented in the form of an XML file that can be visualized with PeekPerf

DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose 15 POMP Profiler (pomprof)

DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose 16 POMP Profiler (II)

DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose 17 KOJAK POMP Library (Forschungszentrum Juelich)  POMP monitoring library which generates EPILOG event traces  Processed by KOJAK’s automatic event tracer analyzer EXPERT Performance Property: Which type of behavior caused the problem? Call Tree: Where in the source code? In which context? Location: How is the problem distributed across the machine? Color Scale: How severe is the problem?

DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose 18 EPILOG Trace Converted to VTF3 (FZ Juelich)  EPILOG-to-VTF3  Maps OpenMP constructs into VAMPIR symbols and activities

DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose 19 Conclusions  DPCL based implementation of the POMP performance monitoring interface for OpenMP  Easy to use  Two POMP Libraries  KOJAK POMP Library  POMP Profiler Library or build your own library