Supporting OpenMP and other Higher Languages in Dyninst

Slides:



Advertisements
Similar presentations
Computer Science 320 Clumping in Parallel Java. Sequential vs Parallel Program Initial setup Execute the computation Clean up Initial setup Create a parallel.
Advertisements

OpenMP Optimization National Supercomputing Service Swiss National Supercomputing Center.
Introduction to Openmp & openACC
Scheduling and Performance Issues for Programming using OpenMP
NewsFlash!! Earth Simulator no longer #1. In slightly less earthshaking news… Homework #1 due date postponed to 10/11.
May 2, 2015©2006 Craig Zilles1 (Easily) Exposing Thread-level Parallelism  Previously, we introduced Multi-Core Processors —and the (atomic) instructions.
Indian Institute of Science Bangalore, India भारतीय विज्ञान संस्थान बंगलौर, भारत Supercomputer Education and Research Centre (SERC) Adapted from: o “MPI-Message.
Open[M]ulti[P]rocessing Pthreads: Programmer explicitly define thread behavior openMP: Compiler and system defines thread behavior Pthreads: Library independent.
PARALLEL PROGRAMMING WITH OPENMP Ing. Andrea Marongiu
1 OpenMP—An API for Shared Memory Programming Slides are based on:
1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP.
Introduction to OpenMP For a more detailed tutorial see: Look at the presentations.
1 ITCS4145/5145, Parallel Programming B. Wilkinson Feb 21, 2012 Programming with Shared Memory Introduction to OpenMP.
CSCI-6964: High Performance Parallel & Distributed Computing (HPDC) AE 216, Mon/Thurs 2-3:20 p.m. Pthreads (reading Chp 7.10) Prof. Chris Carothers Computer.
OpenMPI Majdi Baddourah
A Very Short Introduction to OpenMP Basile Schaeli EPFL – I&C – LSP Vincent Keller EPFL – STI – LIN.
1 Parallel Programming With OpenMP. 2 Contents  Overview of Parallel Programming & OpenMP  Difference between OpenMP & MPI  OpenMP Programming Model.
Budapest, November st ALADIN maintenance and phasing workshop Short introduction to OpenMP Jure Jerman, Environmental Agency of Slovenia.
Programming with Shared Memory Introduction to OpenMP
1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 5 Shared Memory Programming with OpenMP An Introduction to Parallel Programming Peter Pacheco.
Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (
Parallel Programming in Java with Shared Memory Directives.
Lecture 5: Shared-memory Computing with Open MP. Shared Memory Computing.
Chapter 17 Shared-Memory Programming. Introduction OpenMP is an application programming interface (API) for parallel programming on multiprocessors. It.
Paradyn Week – April 14, 2004 – Madison, WI DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Bernd Mohr Forschungszentrum.
OpenMP - Introduction Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
1 OpenMP Writing programs that use OpenMP. Using OpenMP to parallelize many serial for loops with only small changes to the source code. Task parallelism.
OpenMP OpenMP A.Klypin Shared memory and OpenMP Simple Example Threads Dependencies Directives Handling Common blocks Synchronization Improving load balance.
Lecture 8: OpenMP. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism / Implicit parallelism.
OpenMP – Introduction* *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)
CS 838: Pervasive Parallelism Introduction to OpenMP Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from online references.
Work Replication with Parallel Region #pragma omp parallel { for ( j=0; j
OpenMP fundamentials Nikita Panov
Threaded Programming Lecture 4: Work sharing directives.
Parallel Programming 0024 Week 10 Thomas Gross Spring Semester 2010 May 20, 2010.
Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (
Threaded Programming Lecture 2: Introduction to OpenMP.
Uses some of the slides for chapters 7 and 9 accompanying “Introduction to Parallel Computing”, Addison Wesley, 2003.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
Heterogeneous Computing using openMP lecture 2 F21DP Distributed and Parallel Technology Sven-Bodo Scholz.
CPE779: Shared Memory and OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois.
7/9/ Realizing Concurrency using Posix Threads (pthreads) B. Ramamurthy.
OpenMP Lab Antonio Gómez-Iglesias Texas Advanced Computing Center.
Embedded Systems MPSoC Architectures OpenMP: Exercises Alberto Bosio
B. Estrade, LSU – High Performance Computing Enablement Group OpenMP II B. Estrade.
Introduction to OpenMP
Shared Memory Parallelism - OpenMP
Lecture 5: Shared-memory Computing with Open MP
CS427 Multicore Architecture and Parallel Computing
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing A bug in the rwlock program Dr. Xiao Qin.
Open[M]ulti[P]rocessing
Computer Engg, IIT(BHU)
Introduction to OpenMP
Computer Science Department
Shared Memory Programming with OpenMP
Realizing Concurrency using Posix Threads (pthreads)
Multi-core CPU Computing Straightforward with OpenMP
Realizing Concurrency using the thread model
Introduction to High Performance Computing Lecture 20
Programming with Shared Memory Introduction to OpenMP
Allen D. Malony Computer & Information Science Department
Programming with Shared Memory
Multithreading Tutorial
Introduction to OpenMP
Realizing Concurrency using the thread model
Realizing Concurrency using Posix Threads (pthreads)
Programming with Shared Memory Specifying parallelism
OpenMP Parallel Programming
Presentation transcript:

Supporting OpenMP and other Higher Languages in Dyninst Nick Rutar University of Maryland

Parallel Language Support for Dyninst OpenMP and other parallel languages are becoming more popular Advantageous to parse and instrument New languages on horizon Want API to be extensible for adding languages Start with OpenMP Unless otherise specified, talk will be OpenMP UPC, Titanium, Fortress, X10, Chapel planned for future

OpenMP Parallel & Work-Sharing Constructs Main construct Do/for Loop parallelism Sections Non-iterative work sharing Single Executed by only one thread in the team Combined Parallel & Work-Sharing Parallel Do Parallel Sections

OpenMP Synchronization Constructs Master Only master thread operates on it Critical Area of code executed by one thread at a time Barrier All threads must reach point before execution continues Atomic Specific memory location updated atomically Flush Sync point that must have consistent view of memory Ordered Iterations in loop will be executed in same order as serial Has to be associated with a for directive

Parallel/Work Sharing Traits (Power) Sets up parallelism with Call to _xlsmpParSelf Register bookkeeping Set up parameters for parallel behavior Call to _xlsmp*_TPO This call then calls parallel regions discussed below Actual parallel regions stored in function Format <CallingFunction>@OL@<Var++> Parallel Functions(Regions) can call out Nested Constructs, e.g. Parallel, For, Ordered

Associated Setup Functions(Power) Parallel _xlsmpParRegionSetup_TPO Do/for _xlsmpWSDoSetup_TPO Sections _xlsmpWSSectSetup_TPO Single _xlsmpSingleSetup_TPO Parallel Do _xlsmpParallelDoSetup_TPO Parallel Sections -

Synchronization Traits (Power) Master Makes call to _xlsmpMaster_TPO Checks to see if master thread If so, explicitly calls a @OL function Critical Calls _xlsmpFlush Calls _xlsmpGetDefaultSLock Performs operation (no @OL call) Calls _xlsmpRelDefaultSLock

Synchronization Traits (Power) Barrier Calls _xlsmpBarrier_TPO Atomic Calls _xlsmpGetAtomicLock Performs operation(not an @OL call) Calls _xlsmpRelAtomicLock Flush Calls _xlsmpFlush Ordered Calls _xlsmpBeginOrdered_TPO Explicitly Calls @OL function to do operation Calls _xlsmpEndOrdered_TPO

Instrumentable Regions Instrument entire function of @OL call Entire region contained neatly within outlined function Parallel, Do, Section, Single, Ordered, Master Instrument region Make inst point immediately after given call Store info about end of region Critical, Ordered, Master, Atomic One instruction “region” Flush & Barrier calls can be instrumented Insert call to Flush or Barrier in an existing parallel region Loop Region Region consists of the instructions in parallel loop body

Bpatch_parRegion New class to deal with parallel languages Standard region functions getStartAddress() getEndAddress() size() getInstructions() Generic Parallel Functions getClause(const char * key) Language Specific Functions replaceOMPParameter(const char * key, int value)

getClause Accesses information about parallel region Every region has at least Region_Type key Enum for designating what region it is enum{OMP_NONE, OMP_PARALLEL, OMP_DO_FOR, …} Other language regions easily added Region Specific Keys OMP_DO_FOR CHUNK_SIZE NUM_ITERATIONS ORDERED SCHEDULE IF Not an attribute for do/for, would return -1 Documentation contain valid clauses API calls as well

replaceOMPParameter OpenMP passes in parameters to setup functions that dictate behavior Work Sharing Constructs If Nowait Loops Schedule Type Static, dynamic, guided, runtime Chunk Size We can dynamically modify these values Significantly change behavior without recompilation

Sample Code /* Instrument first instruction in each OpenMP Section Construct */ BPatch_thread* appThread= bPatch.createProcess() BPatch_image* appImage = appThread->getImage(); BPatch_Vector< BPatch_parRegion * > *appParRegions = appImage->getParRegions(); for(int i = 0; i < appParRegions->size(); i++) { int regionType = (*appParRegions)[i]->getClause("REGION_TYPE"); if (regionType != OMP_SECTIONS) continue; BPatch_Vector< BPatch_instruction *> *regionInstructions = (*appParRegions)[i]->getInstructions(); BPatch_instruction *bpInst = (*regionInstructions)[0]; long unsigned int firstAdd = (long unsigned int)bpInst->getAddress(); BPatch_point*point=appImage->createInstPointAtAddr ((caddr_t)firstAdd); appThread->insertSnippet( , *point, , ,); }

Current Status & Future Work Everything in talk implemented on Power Solaris Future Work Additional platforms for OpenMP support Utilization of features for performance tools Additional Language support UPC is next on list Support for shared/private variables Variables still handled as BPatch_[Local]Var No distinction between shared or private

Demo OpenMP implementation of Life Trivial nearest neighbor computation Ran on AIX, Power4 with 8 processors Progam has naive initial settings Schedule type static Chunk size of 1 Dynamically change chunk size 64 would be logical choice Would be default value assigned if not specified How much of a speed-up can we achieve??? Serial Time - 375.9727771282

Questions?