GASP: A Performance Tool Interface for Global Address Space Languages & Libraries Adam Leko 1, Dan Bonachea 2, Hung-Hsun Su 1, Bryan Golden 1, Hans Sherburne.

Slides:



Advertisements
Similar presentations
Instructor Notes This lecture describes the different ways to work with multiple devices in OpenCL (i.e., within a single context and using multiple contexts),
Advertisements

MPI Message Passing Interface
Parallel Processing with OpenMP
Introduction to Openmp & openACC
Unified Parallel C at LBNL/UCB Implementing a Global Address Space Language on the Cray X1 Christian Bell and Wei Chen.
Software & Services Group PinPlay: A Framework for Deterministic Replay and Reproducible Analysis of Parallel Programs Harish Patil, Cristiano Pereira,
2003 Michigan Technological University March 19, Steven Seidel Department of Computer Science Michigan Technological University
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Potential Languages of the Future Chapel,
Introduction to the Partitioned Global Address Space (PGAS) Programming Model David E. Hudak, Ph.D. Program Director for HPC Engineering
NewsFlash!! Earth Simulator no longer #1. In slightly less earthshaking news… Homework #1 due date postponed to 10/11.
University of Houston Open Source Software Support for the OpenMP Runtime API for Profiling Oscar Hernandez, Ramachandra Nanjegowda, Van Bui, Richard Krufin.
Unified Parallel C at LBNL/UCB The Berkeley UPC Compiler: Implementation and Performance Wei Chen, Dan Bonachea, Jason Duell, Parry Husbands, Costin Iancu,
1 Presentation at the 4 th PMEO-PDS Workshop Benchmark Measurements of Current UPC Platforms Zhang Zhang and Steve Seidel Michigan Technological University.
1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
Unified Parallel C at LBNL/UCB UPC at LBNL/U.C. Berkeley Overview Kathy Yelick U.C. Berkeley, EECS LBNL, Future Technologies Group.
Unified Parallel C at LBNL/UCB Implementing a Global Address Space Language on the Cray X1: the Berkeley UPC Experience Christian Bell and Wei Chen CS252.
CS 240A: Models of parallel programming: Machines, languages, and complexity measures.
Active Messages: a Mechanism for Integrated Communication and Computation von Eicken et. al. Brian Kazian CS258 Spring 2008.
UPC at CRD/LBNL Kathy Yelick Dan Bonachea, Jason Duell, Paul Hargrove, Parry Husbands, Costin Iancu, Mike Welcome, Christian Bell.
Sameer Shende, Allen D. Malony Computer & Information Science Department Computational Science Institute University of Oregon.
1 MPI-2 and Threads. 2 What are Threads? l Executing program (process) is defined by »Address space »Program Counter l Threads are multiple program counters.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
06 April 2006 Parallel Performance Wizard: Analysis Module Professor Alan D. George, Principal Investigator Mr. Hung-Hsun Su, Sr. Research Assistant Mr.
A Free sample background from © 2001 By Default!Slide 1.NET Overview BY: Pinkesh Desai.
Budapest, November st ALADIN maintenance and phasing workshop Short introduction to OpenMP Jure Jerman, Environmental Agency of Slovenia.
Parallel Programming in Java with Shared Memory Directives.
MPI3 Hybrid Proposal Description
University of Kansas Electrical Engineering Computer Science Jerry James and Douglas Niehaus Information and Telecommunication Technology Center Electrical.
Paradyn Week – April 14, 2004 – Madison, WI DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Bernd Mohr Forschungszentrum.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,
TRACEREP: GATEWAY FOR SHARING AND COLLECTING TRACES IN HPC SYSTEMS Iván Pérez Enrique Vallejo José Luis Bosque University of Cantabria TraceRep IWSG'15.
Unified Parallel C at LBNL/UCB The Berkeley UPC Compiler: Implementation and Performance Wei Chen the LBNL/Berkeley UPC Group.
Compilation Technology SCINET compiler workshop | February 17-18, 2009 © 2009 IBM Corporation Software Group Coarray: a parallel extension to Fortran Jim.
(Superficial!) Review of Uniprocessor Architecture Parallel Architectures and Related concepts CS 433 Laxmikant Kale University of Illinois at Urbana-Champaign.
AN EXTENDED OPENMP TARGETING ON THE HYBRID ARCHITECTURE OF SMP-CLUSTER Author : Y. Zhao 、 C. Hu 、 S. Wang 、 S. Zhang Source : Proceedings of the 2nd IASTED.
11 July 2005 Tool Evaluation Scoring Criteria Professor Alan D. George, Principal Investigator Mr. Hung-Hsun Su, Sr. Research Assistant Mr. Adam Leko,
Co-Array Fortran Open-source compilers and tools for scalable global address space computing John Mellor-Crummey Rice University.
Overview of CrayPat and Apprentice 2 Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative.
SvPablo Evaluation Report Hans Sherburne, Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red:
Unified Parallel C at LBNL/UCB An Evaluation of Current High-Performance Networks Christian Bell, Dan Bonachea, Yannick Cote, Jason Duell, Paul Hargrove,
CS333 Intro to Operating Systems Jonathan Walpole.
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder.
UPC Performance Tool Interface Professor Alan D. George, Principal Investigator Mr. Hung-Hsun Su, Sr. Research Assistant Mr. Adam Leko, Sr. Research Assistant.
October 11, 2007 © 2007 IBM Corporation Multidimensional Blocking in UPC Christopher Barton, Călin Caşcaval, George Almási, Rahul Garg, José Nelson Amaral,
HPCToolkit Evaluation Report Hans Sherburne, Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red:
Evaluating and Improving an OpenMP-based Circuit Design Tool Tim Beatty, Dr. Ken Kent, Dr. Eric Aubanel Faculty of Computer Science University of New Brunswick.
1 Qualifying ExamWei Chen Unified Parallel C (UPC) and the Berkeley UPC Compiler Wei Chen the Berkeley UPC Group 3/11/07.
Unified Parallel C Kathy Yelick EECS, U.C. Berkeley and NERSC/LBNL NERSC Team: Dan Bonachea, Jason Duell, Paul Hargrove, Parry Husbands, Costin Iancu,
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Operating Systems Processes and Threads.
Parallel Performance Wizard: a Performance Analysis Tool for UPC (and other PGAS Models) Max Billingsley III 1, Adam Leko 1, Hung-Hsun Su 1, Dan Bonachea.
21 Sep UPC Performance Analysis Tool: Status and Plans Professor Alan D. George, Principal Investigator Mr. Hung-Hsun Su, Sr. Research Assistant.
Shangkar Mayanglambam, Allen D. Malony, Matthew J. Sottile Computer and Information Science Department Performance.
CCA Common Component Architecture Distributed Array Component based on Global Arrays Manoj Krishnan, Jarek Nieplocha High Performance Computing Group Pacific.
11 July 2005 Emergence of Performance Analysis Tools for Shared-Memory HPC in UPC or SHMEM Professor Alan D. George, Principal Investigator Mr. Hung-Hsun.
PAPI on Blue Gene L Using network performance counters to layout tasks for improved performance.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
UDI Technology Benefits Slide 1 Uniform Driver Interface UDI Technology Benefits.
Unified Parallel C at LBNL/UCB Berkeley UPC Runtime Report Jason Duell LBNL September 9, 2004.
Parallel Performance Wizard: A Generalized Performance Analysis Tool Hung-Hsun Su, Max Billingsley III, Seth Koehler, John Curreri, Alan D. George PPW.
Unified Parallel C at LBNL/UCB UPC at LBNL/U.C. Berkeley Overview Kathy Yelick LBNL and U.C. Berkeley.
Adding Concurrency to a Programming Language Peter A. Buhr and Glen Ditchfield USENIX C++ Technical Conference, Portland, Oregon, U. S. A., August 1992.
Chapter 4 – Thread Concepts
Chapter 4 – Thread Concepts
Computer Engg, IIT(BHU)
Introduction to OpenMP
Allen D. Malony Computer & Information Science Department
Dynamic Binary Translators and Instrumenters
Presentation transcript:

GASP: A Performance Tool Interface for Global Address Space Languages & Libraries Adam Leko 1, Dan Bonachea 2, Hung-Hsun Su 1, Bryan Golden 1, Hans Sherburne 1, Alan D. George 1 1 Electrical Engineering Dept., University of Florida 2 Computer Science Dept., UC Berkeley

2 Outline Introduction GASP Overview GASP Interface GASP Overhead Conclusions Future Directions

3 Global Address Space (GAS) Languages/Libraries Unified Parallel C (UPC), Co-Array Fortran, Titanium, SHMEM, etc. Properties:  Provides a shared address space abstraction  Includes one-sided communication operations (put/get) Available for a wide range of system architectures (both shared-memory machines and distributed systems) in the form of compiler or library Implementation’s internals may vary greatly from one system to another for the same language Performance can be comparable with MPI code But, generally requires hand-tuning  Performance tool support would help greatly

4 Motivation for Tool Interface (1) Minimal performance tool support for GAS programs. Why is that?  Newer languages/libraries  Complicated compilers Take UPC for example, several different implementation strategies  Direct compilation (GCC-UPC, Cray UPC)  Translator + Library approach (Berkeley UPC w/GASNET, HP UPC)  One-sided memory operations tracking support  Shared-data tracking support

5 Motivation for Tool Interface (2) Performance tool support strategies  Direct source instrumentation May prevent compiler optimizations/reorganization How to deal with relaxed memory model?  Binary instrumentation Not available on some architectures Difficult to relate back to source code  Intermediate libraries Wrappers for functions/procedures Does not work for “pure compilers”  Performance interface Defines basic interaction between compiler and performance tool Up to compiler developers to decide how to best incorporate the interface (wrapper, translation, etc.)

6 Overview (1) Global Address Space Performance (GASP) interface Event-based interface  GAS compiler/runtime communicate with performance tools using standard interface  Performance tool is notified when particular actions happen at runtime  Implementation-agnostic Notification structure  Function “callback” to tool developer code  Use a single function name (gasp_event_notify) Pass in event ID and source code location Use varargs for rest of arguments (like printf )  Notifications can come from compiler/runtime (system events) or from code (user events)  Allows calls to the source language/library to make model-specific queries

7 Overview (2) enum gasp_event_type {gasp_event_type_start, gasp_event_type_end, gasp_event_type_atomic}; void gasp_event_notify( unsigned int event_id, enum gasp_event_type event_type, const char* source_file, unsigned int source_line, unsigned int source_col,...);

8 System Event Types (1) All system events have symbolic names defined in gasp_[language].h Startup and shutdown  Initialization called by each process after GAS runtime has been initialized  Exit called before all threads stop (two types of events: collective exit & non-collective exit) Synchronization  Fence, notify, wait, barrier start/end  Lock functions

9 System Event Types (2) Work sharing  Forall start/end Collective events  Broadcast, scatter, gather, etc. Shared variable access  Direct (through variable manipulations)  Indirect (through bulk transfer functions)  Non-blocking operations User functions  Beginning and end of desirable user functions

10 User-Defined Event Type Allow user to give context to performance data Can be used for  Instrumenting individual loops or regions in user code  Phase profiling  Hand instrumentation Simple language-independent API  gasp_create_event() creates an event with a description  gasp_event_start(), gasp_event_end() notify tool of region entry/exit  Event start/end functions also take a variable number of arguments (printf-style display) inside performance tool

11 Instrumentation & Measurement Control Provide finer instrumentation and measurement control --profile flag  Instructs compiler to instrument all events for use with performance tool  Compiler should instrument all events, except Shared local accesses Accesses that have been privatized through optimizations --profile-local flag  Instruments everything as in --profile, but also includes shared local accesses #pragma pupc [on / off] directive  Controls instrumentation during compile time, only has effect when --profile or --profile-local have been used  Instructs compiler to avoid instrumentation for specific regions of code, if possible pupc_control(int on); function call  Controls measurement during runtime done by performance tool

12 Vendor Support UPC  Berkeley UPC GASP implemented within runtime library Supported with Berkeley UPC 2.4+  --enable-profile configure-time option  HP UPC HP verbally agreed to support GASP at UPC ’05 workshop Unfortunately, GASP work has been pushed back at the moment  Cray UPC, MuPC, GCC-UPC GASP support pending Others  Titanium GASP support is in the pipeline  Support for other languages/libraries pending

13 I – instrumentation only, empty calls (with or without instrumentation for local events) I+M – actual events are recorded through a preliminary measurement layer PAPI – measurement layer records PAPI hardware counter events * Overhead expected to be much lower when replacing gettimeofday() with a high-performing timer All test executed on dual 2.4GHZ opteron cluster (32 nodes) with Quadrics interconnect

14 *All results are for NAS benchmark 2.4 class A unless noted otherwise

15 Conclusions GASP specifies a standard event-based performance interface for GAS languages/libraries Preliminary version of GASP includes UPC support (w/ low overhead, working implementation available for Berkeley UPC) Performance interface should be an integral part of a language/library design & implementation effort  Fairly straightforward for compiler developers to add support  Avoid interference with compiler optimization  But how do we get language/library implementer’s support? GASP is integrated with a new performance analysis tool (Parallel Performance Wizard) we are currently developing for UPC and SHMEM Specifications for other GAS languages/libraries are forthcoming  May even extend beyond GAS languages/libraries to include other parallel languages/libraries such as OpenMP, MPI-2, X10, Fortress, Chapel, etc. For more info, see  

16

17 Future Directions GASP serves as a starting point for generic parallel performance analysis We are currently investigating the possibility of a generic parallel performance analysis approach that deals with event types rather than language constructs/library functions  Execution model does not differ significantly between parallel languages/libraries  Once a generic set of analyses is developed, it should be applicable to all languages/libraries  Adding performance analysis support for a new language/libraries simplifies to Enabling instrumentation and measurement of events (i.e. GASP) Creating a mapping of language constructs/library functions to event types Small modification to visualizations to better present the result  Analysis of program involving multiple languages/libraries is possible

18

19 Q&A