Software Group © 2006 IBM Corporation Compiler Technology Task, thread and processor — OpenMP 3.0 and beyond Guansong Zhang, IBM Toronto Lab.

Slides:



Advertisements
Similar presentations
MPI Message Passing Interface
Advertisements

OpenMP Optimization National Supercomputing Service Swiss National Supercomputing Center.
Parallel Processing with OpenMP
Introduction to Openmp & openACC
A Process Splitting Transformation for Kahn Process Networks Sjoerd Meijer.
Lecture 6: Multicore Systems
Introductions to Parallel Programming Using OpenMP
The OpenUH Compiler: A Community Resource Barbara Chapman University of Houston March, 2007 High Performance Computing and Tools Group
A Dynamic World, what can Grids do for Multi-Core computing? Daniel Goodman, Anne Trefethen and Douglas Creager
NewsFlash!! Earth Simulator no longer #1. In slightly less earthshaking news… Homework #1 due date postponed to 10/11.
Carnegie Mellon Lessons From Building Spiral The C Of My Dreams Franz Franchetti Carnegie Mellon University Lessons From Building Spiral The C Of My Dreams.
Types of Parallel Computers
University of Houston So What’s Exascale Again?. University of Houston The Architects Did Their Best… Scale of parallelism Multiple kinds of parallelism.
March 18, 2008SSE Meeting 1 Mary Hall Dept. of Computer Science and Information Sciences Institute Multicore Chips and Parallel Programming.
MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.
Revisiting a slide from the syllabus: CS 525 will cover Parallel and distributed computing architectures – Shared memory processors – Distributed memory.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts Essentials – 2 nd Edition Chapter 4: Threads.
Introduction CS 524 – High-Performance Computing.
1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP.
Parallel Programming Models and Paradigms
Contemporary Languages in Parallel Computing Raymond Hummel.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Programming the Cell Multiprocessor Işıl ÖZ. Outline Cell processor – Objectives – Design and architecture Programming the cell – Programming models CellSs.
Parallel Computing Project (OPENMP using LINUX for Parallel application) Summer 2008 Group Project Instructor: Prof. Nagi Mekhiel August 12 th,, 2008 Ravi.
SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov Software and Services.
High Performance Computation --- A Practical Introduction Chunlin Tian NAOC Beijing 2011.
1 Parallel Programming With OpenMP. 2 Contents  Overview of Parallel Programming & OpenMP  Difference between OpenMP & MPI  OpenMP Programming Model.
CC02 – Parallel Programming Using OpenMP 1 of 25 PhUSE 2011 Aniruddha Deshmukh Cytel Inc.
Parallel Programming in Java with Shared Memory Directives.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
MPI3 Hybrid Proposal Description
OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
CCA Common Component Architecture Manoj Krishnan Pacific Northwest National Laboratory MCMD Programming and Implementation Issues.
Compiler BE Panel IDC HPC User Forum April 2009 Don Kretsch Director, Sun Developer Tools Sun Microsystems.
OpenMP – Introduction* *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)
04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.
Computer Organization David Monismith CS345 Notes to help with the in class assignment.
Hardware process When the computer is powered up, it begins to execute fetch-execute cycle for the program that is stored in memory at the boot strap entry.
MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.
Work Replication with Parallel Region #pragma omp parallel { for ( j=0; j
Parallelization of likelihood functions for data analysis Alfio Lazzaro CERN openlab Forum on Concurrent Programming Models and Frameworks.
1. 2 Pipelining vs. Parallel processing  In both cases, multiple “things” processed by multiple “functional units” Pipelining: each thing is broken into.
Introduction to OpenMP Eric Aubanel Advanced Computational Research Laboratory Faculty of Computer Science, UNB Fredericton, New Brunswick.
Processor Architecture
Hardware process When the computer is powered up, it begins to execute fetch-execute cycle for the program that is stored in memory at the boot strap entry.
Porting processes to threads with MPC instead of forking Some slides from Marc Tchiboukdjian (IPDPS’12) : Hierarchical Local Storage Exploiting Flexible.
Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 4: Threads.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
Special Topics in Computer Engineering OpenMP* Essentials * Open Multi-Processing.
NUMA Control for Hybrid Applications Kent Milfeld TACC May 5, 2015.
Tuning Threaded Code with Intel® Parallel Amplifier.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.
Chapter 4: Multithreaded Programming
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
For Massively Parallel Computation The Chaotic State of the Art
High Performance Computing on an IBM Cell Processor --- Bioinformatics
Computer Engg, IIT(BHU)
Many-core Software Development Platforms
Chapter 4: Threads.
Chapter 4: Threads.
Chapter 4: Threads & Concurrency
Research: Past, Present and Future
OpenMP Parallel Programming
Question 1 How are you going to provide language and/or library (or other?) support in Fortran, C/C++, or another language for massively parallel programming.
Presentation transcript:

Software Group © 2006 IBM Corporation Compiler Technology Task, thread and processor — OpenMP 3.0 and beyond Guansong Zhang, IBM Toronto Lab

Compiler technology 2 Overview  The purpose of the talk –Introducing the latest improvement on the OpenMP standard Task Still under discussion, don’t take the syntax –Considerations for the future OpenMP development Thread affinity

Compiler technology 3 The changing world  Hardware improvement –Development of the multicore system Soon we will have more processors than we know how to program with IBM to Build World's First Cell Broadband Engine Based Supercomputer Intel: Quad core to turbocharge chips Terra Soft to Build Cell-Based Super Out of PS3 Beta Iron  OpenMP Standard –C/C++ and Fortran standard are merged into 2.5  Other language committee –C++ memory model: atomic access

Compiler technology 4 More changes in the OpenMP world  New players –Microsoft just joined the OpenMP ARB …, and Visual C++® 2005 supports the full standard. OpenMP is also supported by the Xbox 360™ platform –GCC The GOMP project is developing an implementation of OpenMP for the C, C++, and Fortran 95 compilers in the GNU Compiler Collection

Compiler technology 5 Overview  The purpose of the talk –Introducing the latest improvement on the OpenMP standard Task Still under discussion, don’t take the syntax –Considerations for the future OpenMP development Thread affinity

Compiler technology 6 Workshare and task pool  What is a workshare

Compiler technology 7 Workshare and task pool (cont.)  What is a task  Not a workshare –But still “sharing/cooperating” between threads  Comparing with a workshare –Unit can be generated –Unit can wait for another generated unit

Compiler technology 8 Task examples RRecursive algorithm int fib(int n) { int x, y; if (n<2) return n; #pragma omp taskgroup { #pragma omp task common(x) x=fib(n-1); #pragma omp task common(y) y=fib(n-2); } return x+y; } PPointer chasing #pragma omp parallel { #pragma omp single { while(p) { #pragma omp task process(p) p=p->next; } } }

Compiler technology 9 Task schedule  More flexible scheduling –Can a task be multi-threaded? When a task is encountered, the thread always go for the new task  Advantage –The idea is to provide one more level of abstraction Task centric view Try to avoid thread starvation Potential cache reuse  Disadvantage –Threadprivate No threadprivate data –Thread id HPC users may need thread id to localize data access. –Locks Locks’ owner becomes confusing

Compiler technology 10 Overview  The purpose of the talk –Introducing the latest improvement on the OpenMP standard Task Still under discussion, don’t take the syntax –Considerations for the future OpenMP development Thread affinity

Compiler technology 11 Emerging architectures

Compiler technology 12 Performance number Stride 1 Stride 2

Compiler technology 13 Thread affinity  Nested parallelism Organize threads to multi levels (This is in previous OpenMP standard already)  Thread grouping Balancing the number of threads available and the parallelism in the code  Thread mapping Associate each OpenMP thread to physical/logical processors

Compiler technology 14 How to represent a thread group Environment VarExplicit indexDescriptor handle User interfaceNo touch for user code Simple data type; Possible multiple changes in the source. New internal type, allow centralized thread group programming Modularity (procedure calls, library functions) No supportPass level, CPU array and array size Pass group type var, which may be used as an execution context (MPI) Nested par (thread to thread affinity) Fixed in advance, no dynamic adjustment according to user input Specify number of threads at different levels Specify thread composition Mapping threadsImplementation defined Supported, through Virtual CPU numbers Supported, through omp_get_procs() Heterogeneous system No support (?)Supported, different kinds of CPU with same numbering scheme? Different groups

Compiler technology 15 What is for the future  Performance is still our goal –“OpenMP is about performance.” Quoted from NASA scientists  OpenMP needs to enlarge itself for the broader market –C/C++ will become more interesting –People like to see non numeric programs in OpenMP  Partition OpenMP interface as different layers –TASK, WORKSHARE vs. THREAD vs. PROC –MPI has more than 300 calls, most people will only use 6-8 –Keep the layered approach while we extending OpenMP ?

Compiler technology 16 Summary  Start a parallel region  Split into two nested parallel regions –This is the chance to bind threads to the right processors  Start a task region –For independent works E.g. game objects  Start a workshare –For computation intensive calculation E.g. graphic rendering

Compiler technology 17 Q&A