2005/6/2 IWOMP05 1 IWOMP05 panel “OpenMP 3.0” Mitsuhisa Sato ( University of Tsukuba, Japan)

Slides:

Advertisements

Similar presentations

XcalableMP: A performance-aware scalable parallel programming language and "e-science" project Mitsuhisa Sato Center for Computational Sciences, University.

Advertisements

OpenMP Optimization National Supercomputing Service Swiss National Supercomputing Center.

Parallel Processing with OpenMP

Introduction to Openmp & openACC

Introductions to Parallel Programming Using OpenMP

IBM’s X10 Presentation by Isaac Dooley CS498LVK Spring 2006.

Introduction to the Partitioned Global Address Space (PGAS) Programming Model David E. Hudak, Ph.D. Program Director for HPC Engineering

Class CS 775/875, Spring 2011 Amit H. Kumar, OCCS Old Dominion University.

NewsFlash!! Earth Simulator no longer #1. In slightly less earthshaking news… Homework #1 due date postponed to 10/11.

1 Programming Explicit Thread-level Parallelism  As noted previously, the programmer must specify how to parallelize  But, want path of least effort.

Status report of XcalableMP project Mitsuhisa Sato University of Tsukuba On behalf of the parallel language WG This is the name of our language!

Types of Parallel Computers

PARALLEL PROGRAMMING WITH OPENMP Ing. Andrea Marongiu

DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.

Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP.

Introduction to OpenMP For a more detailed tutorial see: Look at the presentations.

Earth Simulator Jari Halla-aho Pekka Keränen. Architecture MIMD type distributed memory 640 Nodes, 8 vector processors each. 16GB shared memory per node.

Tile Reduction: the first step towards tile aware parallelization in OpenMP Ge Gan Department of Electrical and Computer Engineering Univ. of Delaware.

High Performance Communication using MPJ Express 1 Presented by Jawad Manzoor National University of Sciences and Technology, Pakistan 29 June 2015.

A Very Short Introduction to OpenMP Basile Schaeli EPFL – I&C – LSP Vincent Keller EPFL – STI – LIN.

A Source-to-Source OpenACC compiler for CUDA Akihiro Tabuchi †1 Masahiro Nakao †2 Mitsuhisa Sato †1 †1. Graduate School of Systems and Information Engineering,

Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı

Budapest, November st ALADIN maintenance and phasing workshop Short introduction to OpenMP Jure Jerman, Environmental Agency of Slovenia.

Shared Memory Parallelization Outline What is shared memory parallelization? OpenMP Fractal Example False Sharing Variable scoping Examples on sharing.

Parallel Programming in Java with Shared Memory Directives.

What is required for "standard" distributed parallel programming model? Mitsuhisa Sato Taisuke Boku and Jinpil Lee University of Tsukuba.

MPI3 Hybrid Proposal Description

ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.

1 OpenMP Writing programs that use OpenMP. Using OpenMP to parallelize many serial for loops with only small changes to the source code. Task parallelism.

CSE 260 – Parallel Processing UCSD Fall 2006 A Performance Characterization of UPC Presented by – Anup Tapadia Fallon Chen.

AN EXTENDED OPENMP TARGETING ON THE HYBRID ARCHITECTURE OF SMP-CLUSTER Author ： Y. Zhao 、 C. Hu 、 S. Wang 、 S. Zhang Source ： Proceedings of the 2nd IASTED.

OpenMP OpenMP A.Klypin Shared memory and OpenMP Simple Example Threads Dependencies Directives Handling Common blocks Synchronization Improving load balance.

OpenMP – Introduction* *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)

Co-Array Fortran Open-source compilers and tools for scalable global address space computing John Mellor-Crummey Rice University.

04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

Institute for Software Science – University of ViennaP.Brezany Parallel and Distributed Systems Peter Brezany Institute for Software Science University.

Hybrid MPI and OpenMP Parallel Programming

Work Replication with Parallel Region #pragma omp parallel { for ( j=0; j

Case Study in Computational Science & Engineering - Lecture 2 1 Parallel Architecture Models Shared Memory –Dual/Quad Pentium, Cray T90, IBM Power3 Node.

OpenMP fundamentials Nikita Panov

High-Performance Parallel Scientific Computing 2008 Purdue University OpenMP Tutorial Seung-Jai Min School of Electrical and Computer.

1 Parallel Programming Aaron Bloomfield CS 415 Fall 2005.

October 11, 2007 © 2007 IBM Corporation Multidimensional Blocking in UPC Christopher Barton, Călin Caşcaval, George Almási, Rahul Garg, José Nelson Amaral,

Eliminating affinity tests and simplifying shared accesses in UPC Rahul Garg*, Kit Barton*, Calin Cascaval** Gheorghe Almasi**, Jose Nelson Amaral* *University.

Introduction to OpenMP Eric Aubanel Advanced Computational Research Laboratory Faculty of Computer Science, UNB Fredericton, New Brunswick.

Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (

OpenMP for Networks of SMPs Y. Charlie Hu, Honghui Lu, Alan L. Cox, Willy Zwaenepoel ECE1747 – Parallel Programming Vicky Tsang.

Lecture 3: Computer Architectures

Threaded Programming Lecture 1: Concepts. 2 Overview Shared memory systems Basic Concepts in Threaded Programming.

Barnes Hut – A Broad Review Abhinav S Bhatele The 27th day of April, 2006.

Heterogeneous Computing using openMP lecture 2 F21DP Distributed and Parallel Technology Sven-Bodo Scholz.

CPE779: Shared Memory and OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois.

Multicore Applications in Physics and Biochemical Research Hristo Iliev Faculty of Physics Sofia University “St. Kliment Ohridski” 3 rd Balkan Conference.

OpenMP Lab Antonio Gómez-Iglesias Texas Advanced Computing Center.

Distributed and Parallel Processing George Wells.

Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming

Dieter an Mey Center for Computing and Communication, RWTH Aachen University, Germany V 3.0.

CS5102 High Performance Computer Systems Thread-Level Parallelism

Shared Memory Parallelism - OpenMP

SHARED MEMORY PROGRAMMING WITH OpenMP

Introduction to OpenMP

September 4, 1997 Parallel Processing (CS 667) Lecture 5: Shared Memory Parallel Programming with OpenMP* Jeremy R. Johnson Parallel Processing.

Distributed Shared Memory

Hybrid Parallel Programming

Distributed Systems CS

Implementing an OpenMP Execution Environment on InfiniBand Clusters

Hybrid Parallel Programming

HPC User Forum: Back-End Compiler Technology Panel

Hybrid Parallel Programming

Shared-Memory Paradigm & OpenMP

Presentation transcript:

2005/6/2 IWOMP05 1 IWOMP05 panel “OpenMP 3.0” Mitsuhisa Sato ( University of Tsukuba, Japan)

2005/6/2 IWOMP05 2 Final comments Performance of OpenMP for SDSM –good for some applications, but sometimes bad –it depends on network performance. We should look at PC-clusters –High performance and good cost-performance “will converge to cluster of small SMP nodes” (by 2001) –Large scale SMPs can survive? Mixed OpenMP-MPI does not help unless you already have MPI code. –“Life is too short for MPI” (by T-shirts We should learn from HPF. in EWOMP03 panel “What are the necessary ingredients for scalable OpenMP programming”

2005/6/2 IWOMP05 3 Many idea and proposals, so far …. Task queue construct (by KAI) conditional variable in critical construct processor-binding nested parallelism, multi-dimensional parallel loop post/wait in sections construct (task-level parallelism) (by UPC?) For DSM –next touch –mapping directives & affinity scheduling for loop (Omni/SCASH) “Threadshared” in Cluster OpenMP (KAI?) …. OpenMP on Software DSM for distributed memory –Very attractive, but … –Limitation of shared memory model for a large-scale system (100 ～ processors) Requires a large single address (naming) space to map the whole data. may require large amount of memory and TLBs ….

2005/6/2 IWOMP05 4 For OpenMP3.0 Core spec. to define programming model Hints directives for performance tuning (esp. for DSM) Extensions to distributed memory Core Spec (OpenMP α ） Hints directives for performance tuning extensions to distributed memory

2005/6/2 IWOMP05 5 OpenMP3.0 Core Spec Core spec to define programming model –mandatory spec. to be compliant –OpenMP α –Candidates (α) may include: Task queue construct (by KAI) conditional variable in critical construct processor-binding nested parallelism, multi-dimensional parallel loop post/wait in sections construct (task-level parallelism)

2005/6/2 IWOMP05 6 Hint directives For performance tuning –Performance is a key for HPC! –Not mandatory it can be ignored –May include: To exploit locality (esp. for Hardware/Software DSM) –next touch/first touch –mapping directives & affinity scheduling for loop For better (loop) scheduling –…? ….

2005/6/2 IWOMP05 7 Extensions for distributed memory “We should look at PC cluster (distributed memory)” –Everybody says “OpenMP is good, but no help for cluster…” Should be defined outside of OpenMP –may be nested with OpenMP core spec. Inside node, OpenMP core spec. outside node, use the extensions Candidates will be: –“Threadshared” in Cluster OpenMP by KAI “private” is default. “shared” must be specifed. –UPC –CAF –(HPF?, too much!?) –We have proposed “OpenMPI” (not Open MPI!) in the last EWOMP

2005/6/2 IWOMP05 8 An example of “OpenMPI” #pragma ompi distvar (dim=1,sleeve=1) double u[YSIZE + 2][XSIZE + 2]; #pragma ompi distvar (dim=1) double nu[YSIZE + 2][XSIZE + 2]; #pragma ompi distvar (dim=0) int p[YSIZE + 2][XSIZE + 2]; #pragma ompi for for(j = 1; j <= XSIZE; j++) u[i][j] = 1.0; #pragma ompi for for(j = 1; j <= XSIZE; j++){ u[0][j] = 10.0; u[YSIZE+1][j] = 10.0; } XSIZE+2 YSIZE+2 u nu p Array distribution Data consistency –With “sleeve” notation, necessary data are exchanged among neighboring processes Data reduction Data synchronization –Pseudo global variables should be synchronized

2005/6/2 IWOMP05 9 OpenMP for distributed Memory? Limitation of shared memory model for very large-scale system (100 ～ processors) –Requires a large single address (naming) space to map the whole data. –may require large amount of memory and TLBs …. –64 bit address space is required. Distributed Array like in HPF –A portion of array is stored into each processor. –It is different from uniform shared memory address space OK in Fortran, but NG in C. –Mixed HPF-OpenMP? –OpenMP extension like HPF? –… OpenMP should learn from HPF !?