High Performance Computation --- A Practical Introduction Chunlin Tian NAOC Beijing 2011.

Slides:



Advertisements
Similar presentations
NPACI Parallel Computing Institute August 19-23, 2002 San Diego Supercomputing Center S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED.
Advertisements

MPI version of the Serial Code With One-Dimensional Decomposition Presented by Timothy H. Kaiser, Ph.D. San Diego Supercomputer Center Presented by Timothy.
Parallel Processing with OpenMP
Introduction to Openmp & openACC
Class CS 775/875, Spring 2011 Amit H. Kumar, OCCS Old Dominion University.
NewsFlash!! Earth Simulator no longer #1. In slightly less earthshaking news… Homework #1 due date postponed to 10/11.
Scientific Programming OpenM ulti- P rocessing M essage P assing I nterface.
Introduction to MPI. What is Message Passing Interface (MPI)?  Portable standard for communication  Processes can communicate through messages.  Each.
Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP.
Software Group © 2006 IBM Corporation Compiler Technology Task, thread and processor — OpenMP 3.0 and beyond Guansong Zhang, IBM Toronto Lab.
Scripting Languages For Virtual Worlds. Outline Necessary Features Classes, Prototypes, and Mixins Static vs. Dynamic Typing Concurrency Versioning Distribution.
1 Parallel Computing—Introduction to Message Passing Interface (MPI)
E.Papandrea 06/11/2003 DFCI COMPUTING - HW REQUIREMENTS1 Enzo Papandrea COMPUTING HW REQUIREMENT.
High Performance Communication using MPJ Express 1 Presented by Jawad Manzoor National University of Sciences and Technology, Pakistan 29 June 2015.
Contemporary Languages in Parallel Computing Raymond Hummel.
Parallel Programming Using Basic MPI Presented by Timothy H. Kaiser, Ph.D. San Diego Supercomputer Center Presented by Timothy H. Kaiser, Ph.D. San Diego.
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
Parallel Processing LAB NO 1.
Budapest, November st ALADIN maintenance and phasing workshop Short introduction to OpenMP Jure Jerman, Environmental Agency of Slovenia.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 5 Shared Memory Programming with OpenMP An Introduction to Parallel Programming Peter Pacheco.
CC02 – Parallel Programming Using OpenMP 1 of 25 PhUSE 2011 Aniruddha Deshmukh Cytel Inc.
Parallel Programming in Java with Shared Memory Directives.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.
Tools and Utilities for parallel and serial codes in ENEA-GRID environment CRESCO Project: Salvatore Raia SubProject I.2 C.R. ENEA-Portici. 11/12/2007.
STRATEGIC NAMING: MULTI-THREADED ALGORITHM (Ch 27, Cormen et al.) Parallelization Four types of computing: –Instruction (single, multiple) per clock cycle.
GPUs and Accelerators Jonathan Coens Lawrence Tan Yanlin Li.
Introduction to Parallel Programming at MCSR Presentation at Delta State University January 17, 2007 Jason Hale.
AN EXTENDED OPENMP TARGETING ON THE HYBRID ARCHITECTURE OF SMP-CLUSTER Author : Y. Zhao 、 C. Hu 、 S. Wang 、 S. Zhang Source : Proceedings of the 2nd IASTED.
OpenMP OpenMP A.Klypin Shared memory and OpenMP Simple Example Threads Dependencies Directives Handling Common blocks Synchronization Improving load balance.
OpenMP – Introduction* *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)
Part I MPI from scratch. Part I By: Camilo A. SilvaBIOinformatics Summer 2008 PIRE :: REU :: Cyberbridges.
Introduction to Parallel Programming with C and MPI at MCSR Part 1 The University of Southern Mississippi April 8, 2010.
Introduction to Parallel Programming with C and MPI at MCSR Part 1 MCSR Unix Camp.
Message Passing Programming with MPI Introduction to MPI Basic MPI functions Most of the MPI materials are obtained from William Gropp and Rusty Lusk’s.
Hybrid MPI and OpenMP Parallel Programming
GPU Architecture and Programming
High-Performance Parallel Scientific Computing 2008 Purdue University OpenMP Tutorial Seung-Jai Min School of Electrical and Computer.
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER Comparing Cray Tasking and OpenMP NERSC User Services Overview of Cray Tasking Overview of OpenMP.
Part 3.  What are the general types of parallelism that we already discussed?
Introduction to OpenMP Eric Aubanel Advanced Computational Research Laboratory Faculty of Computer Science, UNB Fredericton, New Brunswick.
Chapter 4 Message-Passing Programming. The Message-Passing Model.
HPC1 OpenMP E. Bruce Pitman October, HPC1 Outline What is OpenMP Multi-threading How to use OpenMP Limitations OpenMP + MPI References.
Message-Passing Computing Chapter 2. Programming Multicomputer Design special parallel programming language –Occam Extend existing language to handle.
Programming distributed memory systems: Message Passing Interface (MPI) Distributed memory systems: multiple processing units working on one task (e.g.
Threaded Programming Lecture 2: Introduction to OpenMP.
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture FIT5174 Distributed & Parallel Systems Lecture 5 Message Passing and MPI.
Contemporary Languages in Parallel Computing Raymond Hummel.
Introduction to Parallel Programming at MCSR Message Passing Computing –Processes coordinate and communicate results via calls to message passing library.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
April 24, 2002 Parallel Port Example. April 24, 2002 Introduction The objective of this lecture is to go over a simple problem that illustrates the use.
Special Topics in Computer Engineering OpenMP* Essentials * Open Multi-Processing.
Message Passing Interface Using resources from
CPE779: Shared Memory and OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois.
NUMA Control for Hybrid Applications Kent Milfeld TACC May 5, 2015.
SHARED MEMORY PROGRAMMING WITH OpenMP
SHARED MEMORY PROGRAMMING WITH OpenMP
CS427 Multicore Architecture and Parallel Computing
Computer Engg, IIT(BHU)
Constructing a system with multiple computers or processors
CMAQ PARALLEL PERFORMANCE WITH MPI AND OpenMP George Delic, Ph
Multi-core CPU Computing Straightforward with OpenMP
MPI-Message Passing Interface
Constructing a system with multiple computers or processors
Hybrid Programming with OpenMP and MPI
Hybrid MPI and OpenMP Parallel Programming
Introduction to Parallel Computing
Types of Parallel Computers
Programming Parallel Computers
Presentation transcript:

High Performance Computation --- A Practical Introduction Chunlin Tian NAOC Beijing 2011

Outline   Parallelization techniques   OpenMP: do-loop based   MPI: communication   Auto-parallelization, CUDA   Remark: – –It is at introduction level – –It is NOT a comprehensive introduction

Introduction   Speed up the computing   Mathematic, physics, computation   Hardware – –number of CPU – –size of memory – –CPU : multi-processer vs. cluster; GPU – –Memory: distributed vs. shared   Software – –Auto-parallelization by compiler – –OpenMP – –MPI – –Cuda

Shared vs. Distributed   Hardware: Desktop vs. Supercomputer   Software: distributed=  shared

Auto-parallelization  Easy to employ –Set environment variable  setenv OMP_NUM_THREADS 2 –Compiler options  pgf77 –mp –static … …  ifort –parallel … …  Not smart enough –Only efficient for dual core CPU –Some time even slower than the single thread

OpenMP-introduction  Open Multi-Processing –An API supporting multi-platform shared memory multiprocessing programming. –It consists of a set of compiler directives, library routines and environment variables. –History: –1997, version 1.0 in Fortran –1998, version 1.0 in C, C++ –2000,version 2.0 in Fortran –2002, version 2.0 in C, C++ –2005, version 2.5 in Fortran, C, C++ –2008, version 3.0 in Fortran, C, C++ … –Compilers: GNU, Intel, IBM, PGI, MS …

Coding with OpenMP  Step 1: define parallel region  Step 2: define the types of the variables  Step 3: mark the do-loops to be paralleled  Remark: –you can parallel your code (parts by parts) incrementally. –The number of parallel regions should be as less as possible.

Example of OpenMP code   !$omp parallel   !$omp& default (shared)   !$omp& private (tmp)   !$omp do do i=1,nx tmp=a(i)**2+b(i)**2 tmp=sqrt(tmp) c(i)=a(i)/tmp d(i)=b(i)/tmp enddo   !$omp end do   !$omp single write(*,*)maxval(c), maxval(b)  !$omp end single  !$omp do do j=1,ny tmp=a(j)**2+b(j)**2 tmp=sqrt(tmp) c(j)=b(j)/tmp d(j)=a(j)/tmp enddo  !$omp end do  !$omp end parallel

Run the OpenMP code   Set environment variable – –setenv OMP_NUM_THREADS 4   ifort –openmp –intel-static *.f –o openbbs1.e  ./openbbs1.e

Scalability of OpenMP code   Ideally it should be linear.   But the initializing, finalizing, and synthesis etc. takes time.

MPI   Message Pass Interface – –A specification for an API that allows many computers to communicate with one another. – –Language-independent protocol, programmer interface, semantic specification – –History:   1994 May, version 1.0, the final report of MPIF   1995 June, version 1.1   1997 July, version 1.2, MPI-1; 2.0 MPI-2   2008 May, version 1.3   2008 June, version 2.1   2009 Sept., version 2.2   Remark: – –Open MPI ≠ OpenMP – –MPICH, HP MPI, Intel MPI, MS MPI, …

Coding with MPI  1: determine the number of blocks  2: define virtual CPU topology  3: define the parallel region  4: assign tasks to different threads.  5: communication between threads.  6: manage the threads:  master-slave  non-master

Example of MPI coding  Include ‘mpi.h’  nx=100, ny=100 !number of grids  mx=2, my=5 !number of blocks  call MPI_INIT(ierr) !initialize the parallelization  call MPI_COMM_RANK(MPI_COMM_WORLD,myid,ierr) !get id  … … … … … … …  call MPI_Finalize(ierr) !finalize the parallelization  myid  myidx,myidy  the IDs of myid’s neighbours !virtual topology  call MPI_SEND(vb,nx*2, MPI_REAL8, receiverid,tag,MPI_COMM_WORLD,ierr) !send data  call MPI_RECV(va,nx*2,MPI_REAL8, senderid, tag, MPI_COMM_WORLD,ierr) !receive data

CPU Virtual Topology  1. each thread has a unique ID;  2. each thread has more than one neighbors;  3. cpus can be arranged as one- or multi- dimensional array;  4. the topology should be as simple as possible.

MPI Communication  Point-point: one CPU to one CPU  Collective: –one to multiple: broadcast; scatter; gather; reduce, etc.  Block –Send and then check the receiving buffer  Non-block –Send and return

Run the MPI code   Compiling – –mpif77 –O3 *.f -o mpimod4.e   Start mpd – –mpdboot   Run code – –mpirun –n 7 mpidmode.4

CUDA   what's next ? GPU-SUPERCOMPUTING It is do-loop based method. Do-loop cuda subroutine

Summary  Parallelization  Three levels of parallelization  (compiler, OpenMP, MPI)  Employment: Easy Difficult  Scalability: inefficient efficient?  Principle  Do-loop based parallelization  Massage passing