1 ITCS4145 Parallel Programming B. Wilkinson March 23, 2016. hybrid-abw.ppt Hybrid Parallel Programming Introduction.

Slides:



Advertisements
Similar presentations
MPI Basics Introduction to Parallel Programming and Cluster Computing University of Washington/Idaho State University MPI Basics Charlie Peck Earlham College.
Advertisements

Chapter 3. MPI MPI = Message Passing Interface Specification of message passing libraries for developers and users –Not a library by itself, but specifies.
Tutorial on MPI Experimental Environment for ECE5610/CSC
High Performance Computing
Introduction to MPI. What is Message Passing Interface (MPI)?  Portable standard for communication  Processes can communicate through messages.  Each.
Parallel Programming in C with MPI and OpenMP
Other Means of Executing Parallel Programs OpenMP And Paraguin 1(c) 2011 Clayton S. Ferner.
1 ITCS4145/5145, Parallel Programming B. Wilkinson Feb 21, 2012 Programming with Shared Memory Introduction to OpenMP.
12b.1 Introduction to Message-passing with MPI UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008.
Programming with Shared Memory Introduction to OpenMP
CS470/570 Lecture 5 Introduction to OpenMP Compute Pi example OpenMP directives and options.
1 Datamation Sort 1 Million Record Sort using OpenMP and MPI Sammie Carter Department of Computer Science N.C. State University November 18, 2004.
Lecture 8: Caffe - CPU Optimization
Parallel & Cluster Computing MPI Basics Paul Gray, University of Northern Iowa David Joiner, Shodor Education Foundation Tom Murphy, Contra Costa College.
IBM Research © 2006 IBM Corporation CDT Static Analysis Features CDT Developer Summit - Ottawa Beth September.
Director of Contra Costa College High Performance Computing Center
2.1 Message-Passing Computing ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 17, 2012.
2.1 Message-Passing Computing ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 14, 2013.
Part I MPI from scratch. Part I By: Camilo A. SilvaBIOinformatics Summer 2008 PIRE :: REU :: Cyberbridges.
Message Passing Programming with MPI Introduction to MPI Basic MPI functions Most of the MPI materials are obtained from William Gropp and Rusty Lusk’s.
Hybrid MPI and OpenMP Parallel Programming
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
1 " Teaching Parallel Design Patterns to Undergraduates in Computer Science” Panel member SIGCSE The 45 th ACM Technical Symposium on Computer Science.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, ©
Chapter 4 Message-Passing Programming. The Message-Passing Model.
Oct. 23, 2002Parallel Processing1 Parallel Processing (CS 730) Lecture 6: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived.
Running on GCB part1 By: Camilo Silva. Simple steps to run MPI 1.Use putty or the terminal 2.SSH to gcb.fiu.edu 3.Loggin by providing your username and.
MPI and OpenMP.
Programming distributed memory systems: Message Passing Interface (MPI) Distributed memory systems: multiple processing units working on one task (e.g.
3/12/2013Computer Engg, IIT(BHU)1 MPI-1. MESSAGE PASSING INTERFACE A message passing library specification Extended message-passing model Not a language.
Using Compiler Directives Paraguin Compiler 1 © 2013 B. Wilkinson/Clayton Ferner SIGCSE 2013 Workshop 310 session2a.ppt Modification date: Jan 9, 2013.
Message Passing Interface Using resources from
1 Programming distributed memory systems Clusters Distributed computers ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 6, 2015.
Parallel Programming C. Ferner & B. Wilkinson, 2014 Introduction to Message Passing Interface (MPI) Introduction 9/4/
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
MPI and OpenMP (Lecture 25, cs262a)
Introduction to OpenMP
Introduction to parallel computing concepts and technics
MPI Basics.
Message-Passing Computing
Hybrid Parallel Programming with the Paraguin compiler
Introduction to MPI.
MPI Message Passing Interface
Introduction to OpenMP
September 4, 1997 Parallel Processing (CS 667) Lecture 5: Shared Memory Parallel Programming with OpenMP* Jeremy R. Johnson Parallel Processing.
CS 668: Lecture 3 An Introduction to MPI
CS 584.
Using compiler-directed approach to create MPI code automatically
Introduction to Message Passing Interface (MPI)
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
Hybrid Parallel Programming
Paraguin Compiler Communication.
Programming with Shared Memory Introduction to OpenMP
Lab Course CFD Parallelisation Dr. Miriam Mehl.
Introduction to parallelism and the Message Passing Interface
Hybrid Parallel Programming
DNA microarrays. Infinite Mixture Model-Based Clustering of DNA Microarray Data Using openMP.
Using compiler-directed approach to create MPI code automatically
Hybrid Parallel Programming
Introduction to OpenMP
Introduction to Parallel Computing with MPI
Hybrid MPI and OpenMP Parallel Programming
Hardware Environment VIA cluster - 8 nodes Blade Server – 5 nodes
Introduction to Parallel Computing
Hybrid Parallel Programming
Distributed Memory Programming with Message-Passing
Parallel Processing - MPI
Some codes for analysis and preparation for programming
CS 584 Lecture 8 Assignment?.
Programming Parallel Computers
Presentation transcript:

1 ITCS4145 Parallel Programming B. Wilkinson March 23, hybrid-abw.ppt Hybrid Parallel Programming Introduction

Hybrid Systems Since most computers are multi-core, most clusters have both shared-memory and distributed-memory. Ethernet Switch Core Memory Core Multi-core Computer Core Memory Core Multi-core Computer Core Memory Core Multi-core Computer Core Memory Core Multi-core Computer 2

Hybrid (MPI-OpenMP) Parallel Computing MPI to run processes concurrently on each computer OpenMP to run threads concurrently on each core of a computer Advantage: we can make use of shared-memory where communication is required Why? – Because inter-computer communication is an order of magnitude slower than synchronization 3

4 Message-passing routines used to pass messages between computer systems and threads execute on each computer system using the multiple cores on the system

How to create a hybrid OpenMP- MPI program Write source code with both MPI routines and OpenMP directives/routines Compile using mpicc –mpicc uses gcc linked with appropriate MPI libraries. –gcc supports OpenMP with –fopenmp option. So can use: mpicc -fopenmp -o hybrid hybrid.c Execute as an MPI program. e.g. on UNCC cluster cci-gridgw.uncc.edu: mpiexec.hydra -f -n./hybrid 5

Example #include #include "mpi.h" #define N 10 void openmp_code(int rank){ … // next slide } main(int argc, char **argv ) { char message[20]; int i,rank, size, type=99; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD,&size); MPI_Comm_rank(MPI_COMM_WORLD, &rank); if(rank == 0) { strcpy(message, "Hello, world"); for (i=1; i<size; i++) MPI_Send(message, 13, MPI_CHAR, i, type, MPI_COMM_WORLD); } else MPI_Recv(message, 20, MPI_CHAR, 0, type, MPI_COMM_WORLD, &status); openmp_code(rank); //all MPI processes run OpenMP code, no message passing printf( "Message from process = %d : %.13s\n", rank,message); MPI_Finalize(); return (0); } 6

void openmp_code(int rank){ int nthreads, i, t; double a[N], b[N], c[N]; for (i=0; i < N; i++) a[i] = b[i] = i * 1.0; // initialize arrays t = 8; omp_set_num_threads(t); // set # of threads for each MPI process printf("MPI process %d, number of threads = %d\n",rank,t); #pragma omp parallel for shared(a,b,c) for (i=0; i<N; i++) { c[i] = a[i] + b[i]; printf("Process %d: Thread %d: c[%d] = %f\n", rank, omp_get_thread_num(), i, c[i]); } return; } 7

8

9 #include … #define N 4 int main(int argc, char *argv[]) { int i, j, blksz, rank, P, start,end; MPI_Init (&argc, &argv); MPI_Comm_size (MPI_COMM_WORLD, &P); MPI_Comm_rank (MPI_COMM_WORLD, &rank); blksz = N/P; if (N % P != 0) printf("ERROR: N must be a multiple of P. N = 4\n"); start = rank*blksz; end = start + blksz; for (i = start; i < end; i++) { #pragma omp parallel for for (j = 0; j < N; j++) { printf ("Process rank %d, thread %d: executing loop iteration i=%d j=%d\n",rank,omp_get_thread_num(),i,j); } MPI_Finalize(); return (0); } Loop i parallelized across processes Loop j parallelized across threads Parallelizing a double for loop

10 Sample output

Hybrid (MPI-OpenMP) Parallel Computing Caution: Using the hybrid approach may not necessarily result in increased performance though – will strongly depend upon application. 11

Matrix Multiplication, C = A * B 12 where A is an n x l matrix and B is an l x m matrix.

One way to parallelize Matrix multiplication using hybrid approach for (i = 0; i < N; i++) for (j = 0; j < N; j++) { c[i][j] = 0.0; for (k = 0; k < N; k++) { c[i][j] += a[i][k] * b[k][j]; } 13 Parallelize i loop into partitioned among the computers with MPI Parallelize j loop into partitioned among cores within each computer, using OpenMP

14 MPI_Scatter(A, blksz*N, … );// Scatter input matrix A MPI_Bcast(B, N*N, … ); // Broadcast input matrix B for(i = 0 ; i < blksz; i++) { #pragma omp parallel for private (sum,k) for (j = 0 ; j < N ; j++) { sum = 0; for (k = 0 ; k < N ; k++) { sum += A[i][k] * B[k][j]; } C[i][j] = sum; } MPI_Gather(C, blksz*N, … ); Simply add this one statement to MPI code for matrix multiplication Parallelize i loop into partitions among processes on computers with MPI MPI-OpenMP Matrix Multiplication Parallelize j loop on each computer into partitioned using OpenMP

15

16 Hybrid did not do better than MPI only

17 Perhaps we could do better parallelizing i loop both with MPI and OpenMP Parallelize i loop into partitions among processes/threads with MPI and OpenMP j loop not parallelized No better!

Discussion Although demos are done on a single 4-core machine*, experiments on a cluster do not show improvements either. Why does the hybrid approach not outperform MPI- only for this problem? For what kinds of problem might a hybrid approach do better? Note problem size is small 256 x 256 arrays 18 *Intel i GHz (4-core hyperthreaded) with 16 GB main memory

Questions 19