Parallel Processing Dr. Guy Tel-Zur.

Slides:



Advertisements
Similar presentations
MPI Message Passing Interface
Advertisements

1 Non-Blocking Communications. 2 #include int main(int argc, char **argv) { int my_rank, ncpus; int left_neighbor, right_neighbor; int data_received=-1;
Chapter 3. MPI MPI = Message Passing Interface Specification of message passing libraries for developers and users –Not a library by itself, but specifies.
High Performance Computing
Introduction to Parallel Processing BGU ECE Dept , Lecture 5 Dr. Guy Tel-Zur
Distributed Memory Programming with MPI. What is MPI? Message Passing Interface (MPI) is an industry standard message passing system designed to be both.
Comp 422: Parallel Programming Lecture 8: Message Passing (MPI)
Lecture 8 Objectives Material from Chapter 9 More complete introduction of MPI functions Show how to implement manager-worker programs Parallel Algorithms.
Embarrassingly Parallel Computations Partitioning and Divide-and-Conquer Strategies Pipelined Computations Synchronous Computations Asynchronous Computations.
Basics of Message-passing Mechanics of message-passing –A means of creating separate processes on different computers –A way to send and receive messages.
2.1 Message-Passing Computing ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 17, 2012.
Scientific Computing Lecture 5 Dr. Guy Tel-Zur Autumn Colors, by Bobby Mikul, Mikul Autumn Colors, by Bobby Mikul,
1 MPI: Message-Passing Interface Chapter 2. 2 MPI - (Message Passing Interface) Message passing library standard (MPI) is developed by group of academics.
MA471Fall 2003 Lecture5. More Point To Point Communications in MPI Note: so far we have covered –MPI_Init, MPI_Finalize –MPI_Comm_size, MPI_Comm_rank.
MPI and High Performance Computing: Systems and Programming Barry Britt, Systems Administrator Department of Computer Science Iowa State University.
Parallel Programming with MPI Prof. Sivarama Dandamudi School of Computer Science Carleton University.
Message Passing Programming with MPI Introduction to MPI Basic MPI functions Most of the MPI materials are obtained from William Gropp and Rusty Lusk’s.
Hybrid MPI and OpenMP Parallel Programming
CS 838: Pervasive Parallelism Introduction to MPI Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from an online tutorial.
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
Summary of MPI commands Luis Basurto. Large scale systems Shared Memory systems – Memory is shared among processors Distributed memory systems – Each.
MPI Introduction to MPI Commands. Basics – Send and Receive MPI is a message passing environment. The processors’ method of sharing information is NOT.
MA471Fall 2002 Lecture5. More Point To Point Communications in MPI Note: so far we have covered –MPI_Init, MPI_Finalize –MPI_Comm_size, MPI_Comm_rank.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, ©
CSCI-455/522 Introduction to High Performance Computing Lecture 4.
1 Message Passing Models CEG 4131 Computer Architecture III Miodrag Bolic.
Introduction to MPI Nischint Rajmohan 5 November 2007.
MPI Point to Point Communication CDP 1. Message Passing Definitions Application buffer Holds the data for send or receive Handled by the user System buffer.
Programming distributed memory systems: Message Passing Interface (MPI) Distributed memory systems: multiple processing units working on one task (e.g.
1 HPCI Presentation Kulathep Charoenpornwattana. March 12, Outline Parallel programming with MPI Running MPI applications on Azul & Itanium Running.
Introduction to Parallel Programming at MCSR Message Passing Computing –Processes coordinate and communicate results via calls to message passing library.
Chapter 5. Nonblocking Communication MPI_Send, MPI_Recv are blocking operations Will not return until the arguments to the functions can be safely modified.
3/12/2013Computer Engg, IIT(BHU)1 MPI-1. MESSAGE PASSING INTERFACE A message passing library specification Extended message-passing model Not a language.
Implementing Processes and Threads CS550 Operating Systems.
1 Parallel and Distributed Processing Lecture 5: Message-Passing Computing Chapter 2, Wilkinson & Allen, “Parallel Programming”, 2 nd Ed.
Message Passing Interface Using resources from
Lecture 3 Point-to-Point Communications Dr. Muhammad Hanif Durad Department of Computer and Information Sciences Pakistan Institute Engineering and Applied.
1 Programming distributed memory systems Clusters Distributed computers ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 6, 2015.
The OSCAR Cluster System
Chapter 4.
Introduction to parallel computing concepts and technics
MPI Basics.
Message-Passing Computing
CS4402 – Parallel Computing
MPI Point to Point Communication
Introduction to MPI.
MPI Message Passing Interface
Parallel Programming with MPI and OpenMP
CS 584.
CS4961 Parallel Programming Lecture 16: Introduction to Message Passing Mary Hall November 3, /03/2011 CS4961.
Introduction to Message Passing Interface (MPI)
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
Parallel Processing - MPI
Message Passing Models
Lecture 14: Inter-process Communication
MPI: Message Passing Interface
May 19 Lecture Outline Introduce MPI functionality
Message-Passing Computing More MPI routines: Collective routines Synchronous routines Non-blocking routines ITCS 4/5145 Parallel Computing, UNC-Charlotte,
CSCE569 Parallel Computing
Pattern Programming Tools
Introduction to parallelism and the Message Passing Interface
Barriers implementations
Hardware Environment VIA cluster - 8 nodes Blade Server – 5 nodes
Message-Passing Computing Message Passing Interface (MPI)
Distributed Memory Programming with Message-Passing
Parallel Processing - MPI
MPI Message Passing Interface
Some codes for analysis and preparation for programming
CS 584 Lecture 8 Assignment?.
Programming Parallel Computers
Presentation transcript:

Parallel Processing Dr. Guy Tel-Zur

Agenda Home assignment #1 Sending a 2D array in MPI Network Performance – Revisited Collective commands Send/Recv types and summary Performance Evaluation of Parallel Programs PBS demo Parallel Computing environments for this course Synchronous Parallel Computations

Recommended online references Designing and Building Parallel Programs, by Ian Foster http://www.mcs.anl.gov/~itf/dbpp/ http://www.mhpcc.edu/training/workshop/parallel_intro/MAIN.html https://computing.llnl.gov/tutorials/parallel_comp/ by Blaise Barney, Livermore Computing

Another Linux Reference People who still don’t feel comfortable with linux please consult this reference: http://sc.tamu.edu/shortcourses/SC-unix/introToUnix.pdf

About the status in MPI_Recv It is possible to Receive messages in non-selective ways: source = MPI_ANY_SOURCE tag = MPI_ANY_TAG This is useful for example in a Master-Worker style where the master holds a workpool and it gives a work unit upon each “recv”.

M-W Demo  calculation by Monte Carlo (Previous lecture) Embarrassingly Parallel Computation  calculation by Monte Carlo (Previous lecture) Demo under: /users/agnon/misc/tel-zur/mpi/pi_monte_carlo - Execution: mpirun -np 4 ./pi_reduce The source code – open IDE

Master-Worker (M-W) Demo Embarrassingly Parallel Computing paradigm Calculation of 

Pseudo Code - Serial Version npoints = 10000 circle_count = 0 do j = 1,npoints generate 2 random numbers between 0 and 1 xcoordinate = random1 ycoordinate = random2 if (xcoordinate, ycoordinate) inside circle then circle_count = circle_count + 1 end do PI = 4.0*circle_count/npoints

Must use different seeds Pseudo Code - Parallel Version npoints = 10000 circle_count = 0 p = number of tasks num = npoints/p find out if I am MASTER or WORKER do j = 1,num generate 2 random numbers between 0 and 1 xcoordinate = random1 ycoordinate = random2 if (xcoordinate, ycoordinate) inside circle then circle_count = circle_count + 1 end do if I am MASTER receive from WORKERS their circle_counts compute PI (use MASTER and WORKER calculations) else if I am WORKER send to MASTER circle_count endif Must use different seeds

M – W Monte-Carlo calculation of 

Reference to the source code: http://www.pdc.kth.se/training/Tutor/MPI/Templates/pi/index.html#top pi_send.c pi_reduce.c dboard.c make.pi.c However, I had to modify the scaling of random numbers – see next slide

0<r<1 Instead of: cconst = 2 << (31 - 1); r = (double)random()/cconst; I had to change the code to: r = ((double)rand() / ((double)(RAND_MAX)+(double)(1)));

Mandelbrot set

MPI on Windows… Usage: client – servers mode cd to c:\program files (x86)\deinompi\examples on shell window #1: ..\bin\mpiexec –n 4 pmandel on shell window #2: pman_vis, then open TCP port 7470

Sending a 2D array in MPI between two tasks A simple demo – turn to the demo The source code is enclosed – next slide Implementation on Windows Vista using DeinoMPI and DevC++

// Guy Tel-Zur (c) 2009 // This is a demo for PP2010A course // sending a 2D array - study pointers #include <mpi.h> int main (int argc, char **argv) { int dim = 3; // array dimension int smat[dim][dim]; //send matrix[row][col] int rmat[dim][dim]; // recv matrix int i,j; int mytid, nproc; int MASTER = 0; int WORKER = 1; int tag = 99; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &mytid); MPI_Comm_size(MPI_COMM_WORLD, &nproc); printf ("MPI task ID = %d\n", mytid); if (nproc != 2) { printf("This program needs exactly 2 processes\n"); exit (1); } if (mytid == 0) { // fill the matrix smat[0][0] = 1; smat[1][0] = 4; smat[2][0] = 7; smat[0][1] = 2; smat[1][1] = 5; smat[2][1] = 8; smat[0][2] = 3; smat[1][2] = 6; smat[2][2] = 9; printf("Thie is master\n"); for (i=0;i<dim;i++) { for (j=0;j<dim;j++) printf("%i",smat[i][j]); printf("\n"); // send the 2D matrix as a linear array to the Worker MPI_Send(&smat,dim*dim,MPI_INT,WORKER,tag,MPI_COMM_WORLD); } else { MPI_Recv(rmat,dim*dim,MPI_INT,MASTER,tag,MPI_COMM_WORLD,&status); printf("The is worker\n"); printf("%i",rmat[i][j]); // That's it! MPI_Finalize();

Use DeinoMPI for the demo!!!

Network Performance The time needed to transmit data Source: Stanford

MPI_Isend “Latency Hiding” Identifies an area in memory to serve as a send buffer. Processing continues immediately without waiting for the message to be copied out from the application buffer. A communication request handle is returned for handling the pending message status. The program should not modify the application buffer until subsequent calls to MPI_Wait or MPI_Test indicate that the non-blocking send has completed.

Some Common Collective Commands Source: https://computing.llnl.gov/tutorials/parallel_comp/

Allgather

MPI_Reduce

MPI_Reduce

MPI_Reduce Users can also define their own reduction functions by using the routine

MPI_Allreduce

MPI_Reduce_scatter

Different ways to partition data Source: https://computing.llnl.gov/tutorials/parallel_comp/

Overhead and Complexity Source: https://computing.llnl.gov/tutorials/parallel_comp/

PBS/Torque – job scheduling

-N myjob15 specifies the name of the job will be myjob15 -l nodes=1:ppn=1 specified that the job will use 1 node and that there is 1 processor per node.

PSB/Torque

PSB/Torque

References Torque: http://www.clusterresources.com/products/torque-resource-manager.php OpenPBS: http://openpbs.org

Parallel Computing environments for this course Our Linux Cluster: hobbit Your own resources: A dedicated Linux installation with MPI Install MPI on Windows Dual boot Linux/Windows Windows + Cygwin Virtualization: VMware Player, VirtualBox… Parallel Computing on the Cloud ($)

Turn to Algorithms Slides Synchronous Parallel Computations