Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. 2004.

Slides:



Advertisements
Similar presentations
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Advertisements

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
6.1 Synchronous Computations ITCS 4/5145 Cluster Computing, UNC-Charlotte, B. Wilkinson, 2006.
Sharks and Fishes – The problem
CSCI-455/552 Introduction to High Performance Computing Lecture 25.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
CSCI-455/552 Introduction to High Performance Computing Lecture 11.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Cellular Automata: Life with Simple Rules
EECC756 - Shaaban #1 lec # 8 Spring Synchronous Iteration Iteration-based computation is a powerful method for solving numerical (and some.
CSCI-455/552 Introduction to High Performance Computing Lecture 26.
Game of Life Courtesy: Dr. David Walker, Cardiff University.
Definitions A Synchronous application is one where all processes must reach certain points before execution continues. Local synchronization is a requirement.
Introduction to Parallel Processing Final Project SHARKS & FISH Presented by: Idan Hammer Elad Wallach Elad Wallach.
Numerical Algorithms Matrix multiplication
1 The Game of Life Supplement 2. 2 Background The Game of Life was devised by the British mathematician John Horton Conway in More sophisticated.
Numerical Algorithms • Matrix multiplication
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Special Matrices and Gauss-Siedel
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Synchronous Computations
CSCI-455/552 Introduction to High Performance Computing Lecture 22.
CS305j Introduction to Computing Two Dimensional Arrays 1 Topic 22 Two Dimensional Arrays "Computer Science is a science of abstraction -creating the right.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
EECC756 - Shaaban #1 lec # 9 Spring Synchronous Iteration (Synchronous Parallelism) : –Barriers: Counter Barrier Implementation. Tree Barrier.
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
CSCI-455/552 Introduction to High Performance Computing Lecture 18.
Lecture 8 – Stencil Pattern Stencil Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science.
Synchronous Computations ITCS 4/5145 Parallel computing, UNC-Charlotte, B. Wilkinson Jan 23, 2013 In a (fully) synchronous computation, all the processes.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Topic 26 Two Dimensional Arrays "Computer Science is a science of abstraction -creating the right model for a problem and devising the appropriate mechanizable.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
CSCI-455/552 Introduction to High Performance Computing Lecture 13.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Embarrassingly Parallel Computations Partitioning and Divide-and-Conquer Strategies Pipelined Computations Synchronous Computations Asynchronous Computations.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Linear Systems – Iterative methods
Cellular Automata Introduction  Cellular Automata originally devised in the late 1940s by Stan Ulam (a mathematician) and John von Neumann.  Originally.
CSCI-455/552 Introduction to High Performance Computing Lecture 6.
CS 484. Iterative Methods n Gaussian elimination is considered to be a direct method to solve a system. n An indirect method produces a sequence of values.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
CSCI-455/552 Introduction to High Performance Computing Lecture 9.
CSCI-455/552 Introduction to High Performance Computing Lecture 23.
CSCI-455/552 Introduction to High Performance Computing Lecture 21.
CSCI-455/552 Introduction to High Performance Computing Lecture 15.
All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! 1 ITCS 4/5145 Parallel Computing,
Data Parallel Computations and Pattern ITCS 4/5145 Parallel computing, UNC-Charlotte, B. Wilkinson, slides6c.ppt Nov 4, c.1.
Synchronous Computations
Courtesy: Dr. David Walker, Cardiff University
Definitions A Synchronous application is one where all processes must reach certain points before execution continues. Local synchronization is a requirement.
Synchronous Computations
Stencil Pattern A stencil describes a 2- or 3- dimensional layout of processes, with each process able to communicate with its neighbors. Appears in simulating.
Synchronous Computations
Numerical Algorithms • Parallelizing matrix multiplication
All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! ITCS 4/5145 Parallel Computing,
All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! ITCS 4/5145 Parallel Computing,
All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! ITCS 4/5145 Parallel Computing,
Jacobi Project Salvatore Orlando.
Introduction to High Performance Computing Lecture 16
Introduction to High Performance Computing Lecture 17
Data Parallel Pattern 6c.1
Data Parallel Computations and Pattern
Data Parallel Computations and Pattern
Presentation transcript:

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved. Synchronous Computations Chapter 6

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved. Synchronous Computations. In a (fully) synchronous application, all the processes synchronized at regular points MPI_Barrier() A basic mechanism for synchronizing processes Called by each process in the group, blocking until all members of the group have reached the barrier call and only returning then

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved. Barrier Implementation Centralized counter implementation (a linear barrier):

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved. Tree barrier

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved. Butterfly Barrier

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved. Fully Synchronized Computation Examples Data Parallel Computations Same operation performed on different data elements simultaneously; i.e., in parallel. Particularly convenient because: Ease of programming (essentially only one program). Scale easily to larger problem sizes. Many numeric and some non-numeric problems can be cast in a data parallel form

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved. Prefix Operations Given a list of numbers, x 0, …, x n-1, compute all the partial summations, i.e.: x 0 x 0 + x 1 x 0 + x 1 + x 2 x 0 + x 1 + x 2 + x 3 … Any associative operation (e.g. ‘+’, ‘*’, Bitwise-AND etc.) can be used. Practical applications in areas such as sorting, recurrence relations, and polynomial evaluation.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved. Parallel Prefix Sum

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved. Shift-by-2 k (in Gray-Code Order) can be done in two routing steps on a hypercube Example: Shift-by 4 on a 16 PE hypercube o A B C D - E F G H o I J K L - M N O P w id : ) shift-by-2 in reverse order P O B A - D C F E o H G J I - L K N M 2) shift-by-2 again (in reverse order) M N O P - A B C D o E F G H - I J K L T par = 2 routing steps = O(1) MIMD Powershift on a Hypercube

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved. Shift-by-2 k (in Gray-Code Order) can be done in two routing steps on a hypercube Example: Shift-by 8 on a 16 PE hypercube o A B C D - E F G H o I J K L - M N O P w id : ) shift-by-4 in reverse order P O N M - D C B A o H G F E o L K J I 2) shift-by-4 again (in reverse order) I J K L - M N O P o A B C D - E F G H T par = 2 routing steps = O(1) MIMD Powershift on a Hypercube

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved. Solving a General System of Linear Equations where x 0, x 1, x 2, … x n-1 are the unknowns By rearranging the i th equation: x i is expressed in terms of the other unknowns This can be used as an iteration formula for each of the unknowns to obtain better approximations.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved. 12 Jacobi Iterative Method

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved. Parallel Jacobi iterations P0P0 P1P1 P n-1 Jacobi method will converge if diagonal value a ii (  i, 0 ≤ i < n) has an absolute value greater than the sum of the absolute values of the other a ij ’s on the same row. Then A matrix is called diagonally dominant: If P << n, how do you do the data & task partitioning?

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved. Compare values computed in one iteration to values obtained from the previous iteration. Terminate computation when all values are within given tolerance: or: In either case, you need a ‘global sum’ (MPI_Reduce) operation. Q: Do you need to execute it after each and every iteration ? Termination

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved. Parallel Code Process Pi could be of the form x[i] = b[i]; /*initialize unknown*/ for (iter = 0; iter < limit; iter++) { sum = -a[i][i] * x[i]; for (j = 0; j < n; j++) /* compute summation */ sum = sum + a[i][j] * x[j]; new_x[i] = (b[i] - sum) / a[i][i]; /* compute unknown */ All-to-All-Broadcast(&new_x[i]); /*bcast/rec values */ Global_barrier(); /* wait for all procs */ }

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved. The problem space is divided into cells. Each cell can be in one of a finite number of states. Cells affected by their neighbors according to certain rules, and all cells are affected simultaneously in a “generation.” Rules re-applied in subsequent generations so that cells evolve, or change state, from generation to generation. Most famous cellular automata is the “Game of Life” devised by John Horton Conway, a Cambridge mathematician. Other fully synchronous problems Cellular Automata

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved. Board game - theoretically infinite 2-dimensional array of cells. Each cell can hold one “organism” and has eight neighboring cells. Initially, some cells occupied. The following rules were derived by Conway after a long period of experimentation: 1. Every organism with two or three neighboring organisms survives for the next generation. 2. Every organism with four or more neighbors dies from overpopulation. 3. Every organism with one neighbor or none dies from isolation. 4. Each empty cell adjacent to exactly three occupied neighbors will give birth to an organism. The Game of Life

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved. Fish Might move around according to these rules: 1.If there is one empty adjacent cell, the fish moves to this cell. 2.If there is more than one empty adjacent cell, the fish moves to one cell chosen at random. 3.If there are no empty adjacent cells, the fish stays where it is. 4.If the fish moves and has reached its breeding age, it gives birth to a baby fish, which is left in the vacating cell. 5.Fish die after x generations. Simple Fun Examples of Cellular Automata “Sharks and Fishes” An ocean could be modeled as a 3-dimensional array of cells. Each cell can hold one fish or one shark (but not both).

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved. Sharks Might be governed by the following rules: 1.If one adjacent cell is occupied by a fish, the shark moves to this cell and eats the fish. 2.If more than one adjacent cell is occupied by a fish, the shark chooses one fish at random, moves to the cell occupied by the fish, and eats the fish. 3.If no fish are in adjacent cells, the shark chooses an unoccupied adjacent cell to move to in a similar manner as fish move. 4.If the shark moves and has reached its breeding age, it gives birth to a baby shark, which is left in the vacating cell. 5.If a shark has not eaten for y generations, it dies.