Grid solver example Gauss-Seidel (near-neighbor) sweeps to convergence  interior n-by-n points of (n+2)-by-(n+2) updated in each sweep  updates done.

Slides:



Advertisements
Similar presentations
Parallel Jacobi Algorithm Steven Dong Applied Mathematics.
Advertisements

Sharks and Fishes – The problem
Dense Matrix Algorithms. Topic Overview Matrix-Vector Multiplication Matrix-Matrix Multiplication Solving a System of Linear Equations.
Perspective on Parallel Programming Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals.
Numerical Algorithms ITCS 4/5145 Parallel Computing UNC-Charlotte, B. Wilkinson, 2009.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Game of Life Courtesy: Dr. David Walker, Cardiff University.
Numerical Algorithms Matrix multiplication
Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers Chapter 11: Numerical Algorithms Sec 11.2: Implementing.
1 Parallelization of An Example Program Examine a simplified version of a piece of Ocean simulation Iterative equation solver Illustrate parallel program.
Numerical Algorithms • Matrix multiplication
Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR Collaborators: Adam Frank Brandon Shroyer Chen Ding Shule Li.
ECE669 L4: Parallel Applications February 10, 2004 ECE 669 Parallel Computer Architecture Lecture 4 Parallel Applications.
Examples of Two- Dimensional Systolic Arrays. Obvious Matrix Multiply Rows of a distributed to each PE in row. Columns of b distributed to each PE in.
ECE669 L5: Grid Computations February 12, 2004 ECE 669 Parallel Computer Architecture Lecture 5 Grid Computations.
1 Parallel Programs. 2 Why Bother with Programs? They’re what runs on the machines we design Helps make design decisions Helps evaluate systems tradeoffs.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR.
1 Steps in Creating a Parallel Program 4 steps: Decomposition, Assignment, Orchestration, Mapping Done by programmer or system software (compiler, runtime,...)
NAE C_S2001 FINAL PROJECT PRESENTATION LASIC ISMAR 04/12/01 INSTRUCTOR: PROF. GUTIERREZ.
1 Steps in Creating a Parallel Program 4 steps: Decomposition, Assignment, Orchestration, Mapping Done by programmer or system software (compiler, runtime,...)
Chapter 5, CLR Textbook Algorithms on Grids of Processors.
1 Matrix Addition, C = A + B Add corresponding elements of each matrix to form elements of result matrix. Given elements of A as a i,j and elements of.
ECE669 L6: Programming for Performance February 17, 2004 ECE 669 Parallel Computer Architecture Lecture 6 Programming for Performance.
Components and Concurrency in ESMF Nancy Collins Community Meeting July 21, GMAO Seasonal.
L15: Putting it together: N-body (Ch. 6) October 30, 2012.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Lecture 8 – Stencil Pattern Stencil Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science.
9/20/2015 slide 1 PCOD: Lecture 1 Per Stenström © 2008, Sally A. McKee © credit points Instructor: Sally A. McKee.
Multi-Dimensional Arrays
Computational issues in Carbon nanotube simulation Ashok Srinivasan Department of Computer Science Florida State University.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
1 High-Performance Grid Computing and Research Networking Presented by Xing Hang Instructor: S. Masoud Sadjadi
1 Programming for Performance Reference: Chapter 3, Parallel Computer Architecture, Culler et.al.
1 Lecture 2: Parallel Programs Topics: parallel applications, parallelization process, consistency models.
Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.
SE-292 High Performance Computing Intro. to Parallelization Sathish Vadhiyar.
Chapter 3 Parallel Programming Models. Abstraction Machine Level – Looks at hardware, OS, buffers Architectural models – Looks at interconnection network,
Chapter 3 Shared Memory Parallel Programming Yan Solihin Copyright notice: No part of this publication may be reproduced, stored.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Parallel graph algorithms Antonio-Gabriel Sturzu, SCPD Adela Diana Almasi, SCPD Adela Diana Almasi, SCPD Iulia Alexandra Floroiu, ISI Iulia Alexandra Floroiu,
Parallelizing Gauss-Seidel Solver/Pre-conditioner Aim: To parallelize a Gauss-Seidel Solver, which can be used as a pre-conditioner for the finite element.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
Steps in Creating a Parallel Program
CS 484. Iterative Methods n Gaussian elimination is considered to be a direct method to solve a system. n An indirect method produces a sequence of values.
Finding concurrency Jakub Yaghob. Finding concurrency design space Starting point for design of a parallel solution Analysis The patterns will help identify.
Lecture 3: Designing Parallel Programs. Methodological Design Designing and Building Parallel Programs by Ian Foster www-unix.mcs.anl.gov/dbpp.
Programming assignment # 3 Numerical Methods for PDEs Spring 2007 Jim E. Jones.
Numerical Algorithms Chapter 11.
Steps in Creating a Parallel Program
Steps in Creating a Parallel Program
Perspective on Parallel Programming
Steps in Creating a Parallel Program
Steps in Creating a Parallel Program
Steps in Creating a Parallel Program
Lecture 2: Parallel Programs
Steps in Creating a Parallel Program
Steps in Creating a Parallel Program
Parallelization of An Example Program
Steps in Creating a Parallel Program
Parallel build blocks.
Stencil Pattern ITCS 4/5145 Parallel computing, UNC-Charlotte, B. Wilkinson Oct 14, 2014 slides6b.ppt 1.
Steps in Creating a Parallel Program
Stencil Pattern ITCS 4/5145 Parallel computing, UNC-Charlotte, B. Wilkinson StencilPattern.ppt Oct 14,
To accompany the text “Introduction to Parallel Computing”,
Matrix Addition, C = A + B Add corresponding elements of each matrix to form elements of result matrix. Given elements of A as ai,j and elements of B as.
Parallel Programming in C with MPI and OpenMP
Programming assignment #1 Solving an elliptic PDE using finite differences Numerical Methods for PDEs Spring 2007 Jim E. Jones.
Steps in Creating a Parallel Program
Presentation transcript:

Grid solver example Gauss-Seidel (near-neighbor) sweeps to convergence  interior n-by-n points of (n+2)-by-(n+2) updated in each sweep  updates done in-place in grid, and diff. from prev. value computed  accumulate partial diffs into global diff at end of every sweep  check if error has converged (to within a tolerance parameter)  if so, exit solver; if not, do another sweep 1

Sequential Version 2

Concurrency O(n) along anti-diagonals, serialization O(n) along diag. Retain loop structure, use pt-to-pt synch; Problem: too many synch ops. Restructure loops, use global synch; imbalance and too much synch 3 First Parallel Version

Exploit application knowledge Reorder grid traversal: red-black ordering Different ordering of updates: may converge quicker or slower Red sweep and black sweep are each fully parallel Global synch between them (conservative but convenient) Ocean uses red-black; we use simpler, asynchronous one to illustrate  no red-black, simply ignore dependences within sweep  sequential order same as original, parallel program nondeterministic 4

Assignment Static assignments (given decomposition into rows)  block assignment of rows: Row i is assigned to process floor(i/p)  cyclic assignment of rows: process i is assigned rows i, i+p, and so on Dynamic assignment  get a row index, work on the row, get a new row, and so on Static assignment into rows reduces concurrency (from n to p)  block assign. reduces communication by keeping adjacent rows together 5

MATRIX MULTIPLICATION The product of an m x n matrix A by an n x k matrix B is an m x k matrix C. The pseudo code is : procedure matrix_multiplication(A, B, C) is for i = 1 to m do for j = 1 to k do c ij = 0; for s = 1 to n do c ij = c ij + (a is * b sj ); end for; This procedure takes m*k*n steps.

MATRIX MULTIPLICATION Mesh Matrix multiplication:

MATRIX MULTIPLICATION Mesh Matrix Multiplication: procedure mesh_matrix_multiplication(A, B, C) for i = 1 to m do in parallel for j = 1 to k do in parallel c ij = 0; while P(i, j) receives 2 inputs a and b do c ij = c ij + (a * b); if i < m then send b to P(i +1, j ) end if; if j<k then send a to P(i, j+1); end if; end while; end for; This procedure takes max(m,k)+n steps (latency).