ECE669 L5: Grid Computations February 12, 2004 ECE 669 Parallel Computer Architecture Lecture 5 Grid Computations.

Slides:



Advertisements
Similar presentations
Parallel Jacobi Algorithm Steven Dong Applied Mathematics.
Advertisements

ECE669 L3: Design Issues February 5, 2004 ECE 669 Parallel Computer Architecture Lecture 3 Design Issues.
CSCI-455/552 Introduction to High Performance Computing Lecture 26.
Numerical Algorithms Matrix multiplication
1 Parallelization of An Example Program Examine a simplified version of a piece of Ocean simulation Iterative equation solver Illustrate parallel program.
Grid solver example Gauss-Seidel (near-neighbor) sweeps to convergence  interior n-by-n points of (n+2)-by-(n+2) updated in each sweep  updates done.
CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.
Revisiting a slide from the syllabus: CS 525 will cover Parallel and distributed computing architectures – Shared memory processors – Distributed memory.
ECE669 L4: Parallel Applications February 10, 2004 ECE 669 Parallel Computer Architecture Lecture 4 Parallel Applications.
ECE669 L15: Mid-term Review March 25, 2004 ECE 669 Parallel Computer Architecture Lecture 15 Mid-term Review.
ECE669 L11: Static Routing Architectures March 4, 2004 ECE 669 Parallel Computer Architecture Lecture 11 Static Routing Architectures.
1 Multiprocessors. 2 Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) bad.
Hardware/Software Performance Tradeoffs (plus Msg Passing Finish) CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley.
1 Parallel Programs. 2 Why Bother with Programs? They’re what runs on the machines we design Helps make design decisions Helps evaluate systems tradeoffs.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR.
1 Steps in Creating a Parallel Program 4 steps: Decomposition, Assignment, Orchestration, Mapping Done by programmer or system software (compiler, runtime,...)
NAE C_S2001 FINAL PROJECT PRESENTATION LASIC ISMAR 04/12/01 INSTRUCTOR: PROF. GUTIERREZ.
1 Steps in Creating a Parallel Program 4 steps: Decomposition, Assignment, Orchestration, Mapping Done by programmer or system software (compiler, runtime,...)
Implications for Programming Models Todd C. Mowry CS 495 September 12, 2002.
Chapter 13 Finite Difference Methods: Outline Solving ordinary and partial differential equations Finite difference methods (FDM) vs Finite Element Methods.
ECE669 L23: Parallel Compilation April 29, 2004 ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation.
ECE669 L6: Programming for Performance February 17, 2004 ECE 669 Parallel Computer Architecture Lecture 6 Programming for Performance.
Parallel Programming: Case Studies Todd C. Mowry CS 495 September 12, 2002.
L15: Putting it together: N-body (Ch. 6) October 30, 2012.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 1 Concurrency in Programming Languages Matthew J. Sottile Timothy G. Mattson Craig.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Lecture 8 – Stencil Pattern Stencil Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Concurrency: Mutual Exclusion and Synchronization Chapter 5.
1 Lecture 2: Parallel Programs Topics: parallel applications, parallelization process, consistency models.
SE-292 High Performance Computing Intro. to Parallelization Sathish Vadhiyar.
09/21/2010CS4961 CS4961 Parallel Programming Lecture 9: Red/Blue and Introduction to Data Locality Mary Hall September 21,
1 M. Tudruj, J. Borkowski, D. Kopanski Inter-Application Control Through Global States Monitoring On a Grid Polish-Japanese Institute of Information Technology,
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 Basic Parallel Programming Concepts Computational.
Concurrency: Mutual Exclusion and Synchronization Chapter 5.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
Steps in Creating a Parallel Program
Chapter 5 Concurrency: Mutual Exclusion and Synchronization Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee.
Parallel and Distributed Simulation Time Parallel Simulation.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
April 24, 2002 Parallel Port Example. April 24, 2002 Introduction The objective of this lecture is to go over a simple problem that illustrates the use.
Parallel Computing Presented by Justin Reschke
Autumn 2006CSE P548 - Dataflow Machines1 Von Neumann Execution Model Fetch: send PC to memory transfer instruction from memory to CPU increment PC Decode.
Department of Computer Science, Johns Hopkins University Lecture 7 Finding Concurrency EN /420 Instructor: Randal Burns 26 February 2014.
Parallel programs Inf-2202 Concurrent and Data-intensive Programming Fall 2016 Lars Ailo Bongo
CS5102 High Performance Computer Systems Thread-Level Parallelism
Parallel Programming By J. H. Wang May 2, 2017.
Steps in Creating a Parallel Program
Steps in Creating a Parallel Program
Perspective on Parallel Programming
Steps in Creating a Parallel Program
Steps in Creating a Parallel Program
Steps in Creating a Parallel Program
Lecture 2: Parallel Programs
Steps in Creating a Parallel Program
Steps in Creating a Parallel Program
Parallelization of An Example Program
Stencil Quiz questions
Stencil Quiz questions
Steps in Creating a Parallel Program
Stencil Pattern ITCS 4/5145 Parallel computing, UNC-Charlotte, B. Wilkinson Oct 14, 2014 slides6b.ppt 1.
Steps in Creating a Parallel Program
Introduction to High Performance Computing Lecture 16
Stencil Pattern ITCS 4/5145 Parallel computing, UNC-Charlotte, B. Wilkinson StencilPattern.ppt Oct 14,
Parallel Programming in C with MPI and OpenMP
Steps in Creating a Parallel Program
Jianmin Chen, Zhuo Huang, Feiqi Su, Jih-Kwon Peir and Jeff Ho
Presentation transcript:

ECE669 L5: Grid Computations February 12, 2004 ECE 669 Parallel Computer Architecture Lecture 5 Grid Computations

ECE669 L5: Grid Computations February 12, 2004 Outline °Motivating Problems (application case studies) °Classifying problems °Parallelizing applications °Examining tradeoffs °Understanding communication costs Remember: software and communication!

ECE669 L5: Grid Computations February 12, 2004 Current status °We saw how to set up a system of equations °How to solve them °Poisson: Basic idea °In iterative methods Iterate till no difference The ultimate parallel method Iterative Direct Jacobi,... Multigrid.... Or 0 for Laplace

ECE669 L5: Grid Computations February 12, 2004 Examining Optimizations Optimizations SOR Gauss-Seidel Use recent values ASAP Slow! Jacobi relaxation n O(n 2 )

ECE669 L5: Grid Computations February 12, 2004 Parallel Implementation °Q: How would you partition the problem? Say, on 4 processors? °Communication! °What about synchronization? i: j:

ECE669 L5: Grid Computations February 12, 2004 Machine model Data is distributed among memories (ignore initial I/O costs) Communication over network-explicit Processor can compute only on data in local memory. -To effect communication, processor sends data to other node Interconnection network M P MM P P

ECE669 L5: Grid Computations February 12, 2004 Turbo charging – Iterative methods °SOR: Successive Over Relaxation Accelerate towards direction of change Difference (new-old) 0(n 2 ) n

ECE669 L5: Grid Computations February 12, 2004 °Basic idea ---> Solve on coarse grid ---> then on fine grid In practice -- solve for errors in next finer grid. But communication and computation patterns stay the same. Multigrid 8, 1 1, 1 8, 8 1, 8 X k+1

ECE669 L5: Grid Computations February 12, 2004 ° Multigrid Basic idea ---> Solve on coarse grid ---> then on fine grid 8, 1 1, 1 8, 8 1, 8 X k+1

ECE669 L5: Grid Computations February 12, , 1 1, 1 8, 8 1, 8 Basic idea ---> Solve on coarse grid ---> then on fine grid (Note: In practice -- solve for errors in next finer grid. But communication and computation patterns stay the same.) 8, 1 1, 1 8, 8 1, 8 ° Multigrid X k+1 i, j

ECE669 L5: Grid Computations February 12, 2004 Example: iterative equation solver °Simplified version of a piece of Ocean simulation °Illustrate program in low-level parallel language C-like pseudocode with simple extensions for parallelism Expose basic comm. and synch. primitives State of most real parallel programming today

ECE669 L5: Grid Computations February 12, 2004 Grid Solver °Gauss-Seidel (near-neighbor) sweeps to convergence Interior n-by-n points of (n+2)-by-(n+2) updated in each sweep Updates done in-place in grid Difference from previous value computed Check if has converged -to within a tolerance parameter

ECE669 L5: Grid Computations February 12, 2004 Exploit Application Knowledge Different ordering of updates: may converge quicker or slower Red sweep and black sweep are each fully parallel: Global synchronization between them Reorder grid traversal: red-black ordering

ECE669 L5: Grid Computations February 12, 2004 Point to Point Event Synchronization °One process notifies another of an event so it can proceed Common example: producer-consumer (bounded buffer) Concurrent programming on uniprocessor: semaphores Shared address space parallel programs: semaphores, or use ordinary variables as flags Busy-waiting or spinning P 1 P 2 A = 1; a: while (flag is 0) do nothing; b: flag = 1; print A;

ECE669 L5: Grid Computations February 12, 2004 Group Event Synchronization °Subset of processes involved Can use flags or barriers (involving only the subset) Concept of producers and consumers °Major types: Single-producer, multiple-consumer Multiple-producer, single-consumer

ECE669 L5: Grid Computations February 12, 2004 Message Passing Grid Solver °Cannot declare A to be global shared array compose it logically from per-process private arrays usually allocated in accordance with the assignment of work -process assigned a set of rows allocates them locally °Transfers of entire rows between traversals °Structurally similar to shared memory °Orchestration different data structures and data access/naming communication synchronization °Ghost rows

ECE669 L5: Grid Computations February 12, 2004 Data Layout and Orchestration P 0 P 1 P 2 P 4 P 0 P 2 P 4 P 1 Data partition allocated per processor Add ghost rows to hold boundary data Send edges to neighbors Receive into ghost rows Compute as in sequential program

ECE669 L5: Grid Computations February 12, 2004 Notes on Message Passing Program °Use of ghost rows °Communication done at beginning of iteration, so no asynchrony °Communication in whole rows, not element at a time °Core similar, but indices/bounds in local rather than global space °Synchronization through sends and receives Could implement locks and barriers with messages

ECE669 L5: Grid Computations February 12, 2004 Send and Receive Alternatives Affect event synch (mutual excl. by fiat: only one process touches data) Affect ease of programming and performance °Synchronous messages provide built-in synch. through match Can extend functionality: stride, scatter-gather, groups Semantic flavors: based on when control is returned Affect when data structures or buffers can be reused at either end Send/Receive SynchronousAsynchronous Blocking asynch.Nonblocking asynch.

ECE669 L5: Grid Computations February 12, 2004 Orchestration: Summary ° Shared address space Shared and private data explicitly separate Communication implicit in access patterns No correctness need for data distribution Synchronization via atomic operations on shared data Synchronization explicit and distinct from data communication ° Message passing Data distribution among local address spaces needed No explicit shared structures (implicit in comm. patterns) Communication is explicit Synchronization implicit in communication (at least in synch. case)

ECE669 L5: Grid Computations February 12, 2004 Correctness in Grid Solver Program °Decomposition and Assignment similar in SAS and message-passing °Orchestration is different Data structures, data access/naming, communication, synchronization Performance?

ECE669 L5: Grid Computations February 12, 2004 Summary °Several techniques to parallelizing grid problems °Specify as a series of difference relations °Use of currently computer values can speed convergence °Multigrid methods require specialized communication °Understanding shared memory and message passing constraints for grid computation Remember: software and communication!