ECE669 L4: Parallel Applications February 10, 2004 ECE 669 Parallel Computer Architecture Lecture 4 Parallel Applications.

Slides:



Advertisements
Similar presentations
Parallel Jacobi Algorithm Steven Dong Applied Mathematics.
Advertisements

Parallel Programming and Algorithms : A Primer Kishore Kothapalli IIIT-H Workshop on Multi-core Technologies International Institute.
SE-292 High Performance Computing
N-Body I CS 170: Computing for the Sciences and Mathematics.
CS 484. Discrete Optimization Problems A discrete optimization problem can be expressed as (S, f) S is the set of all feasible solutions f is the cost.
Parallelizing stencil computations Based on slides from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267.
Numerical Algorithms Matrix multiplication
1 Parallelization of An Example Program Examine a simplified version of a piece of Ocean simulation Iterative equation solver Illustrate parallel program.
Parallel System Performance CS 524 – High-Performance Computing.
CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.
ECE602 BME I Partial Differential Equations in Biomedical Engineering.
CS 584. Review n Systems of equations and finite element methods are related.
ECE669 L15: Mid-term Review March 25, 2004 ECE 669 Parallel Computer Architecture Lecture 15 Mid-term Review.
ECE669 L5: Grid Computations February 12, 2004 ECE 669 Parallel Computer Architecture Lecture 5 Grid Computations.
1 Tuesday, October 03, 2006 If I have seen further, it is by standing on the shoulders of giants. -Isaac Newton.
1 Parallel Programs. 2 Why Bother with Programs? They’re what runs on the machines we design Helps make design decisions Helps evaluate systems tradeoffs.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
CISC 879 : Software Support for Multicore Architectures John Cavazos Dept of Computer & Information Sciences University of Delaware
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
Parallel Programming Todd C. Mowry CS 740 October 16 & 18, 2000 Topics Motivating Examples Parallel Programming for High Performance Impact of the Programming.
Parallel System Performance CS 524 – High-Performance Computing.
CS267 L24 Solving PDEs.1 Demmel Sp 1999 CS 267 Applications of Parallel Computers Lecture 24: Solving Linear Systems arising from PDEs - I James Demmel.
Chapter 13 Finite Difference Methods: Outline Solving ordinary and partial differential equations Finite difference methods (FDM) vs Finite Element Methods.
Module on Computational Astrophysics Jim Stone Department of Astrophysical Sciences 125 Peyton Hall : ph :
ECE669 L6: Programming for Performance February 17, 2004 ECE 669 Parallel Computer Architecture Lecture 6 Programming for Performance.
Parallel Programming: Case Studies Todd C. Mowry CS 495 September 12, 2002.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
Lecture 3 – Parallel Performance Theory - 1 Parallel Performance Theory - 1 Parallel Computing CIS 410/510 Department of Computer and Information Science.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Scientific Computing Partial Differential Equations Poisson Equation.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
1 Programming for Performance Reference: Chapter 3, Parallel Computer Architecture, Culler et.al.
1 Lecture 2: Parallel Programs Topics: parallel applications, parallelization process, consistency models.
Performance Model & Tools Summary Hung-Hsun Su UPC Group, HCS lab 2/5/2004.
Sept COMP60611 Fundamentals of Concurrency Lab Exercise 2 Notes Notes on the finite difference performance model example – for the lab… Graham Riley,
Linear Systems Iterative Solutions CSE 541 Roger Crawfis.
Parallel Simulation of Continuous Systems: A Brief Introduction
(Short) Introduction to Parallel Computing CS 6560: Operating Systems Design.
The Truth About Parallel Computing: Fantasy versus Reality William M. Jones, PhD Computer Science Department Coastal Carolina University.
Elliptic PDEs and the Finite Difference Method
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
ATmospheric, Meteorological, and Environmental Technologies RAMS Parallel Processing Techniques.
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.
Data Structures and Algorithms in Parallel Computing Lecture 1.
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University.
Data Structures and Algorithms in Parallel Computing Lecture 7.
Parallel Computing Presented by Justin Reschke
Solving Partial Differential Equation Numerically Pertemuan 13 Matakuliah: S0262-Analisis Numerik Tahun: 2010.
Concurrency and Performance Based on slides by Henri Casanova.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.
Department of Computer Science, Johns Hopkins University Lecture 7 Finding Concurrency EN /420 Instructor: Randal Burns 26 February 2014.
Parallel programs Inf-2202 Concurrent and Data-intensive Programming Fall 2016 Lars Ailo Bongo
EEE 431 Computational Methods in Electrodynamics
High Altitude Low Opening?
Christopher Crawford PHY
Parallel Programming By J. H. Wang May 2, 2017.
Lecture 19 MA471 Fall 2003.
Parallel Programming in C with MPI and OpenMP
EE 193: Parallel Computing
Perspective on Parallel Programming
Parallel Application Case Studies
Christopher Crawford PHY
Lecture 2: Parallel Programs
CS6068 Applications: Numerical Methods
Mattan Erez The University of Texas at Austin
Jacobi Project Salvatore Orlando.
Parallel Programming in C with MPI and OpenMP
Presentation transcript:

ECE669 L4: Parallel Applications February 10, 2004 ECE 669 Parallel Computer Architecture Lecture 4 Parallel Applications

ECE669 L4: Parallel Applications February 10, 2004 Outline °Motivating Problems (application case studies) °Classifying problems °Parallelizing applications °Examining tradeoffs °Understanding communication costs Remember: software and communication!

ECE669 L4: Parallel Applications February 10, 2004 Simulating Ocean Currents °Model as two-dimensional grids Discretize in space and time finer spatial and temporal resolution => greater accuracy °Many different computations per time step -set up and solve equations Concurrency across and within grid computations °Static and regular (a) Cross sections (b) Spatial discretization of a cross section

ECE669 L4: Parallel Applications February 10, 2004 Creating a Parallel Program °Pieces of the job: Identify work that can be done in parallel -work includes computation, data access and I/O Partition work and perhaps data among processes Manage data access, communication and synchronization °Simplification: How to represent big problem using simple computation and communication °Identifying the limiting factor Later: balancing resources

ECE669 L4: Parallel Applications February 10, Steps in Creating a Parallel Program °Decomposition of computation in tasks °Assignment of tasks to processes °Orchestration of data access, comm, synch. °Mapping processes to processors

ECE669 L4: Parallel Applications February 10, 2004 Decomposition °Identify concurrency and decide level at which to exploit it °Break up computation into tasks to be divided among processors Tasks may become available dynamically No. of available tasks may vary with time °Goal: Enough tasks to keep processors busy, but not too many Number of tasks available at a time is upper bound on achievable speedup

ECE669 L4: Parallel Applications February 10, 2004 Limited Concurrency: Amdahl’s Law °Most fundamental limitation on parallel speedup °If fraction s of seq execution is inherently serial, speedup <= 1/s °Example: 2-phase calculation sweep over n-by-n grid and do some independent computation sweep again and add each value to global sum °Time for first phase = n 2 /p °Second phase serialized at global variable, so time = n 2 °Speedup <= or at most 2 °Trick: divide second phase into two accumulate into private sum during sweep add per-process private sum into global sum °Parallel time is n 2 /p + n2/p + p, and speedup at best 2n 2 n2n2 p + n 2 2n 2 2n 2 + p 2

ECE669 L4: Parallel Applications February 10, 2004 Understanding Amdahl’s Law 1 p 1 p 1 n 2 /p n2n2 p work done concurrently n2n2 n2n2 Time n 2 /p (c) (b) (a)

ECE669 L4: Parallel Applications February 10, 2004 Concurrency Profiles Area under curve is total work done, or time with 1 processor Horizontal extent is lower bound on time (infinite processors) Speedup is the ratio:, base case: Amdahl’s law applies to any overhead, not just limited concurrency f k k fkfk k p  k=1    1 s + 1-s p

ECE669 L4: Parallel Applications February 10, 2004 Applications °Classes of problems Continuum Particle Graph, Combinatorial °Goal: Demystifying ° Differential equations ---> Parallel Program

ECE669 L4: Parallel Applications February 10, 2004 Particle Problems °Simulate the interactions of many particles evolving over time °Computing forces is expensive Locality Methods take advantage of force law: G m1m2m1m2 r2r2 Many time-steps, plenty of concurrency across stars within one

ECE669 L4: Parallel Applications February 10, 2004 Graph problems Traveling salesman Network flow Dynamic programming °Searching, sorting, lists, °Generally unstructured

ECE669 L4: Parallel Applications February 10, 2004 Continuous systems °Hyperbolic °Parabolic °Elliptic °Examples: Heat diffusion Electrostatic potential Electromagnetic waves Laplace: B is zero Poisson: B is non-zero   2 A  B

ECE669 L4: Parallel Applications February 10, 2004 Numerical solutions °Let’s do finite difference first °Solve Discretize Form system of equations Solve ---> Direct methods Indirect methods Iterative finitedifferencemethods finiteelementmethods. Result in system of equations

ECE669 L4: Parallel Applications February 10, 2004 Discretize °Time Where °Space °1st Where °2nd Can use other discretizations -Backward -Leap frog Forward difference A 12 A 11 Space Boundary conditions n-2 n-1 n Time  x  1 X grid points

ECE669 L4: Parallel Applications February 10, D Case °Or 0 0 A i n  1  A i n  t  1  x 2 A i  1 n  2A i n  A i-1 n    B i

ECE669 L4: Parallel Applications February 10, 2004 Poisson’s For Or Ax = b

ECE669 L4: Parallel Applications February 10, D case n A 11 A 12 A A 21 A °What is the form of this matrix?

ECE669 L4: Parallel Applications February 10, 2004 Current status °We saw how to set up a system of equations °How to solve them °Poisson: Basic idea °In iterative methods Iterate till no difference The ultimate parallel method Iterative Direct Jacobi,... Multigrid.... Or 0 for Laplace

ECE669 L4: Parallel Applications February 10, 2004 In Matrix notation Ax = b °Set up a system of equations. °Now, solve °Direct: °Iterative: Direct methods Semi-direct - CG Iterative Gaussian elim. Recursive dbl. Jacobi MG Solve Ax=b directly LU Ax = b = -Ax+b Mx = Mx - Ax + b Mx = (M - A) x + b Mx k+1 = (M - A) x k + b Solve iteratively

ECE669 L4: Parallel Applications February 10, 2004 Machine model Data is distributed among memories (ignore initial I/O costs) Communication over network-explicit Processor can compute only on data in local memory. To effect communication, processor sends data to other node (writes into other memory). Interconnection network M P MM P P

ECE669 L4: Parallel Applications February 10, 2004 Summary °Many types of parallel applications Attempt to specify as classes (graph, particle, continuum) °We examine continuum problems as a series of finite differences °Partition in space and time °Distribute computation to processors °Understand processing and communication tradeoffs