Domain Decomposed Parallel Heat Distribution Problem in Two Dimensions Yana Kortsarts Jeff Rufinus Widener University Computer Science Department.

Slides:



Advertisements
Similar presentations
Load Balancing Parallel Applications on Heterogeneous Platforms.
Advertisements

Partial Differential Equations
Parallel Jacobi Algorithm Steven Dong Applied Mathematics.
MINJAE HWANG THAWAN KOOBURAT CS758 CLASS PROJECT FALL 2009 Extending Task-based Programming Model beyond Shared-memory Systems.
A NOVEL APPROACH TO SOLVING LARGE-SCALE LINEAR SYSTEMS Ken Habgood, Itamar Arel Department of Electrical Engineering & Computer Science GABRIEL CRAMER.
Computational Modeling for Engineering MECN 6040
CSCI-455/552 Introduction to High Performance Computing Lecture 11.
ICS 556 Parallel Algorithms Ebrahim Malalla Office: Bldg 22, Room
Image Processing A brief introduction (by Edgar Alejandro Guerrero Arroyo)
Parallel Computation of the 2D Laminar Axisymmetric Coflow Nonpremixed Flames Qingan Andy Zhang PhD Candidate Department of Mechanical and Industrial Engineering.
PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker.
OpenFOAM on a GPU-based Heterogeneous Cluster
Chapter 3 Steady-State Conduction Multiple Dimensions
Revisiting a slide from the syllabus: CS 525 will cover Parallel and distributed computing architectures – Shared memory processors – Distributed memory.
Reference: Message Passing Fundamentals.
CS 584. Review n Systems of equations and finite element methods are related.
ECE669 L4: Parallel Applications February 10, 2004 ECE 669 Parallel Computer Architecture Lecture 4 Parallel Applications.
Parallellization of a semantic discovery tool for a search engine Kalle Happonen Wray Buntine.
Message Passing Fundamentals Self Test. 1.A shared memory computer has access to: a)the memory of other nodes via a proprietary high- speed communications.
SMART: A Scan-based Movement- Assisted Sensor Deployment Method in Wireless Sensor Networks Jie Wu and Shuhui Yang Department of Computer Science and Engineering.
Science Advisory Committee Meeting - 20 September 3, 2010 Stanford University 1 04_Parallel Processing Parallel Processing Majid AlMeshari John W. Conklin.
CSCI-455/552 Introduction to High Performance Computing Lecture 22.
Monica Garika Chandana Guduru. METHODS TO SOLVE LINEAR SYSTEMS Direct methods Gaussian elimination method LU method for factorization Simplex method of.
P.Krusche / A. Tiskin - Efficient LLCS Computation using Bulk-Synchronous Parallelism Efficient Longest Common Subsequence Computation using Bulk-Synchronous.
Chapter 13 Finite Difference Methods: Outline Solving ordinary and partial differential equations Finite difference methods (FDM) vs Finite Element Methods.
Parallel Adaptive Mesh Refinement Combined With Multigrid for a Poisson Equation CRTI RD Project Review Meeting Canadian Meteorological Centre August.
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
Efficient Parallel Implementation of Molecular Dynamics with Embedded Atom Method on Multi-core Platforms Reporter: Jilin Zhang Authors:Changjun Hu, Yali.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental.
© Fujitsu Laboratories of Europe 2009 HPC and Chaste: Towards Real-Time Simulation 24 March
Processing of a CAD/CAE Jobs in grid environment using Elmer Electronics Group, Physics Department, Faculty of Science, Ain Shams University, Mohamed Hussein.
Lecture 3 – Parallel Performance Theory - 1 Parallel Performance Theory - 1 Parallel Computing CIS 410/510 Department of Computer and Information Science.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Computational issues in Carbon nanotube simulation Ashok Srinivasan Department of Computer Science Florida State University.
ITCS 4/5145 Cluster Computing, UNC-Charlotte, B. Wilkinson, 2006outline.1 ITCS 4145/5145 Parallel Programming (Cluster Computing) Fall 2006 Barry Wilkinson.
Edgar Gabriel Short Course: Advanced programming with MPI Edgar Gabriel Spring 2007.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Sept COMP60611 Fundamentals of Concurrency Lab Exercise 2 Notes Notes on the finite difference performance model example – for the lab… Graham Riley,
Sieve of Eratosthenes by Fola Olagbemi. Outline What is the sieve of Eratosthenes? Algorithm used Parallelizing the algorithm Data decomposition options.
Parallel Simulation of Continuous Systems: A Brief Introduction
MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.
High Performance Computing How to use Recommended Books Spring Semester 2005 Geoffrey Fox Community Grids Laboratory Indiana University 505 N Morton Suite.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
ATmospheric, Meteorological, and Environmental Technologies RAMS Parallel Processing Techniques.
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
Motivation: Sorting is among the fundamental problems of computer science. Sorting of different datasets is present in most applications, ranging from.
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
Data Structures and Algorithms in Parallel Computing Lecture 7.
CDP Tutorial 3 Basics of Parallel Algorithm Design uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison.
CSCI-455/552 Introduction to High Performance Computing Lecture 23.
High Performance LU Factorization for Non-dedicated Clusters Toshio Endo, Kenji Kaneda, Kenjiro Taura, Akinori Yonezawa (University of Tokyo) and the future.
MULTIDIMENSIONAL HEAT TRANSFER  This equation governs the Cartesian, temperature distribution for a three-dimensional unsteady, heat transfer problem.
On the Performance of PC Clusters in Solving Partial Differential Equations Xing Cai Åsmund Ødegård Department of Informatics University of Oslo Norway.
Uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison Wesley, 2003.
Parallel Computing Presented by Justin Reschke
ERT 216 HEAT & MASS TRANSFER Sem 2/ Dr Akmal Hadi Ma’ Radzi School of Bioprocess Engineering University Malaysia Perlis.
Hybrid Parallel Implementation of The DG Method Advanced Computing Department/ CAAM 03/03/2016 N. Chaabane, B. Riviere, H. Calandra, M. Sekachev, S. Hamlaoui.
TEMPLATE DESIGN © H. Che 2, E. D’Azevedo 1, M. Sekachev 3, K. Wong 3 1 Oak Ridge National Laboratory, 2 Chinese University.
High Altitude Low Opening?
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Mapping Techniques Dr. Xiao Qin Auburn University.
Lecture 19 MA471 Fall 2003.
Parallel Programming with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP
CSCE569 Parallel Computing
Hybrid Programming with OpenMP and MPI
Chapter 01: Introduction
Introduction to High Performance Computing Lecture 17
Parallel Programming in C with MPI and OpenMP
Presentation transcript:

Domain Decomposed Parallel Heat Distribution Problem in Two Dimensions Yana Kortsarts Jeff Rufinus Widener University Computer Science Department

Introduction 2004: Office of Science in the Department of Energy issued a twenty-year strategic plan with seven highest priorities ranging from fusion energy to genomics. To achieve the necessary levels of algorithmic and computational capabilities it is essential to educate students in computation and computational techniques. Parallel Computing topic is one of the attractive topics in computation science field

Introductory Parallel Computer Course Computer Science Department, Widener University, CS and CIS majors Series of two courses Introduction to Parallel Computing I and II Resources: computer cluster of six nodes, each node has two 2.4 GHz processors and 1 GB of memory, nodes are connected by Gigabit Ethernet switch.

Course Curriculum Matrix Manipulation Numerical Simulation Concepts: Direct applications in science and engineering Introduction to MPI libraries and their applications Concepts of parallelism Finite difference method for the 2-D heat equation using parallel algorithm

2-D Heat Distribution Problem The problem: to determine the temperature u(x,y,t) in an isotropic two-dimensional rectangular plate The model:

Finite Difference Method The finite difference method begins with the discretization of space and time such that there is an integer number of points in space and an integer number of times at which we calculate the temperature xx yy (x i, y j ) tt t k+1 tktk

We will use the following notation: We will use the finite difference approximations for the derivatives: Expressing u i,j,k+1 from this equation yields:

Finite Difference Method Explicit Scheme u i,j,k+1 u i,j,k u i,j+1,k u i,j-1,k u i-1,j,k u i+1,j,k k  k + 1

Single Processor Implementation double u_old[n+1][n+1], u_new[n+1][n+1]; Initialize u_old with initial values and boundary conditions; while (still time points to compute) { for (i = 1; i < n; i++) { for (j = 1; j < n; j++) { compute u_new[i, j] using formula (1) }//end of for u_old  u_new; } // end of while

Parallel Implementation Domain Decomposition Dividing computation and data into pieces Domain could be decomposed in three ways: Column-wise: adjacent groups of columns (A) Row-wise: adjacent groups of rows (B) Block-wise: adjacent groups of two dimensional blocks (C)

Domain Decomposition and Partition Example: column-wise domain decomposition method, 200 points to be calculated simultaneously, 4 processors MPI_Send and MPI_Recv Processor 1 x 0 …x 49 Processor 2 x 50 …x 99 Processor3 x 100 …x 149 Processor 4 x 149 …x 199

Load Imbalance When dividing the data into processes we have to pay attention to the number of loads being processed by each processor Uneven load distribution may cause some processes to finish earlier than others Load imbalance is one source of overhead Good task mapping is needed All tasks should be mapped onto processes as evenly as possible so that all tasks complete in the shortest amount of time and the idle time is minimized

Communication Communication time depends on the latency and the speed of communication network – these two factors are much slower than CPU’s communication time There is a catch of using too many communications

Running Time and Speedup The running time of one time iteration of the sequential algorithm is  (MN), where M and N are numbers of grid points in each direction The running time of one time iteration of the parallel algorithm is: computational time + communication time = =  (MN/p) + B where p is the number of processors and B is the total send- receive communication time that is required for one time iteration The speedup is always defined as:

Results The temperature distribution on the two- dimensional plate at a much later time

Results Two cases were considered: M x N = 1000 M x N = 500,000. Next slide shows the speed-up versus the number of processors for two different inputs: 500,000 (the top chart) and 1000 (the bottom chart). The dashed line indicates the speed-up equals to one, which is the sequential version of the algorithm. The higher the speed-up (at a specific number of processors) means the better the performance of the parallel algorithm. Most of the results come from the column-wise domain decomposition method.

Results

For the case of input = 1000, the sequential version (with p = 1) is faster than the parallel version (p ≥ 2). The parallel version is slower because of the latency and speed of the communication network which does not exist in the sequential version. The top chart shows the speedup versus the number of processors for total input = 500,000. In this case, as we increase the number of processors, the speedup also increases, reaching the speedup of ~ 4.13 at p = 10. For a large number of inputs the communication time begins to catch up with the CPU’s computation time, resulting in a better performance of the parallel algorithm.

Speedup comparisons for column-wise and block- wise decomposition methods for number of processors equals to 4 and 9 Total number of inputs Speedup Column-wise Decompositio n P = 4 Speedup Column-wise Decompositio n P = 9 Speedup Block-wise Decompositio n P = 4 Speedup Block-wise Decompositio n P = 9 1, ,

Results Overall, the speedups between the two methods are not very different. For number of inputs = 1,000 the column-wise decomposition produces better speed-ups than the block-wise decomposition. For number of inputs = 500,000, we have a mixed result. The column-wise method performs better for 9 processors while the block-wise method performs (slightly) better for 4 processors. The results given in the table do not give a conclusive idea of which decomposition method is better, unless the number of inputs and the number of processors could be extended beyond the ones used in here.

Summary Numerical simulation of two-dimensional heat distribution has been used as an example that can be used to teach parallel computing concepts in an introductory course. With this simple example we introduce the core concepts of parallelism: Domain decomposition and partitioning Load balancing and mapping Communication Speedup We show the benchmarking results of the parallel version of two-dimensional heat distribution problem with different number of processors.

References 1. J. Dongarra, I. Foster, G. Fox, W. Gropp, K. Kennedy, L. Torczon, and A. White, (Editors), Sourcebook of Parallel Computing. Elsevier Science (2003). 2. I. Foster, Designing and Building Parallel Programs. Addison Wesley (1994). 3. G. E. Karniadakis and R. M. Kirby, Parallel Scientific Computing in C++ and MPI. Cambridge University Press (2003). 4. M. J. Quinn, Parallel Programming in C with MPI and OpenMP. McGraw Hill Publishers (2005). 5. B. Wilkinson and M. Allen, Parallel Programming. Second edition. Prentice- Hall (2005). 6. M. Snir, S. Otto, S. Huss-Lederman, D. Walker and J. Dongarra, MPI The Complete Reference, Volume 1. Second edition. MIT Press (1998). 7. W. F. Ames, Numerical Methods for Partial Differential Equations. Second edition. Academic Press, New York (1977). 8. T. Myint-U and L. Debnath, Partial Differential Equations for Scientists and Engineers. Elsevier Science (1987). 9. G. D. Smith, Numerical Solution of Partial Differential Equations: Finite Difference Methods. Third edition. Oxford University Press (1985). 10. S. S. Rao, Applied Numerical Methods for Engineers and Scientists. Prentice- Hall (2002).