Computational Physics (Lecture 16) PHY4061. Typical initial-value problems: – time-dependent diffusion equation, – the time-dependent wave equation Some.

Slides:

Advertisements

Similar presentations

Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.

Advertisements

Distributed Systems CS

Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.

Why Systolic Architecture ?. Motivation & Introduction We need a high-performance, special-purpose computer system to meet specific application. I/O and.

PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker.

Lecture 3 Jack Tanabe Old Dominion University Hampton, VA January 2011 Conformal Mapping.

Chapter 3 Steady-State Conduction Multiple Dimensions

CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.

Reference: Message Passing Fundamentals.

ECE669 L4: Parallel Applications February 10, 2004 ECE 669 Parallel Computer Architecture Lecture 4 Parallel Applications.

ECE669 L11: Static Routing Architectures March 4, 2004 ECE 669 Parallel Computer Architecture Lecture 11 Static Routing Architectures.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

Lecture 34 - Ordinary Differential Equations - BVP CVEN 302 November 28, 2001.

High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.

EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley Asynchronous Distributed Algorithm Proof.

Derivation of the Gaussian plume model Distribution of pollutant concentration c in the flow field (velocity vector u ≡ u x, u y, u z ) in PBL can be generally.

1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.

Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;

1 Web Based Interface for Numerical Simulations of Nonlinear Evolution Equations Ryan N. Foster & Thiab Taha Department of Computer Science The University.

1 Reasons for parallelization Can we make GA faster? One of the most promising choices is to use parallel implementations. The reasons for parallelization.

Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.

Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.

An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.

Erin Catto Blizzard Entertainment Numerical Integration.

ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.

Institute for Mathematical Modeling RAS 1 Dynamic load balancing. Overview. Simulation of combustion problems using multiprocessor computer systems For.

Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.

1 Distributed Process Scheduling: A System Performance Model Vijay Jain CSc 8320, Spring 2007.

Copyright © 2001, S. K. Mitra Digital Filter Structures The convolution sum description of an LTI discrete-time system be used, can in principle, to implement.

1 EEE 431 Computational Methods in Electrodynamics Lecture 4 By Dr. Rasime Uyguroglu

A particle-gridless hybrid methods for incompressible flows

Master Program (Laurea Magistrale) in Computer Science and Networking High Performance Computing Systems and Enabling Platforms Marco Vanneschi 1. Prerequisites.

April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.

1 Chapter 2 1. Parametric Models. 2 Parametric Models The first step in the design of online parameter identification (PI) algorithms is to lump the unknown.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 Basic Parallel Programming Concepts Computational.

PARALLEL COMPUTING overview What is Parallel Computing? Traditionally, software has been written for serial computation: To be run on a single computer.

Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing.

Distributed Information Systems. Motivation ● To understand the problems that Web services try to solve it is helpful to understand how distributed information.

JAVA AND MATRIX COMPUTATION

Engineering Analysis – Computational Fluid Dynamics –

Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.

EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley.

HEAT TRANSFER FINITE ELEMENT FORMULATION

Run-time Adaptive on-chip Communication Scheme 林孟諭 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan, R.O.C.

CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/

College Algebra Sixth Edition James Stewart Lothar Redlin Saleem Watson.

Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.

Distributed Computing Systems CSCI 6900/4900. Review Distributed system –A collection of independent computers that appears to its users as a single coherent.

Data Structures and Algorithms in Parallel Computing Lecture 7.

Static Process Scheduling

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Spring 2010 Lecture 13: Basic Parallel.

Parallel Computing Presented by Justin Reschke

Computational Physics (Lecture 13) PHY4061. Recap of last lecture: Boundary-value and eigenvalue problems A typical boundary-value problem in physics.

Computational Physics (Lecture 14) PHY4061. What happens if the string is not light, and/or carries a mass density ρ(x) that is not a constant? This is.

INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.

Application of Design Patterns to Geometric Decompositions V. Balaji, Thomas L. Clune, Robert W. Numrich and Brice T. Womack.

Group Members Hamza Zahid (131391) Fahad Nadeem khan Abdual Hannan AIR UNIVERSITY MULTAN CAMPUS.

Auburn University

Distributed Processors

Parallel Programming By J. H. Wang May 2, 2017.

Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Mapping Techniques Dr. Xiao Qin Auburn University.

Computational Physics (Lecture 16)

Lecture 19 MA471 Fall 2003.

Parallel Programming in C with MPI and OpenMP

Guoliang Chen Parallel Computing Guoliang Chen

COMP60621 Fundamentals of Parallel and Distributed Systems

PERFORMANCE MEASURES. COMPUTATIONAL MODELS Equal Duration Model:  It is assumed that a given task can be divided into n equal subtasks, each of which.

Memory System Performance Chapter 3

Parallel Programming in C with MPI and OpenMP

COMP60611 Fundamentals of Parallel and Distributed Systems

Presentation transcript:

Computational Physics (Lecture 16) PHY4061

Typical initial-value problems: – time-dependent diffusion equation, – the time-dependent wave equation Some are non-linear: – the equation for a stretched elastic string – the Navier–Stokes equation in fluid dynamics. apply the Fourier transform for the time variable of the equation to reduce it to a stationary equation – which can be solved by the relaxation method The time dependence can be obtained – with an inverse Fourier transform – after the solution of the corresponding stationary case is obtained. Initial-value problems

For equations with higher-order time derivatives – redefine the derivatives and convert the equations to ones with only first-order time derivatives. For example, we can redefine the first-order time derivative in the wave equation, the velocity:

Then we have two coupled first-order equations: the above equation set now is similar to a first- order equation such as the diffusion equation:

This means we can develop numerical schemes for equations with first-order time derivatives only. In the case of higher-order time derivatives, we will always introduce new variables to reduce the higher- order equation to a first order equation set. After discretization of the spatial variables, we have practically the same initial-value problem as that discussed in ODE Chapter. – However, there is one more complication. – The specific scheme used to discretize the spatial variables as well as the time variable will certainly affect the stability and accuracy of the solution.

To analyze the stability of the problem, let us first consider the one dimensional diffusion equation: If we discretize – the first-order time derivative by means of the two-point formula with an interval τ – the second-order spatial derivative by means of the three- point formula with an interval h, we obtain a difference equation n i (t + τ ) = n i (t) + γ [n i+1 (t) + n i−1 (t) − 2n i (t)] + τ S i (t), – which is the result of the Euler method

Here γ = Dτ/h^2 is a measure of the relative sizes between the space and the time intervals. – Note that we have used n i (t) =n(x i, t) for notational convenience. So the problem is solved if we know the initial value n(x, 0) and the source S(x, t). However, this algorithm is unstable if γ is significantly larger than 1/2.

A better scheme is the Crank–Nicolson method, – which modifies the Euler method by using the average of the second-order spatial derivative and the source at t and t + τ on the right-hand side of the equation, – resulting in: n i (t + τ ) = n i (t) + 1/2{[H i n i (t) + τ S i (t)] + [H i n i (t + τ ) + τ S i (t + τ )]}, – where we have used H i n i (t) = γ [n i+1 (t) + n i-1 (t) − 2n i (t)] to simplify the notation. The implicit iterative scheme can be rewritten into the form: (2 − H i )n i (t + τ ) = (2 + H i ) n i (t) + τ [S i (t) + S i (t + τ )]. – a linear equation set with a tridiagonal coefficient matrix, which can easily be solved the algorithm is stable for any γ and converges as h → 0, and that the error in the solution is on the order of h 2

However, the above tridiagonal matrix does not hold if the system is in a higher dimensional space. There are two ways to deal with this problem in practice. We can discretize the equation in the same manner and then solve the resulting linear equation set with some other methods, such as the Gaussian elimination scheme or a general LU decomposition scheme for a full matrix. A more practical approach is to deal with each spatial coordinate separately.

For example, if we are dealing with the two- dimensional diffusion equation, we have

Here γ x = Dτ/h x 2 and γ y = Dτ/h y 2. The decomposition of Hi j into H i and H j can be used to take one half of each time step along the x direction and the other half along the y direction with

in each of the above two steps, we have a tridiagonal coefficient matrix. We still have to be careful in using the Peaceman–Rachford algorithm. the convergence in the Peaceman–Rachford algorithm can sometimes be very slow in practice.

For a few more examples Temperature field of a nuclear waste rod problem Ground water dynamics Just read Tao Pang’s book if you are interested in them.

Principles of Parallel computers and some Impacts on their programming models Key technology developed in the last 25 years in solving scientific, mathematical and technical problems. A broad spectrum of parallel architectures has been developed.

A parallel algorithm can be efficiently implemented – Only if it is designed for the specific needs Basic introduction to the architectures of parallel computers.

Overview of architecture principles The first super computer architectures – The use of one or a few of the fastest processors – By increasing the packing density, minimizing switching time, heavily pipelining the system and employing vector processing techniques.

Vector processing: – Highly effective for certain numerically intensive applications. – Much less effective in commercial uses like online transaction processing or databases. Computational speed was achieved – At substantial costs: highly specialized architectural hardware design and renunciation of such techniques as virtual memory.

Another way is to use multiprocessor systems (MPS). Only small changes to earlier uniprocessor systems. – By adding a number of processor elements of the same type to multiply the performance of a single processor machine. The essential fact of a unified global memory could be maintained.

Later, the unified global memory is not required. The total memory is distributed over the total number of processors Each one has a fraction in the form of a local memory.

Massively parallel processors appeared in 80’s. Using low cost standard processors to achieve far greater computational power. One problem: – For the use of such systems – Development of appropriate programming models.

No standard models. A few competing models – Message passing, data-parallel programming, virtual shared memory concept. – Efficient use of parallel computers with distributed memory requires: Exploitation of data locality.

If the performance needs increase: – A cluster of interconnected workstations can be considered as a parallel machine. – The interconnection network of such clusters is characterized by relatively small bandwidths and high latency.

We can realize integrate massively parallel processors, multiprocessor systems, cluster of interconnected workstations, vector computers into a network environment and combine them to form a heterogeneous super computer. Message-passing interface (MPI) is a landmark achievement in making such systems programmable.

Flynn’s classification of computer architectures

Message passing multicomputers: The processors in a multiprocessor system communicate with each other through shared variables in a common memory, each node in a multicomputer system has a local memory, not shared with other nodes. Interprocessor communication is done through message passing.

Massively parallel processor systems Hundreds or several thousands of identical processors, each has its own memory. Distributed memory multicomputers are most useful for problems that can be broken into many relatively independent parts. The interaction should be small – Interprocessor communication can degrade the system performance. – Limiting factors: bandwidth and latency.

Message passing programming model The communication channels are mapped onto the communication network. The communication hardware is capable of operating independently of its assigned compute node so that communication and computation can be done concurrently.

The efficiency is determined by the quality of mapping the process graph with its communication edges onto the distributed memory architecture. In the ideal case, – Each task gets its own processor, every communication channel corresponds with a direct physical link between both communication nodes.

Available processors in massively parallel systems. Scalability requires a relatively simple communication network. Compromises are unavoidable. For example: a logical communication channel is routed when it passes one or more grid points. The transfer of data takes time. If there is no hardware support, routing must be done by software emulation.

On one hand, communication parths with different delays arise by non-optimal mapping of communication channels onto the network. On the other hand, several logical channels are multiplexed on one physical link. Therefore, usable communication bandwith is decreased.

In recent years, adaptive parallel algorithms are developed. The decision of how to inbed the actual process graph into the processor graph can’t be made statically at the compile time, but only at the runtime. Newly created tasks should be placed on processors with less workload to ensure a load balance. The communication paths should be kept as short as possible and not be overloaded by existing channels.

HW 3 1 Suppose in a parallel world where the air resistance force is proportional to V 4/3. Rewrite the motor cyclist program in C or fortran to calculate this problem show in the lecture notes (Assume all the parameters are the same). Show the difference between this model and the linear relationship model shown in the lecture notes.

2 Setup and run the java code of the relaxation method shown in the lecture notes(You can also rewrite it in C/C++ or fortran if you don’t know how to run Java or you enjoy programming), test different p values and the convergence speed, report the best choice of p values in your code.