Sept 2011 1 COMP60611 Fundamentals of Concurrency Lab Exercise 2 Notes Notes on the finite difference performance model example – for the lab… Graham Riley,

Slides:



Advertisements
Similar presentations
CSE 160 – Lecture 9 Speed-up, Amdahl’s Law, Gustafson’s Law, efficiency, basic performance metrics.
Advertisements

Load Balancing Parallel Applications on Heterogeneous Platforms.
Partial Differential Equations
P. Venkataraman Mechanical Engineering P. Venkataraman Rochester Institute of Technology DETC2014 – 35148: Continuous Solution for Boundary Value Problems.
Session: Computational Wave Propagation: Basic Theory Igel H., Fichtner A., Käser M., Virieux J., Seriani G., Capdeville Y., Moczo P.  The finite-difference.
Parallel System Performance CS 524 – High-Performance Computing.
11Sahalu JunaiduICS 573: High Performance Computing5.1 Analytical Modeling of Parallel Programs Sources of Overhead in Parallel Programs Performance Metrics.
Reference: Message Passing Fundamentals.
CS 584. Review n Systems of equations and finite element methods are related.
ECE669 L4: Parallel Applications February 10, 2004 ECE 669 Parallel Computer Architecture Lecture 4 Parallel Applications.
Parallel Decomposition-based Contact Response Fehmi Cirak California Institute of Technology.
1 Lecture 4 Analytical Modeling of Parallel Programs Parallel Computing Fall 2008.
Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Quantitative.
Message Passing Fundamentals Self Test. 1.A shared memory computer has access to: a)the memory of other nodes via a proprietary high- speed communications.
Lecture 5 Today’s Topics and Learning Objectives Quinn Chapter 7 Predict performance of parallel programs Understand barriers to higher performance.
Introduction to Scientific Computing on Linux Clusters Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002.
Parallel System Performance CS 524 – High-Performance Computing.
Chapter 13 Finite Difference Methods: Outline Solving ordinary and partial differential equations Finite difference methods (FDM) vs Finite Element Methods.
CS213 Parallel Processing Architecture Lecture 5: MIMD Program Design
Eurocode 1: Actions on structures – Part 1–2: General actions – Actions on structures exposed to fire Part of the One Stop Shop program Annex D (informative)
Scientific Computing Partial Differential Equations Explicit Solution of Wave Equation.
Tutorial 5: Numerical methods - buildings Q1. Identify three principal differences between a response function method and a numerical method when both.
Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.
Designing and Evaluating Parallel Programs Anda Iamnitchi Federated Distributed Systems Fall 2006 Textbook (on line): Designing and Building Parallel Programs.
Performance Evaluation of Parallel Processing. Why Performance?
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Agenda Project discussion Modeling Critical Sections in Amdahl's Law and its Implications for Multicore Design, S. Eyerman, L. Eeckhout, ISCA'10 [pdf]pdf.
Combining the strengths of UMIST and The Victoria University of Manchester COMP60611 Fundamentals of Parallel and Distributed Systems Lecture 7 Scalability.
Finite Element Method.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Atmospheric Moisture Vapor pressure (e, Pa) The partial pressure exerted by the molecules of vapor in the air. Saturation vapor pressure (e s, Pa ) The.
Chapter 3: A Quantative Basis for Design Real design tries to reach an optimal compromise between a number of thing Execution time Memory requirements.
Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec.
Performance Measurement. A Quantitative Basis for Design n Parallel programming is an optimization problem. n Must take into account several factors:
Parallel Simulation of Continuous Systems: A Brief Introduction
Parallelization of 2D Lid-Driven Cavity Flow
October COMP60611 Fundamentals of Parallel and Distributed Systems Lecture 4 An Approach to Performance Modelling Len Freeman, Graham Riley Centre.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
CLIM Fall 2008 What are the Roles of Satellites & Supercomputers in Studying Weather and Climate? CLIM 101.
ATmospheric, Meteorological, and Environmental Technologies RAMS Parallel Processing Techniques.
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
Convection in Flat Plate Boundary Layers P M V Subbarao Associate Professor Mechanical Engineering Department IIT Delhi A Universal Similarity Law ……
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Lagrangian particle models are three-dimensional models for the simulation of airborne pollutant dispersion, able to account for flow and turbulence space-time.
ECE 1747H: Parallel Programming Lecture 2-3: More on parallelism and dependences -- synchronization.
MSc in High Performance Computing Computational Chemistry Module Parallel Molecular Dynamics (i) Bill Smith CCLRC Daresbury Laboratory
Wind – Chill Index “A Calculus Approach” By Felix Garcia.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Applied NWP [1.2] “…up until the 1960s, Richardson’s model initialization problem was circumvented by using a modified set of the primitive equations…”
Lecture 3: Designing Parallel Programs. Methodological Design Designing and Building Parallel Programs by Ian Foster www-unix.mcs.anl.gov/dbpp.
Data Visualization Fall The Data as a Quantity Quantities can be classified in two categories: Intrinsically continuous (scientific visualization,
Details for Today: DATE:13 th January 2005 BY:Mark Cresswell FOLLOWED BY:Practical Dynamical Forecasting 69EG3137 – Impacts & Models of Climate Change.
April 24, 2002 Parallel Port Example. April 24, 2002 Introduction The objective of this lecture is to go over a simple problem that illustrates the use.
1a.1 Parallel Computing and Parallel Computers ITCS 4/5145 Cluster Computing, UNC-Charlotte, B. Wilkinson, 2006.
Parameterization of the Planetary Boundary Layer -NWP guidance Thor Erik Nordeng and Morten Køltzow NOMEK 2010 Oslo 19. – 23. April 2010.
10/10/2002Yun (Helen) He, GHC20021 Effective Methods in Reducing Communication Overheads in Solving PDE Problems on Distributed-Memory Computer Architectures.
3. Modelling module 3.1 Basics of numerical atmospheric modelling M. Déqué – CNRM – Météo-France J.P. Céron – DClim – Météo-France.
High Altitude Low Opening?
Technology on the Cutting Edge of Weather Research and Forecasting
17-Nov-18 Parallel 2D and 3D Acoustic Modeling Application for hybrid computing platform of PARAM Yuva II Abhishek Srivastava, Ashutosh Londhe*, Richa.
CS 584.
COMP60611 Fundamentals of Parallel and Distributed Systems
COMP60621 Designing for Parallelism
COMP60621 Fundamentals of Parallel and Distributed Systems
COMP60611 Fundamentals of Parallel and Distributed Systems
COMP60611 Fundamentals of Parallel and Distributed Systems
PERFORMANCE MEASURES. COMPUTATIONAL MODELS Equal Duration Model:  It is assumed that a given task can be divided into n equal subtasks, each of which.
Parallel Programming in C with MPI and OpenMP
COMP60611 Fundamentals of Parallel and Distributed Systems
Presentation transcript:

Sept COMP60611 Fundamentals of Concurrency Lab Exercise 2 Notes Notes on the finite difference performance model example – for the lab… Graham Riley, John Gurd Centre for Novel Computing School of Computer Science University of Manchester

Sept Example: Global Atmosphere model Consider a three dimensional model of the atmosphere The model computes values of key atmospheric variables such as temperature, wind speed, pressure and moisture content. The physical processes involved in the atmosphere are described by a set of partial differential equations, in this case describing the basic fluid dynamical behaviour.

Sept Numerical model The behaviour of the equations in a continuous space is approximated by their behaviour on a finite set of regularly spaced grid points in the space. The equations are integrated, from an initial state, in time using a fixed, discrete timestep, typically, 20 mins. The grid points are located on a rectangular latitude-longitude- hight grid of size N_x by N_y by N_z. –There are usually around 30 levels in the atmosphere model (N_z = 30). –N_x (latitude points) is usually less than N_y (longitude) with typical values for N_y being in the range (low to high resolution). Models may cover a limited area (limited area model, LAM) of the globe or the entire globe (global circulation model, GCM). 500 grid points on the equator corresponds to a grid-spacing of approximately 55 miles.

Sept Dynamics and physics We assume the model uses a finite difference method to update grid values, with a five-point stencil in the horizontal (x- and y- directions) to compute atmospheric motion, and a three-point stencil in the vertical (z-direction) The finite difference computations are concerned with the movement, or dynamics, of air in the atmosphere. In additions to the dynamics, the atmosphere model includes algorithms to simulate various physics processes, such as radiation, convection and precipitation. The data dependencies in physics calculations are normally (in most models) restricted to within vertical columns, by design of the modelling equations.

Sept The finite difference stencil at a point x, latitude y, longitutde z, height

Sept Finite difference example – part 1 Assume a grid of N  N  Z grid points. Note that, in this case, the parameter N defines the problem size, but is not actually the problem size itself. Consider first a 1D partition in the horizontal plane (in longitude) so that each task computes N  N/P  Z grid points per timestep –(we only consider the cost of one timestep since, in this problem, all timesteps are assumed equivalent). Thus, the total computation time for one timestep is: Where t c is the (average) time of computation for one grid point –Assuming all processors are the same!

Sept D partition in the horizontal Proc 1Proc 2 longitude latitude

Sept Communication and idle costs The stencil is a 5-point stencil, so each task will exchange a total of NZ points with each of two neighbours –Note we assume cyclic boundary conditions This gives a total communication cost of: t s – comms startup cost, t w – cost per ‘word’ to transmit message If we assume P divides N, there will be no idle time

Sept Total cost (i.e. the model) The total cost is then given by (assuming no idling): i.e. Now, what can we do with this model?

Sept Performance metrics: Speed-up and Efficiency - reminder Define relative speedup as the ratio of the execution time on one processor to that on P processors: Define relative efficiency as: This is the fraction of time that processors spend doing useful work (i.e., the time spent doing useful work divided by total time on all processors) It characterises the effectiveness of an algorithm on a system –For any problem size and any number of processors

Sept Observations on the model Execution time decreases with increasing P –Good! –But bounded from below by the cost of exchanging (two) array slices Implies a limit on the execution time regardless of P Execution time increases with increasing N, Z, t c, t s and t w

Sept Further observations Once you have an explicit expression for relative efficiency, Note: –Relative efficiency decreases with increasing P, t s and t w –Relative efficiency increases with increasing N, Z and t c The implications will be explored in the lab. Relative speedup is of limited use. –Alternatively, define speedup relative the time of the best known sequential algorithm (executing on the same machine). See the paper “Twelve ways to fool the masses when giving performance results on parallel computers” by Bailey, Supercomputing Review, Aug. 1991, on misuses of speedup.

Sept Absolute performance metrics Relative speed-up can be misleading! (Why?) Define absolute speed-up (efficiency) with reference to the sequential time of an implementation of the best known algorithm for the problem-at-hand: T ref Note: the best known algorithm may take an approach to solving the problem different to that of the parallel algorithm

Sept Finite differences example – part 2 Next we consider a 2D partition of the horizontal domain (partitioning both latitude and longitude)… –P processors in total in a square decomposition The number of grid points each task computes is now: ? (derive this…) Each task will exchange ? (derive this…) grid points with each of ? neighbours at each timestep

Sept Full 2D model The total cost for the 2D model is then: ?

Sept What does the 2D model tell us? How does it compare with the 1D case? –In terms of performance and scalability This will be the basis of the lab exercise