COMP60621 Designing for Parallelism

Slides:

Advertisements

Similar presentations

Load Balancing Parallel Applications on Heterogeneous Platforms.

Advertisements

P. Venkataraman Mechanical Engineering P. Venkataraman Rochester Institute of Technology DETC2014 – 35148: Continuous Solution for Boundary Value Problems.

Parallel System Performance CS 524 – High-Performance Computing.

11Sahalu JunaiduICS 573: High Performance Computing5.1 Analytical Modeling of Parallel Programs Sources of Overhead in Parallel Programs Performance Metrics.

ECE669 L4: Parallel Applications February 10, 2004 ECE 669 Parallel Computer Architecture Lecture 4 Parallel Applications.

Parallel Decomposition-based Contact Response Fehmi Cirak California Institute of Technology.

1 Lecture 4 Analytical Modeling of Parallel Programs Parallel Computing Fall 2008.

Lecture 5 Today’s Topics and Learning Objectives Quinn Chapter 7 Predict performance of parallel programs Understand barriers to higher performance.

Parallel System Performance CS 524 – High-Performance Computing.

Chapter 13 Finite Difference Methods: Outline Solving ordinary and partial differential equations Finite difference methods (FDM) vs Finite Element Methods.

Tutorial 5: Numerical methods - buildings Q1. Identify three principal differences between a response function method and a numerical method when both.

Performance Evaluation of Parallel Processing. Why Performance?

1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.

Combining the strengths of UMIST and The Victoria University of Manchester COMP60611 Fundamentals of Parallel and Distributed Systems Lecture 7 Scalability.

Finite Element Method.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

Atmospheric Moisture Vapor pressure (e, Pa) The partial pressure exerted by the molecules of vapor in the air. Saturation vapor pressure (e s, Pa ) The.

Chapter 3: A Quantative Basis for Design Real design tries to reach an optimal compromise between a number of thing Execution time Memory requirements.

Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec.

Performance Measurement. A Quantitative Basis for Design n Parallel programming is an optimization problem. n Must take into account several factors:

Sept COMP60611 Fundamentals of Concurrency Lab Exercise 2 Notes Notes on the finite difference performance model example – for the lab… Graham Riley,

Parallel Simulation of Continuous Systems: A Brief Introduction

October COMP60611 Fundamentals of Parallel and Distributed Systems Lecture 4 An Approach to Performance Modelling Len Freeman, Graham Riley Centre.

ATmospheric, Meteorological, and Environmental Technologies RAMS Parallel Processing Techniques.

CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.

Wind – Chill Index “A Calculus Approach” By Felix Garcia.

Lecture 3: Designing Parallel Programs. Methodological Design Designing and Building Parallel Programs by Ian Foster www-unix.mcs.anl.gov/dbpp.

Data Visualization Fall The Data as a Quantity Quantities can be classified in two categories: Intrinsically continuous (scientific visualization,

Parallel Computing and Parallel Computers

OPERATING SYSTEMS CS 3502 Fall 2017

Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming

High Altitude Low Opening?

Technology on the Cutting Edge of Weather Research and Forecasting

Air Masses and fronts An air mass is a large body of air that has similar temperature and moisture properties throughout. A front is defined as the transition.

CS 584 Lecture 3 How is the assignment going?.

Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Communication Costs (cont.) Dr. Xiao.

FTCS Explicit Finite Difference Method for Evaluating European Options

Parallel Computers.

Eurocode 1: Actions on structures –

17-Nov-18 Parallel 2D and 3D Acoustic Modeling Application for hybrid computing platform of PARAM Yuva II Abhishek Srivastava, Ashutosh Londhe*, Richa.

Chapter 6. Large Scale Optimization

Hidden Markov Models Part 2: Algorithms

Algorithm An algorithm is a finite set of steps required to solve a problem. An algorithm must have following properties: Input: An algorithm must have.

Objective of This Course

Modeling the Atmos.-Ocean System

COMP60611 Fundamentals of Parallel and Distributed Systems

COMP60621 Designing for Parallelism

For this type of flow, the stagnation temperature is constant, then

Introduction to Fluid Dynamics & Applications

COMP60611 Fundamentals of Parallel and Distributed Systems

COMP60621 Fundamentals of Parallel and Distributed Systems

COMP60611 Fundamentals of Parallel and Distributed Systems

Notes on Assignment 3 OpenMP Stencil Pattern

COMP60611 Fundamentals of Parallel and Distributed Systems

topic16_cylinder_flow_relaxation

COMP60621 Designing for Parallelism

PERFORMANCE MEASURES. COMPUTATIONAL MODELS Equal Duration Model:  It is assumed that a given task can be divided into n equal subtasks, each of which.

Parallel Computing and Parallel Computers

Jacobi Project Salvatore Orlando.

Stencil Pattern ITCS 4/5145 Parallel computing, UNC-Charlotte, B. Wilkinson Oct 14, 2014 slides6b.ppt 1.

Stencil Pattern ITCS 4/5145 Parallel computing, UNC-Charlotte, B. Wilkinson Jan 28,

COMP60611 Fundamentals of Parallel and Distributed Systems

Stencil Pattern ITCS 4/5145 Parallel computing, UNC-Charlotte, B. Wilkinson StencilPattern.ppt Oct 14,

World Geography 3202 Unit 2 Climate Patterns.

COMP60611 Fundamentals of Parallel and Distributed Systems

Performance Measurement and Analysis

Chapter 6. Large Scale Optimization

Presentation transcript:

COMP60621 Designing for Parallelism Lab Exercise 2 Notes Notes on the finite difference performance model example – for the lab… Graham Riley, John Gurd Centre for Novel Computing School of Computer Science University of Manchester

Example: Global Atmosphere model Consider a three dimensional model of the atmosphere The model computes values of key atmospheric variables such as temperature, wind speed, pressure and moisture content. The physical processes involved in the atmosphere are described by a set of partial differential equations, in this case describing the basic fluid dynamical behaviour. 5 February, 2019

Numerical model The behaviour of the equations in a continuous space is approximated by their behaviour on a finite set of regularly spaced grid points in the space. The equations are integrated, from an initial state, in time using a fixed, discrete timestep, typically, 20 mins. The grid points are located on a rectangular latitude-longitude-height grid of size N_x by N_y by N_z. There are usually around 30 levels in the atmosphere model (N_z = 30). N_x (latitude points) is usually less than N_y (longitude) with typical values for N_y being in the range 100-500 (low to high resolution). Models may cover a limited area (limited area model, LAM) of the globe or the entire globe (global circulation model, GCM). 500 grid points on the equator corresponds to a grid-spacing of approximately 55 miles. 5 February, 2019

Dynamics and physics We assume the model uses a finite difference method to update grid values, with a five-point stencil in the horizontal (x- and y-directions) to compute atmospheric motion, and a three-point stencil in the vertical (z-direction) The finite difference computations are concerned with the movement, or dynamics, of air in the atmosphere. In additions to the dynamics, the atmosphere model includes algorithms to simulate various physics processes, such as radiation, convection and precipitation. The data dependencies in physics calculations are normally (in most models) restricted to within vertical columns, by design of the modelling equations. 5 February, 2019

The finite difference stencil at a point z, height y, longitutde x, latitude 5 February, 2019

Finite difference example – part 1 Assume a grid of N  N  Z grid points. Note that, in this case, the parameter N defines the problem size, but is not actually the problem size itself. Consider first a 1D partition in the horizontal plane (in longitude) so that each task computes N  N/P  Z grid points per timestep (we only consider the cost of one timestep since, in this problem, all timesteps are assumed equivalent). Thus, the total computation time for one timestep is: Where tc is the (average) time of computation for one grid point Assuming all processors are the same! 5 February, 2019

1D partition in the horizontal latitude Proc 1 Proc 2 longitude 5 February, 2019

Communication and idle costs The stencil is a 5-point stencil, so each task will exchange a total of NZ points with each of two neighbours Note we assume cyclic boundary conditions This gives a total communication cost of: ts – comms startup cost, tw – cost per ‘word’ to transmit message If we assume P divides N, there will be no idle time 5 February, 2019

Total cost (i.e. the model) The total cost is then given by (assuming no idling): i.e. Now, what can we do with this model? 5 February, 2019

Performance metrics: Speed-up and Efficiency - reminder Define relative speedup as the ratio of the execution time on one processor to that on P processors: Define relative efficiency as: This is the fraction of time that processors spend doing useful work (i.e., the time spent doing useful work divided by total time on all processors) It characterises the effectiveness of an algorithm on a system For any problem size and any number of processors 5 February, 2019

Observations on the model Execution time decreases with increasing P Good! But bounded from below by the cost of exchanging (two) array slices Implies a limit on the execution time regardless of P Execution time increases with increasing N, Z, tc, ts and tw 5 February, 2019

Further observations Once you have an explicit expression for relative efficiency, Note: Relative efficiency decreases with increasing P, ts and tw Relative efficiency increases with increasing N, Z and tc The implications will be explored in the lab. Relative speedup is of limited use. Alternatively, define speedup relative to the time of the best known sequential algorithm (executing on the same machine). See the paper “Twelve ways to fool the masses when giving performance results on parallel computers” by Bailey, Supercomputing Review, Aug. 1991, on misuses of speedup. 5 February, 2019

Absolute performance metrics Relative speed-up can be misleading! (Why?) Define absolute speed-up (efficiency) with reference to the sequential time of an implementation of the best known algorithm for the problem-at-hand: Tref Note: the best known algorithm may take an approach to solving the problem different to that of the parallel algorithm 5 February, 2019

Finite differences example – part 2 Next we consider a 2D partition of the horizontal domain (partitioning both latitude and longitude)… P processors in total in a square decomposition The number of grid points each task computes is now: ? (derive this…) Each task will exchange ? (derive this…) grid points with each of ? neighbours at each timestep 5 February, 2019

Full 2D model The total cost for the 2D model is then: ? 5 February, 2019

What does the 2D model tell us? How does it compare with the 1D case? In terms of performance and scalability This will be the basis of the lab exercise 5 February, 2019