Presentation is loading. Please wait.

Presentation is loading. Please wait.

COMP60621 Designing for Parallelism

Similar presentations


Presentation on theme: "COMP60621 Designing for Parallelism"— Presentation transcript:

1 COMP60621 Designing for Parallelism
Lab Exercise 2 Notes Notes on the finite difference performance model example – for the lab… Graham Riley, John Gurd Centre for Novel Computing School of Computer Science University of Manchester

2 Example: Global Atmosphere model
Consider a three dimensional model of the atmosphere The model computes values of key atmospheric variables such as temperature, wind speed, pressure and moisture content. The physical processes involved in the atmosphere are described by a set of partial differential equations, in this case describing the basic fluid dynamical behaviour. 5 February, 2019

3 Numerical model The behaviour of the equations in a continuous space is approximated by their behaviour on a finite set of regularly spaced grid points in the space. The equations are integrated, from an initial state, in time using a fixed, discrete timestep, typically, 20 mins. The grid points are located on a rectangular latitude-longitude-height grid of size N_x by N_y by N_z. There are usually around 30 levels in the atmosphere model (N_z = 30). N_x (latitude points) is usually less than N_y (longitude) with typical values for N_y being in the range (low to high resolution). Models may cover a limited area (limited area model, LAM) of the globe or the entire globe (global circulation model, GCM). 500 grid points on the equator corresponds to a grid-spacing of approximately 55 miles. 5 February, 2019

4 Dynamics and physics We assume the model uses a finite difference method to update grid values, with a five-point stencil in the horizontal (x- and y-directions) to compute atmospheric motion, and a three-point stencil in the vertical (z-direction) The finite difference computations are concerned with the movement, or dynamics, of air in the atmosphere. In additions to the dynamics, the atmosphere model includes algorithms to simulate various physics processes, such as radiation, convection and precipitation. The data dependencies in physics calculations are normally (in most models) restricted to within vertical columns, by design of the modelling equations. 5 February, 2019

5 The finite difference stencil at a point
z, height y, longitutde x, latitude 5 February, 2019

6 Finite difference example – part 1
Assume a grid of N  N  Z grid points. Note that, in this case, the parameter N defines the problem size, but is not actually the problem size itself. Consider first a 1D partition in the horizontal plane (in longitude) so that each task computes N  N/P  Z grid points per timestep (we only consider the cost of one timestep since, in this problem, all timesteps are assumed equivalent). Thus, the total computation time for one timestep is: Where tc is the (average) time of computation for one grid point Assuming all processors are the same! 5 February, 2019

7 1D partition in the horizontal
latitude Proc 1 Proc 2 longitude 5 February, 2019

8 Communication and idle costs
The stencil is a 5-point stencil, so each task will exchange a total of NZ points with each of two neighbours Note we assume cyclic boundary conditions This gives a total communication cost of: ts – comms startup cost, tw – cost per ‘word’ to transmit message If we assume P divides N, there will be no idle time 5 February, 2019

9 Total cost (i.e. the model)
The total cost is then given by (assuming no idling): i.e. Now, what can we do with this model? 5 February, 2019

10 Performance metrics: Speed-up and Efficiency - reminder
Define relative speedup as the ratio of the execution time on one processor to that on P processors: Define relative efficiency as: This is the fraction of time that processors spend doing useful work (i.e., the time spent doing useful work divided by total time on all processors) It characterises the effectiveness of an algorithm on a system For any problem size and any number of processors 5 February, 2019

11 Observations on the model
Execution time decreases with increasing P Good! But bounded from below by the cost of exchanging (two) array slices Implies a limit on the execution time regardless of P Execution time increases with increasing N, Z, tc, ts and tw 5 February, 2019

12 Further observations Once you have an explicit expression for relative efficiency, Note: Relative efficiency decreases with increasing P, ts and tw Relative efficiency increases with increasing N, Z and tc The implications will be explored in the lab. Relative speedup is of limited use. Alternatively, define speedup relative to the time of the best known sequential algorithm (executing on the same machine). See the paper “Twelve ways to fool the masses when giving performance results on parallel computers” by Bailey, Supercomputing Review, Aug. 1991, on misuses of speedup. 5 February, 2019

13 Absolute performance metrics
Relative speed-up can be misleading! (Why?) Define absolute speed-up (efficiency) with reference to the sequential time of an implementation of the best known algorithm for the problem-at-hand: Tref Note: the best known algorithm may take an approach to solving the problem different to that of the parallel algorithm 5 February, 2019

14 Finite differences example – part 2
Next we consider a 2D partition of the horizontal domain (partitioning both latitude and longitude)… P processors in total in a square decomposition The number of grid points each task computes is now: ? (derive this…) Each task will exchange ? (derive this…) grid points with each of ? neighbours at each timestep 5 February, 2019

15 Full 2D model The total cost for the 2D model is then: ?
5 February, 2019

16 What does the 2D model tell us?
How does it compare with the 1D case? In terms of performance and scalability This will be the basis of the lab exercise 5 February, 2019


Download ppt "COMP60621 Designing for Parallelism"

Similar presentations


Ads by Google