Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Scientific Computing on Linux Clusters Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002.

Similar presentations


Presentation on theme: "Introduction to Scientific Computing on Linux Clusters Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002."— Presentation transcript:

1 Introduction to Scientific Computing on Linux Clusters Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

2 Outline Why Clusters? Parallelization example - Game of Life performance metrics Ways to Fool the Masses summary Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

3 Why Clusters? Scientific computing has traditionionally been performed on fast, specialized machines Buzzword - Commodity Computing –clustering cheap, off-the-shelf processors –can achieve good performance at a low cost if the applications scale well Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

4 Clusters (2) 102 clusters in current Top 500 list http://www.top500.org/list/2001/06/ Resonable parallel efficiency is the key generally use message passing, even if there are shared-memory CPU’s in each box Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

5 Compilers Linux Fortran compilers (F90/95) –available from many vendors, e.g., Absoft, Compaq, Intel, Lahey, NAG, Portland Group, Salford –g77 is free, but is restricted to Fortran 77, relatively slow Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

6 Compilers (2) Intel offers free unsupported Fortran compiler for non-commercial purposes –full F95 –OpenMP http://www.intel.com/software/products/ compilers/f60l/noncom.htm Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

7 Compilers (3) http://www.polyhedron.com/

8 Compilers (4) Linux C/C++ compilers –gcc/g++ seems to be the standard, usually described as a good compiler –also available from vendors, e.g., Compaq, Intel, Portland Group Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

9 Parallelization of Scientific Codes Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

10 Domain Decomposition Typically perform operations on arrays –e.g., setting up and solving system of equations domain decomposition –arrays are broken into chunks, and each chunk is handled by a separate processor –processors operate simultaneously on their own chunks of the array Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

11 Other Methods Parallelzation also possible without domain decomposition –less common –e.g., process one set of inputs while reading another set of inputs from a file Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

12 Embarrassingly Parallel if operations are completely independent of one another, this is called embarrassingly parallel –e.g., initializing an array –some Monte Carlo simulations –not usually the case Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

13 Game of Life Early simple cellular automata –created by John Conway 2-D grid of cells –each has one of 2 states (“alive” or “dead”) –cells are initialized with some distribution of alive and dead states Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

14 Game of Life (2) at each time step states are modified based on states of adjacent cells (including diagonals) Rules of the game: –3 alive neighbors - alive –2 alive neighbors - no change –other - dead Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

15 Game of Life (3) Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

16 Game of Life (4) Parallelize on 2 processors –assign block of columns to each processor Problem - What happens at split?

17 Game of Life (5) Solution - Overlap cells Each time step, pass overlap data processor to processor

18 Message Passing Largest bottleneck to good parallel efficiency is usually message passing –much slower than number crunching set up your algorithm to minimize message passing minimize surface-to-volume ratio of subdomains Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

19 Domain Decomp. For this domain: To run on 2 processors, decompose like this: Not like this: Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

20 How to Pass Msgs. MPI is the recommended method –PVM may also be used MPICH –most common –free download http://www-unix.mcs.anl.gov/mpi/mpich/ others also avalable, e.g., LAM Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

21 How to Pass Msgs. some MPI tutorials –Boston University http://scv.bu.edu/Tutorials/MPI/ –NCSA http://pacont.ncsa.uiuc.edu:8900/public/MPI/ Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

22 Performance Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

23 Code Timing How well has code been parallelized? CPU time vs. wallclock time –both are seen in literature –I prefer wallclock only for dedicated processors CPU time doesn’t account for load imbalance unix time command Fortran system_clock subroutine MPI_Wtime Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

24 Parallel Speedup quantify how well we have parallelized our code S n = parallel speedup n = number of processors T 1 = time on 1 processor T n = time on n processors

25 Parallel Speedup (2)

26 Parallel Efficiency  n = parallel efficiency T 1 = time on 1 processor T n = time on n processors n = number of processors Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

27 Parallel Efficiency (2)

28 Parallel Efficiency (3) What is a “reasonable” level of parallel efficiency? Depends on –how much CPU time you have available –when the paper is due can think of (1-  as “wasted” CPU time my personal rule of thumb ~60% Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

29 Parallel Efficiency (4) Superlinear speedup –parallel efficiency > 1.0 –sometimes quoted in the literature –generally attributed to cache issues subdomains fit entirely in cache, entire domain does not this is very problem dependent be suspicious! Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

30 Amdahl’s Law Always some operations which are performed serially want a large fraction of code to execute in parallel Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

31 Amdahl’s Law (2) Let fraction of code that executes serially be denoted s Let fraction of code that executes in parallel be denoted p Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

32 Amdahl’s Law (3) Noting that p = (1-s) The parallel speedup is Amdahl’s Law Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

33 Amdahl’s Law (4) The parallel efficiency is Alternate version of Amdahl’s Law Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

34 Amdahl’s Law (5)

35 Amdahl’s Law (6) Should we despair? –No! –bigger machines solve bigger problems smaller value of s if you want to run on a large number of processors, try to minimize s Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

36 Ways to Fool the Masses full title: “Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Computers” Created by David Bailey of NASA Ames in 1991 following is selection of “ways,” some paraphrased Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

37 Ways to Fool (2) Scale problem size with number of processors Project results linearly –2 proc, 1 hr. 1800 proc., 1 sec. Present performance of kernel, represent as performance of application Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

38 Ways to Fool (3) Compare with old code on obsolete system Quote MFLOPS based on parallel implementation, not best serial implementation –increase no. operations rather than decreasing time Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

39 Ways to Fool (4) Quote parallel speedup making sure single-processor version is slow Mutilate the algorithm used in the parallel implementation to match the architecture –explicit vs. implicit PDE solvers Measure parallel times on dedicated system, serial times in busy environment Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

40 Ways to Fool (5) If all else fails, show pretty pictures and animated videos, and don’t talk about performance. Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

41 Summary Clusters are viable platforms for relatively low-cost scientific computing parallel considerations similar to other platforms MPI is a free, effective message passing API careful with performance timings Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002


Download ppt "Introduction to Scientific Computing on Linux Clusters Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002."

Similar presentations


Ads by Google