Download presentation
Presentation is loading. Please wait.
1
Introduction to Scientific Computing on Linux Clusters Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
2
Outline Why Clusters? Parallelization example - Game of Life performance metrics Ways to Fool the Masses summary Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
3
Why Clusters? Scientific computing has traditionionally been performed on fast, specialized machines Buzzword - Commodity Computing –clustering cheap, off-the-shelf processors –can achieve good performance at a low cost if the applications scale well Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
4
Clusters (2) 102 clusters in current Top 500 list http://www.top500.org/list/2001/06/ Resonable parallel efficiency is the key generally use message passing, even if there are shared-memory CPU’s in each box Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
5
Compilers Linux Fortran compilers (F90/95) –available from many vendors, e.g., Absoft, Compaq, Intel, Lahey, NAG, Portland Group, Salford –g77 is free, but is restricted to Fortran 77, relatively slow Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
6
Compilers (2) Intel offers free unsupported Fortran compiler for non-commercial purposes –full F95 –OpenMP http://www.intel.com/software/products/ compilers/f60l/noncom.htm Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
7
Compilers (3) http://www.polyhedron.com/
8
Compilers (4) Linux C/C++ compilers –gcc/g++ seems to be the standard, usually described as a good compiler –also available from vendors, e.g., Compaq, Intel, Portland Group Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
9
Parallelization of Scientific Codes Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
10
Domain Decomposition Typically perform operations on arrays –e.g., setting up and solving system of equations domain decomposition –arrays are broken into chunks, and each chunk is handled by a separate processor –processors operate simultaneously on their own chunks of the array Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
11
Other Methods Parallelzation also possible without domain decomposition –less common –e.g., process one set of inputs while reading another set of inputs from a file Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
12
Embarrassingly Parallel if operations are completely independent of one another, this is called embarrassingly parallel –e.g., initializing an array –some Monte Carlo simulations –not usually the case Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
13
Game of Life Early simple cellular automata –created by John Conway 2-D grid of cells –each has one of 2 states (“alive” or “dead”) –cells are initialized with some distribution of alive and dead states Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
14
Game of Life (2) at each time step states are modified based on states of adjacent cells (including diagonals) Rules of the game: –3 alive neighbors - alive –2 alive neighbors - no change –other - dead Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
15
Game of Life (3) Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
16
Game of Life (4) Parallelize on 2 processors –assign block of columns to each processor Problem - What happens at split?
17
Game of Life (5) Solution - Overlap cells Each time step, pass overlap data processor to processor
18
Message Passing Largest bottleneck to good parallel efficiency is usually message passing –much slower than number crunching set up your algorithm to minimize message passing minimize surface-to-volume ratio of subdomains Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
19
Domain Decomp. For this domain: To run on 2 processors, decompose like this: Not like this: Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
20
How to Pass Msgs. MPI is the recommended method –PVM may also be used MPICH –most common –free download http://www-unix.mcs.anl.gov/mpi/mpich/ others also avalable, e.g., LAM Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
21
How to Pass Msgs. some MPI tutorials –Boston University http://scv.bu.edu/Tutorials/MPI/ –NCSA http://pacont.ncsa.uiuc.edu:8900/public/MPI/ Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
22
Performance Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
23
Code Timing How well has code been parallelized? CPU time vs. wallclock time –both are seen in literature –I prefer wallclock only for dedicated processors CPU time doesn’t account for load imbalance unix time command Fortran system_clock subroutine MPI_Wtime Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
24
Parallel Speedup quantify how well we have parallelized our code S n = parallel speedup n = number of processors T 1 = time on 1 processor T n = time on n processors
25
Parallel Speedup (2)
26
Parallel Efficiency n = parallel efficiency T 1 = time on 1 processor T n = time on n processors n = number of processors Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
27
Parallel Efficiency (2)
28
Parallel Efficiency (3) What is a “reasonable” level of parallel efficiency? Depends on –how much CPU time you have available –when the paper is due can think of (1- as “wasted” CPU time my personal rule of thumb ~60% Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
29
Parallel Efficiency (4) Superlinear speedup –parallel efficiency > 1.0 –sometimes quoted in the literature –generally attributed to cache issues subdomains fit entirely in cache, entire domain does not this is very problem dependent be suspicious! Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
30
Amdahl’s Law Always some operations which are performed serially want a large fraction of code to execute in parallel Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
31
Amdahl’s Law (2) Let fraction of code that executes serially be denoted s Let fraction of code that executes in parallel be denoted p Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
32
Amdahl’s Law (3) Noting that p = (1-s) The parallel speedup is Amdahl’s Law Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
33
Amdahl’s Law (4) The parallel efficiency is Alternate version of Amdahl’s Law Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
34
Amdahl’s Law (5)
35
Amdahl’s Law (6) Should we despair? –No! –bigger machines solve bigger problems smaller value of s if you want to run on a large number of processors, try to minimize s Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
36
Ways to Fool the Masses full title: “Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Computers” Created by David Bailey of NASA Ames in 1991 following is selection of “ways,” some paraphrased Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
37
Ways to Fool (2) Scale problem size with number of processors Project results linearly –2 proc, 1 hr. 1800 proc., 1 sec. Present performance of kernel, represent as performance of application Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
38
Ways to Fool (3) Compare with old code on obsolete system Quote MFLOPS based on parallel implementation, not best serial implementation –increase no. operations rather than decreasing time Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
39
Ways to Fool (4) Quote parallel speedup making sure single-processor version is slow Mutilate the algorithm used in the parallel implementation to match the architecture –explicit vs. implicit PDE solvers Measure parallel times on dedicated system, serial times in busy environment Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
40
Ways to Fool (5) If all else fails, show pretty pictures and animated videos, and don’t talk about performance. Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
41
Summary Clusters are viable platforms for relatively low-cost scientific computing parallel considerations similar to other platforms MPI is a free, effective message passing API careful with performance timings Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.