Download presentation
Presentation is loading. Please wait.
1
Seminar on parallel computing Goal: provide environment for exploration of parallel computing Driven by participants Weekly hour for discussion, show & tell Focus primarily on distributed memory computing on linux PC clusters Target audience: –Experience with linux computing & Fortran/C –Requires parallel computing for own studies 1 credit possible for completion of ‘proportional’ project
2
Main idea Distribute a job over multiple processing units Do bigger jobs than is possible on single machines Solve bigger problems faster Resources: e.g., www-jics.cs.utk.edu
3
Sequential limits Moore’s law Clock speed physically limited –Speed of light –Miniaturization; dissipation; quantum effects Memory addressing –32 bit words in PCs: 4 Gbyte RAM max.
4
Machine architecture: serial –Single processor –Hierarchical memory: Small number of registers on CPU Cache (L1/L2) RAM Disk (swap space) –Operations require multiple steps Fetch two floating point numbers from main memory Add and store Put back into main memory
5
Vector processing Speed up single instructions on vectors –E.g., while adding two floating point numbers fetch two new ones from main memory –Pushing vectors through the pipeline Useful in particular for long vectors Requires good memory control: –Bigger cache is better Common on most modern CPUs –Implemented in both hardware and software
6
SIMD Same instruction works simultaneously on different data sets Extension of vector computing Example: DO IN PARALLEL for i=1,n x(i) = a(i)*b(i) end DONE PARALLEL
7
MIMD Multiple instruction, multiple data Most flexible, encompasses SIMD/serial. Often best for ‘coarse grained’ parallelism Message passing Example: domain decomposition –Divide computational grid in equal chunks –Work on each domain with one CPU –Communicate boundary values when necessary
8
1976 Cray-1 at Los Alamos (vector) 1980s Control Data Cyber 205 (vector) 1980s Cray XMP –4 coupled Cray-1s 1985 Thinking Machines Connection Machine –SIMD, up to 64k processors 1984+ Nec/Fujitsu/Hitachi –Automatic vectorization Historical machines
9
Sun and SGI (90s) Scaling between desktops and compute servers –Use of both vectorization and large scale parallelization –RISC processors –Sparc for Sun –MIPS for SGI: PowerChallenge/Origin
10
Happy developments High performance Fortran / Fortran 90 Definitions for message passing languages –PVM –MPI Linux Performance increase of commodity CPUs Combination leads to affordable cluster computing
11
Who’s the biggest www.top500.org Linpack matrix-vector benchmarks June 2003: –Earth Simulator, Yokohama, NEC, 36 Tflops –Asci Q, Los Alamos, HP, 14 Tflops –Linux cluster, Livermore, 8 Tflops
12
Parallel approaches Embarrassingly parallel –“Monte Carlo” searches –SETI @ home Analyze lots of small time series Parallalize DO-loops in dominantly serial code Domain decomposition –Fully parallel –Requires complete rewrite/rethinking
13
Example: seismic wave propagation 3D spherical wave propagation modeled with high order finite element technique (Komatitsch and Tromp, GJI, 2002) Massively parallel computation on linux PC clusters Approx. 34 Gbyte RAM needed for 10 km average resolution www.geo.lsa.umich.edu/~keken/waves
14
Resolution Spectral elements: 10 km average resolution 4 th order interpolation functions Reasonable graphics resolution: 10 km or better 12 km: 1024 3 = 1 GB 6 km: 2048 3 = 8 GB
15
Simulated EQ (d=15 km) after 17 minutes 512x512 256 colors Positive only Truncated max Log10 scale Particle velocity P PP PKIKP SK PPP PKP PKPab
16
512x512 256 colors Positive only Truncated max Log10 scale Particle velocity Some S component R PKS PcSS PcS S SS
17
Resources at UM Various linux clusters in Geology –Agassiz (Ehlers) 8 Pentium 4 @ 2 Gbyte each –Panoramix (van Keken) 10 P3 @ 512 Gbyte –Trans (van Keken, Ehlers) 24 P4 @ 2 Gbyte SGIs –Origin 2000 (Stixrude, Lithgow, van Keken) Center for Advanced Computing @ UM –Athlon clusters (384 nodes @ 1 Gbyte each) –Opteron cluster (to be installed) NPACI
18
Software resources GNU and Intel compilers –Fortran/Fortran 90/C/C++ MPICH www-fp.mcs.anl.gov –Primary implementation of MPI –“Using MPI” 2 nd edition, Gropp et al., 1999 Sun Grid Engine Petsc www-fp.mcs.anl.gov –Toolbox for parallel scientific computing
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.