Download presentation
Presentation is loading. Please wait.
Published byHerbert Hines Modified over 9 years ago
1
On the Performance of PC Clusters in Solving Partial Differential Equations Xing Cai Åsmund Ødegård Department of Informatics University of Oslo Norway
2
Outline of the talk IntroductionIntroduction Beowulf clusters – cost effective approach to solving PDEsBeowulf clusters – cost effective approach to solving PDEs Performance analysis of a Linux clusterPerformance analysis of a Linux cluster Numerical experiments & measurementsNumerical experiments & measurements
3
A generic finite element PDE solver Time stepping t 0, t 1, t 2 …Time stepping t 0, t 1, t 2 … Spatial discretization on computational gridSpatial discretization on computational grid Solution of nonlinear problemsSolution of nonlinear problems Solution of linearized problemsSolution of linearized problems Iterative solution of Ax=bIterative solution of Ax=b
4
An observation The computation-intensive part is the iterative solution of Ax=bThe computation-intensive part is the iterative solution of Ax=b A parallel finite element PDE solver needs to run the linear algebra kernels in parallelA parallel finite element PDE solver needs to run the linear algebra kernels in parallel –vector addition –inner-product of two vectors –matrix-vector product Two types of inter-processor communicationTwo types of inter-processor communication Ratio computation/communication is highRatio computation/communication is high Relatively tolerant of slow communicationRelatively tolerant of slow communication
5
A natural parallelization of PDE solvers The global solution domain is partitioned into many smaller sub-domainsThe global solution domain is partitioned into many smaller sub-domains One sub-domain works as a ”unit”, with its sub-matrices and sub-vectorsOne sub-domain works as a ”unit”, with its sub-matrices and sub-vectors No need to create global matrices and vectors physicallyNo need to create global matrices and vectors physically The global linear algebra operations can be realized by local operations + inter- processor communicationThe global linear algebra operations can be realized by local operations + inter- processor communication
6
Linear-algebra level parallelization A SPMD modelA SPMD model Reuse of existing code for local linear algebra operationsReuse of existing code for local linear algebra operations Need new code for the parallelization specific tasksNeed new code for the parallelization specific tasks –grid partition (non-overlapping, overlapping) –inter-processor communication routines
7
Object orientation An add-on ”toolbox” containing all the parallelization specific codesAn add-on ”toolbox” containing all the parallelization specific codes The ”toolbox” has many high-level routines, hides the low-level MPI detailsThe ”toolbox” has many high-level routines, hides the low-level MPI details The existing sequential libraries are slightly modified to include a ”dummy” interface, thus incorporating ”fake” inter-processor communicationsThe existing sequential libraries are slightly modified to include a ”dummy” interface, thus incorporating ”fake” inter-processor communications A seamless coupling between the huge sequential libraries and the add-on toolboxA seamless coupling between the huge sequential libraries and the add-on toolbox
8
Diffpack O-O software environment for scientific computation (C++)O-O software environment for scientific computation (C++) Rich collection of PDE solution components - portable, flexible, extensibleRich collection of PDE solution components - portable, flexible, extensible http://www.nobjects.comhttp://www.nobjects.com H.P.Langtangen, Computational Partial Differential Equations, Springer 1999H.P.Langtangen, Computational Partial Differential Equations, Springer 1999
9
Straightforward parallelization Develop a sequential simulator, without paying attention to parallelismDevelop a sequential simulator, without paying attention to parallelism Follow the Diffpack coding standardsFollow the Diffpack coding standards Use the add-on toolbox for parallel computingUse the add-on toolbox for parallel computing Add a few new statements for transformation to a parallel simulatorAdd a few new statements for transformation to a parallel simulator
10
A Linux cluster 48 Pentium-III 500MHz procs (24 nodes)48 Pentium-III 500MHz procs (24 nodes) 512 MB memory per node512 MB memory per node One 3com905B network card per nodeOne 3com905B network card per node Fast ethernet 100 Mbit/sFast ethernet 100 Mbit/s 26-port Cisco Catalyst 2926 switch26-port Cisco Catalyst 2926 switch Price: around $60,000Price: around $60,000
11
Parallel simulation of 3D acoustic field 3D nonlinear model
12
3D nonlinear acoustic field simulation CPUs Origin 2000 Linux Cluster CPU-timeSpeedupCPU-timeSpeedup 28670.8N/A6681.5N/A 44726.53.753545.93.77 82404.27.211881.17.10 161325.613.0953.8914.0 241043.716.6681.7719.6 32725.2323.9563.5423.7 48557.6131.1673.7719.8 Comparison between Origin 2000 and Linux cluster 1,030,301 grid points
13
Impressible Navier-Stokes Numerical strategy: operator splittingNumerical strategy: operator splitting Calculation of an intermediate velocity in a predictor-corrector wayCalculation of an intermediate velocity in a predictor-corrector way Solution of a Poisson equationSolution of a Poisson equation Correction of the intermediate velocityCorrection of the intermediate velocity
14
Impressible Navier-Stokes Explicit schemes for predicting and correcting the velocity Implicit solution of the pressure by CG PCPU-timeSpeedupEfficiency 1665.45N/A 2329.572.021.01 4166.554.001.00 889.987.400.92 1648.9613.590.85 2434.8519.090.80 4834.2219.450.41
15
3D nonlinear water waves Fully nonlinear 3D water waves Primary unknowns:
16
3D nonlinear water waves Global 3D grid: 49x49x41Global 3D grid: 49x49x41 Global solver: CG + overlapping Schwarz prec.Global solver: CG + overlapping Schwarz prec. Multigrid V-cycle as subdomain solverMultigrid V-cycle as subdomain solver CPU measurement of a total of 32 time stepsCPU measurement of a total of 32 time steps Parallel simulation on the Linux clusterParallel simulation on the Linux cluster
17
Summary OOP+MPI give portable parallel softwareOOP+MPI give portable parallel software Beowulf clusters suit well for solving PDEsBeowulf clusters suit well for solving PDEs Applicable to a wide range of PDEsApplicable to a wide range of PDEs Performance: satisfactory speed-upPerformance: satisfactory speed-up Issues need to be considered for further improvementIssues need to be considered for further improvement
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.