On the Performance of PC Clusters in Solving Partial Differential Equations Xing Cai Åsmund Ødegård Department of Informatics University of Oslo Norway.

Slides:



Advertisements
Similar presentations
Parallel Computation of the 2D Laminar Axisymmetric Coflow Nonpremixed Flames Qingan Andy Zhang PhD Candidate Department of Mechanical and Industrial Engineering.
Advertisements

Optimized Parallel Approach for 3D Modelling of Forest Fire Behaviour G. Accary, O. Bessonov, D. Fougère, S. Meradji, D. Morvan Institute for Problems.
Parallel Solution of Navier Stokes Equations Xing Cai Dept. of Informatics University of Oslo.
Extending the capability of TOUGHREACT simulator using parallel computing Application to environmental problems.
Problem Uncertainty quantification (UQ) is an important scientific driver for pushing to the exascale, potentially enabling rigorous and accurate predictive.
Multilevel Incomplete Factorizations for Non-Linear FE problems in Geomechanics DMMMSA – University of Padova Department of Mathematical Methods and Models.
A Finite Differencing Solution for Evaluating European Prices Computational Finance ~cs 757 Project # CFWin03-33 May 30, 2003 Presented by: Vishnu K Narayanasami.
ECE669 L4: Parallel Applications February 10, 2004 ECE 669 Parallel Computer Architecture Lecture 4 Parallel Applications.
CS267 L12 Sources of Parallelism(3).1 Demmel Sp 1999 CS 267 Applications of Parallel Computers Lecture 12: Sources of Parallelism and Locality (Part 3)
Landscape Erosion Kirsten Meeker
Network and Grid Computing –Modeling, Algorithms, and Software Mo Mu Joint work with Xiao Hong Zhu, Falcon Siu.
Cache-Optimal Parallel Solution of PDEs Ch. Zenger Informatik V, TU München Finite Element Solution of PDEs Christoph Zenger Nadine Dieminger, Frank Günther,
Introduction to Scientific Computing on Linux Clusters Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002.
Monica Garika Chandana Guduru. METHODS TO SOLVE LINEAR SYSTEMS Direct methods Gaussian elimination method LU method for factorization Simplex method of.
Chapter 13 Finite Difference Methods: Outline Solving ordinary and partial differential equations Finite difference methods (FDM) vs Finite Element Methods.
Direct and iterative sparse linear solvers applied to groundwater flow simulations Matrix Analysis and Applications October 2007.
© 2011 Autodesk Freely licensed for use by educational institutions. Reuse and changes require a note indicating that content has been modified from the.
1 Parallel Simulations of Underground Flow in Porous and Fractured Media H. Mustapha 1,2, A. Beaudoin 1, J. Erhel 1 and J.R. De Dreuzy IRISA – INRIA.
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
Parallel Performance of Hierarchical Multipole Algorithms for Inductance Extraction Ananth Grama, Purdue University Vivek Sarin, Texas A&M University Hemant.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
Processing of a CAD/CAE Jobs in grid environment using Elmer Electronics Group, Physics Department, Faculty of Science, Ain Shams University, Mohamed Hussein.
Non-uniformly Communicating Non-contiguous Data: A Case Study with PETSc and MPI P. Balaji, D. Buntinas, S. Balay, B. Smith, R. Thakur and W. Gropp Mathematics.
1 Using the PETSc Parallel Software library in Developing MPP Software for Calculating Exact Cumulative Reaction Probabilities for Large Systems (M. Minkoff.
Fast Thermal Analysis on GPU for 3D-ICs with Integrated Microchannel Cooling Zhuo Fen and Peng Li Department of Electrical and Computer Engineering, {Michigan.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
LBNLGXTBR FY2001 Oil and Gas Recovery Technology Review Meeting Diagnostic and Imaging High Speed 3D Hybrid Elastic Seismic Modeling Lawrence Berkeley.
A Finite Differencing Solution for Evaluating European Prices Computational Finance ~cs 757 Project # CFWin03-33 May 30, 2003 Presented by: Vishnu K Narayanasami.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
ANS 1998 Winter Meeting DOE 2000 Numerics Capabilities 1 Barry Smith Argonne National Laboratory DOE 2000 Numerics Capability
ParCFD Parallel computation of pollutant dispersion in industrial sites Julien Montagnier Marc Buffat David Guibert.
CFD Lab - Department of Engineering - University of Liverpool Ken Badcock & Mark Woodgate Department of Engineering University of Liverpool Liverpool L69.
Domain Decomposed Parallel Heat Distribution Problem in Two Dimensions Yana Kortsarts Jeff Rufinus Widener University Computer Science Department.
1 Grid-Aware Numerical Libraries To enable the use of the Grid as a seamless computing environment.
Parallelization of 2D Lid-Driven Cavity Flow
A Software Strategy for Simple Parallelization of Sequential PDE Solvers Hans Petter Langtangen Xing Cai Dept. of Informatics University of Oslo.
Introduction: Lattice Boltzmann Method for Non-fluid Applications Ye Zhao.
Parallel Solution of the Poisson Problem Using MPI
Parallelizing finite element PDE solvers in an object-oriented framework Xing Cai Department of Informatics University of Oslo.
Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo.
Cracow Grid Workshop, November 5-6, 2001 Concepts for implementing adaptive finite element codes for grid computing Krzysztof Banaś, Joanna Płażek Cracow.
High performance computing for Darcy compositional single phase fluid flow simulations L.Agélas, I.Faille, S.Wolf, S.Réquena Institut Français du Pétrole.
23/5/20051 ICCS congres, Atlanta, USA May 23, 2005 The Deflation Accelerated Schwarz Method for CFD C. Vuik Delft University of Technology
Discretization for PDEs Chunfang Chen,Danny Thorne Adam Zornes, Deng Li CS 521 Feb., 9,2006.
Linear Algebra Operators for GPU Implementation of Numerical Algorithms J. Krüger R. Westermann computer graphics & visualization Technical University.
MA/CS 471 Lecture 15, Fall 2002 Introduction to Graph Partitioning.
An Object-Oriented Software Framework for Building Parallel Navier-Stokes Solvers Xing Cai Hans Petter Langtangen Otto Munthe University of Oslo.
A Software Framework for Easy Parallelization of PDE Solvers Hans Petter Langtangen Xing Cai Dept. of Informatics University of Oslo.
C OMPUTATIONAL R ESEARCH D IVISION 1 Defining Software Requirements for Scientific Computing Phillip Colella Applied Numerical Algorithms Group Lawrence.
HYDROGRID J. Erhel – October 2004 Components and grids  Deployment of components  CORBA model  Parallel components with GridCCM Homogeneous cluster.
Brain (Tech) NCRR Overview Magnetic Leadfields and Superquadric Glyphs.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
A Parallel Linear Solver for Block Circulant Linear Systems with Applications to Acoustics Suzanne Shontz, University of Kansas Ken Czuprynski, University.
Adaptive grid refinement. Adaptivity in Diffpack Error estimatorError estimator Adaptive refinementAdaptive refinement A hierarchy of unstructured gridsA.
Multipole-Based Preconditioners for Sparse Linear Systems. Ananth Grama Purdue University. Supported by the National Science Foundation.
Parallel Computing Activities at the Group of Scientific Software Xing Cai Department of Informatics University of Oslo.
A Software Framework for Easy Parallelization of PDE Solvers Hans Petter Langtangen Xing Cai Dept. of Informatics University of Oslo.
ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. Chapter 5 Distributed Memory Parallel Computing v9.0.
Hybrid Parallel Implementation of The DG Method Advanced Computing Department/ CAAM 03/03/2016 N. Chaabane, B. Riviere, H. Calandra, M. Sekachev, S. Hamlaoui.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Evolution at CERN E. Da Riva1 CFD team supports CERN development 19 May 2011.
Porting the MIT Global Circulation Model on the CellBE Processor
Xing Cai University of Oslo
HPC Modeling of the Power Grid
L Ge, L Lee, A. Candel, C Ng, K Ko, SLAC
GENERAL VIEW OF KRATOS MULTIPHYSICS
A Software Framework for Easy Parallelization of PDE Solvers
Parallelizing Unstructured FEM Computation
Ph.D. Thesis Numerical Solution of PDEs and Their Object-oriented Parallel Implementations Xing Cai October 26, 1998.
Presentation transcript:

On the Performance of PC Clusters in Solving Partial Differential Equations Xing Cai Åsmund Ødegård Department of Informatics University of Oslo Norway

Outline of the talk IntroductionIntroduction Beowulf clusters – cost effective approach to solving PDEsBeowulf clusters – cost effective approach to solving PDEs Performance analysis of a Linux clusterPerformance analysis of a Linux cluster Numerical experiments & measurementsNumerical experiments & measurements

A generic finite element PDE solver Time stepping t 0, t 1, t 2 …Time stepping t 0, t 1, t 2 … Spatial discretization on computational gridSpatial discretization on computational grid Solution of nonlinear problemsSolution of nonlinear problems Solution of linearized problemsSolution of linearized problems Iterative solution of Ax=bIterative solution of Ax=b

An observation The computation-intensive part is the iterative solution of Ax=bThe computation-intensive part is the iterative solution of Ax=b A parallel finite element PDE solver needs to run the linear algebra kernels in parallelA parallel finite element PDE solver needs to run the linear algebra kernels in parallel –vector addition –inner-product of two vectors –matrix-vector product Two types of inter-processor communicationTwo types of inter-processor communication Ratio computation/communication is highRatio computation/communication is high Relatively tolerant of slow communicationRelatively tolerant of slow communication

A natural parallelization of PDE solvers The global solution domain is partitioned into many smaller sub-domainsThe global solution domain is partitioned into many smaller sub-domains One sub-domain works as a ”unit”, with its sub-matrices and sub-vectorsOne sub-domain works as a ”unit”, with its sub-matrices and sub-vectors No need to create global matrices and vectors physicallyNo need to create global matrices and vectors physically The global linear algebra operations can be realized by local operations + inter- processor communicationThe global linear algebra operations can be realized by local operations + inter- processor communication

Linear-algebra level parallelization A SPMD modelA SPMD model Reuse of existing code for local linear algebra operationsReuse of existing code for local linear algebra operations Need new code for the parallelization specific tasksNeed new code for the parallelization specific tasks –grid partition (non-overlapping, overlapping) –inter-processor communication routines

Object orientation An add-on ”toolbox” containing all the parallelization specific codesAn add-on ”toolbox” containing all the parallelization specific codes The ”toolbox” has many high-level routines, hides the low-level MPI detailsThe ”toolbox” has many high-level routines, hides the low-level MPI details The existing sequential libraries are slightly modified to include a ”dummy” interface, thus incorporating ”fake” inter-processor communicationsThe existing sequential libraries are slightly modified to include a ”dummy” interface, thus incorporating ”fake” inter-processor communications A seamless coupling between the huge sequential libraries and the add-on toolboxA seamless coupling between the huge sequential libraries and the add-on toolbox

Diffpack O-O software environment for scientific computation (C++)O-O software environment for scientific computation (C++) Rich collection of PDE solution components - portable, flexible, extensibleRich collection of PDE solution components - portable, flexible, extensible H.P.Langtangen, Computational Partial Differential Equations, Springer 1999H.P.Langtangen, Computational Partial Differential Equations, Springer 1999

Straightforward parallelization Develop a sequential simulator, without paying attention to parallelismDevelop a sequential simulator, without paying attention to parallelism Follow the Diffpack coding standardsFollow the Diffpack coding standards Use the add-on toolbox for parallel computingUse the add-on toolbox for parallel computing Add a few new statements for transformation to a parallel simulatorAdd a few new statements for transformation to a parallel simulator

A Linux cluster 48 Pentium-III 500MHz procs (24 nodes)48 Pentium-III 500MHz procs (24 nodes) 512 MB memory per node512 MB memory per node One 3com905B network card per nodeOne 3com905B network card per node Fast ethernet 100 Mbit/sFast ethernet 100 Mbit/s 26-port Cisco Catalyst 2926 switch26-port Cisco Catalyst 2926 switch Price: around $60,000Price: around $60,000

Parallel simulation of 3D acoustic field 3D nonlinear model

3D nonlinear acoustic field simulation CPUs Origin 2000 Linux Cluster CPU-timeSpeedupCPU-timeSpeedup N/A6681.5N/A Comparison between Origin 2000 and Linux cluster 1,030,301 grid points

Impressible Navier-Stokes Numerical strategy: operator splittingNumerical strategy: operator splitting Calculation of an intermediate velocity in a predictor-corrector wayCalculation of an intermediate velocity in a predictor-corrector way Solution of a Poisson equationSolution of a Poisson equation Correction of the intermediate velocityCorrection of the intermediate velocity

Impressible Navier-Stokes Explicit schemes for predicting and correcting the velocity Implicit solution of the pressure by CG PCPU-timeSpeedupEfficiency N/A

3D nonlinear water waves Fully nonlinear 3D water waves Primary unknowns:

3D nonlinear water waves Global 3D grid: 49x49x41Global 3D grid: 49x49x41 Global solver: CG + overlapping Schwarz prec.Global solver: CG + overlapping Schwarz prec. Multigrid V-cycle as subdomain solverMultigrid V-cycle as subdomain solver CPU measurement of a total of 32 time stepsCPU measurement of a total of 32 time steps Parallel simulation on the Linux clusterParallel simulation on the Linux cluster

Summary OOP+MPI give portable parallel softwareOOP+MPI give portable parallel software Beowulf clusters suit well for solving PDEsBeowulf clusters suit well for solving PDEs Applicable to a wide range of PDEsApplicable to a wide range of PDEs Performance: satisfactory speed-upPerformance: satisfactory speed-up Issues need to be considered for further improvementIssues need to be considered for further improvement