A Software Framework for Easy Parallelization of PDE Solvers Hans Petter Langtangen Xing Cai Dept. of Informatics University of Oslo.

Slides:



Advertisements
Similar presentations
Parallel Jacobi Algorithm Steven Dong Applied Mathematics.
Advertisements

CFD II w/Dr. Farouk By: Travis Peyton7/18/2015 Modifications to the SIMPLE Method for Non-Orthogonal, Non-Staggered Grids in k- E Turbulence Flow Model.
1 Numerical Solvers for BVPs By Dong Xu State Key Lab of CAD&CG, ZJU.
Geometric (Classical) MultiGrid. Hierarchy of graphs Apply grids in all scales: 2x2, 4x4, …, n 1/2 xn 1/2 Coarsening Interpolate and relax Solve the large.
Parallelizing stencil computations Based on slides from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267.
Parallel Solution of Navier Stokes Equations Xing Cai Dept. of Informatics University of Oslo.
1 A component mode synthesis method for 3D cell by cell calculation using the mixed dual finite element solver MINOS P. Guérin, A.M. Baudron, J.J. Lautard.
An efficient parallel particle tracker For advection-diffusion simulations In heterogeneous porous media Euro-Par 2007 IRISA - Rennes August 2007.
Reference: Message Passing Fundamentals.
ECE669 L5: Grid Computations February 12, 2004 ECE 669 Parallel Computer Architecture Lecture 5 Grid Computations.
Avoiding Communication in Sparse Iterative Solvers Erin Carson Nick Knight CS294, Fall 2011.
Steady Aeroelastic Computations to Predict the Flying Shape of Sails Sriram Antony Jameson Dept. of Aeronautics and Astronautics Stanford University First.
CS267 L12 Sources of Parallelism(3).1 Demmel Sp 1999 CS 267 Applications of Parallel Computers Lecture 12: Sources of Parallelism and Locality (Part 3)
Landscape Erosion Kirsten Meeker
Network and Grid Computing –Modeling, Algorithms, and Software Mo Mu Joint work with Xiao Hong Zhu, Falcon Siu.
Monica Garika Chandana Guduru. METHODS TO SOLVE LINEAR SYSTEMS Direct methods Gaussian elimination method LU method for factorization Simplex method of.
Chapter 13 Finite Difference Methods: Outline Solving ordinary and partial differential equations Finite difference methods (FDM) vs Finite Element Methods.
Direct and iterative sparse linear solvers applied to groundwater flow simulations Matrix Analysis and Applications October 2007.
Numerical methods for PDEs PDEs are mathematical models for –Physical Phenomena Heat transfer Wave motion.
© 2011 Autodesk Freely licensed for use by educational institutions. Reuse and changes require a note indicating that content has been modified from the.
1 Parallel Simulations of Underground Flow in Porous and Fractured Media H. Mustapha 1,2, A. Beaudoin 1, J. Erhel 1 and J.R. De Dreuzy IRISA – INRIA.
Tools for Multi-Physics Simulation Hans Petter Langtangen Simula Research Laboratory Oslo, Norway Department of Informatics, University of Oslo.
1 Numerical Integration of Partial Differential Equations (PDEs)
1CPSD NSF/DARPA OPAAL Adaptive Parallelization Strategies using Data-driven Objects Laxmikant Kale First Annual Review October 1999, Iowa City.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
Processing of a CAD/CAE Jobs in grid environment using Elmer Electronics Group, Physics Department, Faculty of Science, Ain Shams University, Mohamed Hussein.
S.S. Yang and J.K. Lee FEMLAB and its applications POSTEC H Plasma Application Modeling Lab. Oct. 25, 2005.
1 Using the PETSc Parallel Software library in Developing MPP Software for Calculating Exact Cumulative Reaction Probabilities for Large Systems (M. Minkoff.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
CADD: Component-Averaged Domain Decomposition Dan Gordon Computer Science University of Haifa Rachel Gordon Aerospace Engg. Technion January 13,
UPC Applications Parry Husbands. Roadmap Benchmark small applications and kernels —SPMV (for iterative linear/eigen solvers) —Multigrid Develop sense.
ParCFD Parallel computation of pollutant dispersion in industrial sites Julien Montagnier Marc Buffat David Guibert.
C GasparAdvances in Numerical Algorithms, Graz, Fast interpolation techniques and meshless methods Csaba Gáspár Széchenyi István University, Department.
Discontinuous Galerkin Methods Li, Yang FerienAkademie 2008.
The swiss-carpet preconditioner: a simple parallel preconditioner of Dirichlet-Neumann type A. Quarteroni (Lausanne and Milan) M. Sala (Lausanne) A. Valli.
Computational Aspects of Multi-scale Modeling Ahmed Sameh, Ananth Grama Computing Research Institute Purdue University.
A Software Strategy for Simple Parallelization of Sequential PDE Solvers Hans Petter Langtangen Xing Cai Dept. of Informatics University of Oslo.
Parallel Solution of the Poisson Problem Using MPI
Parallelizing finite element PDE solvers in an object-oriented framework Xing Cai Department of Informatics University of Oslo.
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo.
October 2008 Integrated Predictive Simulation System for Earthquake and Tsunami Disaster CREST/Japan Science and Technology Agency (JST)
Cracow Grid Workshop, November 5-6, 2001 Concepts for implementing adaptive finite element codes for grid computing Krzysztof Banaś, Joanna Płażek Cracow.
Connections to Other Packages The Cactus Team Albert Einstein Institute
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
High performance computing for Darcy compositional single phase fluid flow simulations L.Agélas, I.Faille, S.Wolf, S.Réquena Institut Français du Pétrole.
Discretization for PDEs Chunfang Chen,Danny Thorne Adam Zornes, Deng Li CS 521 Feb., 9,2006.
MA/CS 471 Lecture 15, Fall 2002 Introduction to Graph Partitioning.
An Object-Oriented Software Framework for Building Parallel Navier-Stokes Solvers Xing Cai Hans Petter Langtangen Otto Munthe University of Oslo.
A Software Framework for Easy Parallelization of PDE Solvers Hans Petter Langtangen Xing Cai Dept. of Informatics University of Oslo.
On the Performance of PC Clusters in Solving Partial Differential Equations Xing Cai Åsmund Ødegård Department of Informatics University of Oslo Norway.
Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University
C OMPUTATIONAL R ESEARCH D IVISION 1 Defining Software Requirements for Scientific Computing Phillip Colella Applied Numerical Algorithms Group Lawrence.
The Mechanical Simulation Engine library An Introduction and a Tutorial G. Cella.
April 24, 2002 Parallel Port Example. April 24, 2002 Introduction The objective of this lecture is to go over a simple problem that illustrates the use.
Adaptive grid refinement. Adaptivity in Diffpack Error estimatorError estimator Adaptive refinementAdaptive refinement A hierarchy of unstructured gridsA.
Multipole-Based Preconditioners for Sparse Linear Systems. Ananth Grama Purdue University. Supported by the National Science Foundation.
Parallel Computing Activities at the Group of Scientific Software Xing Cai Department of Informatics University of Oslo.
A Simulation Framework for Testing Flow Control Strategies Marek Gayer, Milan Milovanovic and Ole Morten Aamo Faculty of Information Technology, Mathematics.
Computational Fluid Dynamics Lecture II Numerical Methods and Criteria for CFD Dr. Ugur GUVEN Professor of Aerospace Engineering.
High Performance Computing Seminar II Parallel mesh partitioning with ParMETIS Parallel iterative solvers with Hypre M.Sc. Caroline Mendonça Costa.
Hui Liu University of Calgary
Xing Cai University of Oslo
Programming Models for SimMillennium
MultiGrid.
GENERAL VIEW OF KRATOS MULTIPHYSICS
Supported by the National Science Foundation.
A Software Framework for Easy Parallelization of PDE Solvers
Parallelizing Unstructured FEM Computation
Ph.D. Thesis Numerical Solution of PDEs and Their Object-oriented Parallel Implementations Xing Cai October 26, 1998.
Presentation transcript:

A Software Framework for Easy Parallelization of PDE Solvers Hans Petter Langtangen Xing Cai Dept. of Informatics University of Oslo

Parallel CFD 2000 Outline of the Talk

Parallel CFD 2000 The Question Starting point: sequential PDE solvers How to do the parallelization? Resulting parallel solvers should have 4 good parallel efficiency 4 good overall numerical performance We need 4 a good parallelization strategy 4 a good and simple implementation of the strategy

Parallel CFD 2000 Problem Domain Partial differential equationsPartial differential equations Finite elements/differencesFinite elements/differences Communication through message passingCommunication through message passing

Parallel CFD 2000 A Known Problem “The hope among early domain decomposition workers was that one could write a simple controlling program which would call the old PDE software directly to perform the subdomain solves. This turned out to be unrealistic because most PDE packages are too rigid and inflexible.” “The hope among early domain decomposition workers was that one could write a simple controlling program which would call the old PDE software directly to perform the subdomain solves. This turned out to be unrealistic because most PDE packages are too rigid and inflexible.” - Smith, Bjørstad and Gropp - Smith, Bjørstad and Gropp One remedy: Use of object-oriented programming techniques Use of object-oriented programming techniques

Parallel CFD 2000 Domain Decomposition Solution of the original large problem through iteratively solving many smaller subproblemsSolution of the original large problem through iteratively solving many smaller subproblems Can be used as solution method or preconditionerCan be used as solution method or preconditioner Flexibility -- localized treatment of irregular geometries, singularities etcFlexibility -- localized treatment of irregular geometries, singularities etc Very efficient numerical methods -- even on sequential computersVery efficient numerical methods -- even on sequential computers Suitable for coarse grained parallelizationSuitable for coarse grained parallelization

Parallel CFD 2000 Overlapping DD Alternating Schwarz method for two subdomains Example: solving an elliptic boundary value problem in A sequence of approximations where

Parallel CFD 2000 Additive Schwarz Method Subproblems can be solved in parallelSubproblems can be solved in parallel Subproblems are of the same form as the original large problem, with possibly different boundary conditions on artificial boundariesSubproblems are of the same form as the original large problem, with possibly different boundary conditions on artificial boundaries

Parallel CFD 2000 Convergence of the Solution Single Single-phase groundwater flow

Parallel CFD 2000 Coarse Grid Correction This DD algorithm is a kind of block Jacobi iterationThis DD algorithm is a kind of block Jacobi iteration Problem: often (very) slow convergenceProblem: often (very) slow convergence Remedy: coarse grid correctionRemedy: coarse grid correction A kind of two-grid multigrid algorithmA kind of two-grid multigrid algorithm Coarse grid solve on each processorCoarse grid solve on each processor

Parallel CFD 2000 Observations DD is a good parallelization strategyDD is a good parallelization strategy A program for the original global problem can be reused (modulo B.C.) for each subdomainA program for the original global problem can be reused (modulo B.C.) for each subdomain Communication of overlapping point values is requiredCommunication of overlapping point values is required The approach is not PDE-specificThe approach is not PDE-specific No need for global dataNo need for global data Data distribution impliedData distribution implied Explicit temporal scheme are a special case where no iteration is needed (“exact DD”)Explicit temporal scheme are a special case where no iteration is needed (“exact DD”)

Parallel CFD 2000 Goals for the Implementation Reuse sequential solver as subdomain solverReuse sequential solver as subdomain solver Add DD management and communication as separate modulesAdd DD management and communication as separate modules Collect common operations in generic library modulesCollect common operations in generic library modules Flexibility and portabilityFlexibility and portability Simplified parallelization process for the end-userSimplified parallelization process for the end-user

Parallel CFD 2000 Generic Programming Framework

Parallel CFD 2000 The Administrator Administrator Parameters DD algorithm Operations ParametersParameters solution method or preconditioner, max iterations stopping criterion etc DD algorithmDD algorithm Subdomain solve + coarse grid correction OperationsOperations Matrix-vector product, inner-product etc

Parallel CFD 2000 The Subdomain Simulator Subdomain Simulator -- a generic representationSubdomain Simulator -- a generic representation C++ class hierarchyC++ class hierarchy Interface of generic member functionsInterface of generic member functions Subdomain Simulator seq. solver add-oncommunication

Parallel CFD 2000 The Communicator Need functionality for exchanging point values inside the overlapping regionsNeed functionality for exchanging point values inside the overlapping regions Build a generic communication module: The communicatorBuild a generic communication module: The communicator Encapsulation of communication related codes. Hidden concrete communication model. MPI in use, but easy to changeEncapsulation of communication related codes. Hidden concrete communication model. MPI in use, but easy to change

Parallel CFD 2000 Realization Object-oriented programming (C++, Java, Python)Object-oriented programming (C++, Java, Python) Use inheritanceUse inheritance –Simplifies modularization –Supports reuse of sequential solver (without touching its source code!)

Parallel CFD 2000 Generic Subdomain Simulators SubdomainSimulatorSubdomainSimulator –abstract interface to all subdomain simulators, as seen by the Administrator SubdomainFEMSolverSubdomainFEMSolver –Special case of SubdomainSimulator for finite element-based simulators These are generic classes, not restricted to specific application areasThese are generic classes, not restricted to specific application areasSubdomainSimulator SubdomainFEMSolver

Parallel CFD 2000 Making the Simulator Parallel class SimulatorP : public SubdomainFEMSolver public Simulator public Simulator{ // … just a small amount of codes // … just a small amount of codes virtual void createLocalMatrix () virtual void createLocalMatrix () { Simualtor::makeSystem (); } { Simualtor::makeSystem (); }};SubdomainSimulator SubdomainFEMSolverSimulatorSimulatorPAdministrator

Parallel CFD 2000 Performance Algorithmic efficiencyAlgorithmic efficiency 4efficiency of original sequential simulator(s) 4efficiency of domain decomposition method Parallel efficiencyParallel efficiency 4communication overhead (low) 4coarse grid correction overhead (normally low) 4load balancing –subproblem size –work on subdomain solves

Parallel CFD 2000 Summary So Far A generic approachA generic approach Works if the DD algorithm works for the problem at handWorks if the DD algorithm works for the problem at hand Implementation in terms of class hierarchiesImplementation in terms of class hierarchies The new parallel-specific code, SimulatorP, is very small and simple to writeThe new parallel-specific code, SimulatorP, is very small and simple to write

Parallel CFD 2000 Application  Single-phase groundwater flow  DD as the global solution method  Subdomain solvers use CG+FFT  Fixed number of subdomains M =32 (independent of P )  Straightforward parallelization of an existing simulator P: number of processors

Parallel CFD 2000 Two-phase Porous Media Flow PEQ: SEQ: DD as preconditioner for global BiCGtab solving pressure eq. Multigrid V-cycle in subdomain solves

Parallel CFD 2000 Two-Phase Porous Media Flow Simulation result obtained on 16 processors

Parallel CFD 2000 Two-phase Porous Media Flow History of saturation for water and oil

Parallel CFD 2000 Nonlinear Water Waves Fully nonlinear 3D water waves Primary unknowns: Parallelization based on an existing sequential Diffpack simulator

Parallel CFD 2000 Nonlinear Water Waves DD as preconditioner for global CG solving Laplace eq.DD as preconditioner for global CG solving Laplace eq. Multigrid V-cycle as subdomain solverMultigrid V-cycle as subdomain solver Fixed number of subdomains M =16 (independent of P )Fixed number of subdomains M =16 (independent of P ) Subgrids from partition of a global 41x41x41 gridSubgrids from partition of a global 41x41x41 grid

Parallel CFD 2000 Nonlinear Water Waves 3D Poisson equation in water wave simulation

Parallel CFD 2000 Application  Test case: 2D linear elasticity, 241 x 241 global grid.  Vector equation  Straightforward parallelization based on an existing Diffpack simulator

Parallel CFD D Linear Elasticity

Parallel CFD D Linear Elasticity DD as preconditioner for a global BiCGStab methodDD as preconditioner for a global BiCGStab method Multigrid V-cycle in subdomain solvesMultigrid V-cycle in subdomain solves I: number of global BiCGStab iterations neededI: number of global BiCGStab iterations needed P: number of processors ( P =#subdomains)P: number of processors ( P =#subdomains)

Parallel CFD 2000 Diffpack O-O software environment for scientific computationO-O software environment for scientific computation Rich collection of PDE solution components - portable, flexible, extensibleRich collection of PDE solution components - portable, flexible, extensible H.P.Langtangen, Computational Partial Differential Equations, Springer 1999H.P.Langtangen, Computational Partial Differential Equations, Springer 1999

Parallel CFD 2000 Straightforward Parallelization Develop a sequential simulator, without paying attention to parallelismDevelop a sequential simulator, without paying attention to parallelism Follow the Diffpack coding standardsFollow the Diffpack coding standards Use add-on libraries for parallelization specific functionalitiesUse add-on libraries for parallelization specific functionalities Add a few new statements for transformation to a parallel simulatorAdd a few new statements for transformation to a parallel simulator

Parallel CFD 2000 Linear-algebra-level Approach Parallelize matrix/vector operationsParallelize matrix/vector operations –inner-product of two vectors –matrix-vector product –preconditioning - block contribution from subgrids Easy to useEasy to use –access to all existing Diffpack iterative methods, preconditioners and convergence monitors –“hidden” parallelization – need only to add a few lines of new code –arbitrary choice of number of procs at run-time –less flexibility than DD

Parallel CFD 2000 New Library Tool class GridPartAdmclass GridPartAdm –Generate overlapping or non-overlapping subgrids –Prepare communication patterns –Update global values –matvec, innerProd, norm

Parallel CFD 2000 Mesh Partition Example

Parallel CFD 2000 A Simple Coding Example Handle(GridPartAdm) adm; // access to parallelizaion functionalities Handle(LinEqAdm) lineq; // administrator for linear system & solver //... #ifdef PARALLEL_CODE adm->scan (menu); adm->prepareSubgrids (); adm->prepareCommunication (); lineq->attachCommAdm (*adm); #endif //... lineq->solve (); set subdomain list = DEFAULT set global grid = grid1.file set partition-algorithm = METIS set number of overlaps = 0

Parallel CFD 2000 Single-phase Groundwater Flow Highly unstructured grid Highly unstructured grid Discontinuity in the coefficient K Discontinuity in the coefficient K

Parallel CFD 2000 Measurements 130,561 degrees of freedom 130,561 degrees of freedom Overlapping subgrids Overlapping subgrids Global BiCGStab using (block) ILU prec. Global BiCGStab using (block) ILU prec.

Parallel CFD 2000 A Fast FEM N-S Solver Operator splitting in the tradition of pressure correction, velocity correction, Helmholtz decompositionOperator splitting in the tradition of pressure correction, velocity correction, Helmholtz decomposition This version is due to Ren & UtnesThis version is due to Ren & Utnes

Parallel CFD 2000 A Fast FEM N-S Solver Calculation of an intermediate velocityCalculation of an intermediate velocity

Parallel CFD 2000 A Fast FEM N-S Solver Solution of a Poisson EquationSolution of a Poisson Equation Correction of the intermediate velocityCorrection of the intermediate velocity

Parallel CFD 2000 Test Case: Vortex-Shedding

Parallel CFD 2000 Simulation Snapshots Pressure

Parallel CFD 2000 Simulation Snapshots Pressure

Parallel CFD 2000 Animated Pressure Field

Parallel CFD 2000 Simulation Snapshots Velocity

Parallel CFD 2000 Simulation Snapshots Velocity

Parallel CFD 2000 Animated Velocity Field

Parallel CFD 2000 Some CPU-Measurements The pressure equation is solved by the CG method

Parallel CFD 2000 Summary Goal: provide software and programming rules for easy parallelization of sequential simulatorsGoal: provide software and programming rules for easy parallelization of sequential simulators Two parallelization strategies:Two parallelization strategies: –domain decomposition: very flexible, compact visible code/algorithm –parallelization at the linear algebra level: “automatic” hidden parallelization Performance: satisfactory speed-upPerformance: satisfactory speed-up

Parallel CFD 2000 Future Application DD with different PDEs and local solvers –Out in deep sea: Eulerian, finite differences, Boussinesq PDEs, F77 code –Near shore: Lagrangian, finite element, shallow water PDEs, C++ code