Optimized Parallel Approach for 3D Modelling of Forest Fire Behaviour G. Accary, O. Bessonov, D. Fougère, S. Meradji, D. Morvan Institute for Problems.

Slides:



Advertisements
Similar presentations
Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Advertisements

Introduction to Openmp & openACC
Distributed Systems CS
Parallel Computation of the 2D Laminar Axisymmetric Coflow Nonpremixed Flames Qingan Andy Zhang PhD Candidate Department of Mechanical and Industrial Engineering.
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500 Cluster.
Chapter 1 Introduction to CFD
Reference: Message Passing Fundamentals.
Introduction CS 524 – High-Performance Computing.
Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia.
DCABES 2009 China University Of Geosciences 1 The Parallel Models of Coronal Polarization Brightness Calculation Jiang Wenqian.
Network and Grid Computing –Modeling, Algorithms, and Software Mo Mu Joint work with Xiao Hong Zhu, Falcon Siu.
CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.
Introduction to Scientific Computing Doug Sondak Boston University Scientific Computing and Visualization.
PARALLEL PROCESSING The NAS Parallel Benchmarks Daniel Gross Chen Haiout.
Introduction to Scientific Computing on Linux Clusters Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002.
The hybird approach to programming clusters of multi-core architetures.
1 Parallel Simulations of Underground Flow in Porous and Fractured Media H. Mustapha 1,2, A. Beaudoin 1, J. Erhel 1 and J.R. De Dreuzy IRISA – INRIA.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Lecture Objectives: Review discretization methods for advection diffusion equation Accuracy Numerical Stability Unsteady-state CFD Explicit vs. Implicit.
Tomographic mammography parallelization Juemin Zhang (NU) Tao Wu (MGH) Waleed Meleis (NU) David Kaeli (NU)
Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.
High Performance Computation --- A Practical Introduction Chunlin Tian NAOC Beijing 2011.
Efficient Parallel Implementation of Molecular Dynamics with Embedded Atom Method on Multi-core Platforms Reporter: Jilin Zhang Authors:Changjun Hu, Yali.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Integrating Trilinos Solvers to SEAM code Dagoberto A.R. Justo – UNM Tim Warburton – UNM Bill Spotz – Sandia.
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters,
Institute for Mathematical Modeling RAS 1 Dynamic load balancing. Overview. Simulation of combustion problems using multiprocessor computer systems For.
LBNLGXTBR FY2001 Oil and Gas Recovery Technology Review Meeting Diagnostic and Imaging High Speed 3D Hybrid Elastic Seismic Modeling Lawrence Berkeley.
UPC Applications Parry Husbands. Roadmap Benchmark small applications and kernels —SPMV (for iterative linear/eigen solvers) —Multigrid Develop sense.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Developing a computational infrastructure for parallel high performance FE/FVM simulations Dr. Stan Tomov Brookhaven National Laboratory August 11, 2003.
1 Parallel Programming Aaron Bloomfield CS 415 Fall 2005.
Accelerating the Singular Value Decomposition of Rectangular Matrices with the CSX600 and the Integrable SVD September 7, 2007 PaCT-2007, Pereslavl-Zalessky.
Parallelization of likelihood functions for data analysis Alfio Lazzaro CERN openlab Forum on Concurrent Programming Models and Frameworks.
Manno, , © by Supercomputing Systems 1 1 COSMO - Dynamical Core Rewrite Approach, Rewrite and Status Tobias Gysi POMPA Workshop, Manno,
Parallel Solution of the Poisson Problem Using MPI
Introduction to OpenMP Eric Aubanel Advanced Computational Research Laboratory Faculty of Computer Science, UNB Fredericton, New Brunswick.
ATmospheric, Meteorological, and Environmental Technologies RAMS Parallel Processing Techniques.
Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
Smoothed Particle Hydrodynamics Matthew Zhu CSCI 5551 — Fall 2015.
Simple Radiative Transfer in Decomposed Domains Tobi Heinemann Åke Nordlund Axel Brandenburg Wolfgang Dobler.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
On the Performance of PC Clusters in Solving Partial Differential Equations Xing Cai Åsmund Ødegård Department of Informatics University of Oslo Norway.
Motivation: dynamic apps Rocket center applications: –exhibit irregular structure, dynamic behavior, and need adaptive control strategies. Geometries are.
Parallel Computing Presented by Justin Reschke
AIAA th AIAA/ISSMO Symposium on MAO, 09/05/2002, Atlanta, GA 0 AIAA OBSERVATIONS ON CFD SIMULATION UNCERTAINTIES Serhat Hosder, Bernard.
Performance Evaluation of Parallel Algorithms on a Computational Grid Environment Simona Blandino 1, Salvatore Cavalieri 2 1 Consorzio COMETA, 2 Faculty.
1 Hierarchical Parallelization of an H.264/AVC Video Encoder A. Rodriguez, A. Gonzalez, and M.P. Malumbres IEEE PARELEC 2006.
ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. Chapter 5 Distributed Memory Parallel Computing v9.0.
First INFN International School on Architectures, tools and methodologies for developing efficient large scale scientific computing applications Ce.U.B.
Hybrid Parallel Implementation of The DG Method Advanced Computing Department/ CAAM 03/03/2016 N. Chaabane, B. Riviere, H. Calandra, M. Sekachev, S. Hamlaoui.
Dynamic Mobile Cloud Computing: Ad Hoc and Opportunistic Job Sharing.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Application of Design Patterns to Geometric Decompositions V. Balaji, Thomas L. Clune, Robert W. Numrich and Brice T. Womack.
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Xing Cai University of Oslo
Parallel Plasma Equilibrium Reconstruction Using GPU
I. E. Venetis1, N. Nikoloutsakos1, E. Gallopoulos1, John Ekaterinaris2
Parallel Programming By J. H. Wang May 2, 2017.
J-Zephyr Sebastian D. Eastham
AIAA OBSERVATIONS ON CFD SIMULATION UNCERTAINITIES
AIAA OBSERVATIONS ON CFD SIMULATION UNCERTAINTIES
A Domain Decomposition Parallel Implementation of an Elasto-viscoplasticCoupled elasto-plastic Fast Fourier Transform Micromechanical Solver with Spectral.
Distributed Systems CS
Hybrid Programming with OpenMP and MPI
MPJ: A Java-based Parallel Computing System
CINECA HIGH PERFORMANCE COMPUTING SYSTEM
Ph.D. Thesis Numerical Solution of PDEs and Their Object-oriented Parallel Implementations Xing Cai October 26, 1998.
Presentation transcript:

Optimized Parallel Approach for 3D Modelling of Forest Fire Behaviour G. Accary, O. Bessonov, D. Fougère, S. Meradji, D. Morvan Institute for Problems in Mechanics, Moscow, Russia Université de la Méditerranée, Marseille, France Université Saint-Esprit de Kaslik, Jounieh, Lebanon Parallel Computing Technologies -- PaCT-2007

PaCT-2007: Optimized Parallel Approach for 3D Modelling Introduction In this work we present methods for parallelization of 3D CFD forest fire modelling code FIRESTAR 3D on NuMA computers in frame of OpenMP environment Numerical model and method Why to parallelize ? Computer system selected for this development Parallelization models Specifics of OpenMP on NuMA computers How to parallelize for OpenMP on NuMA ? Example of OpenMP parallelization, geometric parallelism Current approach to parallelize FIRESTAR 3D Parallelization results for the benchmark problems Parallelization of radiative transfer (input data parallelism) Conclusion

PaCT-2007: Optimized Parallel Approach for 3D Modelling Numerical model and method Full-physical 3D model of forest fire behaviour Complex unsteady flow in 3D rectangular domain Solid phases (vegetation) and gas mixture Decomposition mechanisms: drying, pyrolysis, combustion Transfer: convection, diffusion, radiation, turbulence Navier-Stokes equations in Boussinesq approximation Finite Volume discretization, non-uniform staggered grid Fully implicit segregated SIMPLER-style solution method Linear solvers BiCGStab (nonsymmeric), CG (symmetric) Explicit-class preconditioners for linear solvers

PaCT-2007: Optimized Parallel Approach for 3D Modelling Why to parallelize ? 3D vs. 2D: -- much bigger grid (Nx*Ny*Nz grid points vs. Nx*Ny); -- more complicated discretizations; -- additional grid compression in problematic areas; As a result, total computational complexity increases by (at least) 2 orders of magnitude. Goal: to accelerate by about 10 times (as minimum) and to achieve (along with another optimizations) the speed of 2D simulations.

PaCT-2007: Optimized Parallel Approach for 3D Modelling Computer system selected for this development SGI Altix 350 shared-memory system 20 processors Itanium Ghz 4M NuMA organization of the system (Non-uniform Memory Architecture): 10 bi-processor modules, with local memory in a module (SMP-nodes), interconnected by very fast interface Current configuration: 8 nodes (16 CPUs) connected to the NuMA switch - "batch domain" for intensive computations. 2 nodes (4 CPUs) - "interactive" domain for development and debug.

PaCT-2007: Optimized Parallel Approach for 3D Modelling Parallelization models 2 principal models of parallelization: message passing (MPI): - more universal; - can be applied to distributed memory systems (clusters) as well as to shared memory computers; - complicated to program, requires total reorganization of a code and (often) revision of algorithms. shared memory (OpenMP): - looks as an extension of Fortran and C programming languages; - comment-like directives (ignored if compiled without "-openmp" switch); - simple to program, allows to easily parallelize many algorithms. !$OMP DO do K=1,Nz do J=1,Ny do I=1,Nx {processing} enddo !$OMP END DO

PaCT-2007: Optimized Parallel Approach for 3D Modelling Specifics of OpenMP on NuMA computers Access to the memory within a node (local memory) is fast; access to the memory within another node (remote memory) is much slower ==> Distribution of main data arrays in local memories must correspond to the distribution of computational work between processors !!! This is not supported explicitly by OpenMP ==> Special initialization is required (e.g. assignment in a parallel loop). Affiliation (binding) of CPUs to processes in order to avoid migration between processors (e.g. "dplace" utility).

PaCT-2007: Optimized Parallel Approach for 3D Modelling How to parallelize for OpenMP on NuMA ? Usually, geometric parallelism is applied - data elements are split in some dimenstion. FIRESTAR 3D - most computations are in CG solvers & calculation of turbulent quantities => easily and naturally parallelizable in OpenMP. Algorithms with recursive dependences ( 3-diag solvers, line Jacobi/GS preconditioners ) - more difficult, not naturally (in development). Restrictions of OpenMP/NuMA: parallelization in only one spatial direction ( ~ 16 CPUs is a limit ) Input data parallelism ( or event parallelism ) - for radiative transport equation ( split by angles ).

PaCT-2007: Optimized Parallel Approach for 3D Modelling Example of OpenMP parallelization (geometric parallelism) !$OMP DO do K=1,Nz do J=1,Ny do I=1,Nx Wo3(I,J,K)=Wo2(I,J,K)+ & beta*Wo3(I,J,K) enddo !$OMP END DO Every processor computes its own part of the outermost DO-loop (do K=1,Nz). Iterations of this loop are split evenly between all CPUs. Portions of 3D data arrays must be distributed between local memories accordingly.

PaCT-2007: Optimized Parallel Approach for 3D Modelling Current approach to parallelize FIRESTAR 3D Selection and OpenMP-parallelization of the main time- consuming routines: 1) iterative CG solvers & calculation of turbulent quantities ~80% CPU time (in serial execution). 2) routines for transport equations ( velocity, temperature ) and pressure correstion ~20% (in serial execution). 3) initialization - just assignment in a parallel DO loop that corresponds to computational parallel DO loops. 4) some serial optimizations and transformations of the code (in order to avoid dependencies and side-effects between threads). !$OMP DO do K=0,Nz+1 do J=0,Ny+1 do I=0,Nx+1 Wo2(I,J,K)=0. Wo3(I,J,K)=0. enddo !$OMP END DO

PaCT-2007: Optimized Parallel Approach for 3D Modelling Parallelization results for the benchmark problem 60x60x60 Speed-up is good ! (problem size 170 MB) 2 processors: limited by the throughput of a local memory ( which is common for 2 CPUs ) 4, 8 processors: superlinear speed-up ( owing to the help of a large 4 Mbyte L3 cache in every CPU ) 16 processors: negative effects ( not divisible by 16, i.e. load disbalance; too small problem, i.e. influence of big boundaries )

PaCT-2007: Optimized Parallel Approach for 3D Modelling Parallelization results: ”airflow canopy” problem 96x96x81 Speed-up is reasonable (problem size 1 GB) 2 processors: limited by the throughput of a local memory ( which is common for 2 CPUs ) 4, 8 processors: no superlinear speed-up (bigger problem !) 16 processors: negative effects (load disbalance etc.) are partly compensated by positive effects of a large L3-cache

PaCT-2007: Optimized Parallel Approach for 3D Modelling Parallelization of radiative transfer (input data parallelism) (this work was done in collaboration with INRA-URFM-PIF team) Full sphere is split into parts (sectors) corresponding to the number of processors; Equations are integrated independently in each sector (for the full domain) – i.e. each processor computes its own set of input data; After data from each sector are distributed to subdomains for further processing with geometric parallelism.

PaCT-2007: Optimized Parallel Approach for 3D Modelling Conclusion In this word we developed: - strategy of OpenMP parallelization for NuMA computers - parallelization method for 3D CFD fire modelling code This new method achieves good parallelization efficiency for moderate number of processors (up to 16). Further work: acceleration of algebraic solvers, develop- ment and parallelization of implicit-class preconditioners. Acnowledgements This work was supported by the European integrated fire management project (Fire Paradox) and by the Russian Foundation for Basic Research (project # ). Acknowledgemens PaCT-2007, September 2007 Pereslavl-Zalessky, Russia