Download presentation
Presentation is loading. Please wait.
Published byDustin Gray Modified over 9 years ago
1
Optimized Parallel Approach for 3D Modelling of Forest Fire Behaviour G. Accary, O. Bessonov, D. Fougère, S. Meradji, D. Morvan Institute for Problems in Mechanics, Moscow, Russia Université de la Méditerranée, Marseille, France Université Saint-Esprit de Kaslik, Jounieh, Lebanon Parallel Computing Technologies -- PaCT-2007
2
PaCT-2007: Optimized Parallel Approach for 3D Modelling Introduction In this work we present methods for parallelization of 3D CFD forest fire modelling code FIRESTAR 3D on NuMA computers in frame of OpenMP environment. --------------------------------------------------------------------------------------------------------------------------------------------------------------------- Numerical model and method Why to parallelize ? Computer system selected for this development Parallelization models Specifics of OpenMP on NuMA computers How to parallelize for OpenMP on NuMA ? Example of OpenMP parallelization, geometric parallelism Current approach to parallelize FIRESTAR 3D Parallelization results for the benchmark problems Parallelization of radiative transfer (input data parallelism) Conclusion
3
PaCT-2007: Optimized Parallel Approach for 3D Modelling Numerical model and method Full-physical 3D model of forest fire behaviour Complex unsteady flow in 3D rectangular domain Solid phases (vegetation) and gas mixture Decomposition mechanisms: drying, pyrolysis, combustion Transfer: convection, diffusion, radiation, turbulence Navier-Stokes equations in Boussinesq approximation Finite Volume discretization, non-uniform staggered grid Fully implicit segregated SIMPLER-style solution method Linear solvers BiCGStab (nonsymmeric), CG (symmetric) Explicit-class preconditioners for linear solvers
4
PaCT-2007: Optimized Parallel Approach for 3D Modelling Why to parallelize ? 3D vs. 2D: -- much bigger grid (Nx*Ny*Nz grid points vs. Nx*Ny); -- more complicated discretizations; -- additional grid compression in problematic areas; As a result, total computational complexity increases by (at least) 2 orders of magnitude. Goal: to accelerate by about 10 times (as minimum) and to achieve (along with another optimizations) the speed of 2D simulations.
5
PaCT-2007: Optimized Parallel Approach for 3D Modelling Computer system selected for this development SGI Altix 350 shared-memory system 20 processors Itanium 2 1.5 Ghz 4M NuMA organization of the system (Non-uniform Memory Architecture): 10 bi-processor modules, with local memory in a module (SMP-nodes), interconnected by very fast interface Current configuration: 8 nodes (16 CPUs) connected to the NuMA switch - "batch domain" for intensive computations. 2 nodes (4 CPUs) - "interactive" domain for development and debug.
6
PaCT-2007: Optimized Parallel Approach for 3D Modelling Parallelization models 2 principal models of parallelization: message passing (MPI): - more universal; - can be applied to distributed memory systems (clusters) as well as to shared memory computers; - complicated to program, requires total reorganization of a code and (often) revision of algorithms. shared memory (OpenMP): - looks as an extension of Fortran and C programming languages; - comment-like directives (ignored if compiled without "-openmp" switch); - simple to program, allows to easily parallelize many algorithms. !$OMP DO do K=1,Nz do J=1,Ny do I=1,Nx {processing} enddo !$OMP END DO
7
PaCT-2007: Optimized Parallel Approach for 3D Modelling Specifics of OpenMP on NuMA computers Access to the memory within a node (local memory) is fast; access to the memory within another node (remote memory) is much slower ==> Distribution of main data arrays in local memories must correspond to the distribution of computational work between processors !!! This is not supported explicitly by OpenMP ==> Special initialization is required (e.g. assignment in a parallel loop). Affiliation (binding) of CPUs to processes in order to avoid migration between processors (e.g. "dplace" utility).
8
PaCT-2007: Optimized Parallel Approach for 3D Modelling How to parallelize for OpenMP on NuMA ? Usually, geometric parallelism is applied - data elements are split in some dimenstion. FIRESTAR 3D - most computations are in CG solvers & calculation of turbulent quantities => easily and naturally parallelizable in OpenMP. Algorithms with recursive dependences ( 3-diag solvers, line Jacobi/GS preconditioners ) - more difficult, not naturally (in development). Restrictions of OpenMP/NuMA: parallelization in only one spatial direction ( ~ 16 CPUs is a limit ) Input data parallelism ( or event parallelism ) - for radiative transport equation ( split by angles ).
9
PaCT-2007: Optimized Parallel Approach for 3D Modelling Example of OpenMP parallelization (geometric parallelism) !$OMP DO do K=1,Nz do J=1,Ny do I=1,Nx Wo3(I,J,K)=Wo2(I,J,K)+ & beta*Wo3(I,J,K) enddo !$OMP END DO Every processor computes its own part of the outermost DO-loop (do K=1,Nz). Iterations of this loop are split evenly between all CPUs. Portions of 3D data arrays must be distributed between local memories accordingly.
10
PaCT-2007: Optimized Parallel Approach for 3D Modelling Current approach to parallelize FIRESTAR 3D Selection and OpenMP-parallelization of the main time- consuming routines: 1) iterative CG solvers & calculation of turbulent quantities ~80% CPU time (in serial execution). 2) routines for transport equations ( velocity, temperature ) and pressure correstion ~20% (in serial execution). 3) initialization - just assignment in a parallel DO loop that corresponds to computational parallel DO loops. 4) some serial optimizations and transformations of the code (in order to avoid dependencies and side-effects between threads). !$OMP DO do K=0,Nz+1 do J=0,Ny+1 do I=0,Nx+1 Wo2(I,J,K)=0. Wo3(I,J,K)=0. enddo !$OMP END DO
11
PaCT-2007: Optimized Parallel Approach for 3D Modelling Parallelization results for the benchmark problem 60x60x60 Speed-up is good ! (problem size 170 MB) 2 processors: limited by the throughput of a local memory ( which is common for 2 CPUs ) 4, 8 processors: superlinear speed-up ( owing to the help of a large 4 Mbyte L3 cache in every CPU ) 16 processors: negative effects ( not divisible by 16, i.e. load disbalance; too small problem, i.e. influence of big boundaries )
12
PaCT-2007: Optimized Parallel Approach for 3D Modelling Parallelization results: ”airflow canopy” problem 96x96x81 Speed-up is reasonable (problem size 1 GB) 2 processors: limited by the throughput of a local memory ( which is common for 2 CPUs ) 4, 8 processors: no superlinear speed-up (bigger problem !) 16 processors: negative effects (load disbalance etc.) are partly compensated by positive effects of a large L3-cache
13
PaCT-2007: Optimized Parallel Approach for 3D Modelling Parallelization of radiative transfer (input data parallelism) (this work was done in collaboration with INRA-URFM-PIF team) Full sphere is split into parts (sectors) corresponding to the number of processors; Equations are integrated independently in each sector (for the full domain) – i.e. each processor computes its own set of input data; After data from each sector are distributed to subdomains for further processing with geometric parallelism.
14
PaCT-2007: Optimized Parallel Approach for 3D Modelling Conclusion In this word we developed: - strategy of OpenMP parallelization for NuMA computers - parallelization method for 3D CFD fire modelling code This new method achieves good parallelization efficiency for moderate number of processors (up to 16). Further work: acceleration of algebraic solvers, develop- ment and parallelization of implicit-class preconditioners. Acnowledgements This work was supported by the European integrated fire management project (Fire Paradox) and by the Russian Foundation for Basic Research (project # 05-08-18110). Acknowledgemens PaCT-2007, September 2007 Pereslavl-Zalessky, Russia
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.