CEA’s Parallel industrial codes in electromagnetism

Slides:



Advertisements
Similar presentations
Zhen Lu CPACT University of Newcastle MDC Technology Reduced Hessian Sequential Quadratic Programming(SQP)
Advertisements

Parallel Jacobi Algorithm Steven Dong Applied Mathematics.
Study of propagative and radiative behavior of printed dielectric structures using the finite difference time domain method (FDTD) Università “La Sapienza”,
5/4/2015rew Accuracy increase in FDTD using two sets of staggered grids E. Shcherbakov May 9, 2006.
Siddharth Choudhary.  Refines a visual reconstruction to produce jointly optimal 3D structure and viewing parameters  ‘bundle’ refers to the bundle.
Parallel Computation of the 2D Laminar Axisymmetric Coflow Nonpremixed Flames Qingan Andy Zhang PhD Candidate Department of Mechanical and Industrial Engineering.
Numerical Parallel Algorithms for Large-Scale Nanoelectronics Simulations using NESSIE Eric Polizzi, Ahmed Sameh Department of Computer Sciences, Purdue.
1 A component mode synthesis method for 3D cell by cell calculation using the mixed dual finite element solver MINOS P. Guérin, A.M. Baudron, J.J. Lautard.
Multilevel Incomplete Factorizations for Non-Linear FE problems in Geomechanics DMMMSA – University of Padova Department of Mathematical Methods and Models.
CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.
Network and Grid Computing –Modeling, Algorithms, and Software Mo Mu Joint work with Xiao Hong Zhu, Falcon Siu.
Direct and iterative sparse linear solvers applied to groundwater flow simulations Matrix Analysis and Applications October 2007.
1 Parallel Simulations of Underground Flow in Porous and Fractured Media H. Mustapha 1,2, A. Beaudoin 1, J. Erhel 1 and J.R. De Dreuzy IRISA – INRIA.
The sequence of graph transformation (P1)-(P2)-(P4) generating an initial mesh with two finite elements GENERATION OF THE TOPOLOGY OF INITIAL MESH Graph.
MUMPS A Multifrontal Massively Parallel Solver IMPLEMENTATION Distributed multifrontal.
Parallel Performance of Hierarchical Multipole Algorithms for Inductance Extraction Ananth Grama, Purdue University Vivek Sarin, Texas A&M University Hemant.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
Algorithms and Software for Large-Scale Simulation of Reactive Systems _______________________________ Ananth Grama Coordinated Systems Lab Purdue University.
Scalabilities Issues in Sparse Factorization and Triangular Solution Sherry Li Lawrence Berkeley National Laboratory Sparse Days, CERFACS, June 23-24,
1 A Domain Decomposition Analysis of a Nonlinear Magnetostatic Problem with 100 Million Degrees of Freedom H.KANAYAMA *, M.Ogino *, S.Sugimoto ** and J.Zhao.
Efficient design of a C-band aperture-coupled stacked microstrip array using Nexxim and Designer Alberto Di Maria German Aerospace Centre (DLR) – Microwaves.
Scotch + HAMD –Hybrid algorithm based on incomplete Nested Dissection, the resulting subgraphs being ordered with an Approximate Minimun Degree method.
Fast Low-Frequency Impedance Extraction using a Volumetric 3D Integral Formulation A.MAFFUCCI, A. TAMBURRINO, S. VENTRE, F. VILLONE EURATOM/ENEA/CREATE.
Abstract : The quest for more efficient and accurate computational electromagnetics (CEM) techniques has been vital in the design of modern engineering.
Finite Element Method.
Danie Ludick MScEng Study Leader: Prof. D.B. Davidson Computational Electromagnetics Group Stellenbosch University Extended Studies of Focal Plane Arrays.
Computational Aspects of Multi-scale Modeling Ahmed Sameh, Ananth Grama Computing Research Institute Purdue University.
Stress constrained optimization using X-FEM and Level Set Description
Introduction to CST MWS
PaStiX : how to reduce memory overhead ASTER meeting Bordeaux, Nov 12-14, 2007 PaStiX team LaBRI, UMR CNRS 5800, Université Bordeaux I Projet ScAlApplix,
Parallel Solution of the Poisson Problem Using MPI
October 2008 Integrated Predictive Simulation System for Earthquake and Tsunami Disaster CREST/Japan Science and Technology Agency (JST)
Improvement of existing solvers for the simulation of MHD instabilities Numerical Flow Models for Controlled Fusion Porquerolles, April 2007 P. Hénon,
Scotch + HAMD –Hybrid algorithm based on incomplete Nested Dissection, the resulting subgraphs being ordered with an Approximate Minimun Degree method.
Outline Introduction Research Project Findings / Results
Fast and Efficient RCS Computation over a Wide Frequency Band Using the Universal Characteristic Basis Functions (UCBFs) June 2007 Authors: Prof. Raj Mittra*
Modelling and Simulation of Passive Optical Devices João Geraldo P. T. dos Reis and Henrique J. A. da Silva Introduction Integrated Optics is a field of.
Multipole-Based Preconditioners for Sparse Linear Systems. Ananth Grama Purdue University. Supported by the National Science Foundation.
Hybrid Parallel Implementation of The DG Method Advanced Computing Department/ CAAM 03/03/2016 N. Chaabane, B. Riviere, H. Calandra, M. Sekachev, S. Hamlaoui.
1 Des bulles dans le PaStiX Réunion NUMASIS Mathieu Faverge ScAlApplix project, INRIA Futurs Bordeaux 29 novembre 2006.
Application of Design Patterns to Geometric Decompositions V. Balaji, Thomas L. Clune, Robert W. Numrich and Brice T. Womack.
Unstructured Meshing Tools for Fusion Plasma Simulations
Parallel Direct Methods for Sparse Linear Systems
Auburn University
Hui Liu University of Calgary
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Xing Cai University of Oslo
Introduction to the Finite Element Method
Katsuyo Thornton1, R. Edwin García2, Larry Aagesen3
ELEC 3105 Basic EM and Power Engineering
A computational loop k k Integration Newton Iteration
ELEC 401 MICROWAVE ELECTRONICS Lecture 3
Solving Linear Systems Ax=b
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Mapping Techniques Dr. Xiao Qin Auburn University.
Lecture 19 MA471 Fall 2003.
Finite Element Method To be added later 9/18/2018 ELEN 689.
Convergence in Computational Science
ELEC 401 MICROWAVE ELECTRONICS Lecture 3
Nathan Grabaskas: Batched LA and Parallel Communication Optimization
GPU Implementations for Finite Element Methods
Algorithms and Software for Large-Scale Simulation of Reactive Systems
A robust preconditioner for the conjugate gradient method
Department of Flight Physics and Loads AVFP
GENERAL VIEW OF KRATOS MULTIPHYSICS
Supported by the National Science Foundation.
Algorithms and Software for Large-Scale Simulation of Reactive Systems
Comparison of CFEM and DG methods
Ph.D. Thesis Numerical Solution of PDEs and Their Object-oriented Parallel Implementations Xing Cai October 26, 1998.
Parallel Programming in C with MPI and OpenMP
A computational loop k k Integration Newton Iteration
Presentation transcript:

CEA’s Parallel industrial codes in electromagnetism CEA/DAM - France David GOUDIN Michel MANDALLENA Katherine MER-NKONGA Jean Jacques PESQUE Muriel SESQUES Bruno STUPFEL CEA/DAM/CESTA - France CSC'05 21-23 june 2005

CEA’s Parallel industrial codes in electromagnetism Who are we ? CEA : French Atomic Energy Commission CESTA : One center of CEA, located near Bordeaux, work on the electromagnetic behavior of complex target Measurement devices Terascale computer Simulation codes ARLAS ARLENE ODYSSEE CEA/DAM/CESTA - France CSC'05 21-23 june 2005

CEA’s Parallel industrial codes in electromagnetism The Simulation Program of CEA/DAM: reproduce the different stages of the functioning of a nuclear weapon through numerical simulation . The project TERA : one of the 3 major systems of the Simulation Program 13 99 00 01 02 03 04 05 06 07 08 09 10 11 12 30 (50) Gflops 2D Mesh size 0.5 106 TERA-1 : 1 (5) Tflops 3D Mesh size 15 106 TERA-10 : >10 Tfops 3D Mesh size 150 106 TERA-100 : >100Tflops Mesh size > 109 62 Teraflops 8704 procs 30 Tera bytes RAM 1 peta bytes disk space CEA/DAM/CESTA - France CSC'05 21-23 june 2005

CEA’s Parallel industrial codes in electromagnetism The supercomputer HP/Compaq TERA 1  5 Teraflops of peak performance - 2560 processors EV 68 (1Ghz) 640 SMP nodes ES45 (4 processors/node) 2.5 Tera bytes of RAM 50 Tera bytes of disk space High-performance interconnection network (Quadrics) Benchmark Linpack : 3.98 Teraflops  2005 - 2006 : the next level = 30 Teraflops 5 teraflops 01 02 03 04 05 06 07 08 09 10 11 30 teraflops 100 teraflops CEA/DAM/CESTA - France CSC'05 21-23 june 2005

The supercomputer TERA-10 December 2005 544*16 proc. > 60 Tflops (12,5 Tflops) 30 To 1 Po - 56 nodes (54 OSS+2 MDS) < 2000 kW Installation SMP Nodes Peak performance (Benchmark CEA) RAM Disk space Consummation Main characteristics Computing Node I/O 544 computing nodes 1 Peta bytes of data Data network (IB) Users Access 10 Gb Ethernet CEA/DAM/CESTA - France CSC'05 21-23 june 2005

CEA’s Parallel industrial codes in electromagnetism What is the problem ? The simulation of the electromagnetic behavior of a complex target in frequency domain (Hd,Ed) kinc Einc Hinc Numerical resolution of Maxwell’s equations in the free space  RCS (Radar Cross Section) computation  Antenna computation CEA/DAM/CESTA - France CSC'05 21-23 june 2005

CEA’s Parallel industrial codes in electromagnetism 3D Codes in production before 2004 : ARLENE (since 1994) ARLAS (since 2000) Interior domain Exterior domain Based on the following numerical method Domain decomposition Strong coupling of numerical method between domains -BIEM : Boundary Integral Equations Method -PDE : Partial Derivative Equations Method Finite Elements Method ’s discretization Lead to solve a linear system A.x = b Solved by direct Method due to the characteristics of A EMILIO (industrial software tool, includes PaStiX, MPI version), developed in collaboration with ScAlApplix team (INRIA, Bordeaux) CEA/DAM/CESTA - France CSC'05 21-23 june 2005

CEA’s Parallel industrial codes in electromagnetism ARLAS ARLENE  Hybrid method - free space problem by BIEM - interior problem by PDE  Fully BIEM  Meshes at the surface (interfaces between homogeneous isotropic medium) - number of unknowns reasonable - lead to a full matrix Hybrid meshes - on the outer boundary and in the volume - big number of unknowns in the volume - lead to a matrix with full and sparse part Av11 E J Av12 Av22+ As22 As23 As33 M As22 As23 M full part sparse part As23 As33 J J,M J,M E J Parallelization : very efficient Parallelization : more difficult… CEA/DAM/CESTA - France CSC'05 21-23 june 2005

CEA’s Parallel industrial codes in electromagnetism ARLAS  Solver Solved by Schur ’s complement by elimination of the electric field E ARLENE  Solver own parallel Cholesky-Crout solver : the matrix is : - symmetric - complex - non hermitian 1. sparse matrix assembly : Av11 2. factorization by PaStiX, parallel sparse solver from EMILIO (INRIA-CEA) 3. computation of the Schur’s complement : (Av11)-1 Av12 4. dense matrix assembly : As22 As22 As22 As22 5. add the contribution to the dense matrix (3.) 6. factorization, resolution of the dense linear system -Av12(Av11)-1 Av12 + Av22+ As22 As23 As33 CEA/DAM/CESTA - France CSC'05 21-23 june 2005

CEA’s Parallel industrial codes in electromagnetism ARLENE - Performance tests on TERA 1  Cone sphere 2Ghz : 104 000 unknowns  Cone sphere 5Ghz : 248 000 unknowns  F117 : 216 000 unknowns  NASA ALMOND (a quarter object) - Workshop JINA ’02 (18/11/02) - Conducting body : 237 000 unknowns (948 000 unknowns) CEA/DAM/CESTA - France CSC'05 21-23 june 2005

CEA’s Parallel industrial codes in electromagnetism  Sphere Cone 2 GHZ : 104 000 unknowns matrix size 85.5 GBytes monostatic RCS 2 GHz Incidences (degrees) CEA/DAM/CESTA - France CSC'05 21-23 june 2005

CEA’s Parallel industrial codes in electromagnetism  Sphere Cone 5 GHZ : 248 000 unknowns matrix size 492 GBytes Monostatic RCS 5 GHz Electric currents for the 2 polarizations Incidences (degrees) CEA/DAM/CESTA - France CSC'05 21-23 june 2005

CEA’s Parallel industrial codes in electromagnetism  F117 500 Mhz 216 425 unknowns matrix size : 375 Go global CPU time on 1024 processors 10 684 seconds CEA/DAM/CESTA - France CSC'05 21-23 june 2005

CEA’s Parallel industrial codes in electromagnetism  NASA Almond 8 Ghz ( Workshop JINA'02 - case2 - 18 November 2002) 233 027 unknowns with 2 symmetries -> 932 108 unknowns matrix size : 434 GBytes 4 factorizations & 8 resolutions to compute the currents global CPU time on 1536 processors : 36 083 seconds Bistatique RCS 8 GHz Bistatic RCS 8 GHz axial incidence Observation’s angles (degrees) CEA/DAM/CESTA - France CSC'05 21-23 june 2005

CEA’s Parallel industrial codes in electromagnetism ARLAS - Performance tests on TERA 1  Monopole antenna : 241 500 unknowns in the volume 9 400 unknowns on the outer boundary  Cone sphere : 1 055 000 unknowns in the volume 14 600 unknowns on the outer boundary CEA/DAM/CESTA - France CSC'05 21-23 june 2005

CEA’s Parallel industrial codes in electromagnetism  Monopole antenna ( Workshop JINA'02 - case3 - 18 November 2002) 241 500 unknowns in the volume 9 400 unknowns on the outer boundary CEA/DAM/CESTA - France CSC'05 21-23 june 2005

CEA’s Parallel industrial codes in electromagnetism  Sphere cone : 1 055 000 unknowns in the volume 14 600 unknowns on the outer boundary Mesh fragmented sight CEA/DAM/CESTA - France CSC'05 21-23 june 2005

CEA’s Parallel industrial codes in electromagnetism Conclusions  ARLENE : - is limited to 450 000 unknowns on the outer boundary - reach : 65% of the performance peak until 1500 processors : 58% until 2048 processors  ARLAS : - is limited to 3 million unknowns in the volume 25 000 unknowns on the outer boundary But : for our future applications it’s not enough we’ll need more than : - 30 million unknowns in the volume - 500 000 unknowns on the outer boundary CEA/DAM/CESTA - France CSC'05 21-23 june 2005

CEA’s Parallel industrial codes in electromagnetism Solution ? A new 3D code : ODYSSEE (since 2004) Based on : * a domain decomposition method (DDM) partitioned into concentric sub-domains TC * a Robin-type transmission condition on the interfaces * on the outer boundary, the radiation condition is taken into account by a new Boundary Integral formulation called EID * we solve this problem with 1 or 2 iterative method’s level  Gauss-Seidel for the global problem Inside each sub-domain -  iterative solver (// conjugate gradient) for the free space problem -  a // Fast Multipole Method - the PaStiX direct solver - a direct Scalapack solver CEA/DAM/CESTA - France CSC'05 21-23 june 2005

CEA’s Parallel industrial codes in electromagnetism What about our constraints ? 3 dimensional targets Complex materials : - inhomogeneous medium layer - isotropic or/and anisotropic medium layer - conductor or high index medium layer taken into account by a Leontovich impedance condition - wire, conducting patches, … We need a very good accuracy : low level of RCS We want to reach medium and high frequencies body’s size about 20 to 100 l We want a parallel industrial code which is able to run on the super computer we have CEA/DAM/CESTA - France CSC'05 21-23 june 2005

CEA’s Parallel industrial codes in electromagnetism Numerical issues Unbounded domain : the domain can be cut by any kind of ABC Absorbing Boundary Condition = not accurate enough for our applications The radiation condition is taken into account by a Boundary Integral Equation  leads to a linear system with a full matrix Discretization Mesh’s size = f(l)  the higher the frequency, the bigger the problem’s size is - Numerical methods - High Performance Computing ODYSSEE needs CEA/DAM/CESTA - France CSC'05 21-23 june 2005

CEA’s Parallel industrial codes in electromagnetism ODYSSEE - Key ideas Domain Decomposition Method the computational domain is partitioned into concentric sub-domains On the outer boundary, the radiation condition is taken into account by a new BIEM called EID A Multi Level Fast Multipole Algorithm has been applied to this IE in order to reduce the CPU time BIEM : EID WN+1 WN W1 PDE S0 S1 SN CEA/DAM/CESTA - France CSC'05 21-23 june 2005

CEA’s Parallel industrial codes in electromagnetism Domain Decomposition Method B. Stupfel - B. Desprès (1999) Feature : DDM partitioned into concentric sub-domains BIEM Leontovitch impedance condition on S0 WN+1 WN W1 Transmission condition on Si S0 S1 SN Coupling with the outer problem : TC <==> Impedance condition on SN with Z=1 We used an iterative resolution to solve the global problem CEA/DAM/CESTA - France CSC'05 21-23 june 2005

CEA’s Parallel industrial codes in electromagnetism The BIEM - EID B. Desprès (1996) Minimization of a positive quadratic functional associated to incoming and outgoing electromagnetic waves, with a set of linear constraints F. Collino & B. Desprès (2000) : link with the classical BIEM Very accurate method but leads to solve 2 coupled linear systems with 4 unknowns by edge - 2 linear combinations of the electric and magnetic currents - 2 Lagrangian unknowns (because of the constraints) Very efficient with an impedance condition Z=1 - memory space - the 2 linear system become uncoupled Z=1  exactly the case we are interested in CEA/DAM/CESTA - France CSC'05 21-23 june 2005

CEA’s Parallel industrial codes in electromagnetism The MLFMA applied to EID K. Mer-Nkonga (2001) Potential of interaction Separation of interaction between far and near sources + approximation for the interaction with the far sources : In the EID context : - we have to deal with 2 operators : a linear combination CEA/DAM/CESTA - France CSC'05 21-23 june 2005

CEA’s Parallel industrial codes in electromagnetism Multi scale parallel algorithm The global problem is solved by a Gauss-Seidel method Sequential step The parallelization is at the level of the resolution of each sub-domain Nominal strategy For each inner problem use very efficient parallel sparse direct methods : EMILIO (with solver PaStiX) on NPr processors For the outer problem use iterative solver with a parallel FMM on NPu processors NPr >> NPu We use this unbalanced number of processors to implement a multi level parallel algorithm Decomposition in pool of processors, each of them is in charge of one right hand side (useful for the 2 polarizations and multi incidences) . CEA/DAM/CESTA - France CSC'05 21-23 june 2005

CEA’s Parallel industrial codes in electromagnetism 1997 : Needs of the CESTA of an industrial software tool that would be robust and versatile to solve very large sparse linear systems (structural mechanics, electromagnetism). EMILIO is developed in collaboration with the ScAlApplix team (INRIA, Bordeaux). EMILIO gives efficient solution to the problem of the parallel resolution of sparse linear systems by direct methods, thanks to the high performance direct solver PaStiX (developed by ScAlApplix) Organized in two parts : a sequential pre-processing phase with a global strategy, a parallel phase. Why EMILIO ? CEA/DAM/CESTA - France CSC'05 21-23 june 2005

CEA’s Parallel industrial codes in electromagnetism EMILIO, PaStiX, Scotch ? General description Ordering and block symbolic factorization Parallel factorization algorithm Block partitioning and mapping Work in progress CEA/DAM/CESTA - France CSC'05 21-23 june 2005

CEA’s Parallel industrial codes in electromagnetism EMILIO : General description SCOTCH PaStiX CEA/DAM/CESTA - France CSC'05 21-23 june 2005

CEA’s Parallel industrial codes in electromagnetism Collaboration ScAlApplix (INRIA) - P. Amestoy (ENSEEIHT / IRIT-Toulouse). Tight coupling Size threshold to shift from Nested Dissection to Halo Approximate Minimum Degree Partition P of the graph vertices is obtained by merging the partition of separators and the supernodes computed by block amalgamation over the columns of the subgraphs ordered by HAMD. Ordering (Scotch) Without Halo With Halo CEA/DAM/CESTA - France CSC'05 21-23 june 2005

CEA’s Parallel industrial codes in electromagnetism Symbolic Block factorization (PaStiX) Linear time and space complexities Data block structures Þ only a few pointers Þ use of BLAS3 primitives for numerical computations CEA/DAM/CESTA - France CSC'05 21-23 june 2005

CEA’s Parallel industrial codes in electromagnetism Parallel factorization algorithm (PaStiX) Factorization without pivoting Static regulation scheme The algorithm we deal with is a parallel supernodal version of sparse L.D.Lt , L.Lt , L.U with 1D/2D block distributions Block or column block computations block algorithms are highly cache-friendly BLAS3 computations Processors communicate using aggregated update blocks only Homogeneous & Heterogeneous architectures with predictable performance (SMP) CEA/DAM/CESTA - France CSC'05 21-23 june 2005

CEA’s Parallel industrial codes in electromagnetism Matrix Partitioning Task Graph Block Symbolic Matrix Cost Modeling (Comp/Comm) Number of Processors Mapping and Scheduling Local data Task Scheduling Communication Scheme Parallel Factorization and Solver Computation Time (estimate) Memory Allocation (during factorization) Block partitioning and mapping (PaStiX) CEA/DAM/CESTA - France CSC'05 21-23 june 2005

CEA’s Parallel industrial codes in electromagnetism Current work on Scotch On-going work on PT-Scotch (Parallel version) Current work for PaStiX : Hybrid iterative-direct block solver Starts from the acknowledgement that it is difficult to build a generic and robust pre-conditioner Large scale 3D problems High performance computing Derives direct solver techniques to compute incomplete factorization based preconditioner What’s new ? : (dense) block formulation Incomplete block symbolic factorization: Remove blocks with algebraic criteria Sensitive to structure, not in a first time, to numerical value Provides incomplete LDLt, Cholesky, LU (with static pivoting for symmetric pattern) CEA/DAM/CESTA - France CSC'05 21-23 june 2005

CEA’s Parallel industrial codes in electromagnetism Mesh of a Sphere Cone : * 10 millions unknowns in 6 sub-domains (in the volume) * 220 000 edges for EID (on the outer boundary) 10 millions volumic unknowns 220 000 egdes (surfacic coupling) 6 sub-domains 7 global iterations 64 Processors CPU Time : 6h15’ max. memory / proc : 2.6 Gbytes CEA/DAM/CESTA - France CSC'05 21-23 june 2005

CEA’s Parallel industrial codes in electromagnetism Field E Picture (PhD F. Vivodtzev) CEA/DAM/CESTA - France CSC'05 21-23 june 2005

CEA’s Parallel industrial codes in electromagnetism Conclusions for Parallel codes in electromagnetism We have developed a 3D code which is able to take into account all the constraints we have We did with success the validation of all the physics we have put in it by comparison with : - other codes we have (full BIEM, 2D axis symmetric) - measurements The next step will be to reach (for the end of this year) : - 30 millions unknowns : total amount of DOF (Degrees Of Freedom) in the volume - 1 million unknowns : on the outer boundary A recent result of the solver PaStiX : 10 millions unknowns, 64 procs Power 4 on 2 SMP nodes, 64 giga bytes of RAM per node, NNZL : 6,7 thousand millions, 43 Tera operations needed to factorize, CPU time : 400 seconds to factorize CEA/DAM/CESTA - France CSC'05 21-23 june 2005

CEA’s Parallel industrial codes in electromagnetism Thank you Merci Grazie Mille for your attention Danke Schön Gracias CEA/DAM/CESTA - France CSC'05 21-23 june 2005