Computational Science R&D for Electromagnetic Modeling: Recent Advances and Perspective to Extreme-Scale Lie-Quan Lee For SLAC Computational Team ComPASS.

Slides:



Advertisements
Similar presentations
Steady-state heat conduction on triangulated planar domain May, 2002
Advertisements

Solving Large-scale Eigenvalue Problems in SciDAC Applications
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Basic FEA Procedures Structural Mechanics Displacement-based Formulations.
Solving Linear Systems (Numerical Recipes, Chap 2)
COMPASS All-hands Meeting, Fermilab, Sept Scalable Solvers in Petascale Electromagnetic Simulation Lie-Quan (Rich) Lee, Volkan Akcelik, Ernesto.
1 A component mode synthesis method for 3D cell by cell calculation using the mixed dual finite element solver MINOS P. Guérin, A.M. Baudron, J.J. Lautard.
Steepest Decent and Conjugate Gradients (CG). Solving of the linear equation system.
Modern iterative methods For basic iterative methods, converge linearly Modern iterative methods, converge faster –Krylov subspace method Steepest descent.
Jonathan Richard Shewchuk Reading Group Presention By David Cline
Parallel Mesh Refinement with Optimal Load Balancing Jean-Francois Remacle, Joseph E. Flaherty and Mark. S. Shephard Scientific Computation Research Center.
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
Advancing Computational Science Research for Accelerator Design and Optimization Accelerator Science and Technology - SLAC, LBNL, LLNL, SNL, UT Austin,
Previously Two view geometry: epipolar geometry Stereo vision: 3D reconstruction epipolar lines Baseline O O’ epipolar plane.
CS240A: Conjugate Gradients and the Model Problem.
SLAC is focusing on the modeling and simulation of DOE accelerators using high- performance computing The performance of high-brightness RF guns operating.
Wakefield Damping Effects in the CLIC Power Extraction and Transfer Structure (PETS) Wakefield Simulation of CLIC PETS Structure Using Parallel 3D Finite.
COMPUTER-AIDED DESIGN The functionality of SolidWorks Simulation depends on which software Simulation product is used. The functionality of different producs.
Wakefield Computations at Extreme-scale for Ultra-short Bunches using Parallel hp-Refinement Lie-Quan Lee, Cho Ng, Arno Candel, Liling Xiao, Greg Schussman,
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
Morphological Analysis of 3D Scalar Fields based on Morse Theory and Discrete Distortion Mohammed Mostefa Mesmoudi Leila De Floriani Paola Magillo Dept.
The sequence of graph transformation (P1)-(P2)-(P4) generating an initial mesh with two finite elements GENERATION OF THE TOPOLOGY OF INITIAL MESH Graph.
Computer Graphics Group Tobias Weyand Mesh-Based Inverse Kinematics Sumner et al 2005 presented by Tobias Weyand.
1 A Domain Decomposition Analysis of a Nonlinear Magnetostatic Problem with 100 Million Degrees of Freedom H.KANAYAMA *, M.Ogino *, S.Sugimoto ** and J.Zhao.
Simulation Technology & Applied Research, Inc N. Port Washington Rd., Suite 201, Mequon, WI P:
Algorithms for a large sparse nonlinear eigenvalue problem Yusaku Yamamoto Dept. of Computational Science & Engineering Nagoya University.
Fast Low-Frequency Impedance Extraction using a Volumetric 3D Integral Formulation A.MAFFUCCI, A. TAMBURRINO, S. VENTRE, F. VILLONE EURATOM/ENEA/CREATE.
ParCFD Parallel computation of pollutant dispersion in industrial sites Julien Montagnier Marc Buffat David Guibert.
Discontinuous Galerkin Methods Li, Yang FerienAkademie 2008.
© 2011 Autodesk Freely licensed for use by educational institutions. Reuse and changes require a note indicating that content has been modified from the.
Combinatorial Scientific Computing and Petascale Simulation (CSCAPES) A SciDAC Institute Funded by DOE’s Office of Science Investigators Alex Pothen, Florin.
On the Use of Sparse Direct Solver in a Projection Method for Generalized Eigenvalue Problems Using Numerical Integration Takamitsu Watanabe and Yusaku.
Computational Aspects of Multi-scale Modeling Ahmed Sameh, Ananth Grama Computing Research Institute Purdue University.
1 1 What does Performance Across the Software Stack mean?  High level view: Providing performance for physics simulations meaningful to applications 
Solution of Sparse Linear Systems
Parallel Solution of the Poisson Problem Using MPI
Adaptive Meshing Control to Improve Petascale Compass Simulations Xiao-Juan Luo and Mark S Shephard Scientific Computation Research Center (SCOREC) Interoperable.
Case Study in Computational Science & Engineering - Lecture 5 1 Iterative Solution of Linear Systems Jacobi Method while not converged do { }
October 2008 Integrated Predictive Simulation System for Earthquake and Tsunami Disaster CREST/Japan Science and Technology Agency (JST)
Partitioning using Mesh Adjacencies  Graph-based dynamic balancing Parallel construction and balancing of standard partition graph with small cuts takes.
23/5/20051 ICCS congres, Atlanta, USA May 23, 2005 The Deflation Accelerated Schwarz Method for CFD C. Vuik Delft University of Technology
ILC Damping Rings Mini-Workshop, KEK, Dec 18-20, 2007 Status and Plans for Impedance Calculations of the ILC Damping Rings Cho Ng Advanced Computations.
COMPASS All-Hands Meeting, FNAL, Sept , 2007 Accelerator Prototyping Through Multi-physics Analysis Volkan Akcelik, Lie-Quan Lee, Ernesto Prudencio,
MA/CS 471 Lecture 15, Fall 2002 Introduction to Graph Partitioning.
Graphs, Vectors, and Matrices Daniel A. Spielman Yale University AMS Josiah Willard Gibbs Lecture January 6, 2016.
Finite Element Modelling of Photonic Crystals Ben Hiett J Generowicz, M Molinari, D Beckett, KS Thomas, GJ Parker and SJ Cox High Performance Computing.
Report from LBNL TOPS Meeting TOPS/ – 2Investigators  Staff Members:  Parry Husbands  Sherry Li  Osni Marques  Esmond G. Ng 
Algebraic Solvers in FASTMath Argonne Training Program on Extreme-Scale Computing August 2015.
1 Rocket Science using Charm++ at CSAR Orion Sky Lawlor 2003/10/21.
A Parallel Hierarchical Solver for the Poisson Equation Seung Lee Deparment of Mechanical Engineering
Multipole-Based Preconditioners for Sparse Linear Systems. Ananth Grama Purdue University. Supported by the National Science Foundation.
Finding Rightmost Eigenvalues of Large, Sparse, Nonsymmetric Parameterized Eigenvalue Problems Minghao Wu AMSC Program Advisor: Dr. Howard.
Conjugate gradient iteration One matrix-vector multiplication per iteration Two vector dot products per iteration Four n-vectors of working storage x 0.
Fermi National Accelerator Laboratory & Thomas Jefferson National Accelerator Facility SciDAC LQCD Software The Department of Energy (DOE) Office of Science.
High Performance Computing Seminar
CSE 554 Lecture 8: Alignment
G. Cheng, R. Rimmer, H. Wang (Jefferson Lab, Newport News, VA, USA)
Challenges in Electromagnetic Modeling Scalable Solvers
SIMULATION TOOLS FOR PHOTONIC CRYSTAL FIBER*
SIMULATION TOOLS FOR PHOTONIC CRYSTAL FIBER*
Parallel 3D Finite Element Particle-In-Cell Simulations with Pic3P*
PARALLEL FINITE ELEMENT MODELING TOOLS FOR ERL DESIGN AND ANALYSIS
L Ge, L Lee, A. Candel, C Ng, K Ko, SLAC
V. Akcelik, L-Q Lee, Z. Li, C-K Ng, L. Xiao and K. Ko,
Wakefield Simulation of CLIC PETS Structure Using Parallel 3D Finite Element Time-Domain Solver T3P* Arno Candel, Andreas Kabel, Zenghai Li, Cho Ng, Liequan.
SRF Cavity Designs for the International Linear Collider*
Supported by the National Science Foundation.
Comparison of CFEM and DG methods
PARALLEL FINITE ELEMENT MODELING TOOLS FOR ERL DESIGN AND ANALYSIS
V. Akcelik, L-Q Lee, Z. Li, C-K Ng, L. Xiao and K. Ko,
Presentation transcript:

Computational Science R&D for Electromagnetic Modeling: Recent Advances and Perspective to Extreme-Scale Lie-Quan Lee For SLAC Computational Team ComPASS all hands meeting, Boulder, CO October, 2009

Overview *Recent advances in CS/AM for electromagnetic modeling –Eigensolvers –Meshing –Load balancing –Visualization *Perspective to extreme-scale –Extreme scale problems –Perspective from computational science

Frequency Domain Eigenmode Analysis: Omega3P Find frequency, quality factor and field vector of modes. Maxwell’s Eqns in Frequency Domain N1N1 N2N2 Curved tetrahedral finite elements with higher-order vector basis functions N i : ACE3P Finite Element Method Generalized Eigenvalue Problem Interior eigenvalues Can be a complex nonlinear eigenvalue problem with more complex boundary conditions For order p=2: 20 different N i ’ s For order p=6: 216 different N i ’ s

Example Spectrum of the Eigensystem Zoom in Eigenvalue of interest

Eigensolver and Linear Solver for Modeling Accelerator Cavities *To solve the eigensystem for interior eigenvalues *The shift-invert spectral transformation *Need to solve highly indefinite linear system –Sparse direct solver or iterative solver with good preconditioner

Memory Configuration of Recent Supercomputers Memory-usage scalability of Algorithms is a critical issue!

Memory-usage of A Sparse Direct Solver *Maximal per-core MU is 4-5 times larger than the average MU *Once it cannot fit into N cores, it most likely will not fit into 2*N cores *More “memory-usage” scalable solver needed MUMPS per-core memory usage N=1.11M, nnz=46.1M Complex matrix

Another Sparse Direct Solver *Speed: scales better *Memory usage: scales poorly (bottleneck)

Memory Bottleneck *Need solvers to be scalable for memory-usage –Hybrid linear solver (LBNL) –Domain specific spectra multilevel preconditioner –Scalable eigensolvers (Today)

Construct Explicit Gradient Space G *Tree-cotree splitting for lowest order vector basis functions –Minimum spanning tree for a mesh (edges on electric boundary conditions needs special care) –Remove all the DOFs on the tree edges –Add the gradient of vertices as the replacing basis functions *Explicit formulation of gradient space for higher order basis functions Example of tree- cotree splitting for 2D circular cavity

New Scalable Eigensolver: Method 1 Decompose finite element bases { N i } into gradient space G an rotational space R. That makes the GEP Kx= Mx into two-by-two block form Where K 11 is symmetric positive definite. Thus, the null space of matrix K is Y:

More Scalable Eigensolver (continue) *We can prove that the original GEP Kx= Mx has the same non-zero eigenvalues as the following eigenvalue problem: *We will use Arnoldi algorithm to compute smallest (extreme) eigenvalues of the above EP. *(I-YY T ) is very easy to apply because *Corresponding eigenvectors can be recovered from the following (M-orthogonalization with the null space Y) *The benefit is, we can solve Mp=q in a very scalable way (memory-usage): for example, conjugate gradient with incomplete Cholesky preconditioner

Preliminary Results for the Method 1 *DDS Cell *Compute the first two nonzero eigen-pairs *Shift-Invert *Number of operation (K-  M) -1 M : 53 *New method *Number of operation (I-YY T )M -1 K : 1361 *Remember that M -1 is much more scalable and easy than (K-  M) -1. Computer model of 1/8 th of the DDS Cell

Caveats for Method 1 *The residual for the eigenvalue problem cannot be too small (not a big issue) –After the transformation back, the residual of the original eigenvalue problem is small *Need larger search space *The convergence of the eigenvalue problem is mesh- dependent –Arnoldi method makes extreme eigenvalues converge Smallest eigenvalues Largest eigenvalues –Denser mesh will make the largest eigenvalue larger! Deflate those converged but unwanted eigenvalues? No. of Elements No. of Iterations

New Thoughts *Consider the following eigen-system and That is equivalent to

Another New Scalable Eigensolver: Method 2 *Make a transformation: *Use Arnoldi Method: *For each matrix-vector multiplication Ap=q *Two solves: one for M 22 and the other for K 11 *Use conjugate gradient with incomplete Cholesky A

Preliminary Results for the Method 2 *DDS Cell *Compute the first two nonzero eigen-pairs *Shift-Invert *Number of operation (K-  M) -1 M : 53 *New method *Number of operation : 38 *Remember that operation is much more scalable and easy than (K-  M) -1. Computer model of 1/8 th of the DDS Cell

Future Work K 11 is more and more difficult to solve as meshes gets denser –Further study needed to efficiently solve it (e.g., convergence independent of mesh size?) Spectra of K 11 from different meshes

Multi-file NetCDF Format *SLAC and RPI collaborated in parallel mesh generation for meshes with large number of elements *A multi-file NetCDF format is designed to remove synchronized parallel writing bottleneck *Preliminary testing has shown the success and efficacy of using the format File1 File2 File3 File4 File5 File6 File7 Summary SLAC Finite-element Simulation Suite Multi-file NetCDF format

Effects of Inverted Curved Elements *Inverted Curved Elements *E.g.: Edge crosses the volume at the points other than vertices 2D example

Spectra Comparison For Frequency Domain Analysis Red: from original mesh with inverted curved elements Blue: from fixed mesh

Spectra Comparison For Frequency Domain Analysis Red: from original mesh with inverted curved elements Blue: from fixed mesh Abnormal eigenvalue

Impact on Frequency Domain Analysis *Largest eigenvalue from mesh with inverted curved element is an order of magnitude larger *Small nonzero eigenvalues does not change much! (Good) *If sparse direct solver (SDS) is used for shifted linear system, the impact is minimal due to relatively robust SDS

Spectra Comparison for Time Domain Analysis Red: from original mesh with inverted curved elements Blue: from fixed mesh

Spectra Comparison for Time Domain Analysis Red: from original mesh with inverted curved elements Blue: from fixed mesh Abnormal eigenvalue

Impact on Time Domain Analysis *Largest eigenvalue from mesh with inverted curved element is an order of magnitude larger *Smallest eigenvalue does not change *Condition number of a matrix is ratio of the largest eigenvalue to the smallest eigenvalue *Convergence of iterative solver suffers greatly because of one order of magnitude larger condition number *The noise associated with this large eigenvalue is unknown (likely to have adverse effects in longer time) *Correcting inverted curved element and controlling the shape are crucial! –RPI implement inverted curved element correction tool –Recently they added in shape control measure –Geometry-mesh relation is very important but it is missing in the tool

Mesh Partitioning for Balanced Load *Mismatch –Partitioning on mesh elements –Load is on number of DOFs *Need improved graph model for unstructured mesh –Current graph model represents only element face sharing –Tetrahedral elements sharing only edges will be accurately represented –Graph edges are weighted to balance number of degrees of freedom *Refine partitioning to balancing number of DOFs (>10k cores) 56 million elements

Visualization *Greg Schussman’s talk today

Extreme-Scale Computing Need: ILC/Project X Cryomodule for International Linear Collider / Project X What we accomplished -Frequency-domain: 10 7 elements, 10 8 DOFs -Time-domain: 10 8 elements, 10 9 DOFs Physics Goal -Broadband beam heating in superconducting cryomodule (~10m long) with 300  m beam size Problem size - 5 x elements, > DOFs, flops HEP Extreme-scale Workshop, SLAC, Dec. 2008

Extreme scale Coupling, wakefield and dark current 5 x elements, ~ DOFs, 100K time steps, flops A snapshot of fields in coupled system of PETS and main accelerating structure (Arno Candel) Preliminary study (Arno Candel) 17 million elements with 20 million DOFs Extreme-Scale Computing Need: CLIC Drive Beam Main Beam Two-Beam Module for CLIC PETS Accelerating structure HEP Extreme-scale Workshop, SLAC, Dec. 2008

Omega3P Success in EM Modeling An increase of 10 5 in problem size with 0.01% relative error over a decade From closed cavities to waveguide loaded cavities 2D Cell 3D Cell 2D detuned structure 3D detuned structure SCR cavity Cryomodule Moore’s law RF unit Year

Current Simulation and Analysis Flow Mesh generation Analysis & visualization CAD model Netcdf mesh file Input parameters Partition mesh Assemble matrices Solvers: frequency/time domains Postprocess: E/B

Path to Extreme-scale Computing *Meshing (ITAPS) –Parallel mesh generation –Adaptive mesh refinement –Online mesh generation *Partitioning and Load Balancing (CSCAPES & ITAPS) –Improved scalability and better balancing *Solvers (TOPS) –Speed and memory-usage scalability –Advancement in different level Use new method (domain decomposition, discontinuous Galerkin, etc) Device new algorithms for eigensolvers, linear solvers, preconditioners *Visualization and Analysis (IUSV) –Parallel visualization –Explorative methods –Integrated simulation and analysis