1 Mark F. Adams SciDAC - 27 June 2005 Ax=b: The Link between Gyrokinetic Particle Simulations of Turbulent Transport in Burning Plasmas and Micro-FE Analysis.

Slides:



Advertisements
Similar presentations
CHAPTER 1: COMPUTATIONAL MODELLING
Advertisements

A Discrete Adjoint-Based Approach for Optimization Problems on 3D Unstructured Meshes Dimitri J. Mavriplis Department of Mechanical Engineering University.
1 Numerical Solvers for BVPs By Dong Xu State Key Lab of CAD&CG, ZJU.
CS 290H 7 November Introduction to multigrid methods
SOLVING THE DISCRETE POISSON EQUATION USING MULTIGRID ROY SROR ELIRAN COHEN.
Geometric (Classical) MultiGrid. Hierarchy of graphs Apply grids in all scales: 2x2, 4x4, …, n 1/2 xn 1/2 Coarsening Interpolate and relax Solve the large.
Parallelizing stencil computations Based on slides from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267.
Algebraic MultiGrid. Algebraic MultiGrid – AMG (Brandt 1982)  General structure  Choose a subset of variables: the C-points such that every variable.
Hierarchical Multi-Resolution Finite Element Model for Soft Body Simulation Matthieu Nesme, François Faure, Yohan Payan 2 nd Workshop on Computer Assisted.
ECE669 L4: Parallel Applications February 10, 2004 ECE 669 Parallel Computer Architecture Lecture 4 Parallel Applications.
03/14/06CS267 Lecture 17 CS 267: Applications of Parallel Computers Unstructured Multigrid for Linear Systems James Demmel Based in part on material from.
Parallel Decomposition-based Contact Response Fehmi Cirak California Institute of Technology.
CS267 L12 Sources of Parallelism(3).1 Demmel Sp 1999 CS 267 Applications of Parallel Computers Lecture 12: Sources of Parallelism and Locality (Part 3)
Parallel Mesh Refinement with Optimal Load Balancing Jean-Francois Remacle, Joseph E. Flaherty and Mark. S. Shephard Scientific Computation Research Center.
Landscape Erosion Kirsten Meeker
03/09/06CS267 Lecture 16 CS 267: Applications of Parallel Computers Solving Linear Systems arising from PDEs - II James Demmel
04/13/07CS267 Guest Lecture CS 267: Applications of Parallel Computers Unstructured Multigrid for Linear Systems James Demmel Based in part on material.
MCE 561 Computational Methods in Solid Mechanics
1 Parallel Simulations of Underground Flow in Porous and Fractured Media H. Mustapha 1,2, A. Beaudoin 1, J. Erhel 1 and J.R. De Dreuzy IRISA – INRIA.
Scalabilities Issues in Sparse Factorization and Triangular Solution Sherry Li Lawrence Berkeley National Laboratory Sparse Days, CERFACS, June 23-24,
Improving Coarsening and Interpolation for Algebraic Multigrid Jeff Butler Hans De Sterck Department of Applied Mathematics (In Collaboration with Ulrike.
1 A Domain Decomposition Analysis of a Nonlinear Magnetostatic Problem with 100 Million Degrees of Freedom H.KANAYAMA *, M.Ogino *, S.Sugimoto ** and J.Zhao.
Scalable Multi-Stage Stochastic Programming
Hans De Sterck Department of Applied Mathematics University of Colorado at Boulder Ulrike Meier Yang Center for Applied Scientific Computing Lawrence Livermore.
Van Emden Henson Panayot Vassilevski Center for Applied Scientific Computing Lawrence Livermore National Laboratory Element-Free AMGe: General algorithms.
ParCFD Parallel computation of pollutant dispersion in industrial sites Julien Montagnier Marc Buffat David Guibert.
CFD Lab - Department of Engineering - University of Liverpool Ken Badcock & Mark Woodgate Department of Engineering University of Liverpool Liverpool L69.
ML: Multilevel Preconditioning Package Trilinos User’s Group Meeting Wednesday, October 15, 2003 Jonathan Hu Sandia is a multiprogram laboratory operated.
Efficient Integration of Large Stiff Systems of ODEs Using Exponential Integrators M. Tokman, M. Tokman, University of California, Merced 2 hrs 1.5 hrs.
The swiss-carpet preconditioner: a simple parallel preconditioner of Dirichlet-Neumann type A. Quarteroni (Lausanne and Milan) M. Sala (Lausanne) A. Valli.
Fall 2011Math 221 Multigrid James Demmel
High Performance Computational Fluid-Thermal Sciences & Engineering Lab GenIDLEST Co-Design Virginia Tech AFOSR-BRI Workshop July 20-21, 2014 Keyur Joshi,
1 1 What does Performance Across the Software Stack mean?  High level view: Providing performance for physics simulations meaningful to applications 
Implementing Hypre- AMG in NIMROD via PETSc S. Vadlamani- Tech X S. Kruger- Tech X T. Manteuffel- CU APPM S. McCormick- CU APPM Funding: DE-FG02-07ER84730.
Introduction to Scientific Computing II Multigrid Dr. Miriam Mehl.
Lecture 21 MA471 Fall 03. Recall Jacobi Smoothing We recall that the relaxed Jacobi scheme: Smooths out the highest frequency modes fastest.
1 Mark F. Adams 22 October 2004 Applications of Algebraic Multigrid to Large Scale Mechanics Problems.
Discretization for PDEs Chunfang Chen,Danny Thorne Adam Zornes, Deng Li CS 521 Feb., 9,2006.
MULTISCALE COMPUTATIONAL METHODS Achi Brandt The Weizmann Institute of Science UCLA
MA/CS 471 Lecture 15, Fall 2002 Introduction to Graph Partitioning.
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Advanced User Support for MPCUGLES code at University of Minnesota October 09,
ML: A Multilevel Preconditioning Package Copper Mountain Conference on Iterative Methods March 29-April 2, 2004 Jonathan Hu Ray Tuminaro Marzio Sala Sandia.
Algebraic Solvers in FASTMath Argonne Training Program on Extreme-Scale Computing August 2015.
On the Performance of PC Clusters in Solving Partial Differential Equations Xing Cai Åsmund Ødegård Department of Informatics University of Oslo Norway.
Brain (Tech) NCRR Overview Magnetic Leadfields and Superquadric Glyphs.
The Application of the Multigrid Method in a Nonhydrostatic Atmospheric Model Shu-hua Chen MMM/NCAR.
A Parallel Hierarchical Solver for the Poisson Equation Seung Lee Deparment of Mechanical Engineering
Adaptive grid refinement. Adaptivity in Diffpack Error estimatorError estimator Adaptive refinementAdaptive refinement A hierarchy of unstructured gridsA.
ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. Chapter 5 Distributed Memory Parallel Computing v9.0.
Hybrid Parallel Implementation of The DG Method Advanced Computing Department/ CAAM 03/03/2016 N. Chaabane, B. Riviere, H. Calandra, M. Sekachev, S. Hamlaoui.
1 Mark F. Adams H.H. Bayraktar, T.M. Keaveny, P. Papadopoulos and Atul Gupta Ultrascalable Implicit Finite Element Analyses in Solid Mechanics with over.
High Performance Computing Seminar II Parallel mesh partitioning with ParMETIS Parallel iterative solvers with Hypre M.Sc. Caroline Mendonça Costa.
Unstructured Meshing Tools for Fusion Plasma Simulations
Hui Liu University of Calgary
G. Cheng, R. Rimmer, H. Wang (Jefferson Lab, Newport News, VA, USA)
Xing Cai University of Oslo
Amit Amritkar & Danesh Tafti Eric de Sturler & Kasia Swirydowicz
H.H. Bayraktar, T.M. Keaveny, P. Papadopoulos and Atul Gupta
H.H. Bayraktar, T.M. Keaveny, P. Papadopoulos and Atul Gupta
James Demmel Multigrid James Demmel Fall 2010 Math 221.
H.H. Bayraktar, T.M. Keaveny, P. Papadopoulos and Atul Gupta
Innovative Multigrid Methods
A robust preconditioner for the conjugate gradient method
H.H. Bayraktar, T.M. Keaveny, P. Papadopoulos and Atul Gupta
H.H. Bayraktar, T.M. Keaveny, P. Papadopoulos and Atul Gupta
GENERAL VIEW OF KRATOS MULTIPHYSICS
OVERVIEW OF FINITE ELEMENT METHOD
Parallelizing Unstructured FEM Computation
Ph.D. Thesis Numerical Solution of PDEs and Their Object-oriented Parallel Implementations Xing Cai October 26, 1998.
(based in part on material from Mark Adams)
Presentation transcript:

1 Mark F. Adams SciDAC - 27 June 2005 Ax=b: The Link between Gyrokinetic Particle Simulations of Turbulent Transport in Burning Plasmas and Micro-FE Analysis of Whole Vertebral Bodies in Orthopaedic Biomechanics

2 Outline  Algebraic multigrid (AMG) introduction  Micro-FE bone modeling  Olympus parallel FE framework  Scalability study on IBM SPs  Gyrokinetic Particle Simulations of Turbulent Transport in Burning Plasmas

3 Multigrid smoothing and coarse grid correction (projection) smoothing Finest Grid Prolongation (P=R T ) The Multigrid V-cycle First Coarse Grid Restriction (R) Note: smaller grid

4 Multigrid V(    ) - cycle  Given smoother S and coarse grid space (P)  Columns of “prolongation” operator P, discrete rep. of coarse grid space MG-V  Function u = MG-V(A,f)  if A is small  u  A -1 f  else  u  S  (f, u) --  steps of smoother (pre)  r H  P T ( f – Au ) MG-Vrecursion  u H  MG-V(P T AP, r H )-- recursion (Galerkin)  u  u + Pu H  u  S  (f, u) --  steps of smoother (post)  Iteration matrix w/ R = P T : T = S ( I - P(RAP) -1 RA ) S  multiplicative

5 Smoothed Aggregation  Coarse grid space & smoother  MG method  Piecewise constant function: “Plain” agg. (P 0 )  Start with kernel vectors B of operator  eg, 6 RBMs in elasticity  Nodal aggregation BP0P0  “Smoothed” aggregation: lower energy of functions  One Jacobi iteration: P  ( I -  D -1 A ) P 0

6 Outline  Algebraic multigrid (AMG) introduction  Micro-FE bone modeling  Olympus parallel FE framework  Scalability study on IBM SPs  Gyrokinetic Particle Simulations of Turbulent Transport in Burning Plasmas

7 Trabecular Bone 5-mm Cube Cortical bone Trabecular bone

8 Micro-Computed Tomography  22  m resolution 3D image Mechanical Testing E,  yield,  ult, etc. 2.5 mm cube 44  m elements  FE mesh Methods:  FE modeling

9 the vertebral body you are showing is pretty healthy from a 80 year old female and it is a T-10 that is thoracic. So it is pretty close to the mid-spine. Usually research is done from T-10 downward to the lumbar vertebral bodies. There are 12 thoracic VB's and 5 lumbar. The numbers go up as you go down.

10 Motivation Calibrate material models for continuum elements –eg, explicit computation of a yield surface Validation for low order model Investigation of effects that are not accessible with lower order models –role of cortical shell in load carrying of vertebra –effects of drug treatment on continuum properties 1 mm slice from vertebral body

11 Outline  Algebraic multigrid (AMG) introduction  Micro-FE bone modeling  Olympus parallel FE framework  Scalability study on IBM SPs  Gyrokinetic Particle Simulations of Turbulent Transport in Burning Plasmas

12  Athena: Parallel FE  ParMetis  Parallel Mesh Partitioner (Univerisity of Minnesota)  Prometheus  Multigrid Solver  FEAP  Serial general purpose FE application (University of California)  PETSc  Parallel numerical libraries (Argonne National Labs)  FE Mesh Input File Athena ParMetis FE input file (in memory) Partition to SMPs Athena ParMetis File FEAP Material Card Silo DB Visit Prometheus PETSc ParMetis METIS pFEAP Computational Architecture Olympus

13 Geometric & Material non-linear 2.25% strain 8 procs. DataStar (SP4 at UCSD)

14 ParMetis partitions

15 Outline  Algebraic multigrid (AMG) introduction  Micro-FE bone modeling  Olympus parallel FE framework  Scalability study on IBM SPs  Gyrokinetic Particle Simulations of Turbulent Transport in Burning Plasmas

16 80 µm w/ shell Vertebral Body With Shell  Large deformation elast.  6 load steps (3% strain)  Scaled speedup  ~131K dof/processor  7 to 537 million dof  4 to 292 nodes  IBM SP Power3  14 of 16 procs/node used  Double/Single Colony switch

17 80 µm w/o shell  Inexact Newton  CG linear solver  Variable tolerance  Smoothed aggregation AMG preconditioner  Nodal block diagonal smoothers:  2 nd order Chebeshev (add.)  Gauss-Seidel (multiplicative) Scalability

18 Computational phases  Mesh setup (per mesh):  Coarse grid construction (aggregation)  Graph processing  Matrix setup (per matrix):  Coarse grid operator construction  Sparse matrix triple product RAP (expensive for S.A.)  Subdomain factorizations  Solve (per RHS):  Matrix vector products (residuals, grid transfer)  Smoothers (Matrix vector products)

19 131K dof / proc - Flops/sec/proc.47 Teraflop/s processors

20 Sources of scale inefficiencies in solve phase 7.5M dof537M dof #iteration #nnz/row5068 Flop rate7674 #elems/pr19.3K33.0K model Measured

21 Strong speedup with 7.5M dof problem (1 to 128 nodes)

22 Outline  Algebraic multigrid (AMG) introduction  Micro-FE bone modeling  Olympus parallel FE framework  Scalability study on IBM SPs  Gyrokinetic Particle Simulations of Turbulent Transport in Burning Plasmas

23

24 Finite Element (FEM) Elliptic Solver Developed for GTC Global Field Aligned Mesh FEM adapted for logically non- rectangular grids. Need adjustments of elements at different toroidal angles. Linear sparse matrix solver –PETSc (ANL) Enabled implementing split-weight (Manuilskiy & Lee, POP2000) –and hybrid electron models (Lin & Chen, PoP2001) Ongoing studies of kinetic electron effects on ITG and TEM turbulence Ongoing studies of electromagnetic turbulences:

25 Performance Multigrid preconditioned Krylov solver –Prometheus (Columbia) & HYPRE (LLNL) Scaled speedup –~38K dof per processor –1 to 32 processors/plane –8 planes, 20 time steps, 4 particles per cell

26 Thank You Gordon Bell Prize winner 2004: Ultrascalable implicit finite element analyses in solid mechanics with over a half a billion degrees of freedom M.F. Adams, H.H. Bayraktar,T.M. Keaveny, P. Papadopoulos ACM/IEEE Proceedings of SC2004: High Performance Networking and Computing

27 Linear solver iterations Newton Load Small (7.5M dof)Large (537M dof)

28 S. Ethier Thunder-LLNLJacquard-NERSC

29 164K dof/proc