Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Mark F. Adams 22 October 2004 Applications of Algebraic Multigrid to Large Scale Mechanics Problems.

Similar presentations


Presentation on theme: "1 Mark F. Adams 22 October 2004 Applications of Algebraic Multigrid to Large Scale Mechanics Problems."— Presentation transcript:

1

2 1 Mark F. Adams 22 October 2004 Applications of Algebraic Multigrid to Large Scale Mechanics Problems

3 2 Outline  Algebraic multigrid (AMG) introduction  Industrial applications  Micro-FE bone modeling  Olympus Parallel FE framework  Scalability studies on IBM SPs  Scaled speedup  Plain speedup  Nodal performance

4 3 Multigrid smoothing and coarse grid correction (projection) smoothing Finest Grid Prolongation (P=R T ) The Multigrid V-cycle First Coarse Grid Restriction (R) Note: smaller grid

5 4 Multigrid V(    ) - cycle MG-V  Function u = MG-V(A,f)  if A is small  u  A -1 f  else  u  S  (f, u) --  steps of smoother (pre)  r H  P T ( f – Au ) MG-Vrecursion  u H  MG-V(P T AP, r H )-- recursion (Galerkin)  u  u + Pu H  u  S  (f, u) --  steps of smoother (post)  Iteration matrix: T = S ( I - P(RAP) -1 RA ) S  multiplicative

6 5 Smoothed Aggregation  Coarse grid space & smoother  MG method  Piecewise constant function: “Plain” agg. (P 0 )  Start with kernel vectors B of operator  eg, 6 RBMs in elasticity  Nodal aggregation BP0P0  “Smoothed” aggregation: lower energy of functions  One Jacobi iteration: P  ( I -  D -1 A ) P 0

7 6 Parallel Smoothers  CG/Jacobi: Additive (Requires damping for MG)  Damped by CG (Adams SC1999)  Dot products, non-stationary  Gauss-Seidel: multiplicative (Optimal MG smoother)  Complex communication and computation (Adams SC2001)  Polynomial Smoothers: Additive  Chebyshev is ideal for multigrid smoothers  Chebychev chooses p(y) such that  |1 - p(y) y | = min over interval [ *, max ]  Estimate of max easy  Use * = max / C (No need for lowest eigenvalue)  C related to rate of grid coarsening

8 7 Outline  Algebraic multigrid (AMG) introduction  Industrial applications  Micro-FE bone modeling  Olympus Parallel FE framework  Scalability studies on IBM SPs  Scaled speedup  Plain speedup  Nodal performance

9 8 Aircraft carrier 315,444 vertices Shell and beam elements (6 DOF per node) Linear dynamics – transient (time domain) About 1 min. per solve (rtol=10 -6 ) –2.4 GHz Pentium 4/Xenon processors –Matrix vector product runs at 254 Mflops

10 9 Solve and setup times (26 Sun processors)

11 10 Adagio: “BR” tire  ADAGIO: Quasi static solid mechanics app. (Sandia)  Nearly incompressible visco-elasticity (rubber)  Augmented Lagrange formulation w/ Uzawa like update  Contact (impenetrability constraint)  Saddle point solution scheme:  Uzawa like iteration (pressure) w/ contact search  Non-linear CG (with linear constraints and constant pressure)  Preconditioned with  Linear solvers (AMG, FETI, …)  Nodal (Jacobi)

12 11 “BR” tire

13 12 Displacement history

14 13 Outline  Algebraic multigrid (AMG) introduction  Industrial applications  Micro-FE bone modeling  Olympus Parallel FE framework  Scalability studies on IBM SPs  Scaled speedup  Plain speedup  Nodal performance

15 14 Trabecular Bone 5-mm Cube Cortical bone Trabecular bone

16 15 Micro-Computed Tomography  CT @ 22  m resolution 3D image Mechanical Testing E,  yield,  ult, etc. 2.5 mm cube 44  m elements  FE mesh Methods:  FE modeling

17 16 Outline  Algebraic multigrid (AMG) introduction  Industrial applications  Micro-FE bone modeling  Olympus Parallel FE framework  Scalability studies on IBM SPs  Scaled speedup  Plain speedup  Nodal performance

18 17  Athena: Parallel FE  ParMetis  Parallel Mesh Partitioner (Univerisity of Minnesota)  Prometheus  Multigrid Solver  FEAP  Serial general purpose FE application (University of California)  PETSc  Parallel numerical libraries (Argonne National Labs)  FE Mesh Input File Athena ParMetis FE input file (in memory) Partition to SMPs Athena ParMetis File FEAP Material Card Silo DB Visit Prometheus PETSc ParMetis METIS pFEAP Computational Architecture Olympus

19 18 Outline  Algebraic multigrid (AMG) introduction  Industrial applications  Micro-FE bone modeling  Olympus Parallel FE framework  Scalability studies on IBM SPs  Scaled speedup  Plain speedup  Nodal performance

20 19 80 µm w/o shell  Inexact Newton  CG linear solver  Variable tolerance  Smoothed aggregation AMG preconditioner  Nodal block diagonal smoothers:  2 nd order Chebeshev (add.)  Gauss-Seidel (multiplicative) Scalability

21 20 Inexact Newton  Solve F(x)=0  Newton:  Solve F’(x k-1 )s k = -F(x k-1 ) for s k such that  || F(x k-1 ) + F’(x k-1 )s k || 2 < n k || F(x k-1 ) || 2  n k = ß || F(x k ) || 2 / || F(x k-1 ) || 2  x k  x k-1 + s k  If ( s k T F(x k-1 ) ) 1/2 < T r ( s k T F(x 0 ) ) 1/2  Return x k

22 21 80 µm w/ shell Vertebral Body With Shell  Large deformation elast.  6 load steps (3% strain)  Scaled speedup  ~131K dof/processor  7 to 537 million dof  4 to 292 nodes  IBM SP Power3  15 of 16 procs/node used  Double/Single Colony switch

23 22 Computational phases  Mesh setup (per mesh):  Coarse grid construction (aggregation)  Graph processing  Matrix setup (per matrix):  Coarse grid operator construction  Sparse matrix triple product RAP (expensive for S.A.)  Subdomain factorizations  Solve (per RHS):  Matrix vector products (residuals, grid transfer)  Smoothers (Matrix vector products)

24 23 Linear solver iterations Newton Load Small (7.5M dof)Large (537M dof) 12345123456 15142021185113525702 251420 5113626702 35142022195113626702 45142022195113626702 55142022195113626702 65142022195113626702

25 24 131K dof / proc - Flops/sec/proc.47 Terflops - 4088 processors

26 25 End to end times and (in)efficiency components

27 26 Sources of scale inefficiencies in solve phase 7.5M dof537M dof #iteration450897 #nnz/row5068 Flop rate7674 #elems/pr19.3K33.0K model1.002.78 Measured 1.002.61

28 27 164K dof/proc

29 28 First try: Flop rates (265K dof/processor)  265K dof per proc.  IBM switch bug  Bisection bandwidth plateau 64-128 nodes  Solution:  use more processors  Less dof per proc.  Less pressure on switch Bisection bandwidth

30 29 Outline  Algebraic multigrid (AMG) introduction  Industrial applications  Micro-FE bone modeling  Olympus Parallel FE framework  Scalability studies on IBM SPs  Scaled speedup  Plain speedup  Nodal performance

31 30 Speedup with 7.5M dof problem (1 to 128 nodes)

32 31 Outline  Algebraic multigrid (AMG) introduction  Industrial applications  Micro-FE bone modeling  Olympus Parallel FE framework  Scalability studies on IBM SPs  Scaled speedup  Plain speedup  Nodal performance

33 32 Nodal Performance of IBM SP Power3 and Power4  IBM power3, 16 processors per node  375 Mhz, 4 flops per cycle  16 GB/sec bus (~7.9 GB/sec w/ STREAM bm)  Implies ~1.5 Gflops/s MB peak for Mat-Vec  We get ~1.2 Gflops/s (15 x.08Gflops)  IBM power4, 32 processors per node  1.3 GHz, 4 flops per cycle  Complex memory architecture

34 33 Speedup


Download ppt "1 Mark F. Adams 22 October 2004 Applications of Algebraic Multigrid to Large Scale Mechanics Problems."

Similar presentations


Ads by Google