Download presentation
Presentation is loading. Please wait.
Published bySusan Sutton Modified over 9 years ago
1
ME964 High Performance Computing for Engineering Applications “The real problem is not whether machines think but whether men do.” B. F. Skinner © Dan Negrut, 2011 ME964 UW-Madison Outlining Midterm Projects Topic 3: GPU-based FEA Topic 4: GPU Direct Solver for Sparse Linear Algebra March 01, 2011
2
Before We Get Started… Last time Midterm Project topics 1 and 2 Discrete Element Method on the GPU. Area coordinator: Toby Heyn Collision Detection on the GPU. Area coordinator: Arman Pazouki Today Midterm Project topics 3 and 4 Finite Element Method on the GPU. Area coordinators: Prof. Suresh and Naresh Khude Sparse direct solver on the GPU (Cholesky). Area coordinator: Dan Negrut Midterm Project Related Issues Midterm Project is due on 04/13 at 11:59 PM (use Learn@UW drop-box) Intermediate report due on 03/22 at 11:59 PM (use the same Learn@UW drop-box) Each area coordinator Will provide a test problem for you to test your GPU implementation Will also assist you with questions related to the non-programming aspects (the “theory”) behind the topic you chose You can continue your Midterm Project (MP) and have it become your Final Project (FP) In this case you will be expected to show how the FP implementation is superior to your MP implementation Other issues HW5 due tonight at 11:59 PM Use Learn@UW drop-box to submit homework 2
3
Finite Element Analysis on the GPU? Krishnan Suresh suresh@engr.wisc.edu Associate Professor suresh@engr.wisc.edu
4
Finite Element Analysis Computer simulation of engineering models Physics: – Structural, thermal, fluid, … Mode: – Static, modal, transient – Linear, non-linear, multi-physics
5
Why GPU? Hours or even days of CPU time. [Gordon; JPL]
6
Question Can one exploit graphics programmable units (GPU) to speed- up Finite Element analysis? +
7
Structural Static FEA Model Discretize Post- process Element Stiffness Assemble/ Solve
8
FEA: Variations DiscretizeModel Element Stiffness Assemble/ Solve Post- process Nonlinear Optimization Tet/Hex/… Direct/IterativeOrder/Hybrid
9
FEA: Challenges DiscretizeModel Element Stiffness Assemble/ Solve Post- process Nonlinear Optimization Tet/Hex/… Direct/IterativeOrder/Hybrid 1.Accuracy 2.Automation 3.Speed
10
Typical Bottleneck Model Discretize Post- process Element Stiffness Assemble/ Solve
11
GPU & Engineering Analysis Model Discretize CPU GPU? Discretization Data: Small b-rep (+) Logic: Complex (-) Threads: Few (-) Not a good candidate for GPU!?
12
Element Stiffness Data: O(N) (+/-) Logic: Simple (+) Threads: N (+) DiscretizeModel Element Stiffness CPU GPU? Hex 2 nd Order Hex Hybrid
13
Stiffness: Hex 2 nd Order 8 Corners~100 Bytes Data (x y z) 27 Nodes~ M = 81 DOF (u v w) k ij ~ Gaussian integration – 30 flops (8 Corners) (27 Nodes)
14
Typical Bottleneck Model Discretize Element Stiffness Assemble/ Solve
15
Direct vs. Iterative K is sparse & usually symmetric P.D Direct Iterative (GPU Variation: Assembly-free) Note: Nvidia offers CuBLAS-3 dense matrix library
16
Direct Sparse on GPU (1) (2006)
17
Direct Sparse on GPU (1)
19
Direct Sparse on GPU (2) (2008)
20
Direct Sparse on GPU (2)
21
Iterative Sparse on GPU (1) (2008) Jacobi preconditioned conjugate gradient ATI GPU Speed-up 3.5.
22
Iterative Sparse on GPU (2) Double precision real world SpMv – CPU (2.3 GHz Dual Xeon): 1 GFLOPS – GPU (GTX 280): 16 GFLOPS – Speedup ~ 16
23
FEA/GPU Class Projects? 1.Complete < 6 weeks 2.Important (publishable) 3.Pilot code
24
FEA/GPU Class Projects? 1.GPU Friendly Preconditioners for Thin Structures – Research papers – OpenCL and ViennaCL Pilot Code 2.Topology Optimization – Research papers – CUDA code 3.Others – Can discuss …
25
Thin Structure?
26
Large K
27
Preconditioners? Iterative Methods: – GPU methods available for K*u – Typical preconditioners: simple Jacobi, … Poor preconditioner … slow convergence Objective: – GPU friendly preconditioner for thin structures
28
Research Publication
29
Basic Idea
30
Algorithm
31
Why Preconditioner?
32
Why Double Precision?
33
How Expensive is Preconditioner?
34
GPU Friendly Speed-up without Preconditioner Speed-up with Preconditioner
35
FEA/GPU Class Projects? 1.GPU Friendly Preconditioners for Thin Structures – Research papers – OpenCL and ViennaCL Pilot Code 2.Topology Optimization – Research papers – CUDA code 3.Others – Can discuss …
36
Topology Optimization D [Sigmund 2001] V = 50% Stiffest topology for a given volume? Where to remove material? Multi Objective + Topology Optimization = MOTO
37
Demo Matlab code www.ersl.wisc.eduwww.ersl.wisc.edu
38
Pareto Optimal Designs Purely pareto optimal
39
Comparison D
40
3-D Pareto-Method SIMP
41
3-D GPU Implementation Multi-grid Topology Optimization on the GPU (IDETC conf. 2011)
42
Motivation for Topic 4: Sparse Direct Solver 42
43
Nomenclature & Simplifying Assumptions 43
44
The Schur Complement Problem in Multi-Body Dynamics Applications 44
45
Formulation Framework Position: Orientation: Euler parameters, Translational Velocity: Angular velocities 45
46
Constrained Equations of Motion 46
47
Numerical Solution of the Newton-Euler Constrained Equations of Motion One has to solve a set of Differential Algebraic Equations (DAEs) to find the time evolution of a mechanical system Most often the numerical solution of the DAEs requires the solution of a linear system of the form: 47
48
Approach Followed First solve the “Reduced System” for : Then recover accelerations 48
49
Iterative Solution of the Reduced System Define positive definite Reduced Matrix Preconditioned Conjugate Gradient requires computation at time of requires preconditioning: 49
50
Computing A thread is associated with each body We’ll look at how thread 9 does its share of work to compute Time step n, iteration (k): 50
51
How Thread-9 Does its Work S1. Compute reaction forces acting on me: S2. Compute my constraint acceleration S3. Project my constraint acceleration Finally, 51
52
Iteration Operation Count for Body 9 (Thread-9) StepMultiplicationsAdditions S1 S2 S3 52
53
Computing [Concluding Remarks] The algorithm scales very well: one thread for each body Each thread only interacts with adjacent joints Load balance is obtained when the bodies have similar topology index 53
54
Direct Solution of the Reduced System 54
55
The Sparse Direct Solver 55
56
The Direct Solver: How Things Get Done In the reduced linear system each constraint induces an equation Example: constraint 3 induced equation: Since is positive definite, is also positive definite 56 Fundamental Idea: Solve for ¸ 3 and substitute it in all the equations where it shows up
57
First Example: Seven-Body Mechanism 57
58
58
59
The Elimination Sequence The fundamental question is this: what should be the sequence in which the unknowns (the edges of the graph) are eliminated? Different elimination sequences result in different levels of effort The question becomes more complicated since you are interested in a parallel elimination sequence You would like to limit the amount of synchronization barriers that you impose in the implementation 59 In the end, although it’s formulated like solving a system, the problem becomes that starting with a graph and eliminating its edges in parallel Similar to a Mikado, or “pick-up sticks”, game that you want to play in parallel
60
Second Example: HMMWV Model 60
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.