Jens Krüger Technische Universität München

Slides:

Advertisements

Similar presentations

Stable Fluids A paper by Jos Stam.

Advertisements

Simulation of Fluids using the Navier-Stokes Equations Kartik Ramakrishnan.

CSCI-455/552 Introduction to High Performance Computing Lecture 25.

1 Numerical Solvers for BVPs By Dong Xu State Key Lab of CAD&CG, ZJU.

CS 290H 7 November Introduction to multigrid methods

SOLVING THE DISCRETE POISSON EQUATION USING MULTIGRID ROY SROR ELIRAN COHEN.

CSCI-455/552 Introduction to High Performance Computing Lecture 26.

MULTISCALE COMPUTATIONAL METHODS Achi Brandt The Weizmann Institute of Science UCLA

Geometric (Classical) MultiGrid. Hierarchy of graphs Apply grids in all scales: 2x2, 4x4, …, n 1/2 xn 1/2 Coarsening Interpolate and relax Solve the large.

Numerical Algorithms ITCS 4/5145 Parallel Computing UNC-Charlotte, B. Wilkinson, 2009.

MATH 685/ CSI 700/ OR 682 Lecture Notes

Systems of Linear Equations

SOLVING SYSTEMS OF LINEAR EQUATIONS. Overview A matrix consists of a rectangular array of elements represented by a single symbol (example: [A]). An individual.

Numerical Algorithms Matrix multiplication

Numerical Algorithms • Matrix multiplication

1cs533d-winter-2005 Notes  Please read Fedkiw, Stam, Jensen, “Visual simulation of smoke”, SIGGRAPH ‘01.

Total Recall Math, Part 2 Ordinary diff. equations First order ODE, one boundary/initial condition: Second order ODE.

CSCI 317 Mike Heroux1 Sparse Matrix Computations CSCI 317 Mike Heroux.

Sparse Matrix Algorithms CS 524 – High-Performance Computing.

Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid Jeffrey Bolz, Ian Farmer, Eitan Grinspun, Peter Schröder Caltech ASCI Center.

Avoiding Communication in Sparse Iterative Solvers Erin Carson Nick Knight CS294, Fall 2011.

Geometric (Classical) MultiGrid. Linear scalar elliptic PDE (Brandt ~1971)  1 dimension Poisson equation  Discretize the continuum x0x0 x1x1 x2x2 xixi.

ECIV 301 Programming & Graphics Numerical Methods for Engineers REVIEW II.

Monica Garika Chandana Guduru. METHODS TO SOLVE LINEAR SYSTEMS Direct methods Gaussian elimination method LU method for factorization Simplex method of.

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Intro to Computational Fluid Dynamics Brandon Lloyd COMP 259 April 16, 2003 Image courtesy of Prof. A.

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Introduction to Modeling Fluid Dynamics 1.

Motivation  Movie  Game  Engineering Introduction  Ideally  Looks good  Fast simulation  Looks good?  Look plausible  Doesn’t need to be exactly.

By Mary Hudachek-Buswell. Overview Atmospheric Turbulence Blur.

A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware Nolan Goodnight Cliff Woolley Gregory Lewin David Luebke Greg Humphreys.

A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware Nolan Goodnight Cliff Woolley Gregory Lewin David Luebke Greg Humphreys.

Finite Differences Finite Difference Approximations  Simple geophysical partial differential equations  Finite differences - definitions  Finite-difference.

Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.

Multigrid for Nonlinear Problems Ferien-Akademie 2005, Sarntal, Christoph Scheit FAS, Newton-MG, Multilevel Nonlinear Method.

MATH 250 Linear Equations and Matrices

Linear Systems Iterative Solutions CSE 541 Roger Crawfis.

Large Steps in Cloth Simulation - SIGGRAPH 98 박 강 수박 강 수.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

Elliptic PDEs and the Finite Difference Method

Parallel Solution of the Poisson Problem Using MPI

Case Study in Computational Science & Engineering - Lecture 5 1 Iterative Solution of Linear Systems Jacobi Method while not converged do { }

Lecture 21 MA471 Fall 03. Recall Jacobi Smoothing We recall that the relaxed Jacobi scheme: Smooths out the highest frequency modes fastest.

Discretization Methods Chapter 2. Training Manual May 15, 2001 Inventory # Discretization Methods Topics Equations and The Goal Brief overview.

Discretization for PDEs Chunfang Chen,Danny Thorne Adam Zornes, Deng Li CS 521 Feb., 9,2006.

Linear Algebra Operators for GPU Implementation of Numerical Algorithms J. Krüger R. Westermann computer graphics & visualization Technical University.

Geometry processing on GPUs Jens Krüger Technische Universität München.

Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University

Dynamic Geometry Displacement Jens Krüger Technische Universität München.

Programming assignment # 3 Numerical Methods for PDEs Spring 2007 Jim E. Jones.

CSE 872 Dr. Charles B. Owen Advanced Computer Graphics1 Water Computational Fluid Dynamics Volumes Lagrangian vs. Eulerian modelling Navier-Stokes equations.

1 SYSTEM OF LINEAR EQUATIONS BASE OF VECTOR SPACE.

Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University

Parallel Direct Methods for Sparse Linear Systems

Linear Algebra review (optional)

CUDA Interoperability with Graphical Environments

Chapter 7 Matrix Mathematics

Jens Krüger Technische Universität München

Solving Linear Systems Ax=b

© Fluent Inc. 1/10/2018L1 Fluids Review TRN Solution Methods.

Lecture 19 MA471 Fall 2003.

21th Lecture - Advection - Diffusion

Introduction to Multigrid Method

Matlab tutorial course

Numerical Algorithms • Parallelizing matrix multiplication

3D Image Segmentation and Multigrid Method

Objective Numerical methods Finite volume.

Introduction to Fluid Dynamics & Applications

University of Virginia

Linear Algebra review (optional)

Ph.D. Thesis Numerical Solution of PDEs and Their Object-oriented Parallel Implementations Xing Cai October 26, 1998.

Ax = b Methods for Solution of the System of Equations (ReCap):

Presentation transcript:

Jens Krüger Technische Universität München Linear Algebra on GPUs Jens Krüger Technische Universität München

Why LA on GPUs? Why should we care about Linear Algebra at all? Use LA to solve PDEs solving PDEs can increase realism for VR, Education, Simulations, Games, …

Why Linear Algebra on GPUs? … and why do it on the GPU? The GPU is a fast streaming processor LA operations are easily “streamable” The result of the computation is already on the GPU and ready for display

Getting started … Computer graphics applications GPU as workhorse Visual simulation Visual computing Education and Training Basis linear algebra operators General linear algebra package GPU as workhorse for numerical computations High bandwidth Parallel computing Programmable GPUs

Representation Vector representation 2D textures best we can do Per-fragment vs. per-vertex operations High texture memory bandwidth Read-write access, dependent fetches 1 N 1 N

Representation (cont.) Dense Matrix representation treat a dense matrix as a set of column vectors again, store these vectors as 2D textures Matrix N i N Vectors ... N 1 i N 2D-Textures ... 1 i N

Representation (cont.) Banded Sparse Matrix representation treat a banded matrix as a set of diagonal vectors Matrix 2 Vectors N 1 2 i 2 2D-Textures 1 2 N N

Representation (cont.) Banded Sparse Matrix representation combine opposing vectors to save space i Matrix 2 Vectors N 1 2 2 2D-Textures 1 2 N N-i N

Operations Vector-Vector Operations Reduced to 2D texture operations Coded in vertex/fragment shaders return tex0 OP tex1

Operations (cont.) Vector-Vector Operations Reduce operation for scalar products original Texture ... st 1 pass ... 2 pass nd ... Reduce m x n region in fragment shader

Operations (cont.) In depth example: Vector / Banded-Matrix Multiplication A b x =

Example (cont.) Vector / Banded-Matrix Multiplication A b A b x =

Example (cont.) Compute the result in 2 Passes Pass 1: b A x = multiply

Example (cont.) Compute the result in 2 Passes Pass 2: b A b‘ x = shift x multiply

Representation (cont.) Random sparse matrix representation Textures do not work Splitting yields highly fragmented textures Difficult to find optimal partitions Idea: encode only non-zero entries in vertex arrays Column as TexCoord 1-4 N Matrix Result Vector Matrix = × Values as TexCoord 0 Vertex Array 1 Vertex Array 2 Vector Result Tex0 Pos Pos Tex1-4 Tex0 Pos Pos Tex1-4 Tex1-4 = Row Index as Position

Building a Framework Presented so far: representations on the GPU for single float values vectors matrices dense banded random sparse operations on these representations add, multiply, reduce, … upload, download, clear, clone, …

Building a Framework Example: CG Encapsulate into Classes for more complex algorithms Example use: Conjugate Gradient Method, complete source: void clCGSolver::solveInit() { m_clMatrix->matrixVectorOp(CL_SUB,m_clvX,m_clvB,m_clvR); // R = A*x-b m_clvR->addVector(m_clvR,m_clvR,-1.0f,0.0f); // R = -R m_clvR->addVector(m_clvR,m_clvP,1.0f,0.0f); // P = R m_clvR->reduceAdd(m_clvR, clfRho); // rho = sum(R*R); } void clCGSolver::solveIteration() { m_clMatrix->matrixVectorOp(CL_NULL,m_clvP,NULL,m_clvQ); // Q = Ap; m_clvP->reduceAdd(m_clvQ,clfTemp); // temp = sum(P*Q); clfRho->divZ(clfTemp,clfAlpha); // alpha = rho/temp; m_clvX->addVector(m_clvP,m_clvX,1,clfAlpha); // X = X + alpha*P m_clvR->subtractVector(m_clvQ,m_clvR,1,clfAlpha); // R = R - alpha*Q m_clvR->reduceAdd(m_clvR,clfNewRho); // newrho = sum(R*R); clfNewRho->divZ(clfRho,clfBeta); // beta = newrho/rho m_clvR->addVector(m_clvP,m_clvP,1,clfBeta); // P = R+beta*P; clFloat *temp; temp=clfNewRho; clfNewRho=clfRho; clfRho=temp; // swap rho and newrho pointer void clCGSolver::solve(int maxiter) { solveInit(); for (int i = 0;i< maxiter;i++) solveIteration(); int clCGSolver::solve(float rhoTresh, int maxiter) { solveInit(); clfRho->clone(clfNewRho); for (int i = 0;i< maxiter && clfNewRho.getData() > rhoTresh;i++) solveIteration(); return i;

Building a Framework Example: Jacobi The Jacobi method for a problem can be expressed with matrices as follows: U D L A b x + = × where ( ) 1 k x U L b D × + - = If the Matrix is stored in the diagonal form the matrix D-1 can be computed on the fly by inverting the diagonal elements during the vector-vector product. So one Jacobi step becomes one matrix-vector product, one vector-vector product and one vector subtract.

Example 1 2D Waves (explicit) usually you would compute: you need to write a custom shader for this filter think about this as a matrix-vector operation!

Example 2 2D Waves (implicit) 2D wave equation Finite difference discretization Implicit Crank-Nicholson scheme Key Idea: Rewrite as Matrix-Vector Product

Example 2: 2D wave equation (cont.)

Navier Stokes on GPUs

The Equations ) ( Re 1 = Ñ - + × V div p f v t u d The Navier-Stokes Equations for 2D: Advection ) ( Re 1 2 = Ñ - + × V div p f v t u y x d Diffusion Pressure External Forces Zero Divergence

NSE Discretization ( ) Diffusion Advection Pressure y p g v x uv t ¶ - + ÷ ø ö ç è æ = 2 Re 1 Diffusion Advection Pressure ( ) 2 1 , y v x j i d - + = ú û ù ê ë é ¶ ( ) ÷ ø ö ç è æ - + = ú û ù ê ë é ¶ 2 1 , j i v u x uv d a y p j i d , 1 - = ú û ù ê ë é ¶ +

Navier-Stokes Equations (cont.) Rewrite the Navier Stokes Equations where now F and G can be computed ( ) 1 , - = + p y t G v j i d x F u ( ) ; Re 1 , 2 ÷ ø ö ç è æ + ú û ù ê ë é ¶ - = y j i g v x uv t G u F

Navier-Stokes Equations (cont.) Problem: Pressure is still unknown! From derive: ..to get this Poisson Equation: ( ) 1 , - = + p x t F u j i d ( ) 1 , - = + p y t G v j i d ) ( = V div ¶ u t + 1 ¶ v t + 1 ¶ G t ¶ 2 p t + 1 ¶ F t ¶ 2 p t + 1 = + = - D t + - D t ¶ x ¶ y ¶ x ¶ x 2 ¶ y ¶ y 2 ( + ) - ( + ) + ( + ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) p n 1 2 p n 1 p n 1 p n + 1 - 2 p n + 1 + p n + 1 æ 1 F n - F n G n - G n ö i + 1 , j i , j i - 1 , j + i , j + 1 i , j i , j - 1 = ç i , j i - 1 , j + i , j i , j - 1 ÷ ( ) ( ) ç ÷ d x 2 d y 2 d t d è x d y ø

Navier-Stokes Equations (cont.) The basic algorithm: Compute F and G add external forces advect diffuse solve the Poisson equation update velocities easy  semi-lagrange [Stam 1999] explicit use the CG solver subtract pressure gradient

Multigrid on GPUs

Multigrid “in english” do a few Jacobi/Gauss-Seidel iterations on the fine grid Jacobi/G-S eliminate high frequencies in the error conjugate gradient does not have this property !!! compute the residuum of the last approximaton propagate this residuum to the next coarser grid can be done by the means of a matrix multiplication solve the coarser grid for the absolute error for the solution you can use another multigrid step backpropagate the error to the finer grid can be done by another matrix multiplication (transposed matrix from 3.) use the error to correct the first approximation do another few Jacobi/Gauss-Seidel iterations to remove noise introduced by the propagation steps

( ) Multigrid “in greek” { x b = × ¢ - + 4 3 1 h A e I r 2 Consider this problem: x is the solution vector of a set of linear equations this equation holds for current approximation x’ with the error e rearranging leads to the residual equation with residuum r now multiply both sides with a non quadratic interpolation matrix and replace the error with an error times the transposed interpolation matrix Let A2h be the product of the Interpolation Matrix, A and the transposed Interpolation matrix. Finally we end up with a new set of linear equations with only half the size in every dimension of the old one. We solve this set for the error at this grid level. Propagating the error the above steps we can derive the error for the large system and use it to correct out approximation ( ) { h A e I r x b 2 = × ¢ - + 4 3 1

Multigrid (cont.) „fine grid“, „coarse grid“ only makes sense if the problem to solve corresponds to a grid  this is the case in the finite difference methods described before  need to find an “interpolation matrix” for the propagation step to generate the coarser grid (for instance simple linear interpolation) need an “extrapolation matrix” to move from the coarse to the fine grid the coarse grid matrices can be pre-computed

Multigrid on GPUs Observation: you only need matrix-vector operations and a “Jacobi smoother” to do multigrid to put it on the GPU simply use the matrix-vector operations from the framework Improvement: to do the interpolation an extrapolation steps we can use the fast bilinear interpolation hardware of the GPU instead of a vector-matrix multiplication

Multigrid on GPUs (cont.) We can embed the new “rescale” easily into the vector representation by simply rendering one textured quad. Vector Texture Render Target Interpolation Extrapolation Render Target Vector Texture

Selected References Chorin, A.J., Marsden, J.E. A Mathematical Introduction to Fluid Mechanics. 3rd ed. Springer. New York, 1993 Briggs, Henson, McCormick A Multigrid Tutorial, 2nd ed. siam, ISBN 0-89871-462-1 Acton Numerical Methods that Work, The Mathematical Association of America ISBN 0-88385-450-3 Krüger, J. Westermann, R. Linear algebra operators for GPU implementation of numerical algorithms, In Proceedings of SIGGRAPH 2003, ACM Press / ACM SIGGRAPH Bolz , J., Farmer, I., Grinspun, E., Schröder, P. Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid, In Proceedings of SIGGRAPH 2003, ACM Press / ACM SIGGRAPH Hillesland, K. E. Nonlinear Optimization Framework for Image-Based Modeling on Programmable Graphics Hardware, In Proceedings of SIGGRAPH 2003, ACM Press / ACM SIGGRAPH Fedkiw, R., Stam, J. and Jensen, H.W. Visual Simulation of Smoke. In Proceedings of SIGGRAPH 2001, ACM Press / ACM SIGGRAPH. 2001. Stam, J. Stable Fluids. In Proceedings of SIGGRAPH 1999, ACM Press / ACM SIGGRAPH, 121-128. 1999. Harris, M., Coombe, G., Scheuermann, T., and Lastra, A. Physically-Based Visual Simulation on Graphics Hardware.. Proc. 2002 SIGGRAPH / Eurographics Workshop on Graphics Hardware 2002.