Download presentation
Presentation is loading. Please wait.
1
Jens Krüger Technische Universität München
Linear Algebra on GPUs Jens Krüger Technische Universität München
2
Linear algebra? Why are we interested in Linear Algebra?
It is THE machinery to solve PDEs PDEs are at the core of many graphics applications Physics based simulation, Animation, Mesh fairing …
3
LA on GPUs? … and why put LA on GPU? A perfect couple…
GPUs are fast stream processors, and many LA operations are “streamable” …which goes hand in hand The solution is already on the GPU and ready for display
4
Getting started … Computer graphics applications GPU as workhorse
Visual simulation Visual computing Education and Training Basic linear algebra operators General linear algebra package GPU as workhorse for numerical computations High bandwidth Parallel computing Programmable GPUs
5
Getting started … Computer graphics applications GPU as workhorse
Visual simulation Visual computing Education and Training Basic linear algebra operators General linear algebra package GPU as workhorse for numerical computations High bandwidth Parallel computing Programmable GPUs
6
Internal affairs Vector representation
Per-pixel vs. per-vertex operations 6 gigapixels/second vs. 0.7 gigavertices/second Efficient texture memory cache Texture read-write access Textures best we can do 2D Textures are even better 2D RGBA textures really rock 1 N 1 N
7
Representation (cont.)
Dense Matrix representation Treat a dense matrix as a set of column vectors Again, store these vectors as 2D textures Matrix N i N Vectors ... N 1 i N 2D-Textures ... 1 i N
8
Representation (cont.)
Banded Sparse Matrix representation Treat a banded matrix as a set of diagonal vectors Combine opposing vectors to save space 2 Vectors N 1 2 N i Matrix 2 2D-Textures 1 2 N-i N
9
Operations 1 Vector-Vector Operations
Reduced to 2D texture operations Coded in pixel shaders Example: Vector1 + Vector2 Vector3 Vector 1 Vector 2 Vector 3 Static quad TexUnit 0 TexUnit 1 Render Texture Pass through return tex0 + tex1 Vertex Shader Pixel Shader
10
Operations 2 (reduce) Reduce operation for scalar products
original Texture ... st 1 pass ... 2 pass nd ... Reduce m x n region in fragment shader
11
The “single float” on GPUs
Some operations generate single float values e.g. reduce Read-back to main-mem is slow Keep single floats on the GPU as 1x1 textures ...
12
Operations (cont.) Matrix-Vector Operations
Split it up into Vector – Vector operations N Matrix i 2 Vectors 1 2 2 2D-Textures N-i Matrix N i N Vectors ... 1 N 2D-Textures
13
Operations In depth example: Vector / Banded-Matrix Multiplication A b
=
14
Example (cont.) Vector / Banded-Matrix Multiplication A b A b x =
15
Example (cont.) Compute the result in 2 Passes: A Pass 2 Pass 1 b b‘ x
=
16
Building a Framework Presented so far: Representations on the GPU for
Single float values Vectors Matrices Dense Banded Random sparse (see SIGGRAPH ‘03) Operations on these representations Add, multiply, reduce, … Upload, download, clear, clone, …
17
Framework Classes (UML)
18
Framework Example: CG Encapsulate into Classes for more complex algorithms Example use: Conjugate Gradient Method, complete source: void clCGSolver::solveInit() { Matrix->matrixVectorOp(CL_SUB,X,B,R); // R = A*x-b R->multiply(-1); // R = -R R->clone(P); // P = R R->reduceAdd(R, Rho); // rho = sum(R*R); } void clCGSolver::solveIteration() { Matrix->matrixVectorOp(CL_NULL,P,NULL,Q); // Q = Ap; P->reduceAdd(Q,Temp); // temp = sum(P*Q); Rho->div(Temp,Alpha); // alpha = rho/temp; X->addVector(P,X,1,Alpha); // X = X + alpha*P R->subtractVector(Q,R,1,Alpha); // R = R - alpha*Q R->reduceAdd(R,NewRho); // newrho = sum(R*R); NewRho->divZ(Rho,Beta); // beta = newrho/rho R->addVector(P,P,1,Beta); // P = R+beta*P; clFloat *temp; temp=NewRho; NewRho=Rho; Rho=temp; // swap rho and newrho pointers void clCGSolver::solve(int maxI) { solveInit(); for (int i = 0;i< maxI;i++) solveIteration(); int clCGSolver::solve(float rhoTresh, int maxI) { solveInit(); Rho->clone(NewRho); for (int i = 0;i< maxI && NewRho.getData() > rhoTresh;i++) solveIteration(); return i;
19
Example 1 2D Waves (explicit)
Finite difference discretization: You could write a custom shader for this filter Think about this as a matrix-vector operation
20
2D Waves (cont.) One Time Matrix Initialization: Per Frame Iteration
for (i=sY;i<sX*sY;i++) data[i] = ß; // setup diagonal-sY matrix->getRow(sX*(sY-1))->setData(data); for (i=0;i<sX*sY;i++) data[i] = (i%sX) ? ß : 0; // setup diagonal-1 matrix->getRow(sX*sY-1)->setData(data); for (i=0;i<sX*sY;i++) data[i] = 2-4ß; // setup diagonal matrix->getRow(sX*sY)->setData(data); for (i=0;i<sX*sY;i++) data[i] = ((i+1)%sX) ? ß : 0; // setup diagonal+1 matrix->getRow(sX*sY+1)->setData(data); for (i=0;i<sX*(sY-1);i++) data[i] = ß; // setup diagonal+sY matrix->getRow(sX*(sY+1))->setData(data); Per Frame Iteration clMatrix->matrixVectorOp(CL_SUB,clCurrent,clLast,clNext); // next = matrix*current-last; clLast->copyVector(clCurrent); // save for next iteration clCurrent->copyVector(clNext); // save for next iteration cluNext->unpack(clNext); // unpack for rendering renderHF(cluNext->m_pVectorTexture); // render as heightfield
21
Example 2 2D Waves (implicit)
Key Idea Use different time discretization (e.g. Crank Nicholson) Results in system of linear equations Iterative solution using CG 4+1 - = 1 + t x 2 3 4 5 6 7 8 9 c
22
2D Waves (cont.) One Time Matrix Initialization: Per Frame Iteration
for (i=sY;i<sX*sY;i++) data[i] = -alpha; // setup diagonal-sY matrix->getRow(sX*(sY-1))->setData(data); for (i=0;i<sX*sY;i++) data[i] = (i%sX) ? - alpha : 0; // setup diagonal-1 matrix->getRow(sX*sY-1)->setData(data); for (i=0;i<sX*sY;i++) data[i] = 4*alpha // setup diagonal matrix->getRow(sX*sY)->setData(data); for (i=0;i<sX*sY;i++) data[i] = ((i+1)%sX) ? -alpha:0; // setup diagonal+1 matrix->getRow(sX*sY+1)->setData(data); for (i=0;i<sX*(sY-1);i++) data[i] = -alpha // setup diagonal+sY matrix->getRow(sX*(sY+1))->setData(data); Per Frame Iteration cluRHS->computeRHS(cluLast, cluCurrent); // generate c(i,j,t) clRHS->repack(cluRHS); // encode into RGBA iSteps = pCGSolver->solve(iMaxSteps); // solve using CG cluLast->copyVector(cluCurrent); // save for next iteration clNext->unpack(cluCurrent); // unpack for rendering renderHF(cluCurrent->m_pVectorTexture); // render as heightfield
23
Demos
24
For more infos, browse to:
The End Thank you! Questions? For more infos, browse to:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.