Jens Krüger Technische Universität München

Slides:



Advertisements
Similar presentations
CSCI 317 Mike Heroux1 Sparse Matrix Computations CSCI 317 Mike Heroux.
Advertisements

Matrix Operations on the GPU CIS 665: GPU Programming and Architecture TA: Joseph Kider.
Sparse Matrix Algorithms CS 524 – High-Performance Computing.
Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid Jeffrey Bolz, Ian Farmer, Eitan Grinspun, Peter Schröder Caltech ASCI Center.
Avoiding Communication in Sparse Iterative Solvers Erin Carson Nick Knight CS294, Fall 2011.
University of Texas at Austin CS 378 – Game Technology Don Fussell CS 378: Computer Game Technology Beyond Meshes Spring 2012.
Equations & Brackets.. You are now going to solve more complex equations by combining together two ideas that you have seen already. Try the following.
1 KIPA Game Engine Seminars Jonathan Blow Ajou University December 4, 2002 Day 8.
REAL-TIME VOLUME GRAPHICS Christof Rezk Salama Computer Graphics and Multimedia Group, University of Siegen, Germany Eurographics 2006 Real-Time Volume.
A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware Nolan Goodnight Cliff Woolley Gregory Lewin David Luebke Greg Humphreys.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
1 Introduction to Computer Graphics with WebGL Ed Angel Professor Emeritus of Computer Science Founding Director, Arts, Research, Technology and Science.
Enhancing GPU for Scientific Computing Some thoughts.
Mapping Computational Concepts to GPUs Mark Harris NVIDIA Developer Technology.
CS559: Computer Graphics Lecture 38: Animation Li Zhang Spring 2008 Slides from Brian Curless at U of Washington.
Fast Thermal Analysis on GPU for 3D-ICs with Integrated Microchannel Cooling Zhuo Fen and Peng Li Department of Electrical and Computer Engineering, {Michigan.
Cg Programming Mapping Computational Concepts to GPUs.
Matrices from HELL Paul Taylor Basic Required Matrices PROJECTION WORLD VIEW.
Emerging Technologies for Games Alpha Sorting and “Soft” Particles CO3303 Week 15.
Linear Systems Iterative Solutions CSE 541 Roger Crawfis.
Linear Algebra 1.Basic concepts 2.Matrix operations.
CSE 690: GPGPU Lecture 7: Matrix Multiplications Klaus Mueller Computer Science, Stony Brook University.
Parallel Solution of the Poisson Problem Using MPI
Jens Krüger & Polina Kondratieva – Computer Graphics and Visualization Group computer graphics & visualization 3D Rendering Praktikum: Shader Gallery The.
GPU Based Sound Simulation and Visualization Torbjorn Loken, Torbjorn Loken, Sergiu M. Dascalu, and Frederick C Harris, Jr. Department of Computer Science.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA
Discretization for PDEs Chunfang Chen,Danny Thorne Adam Zornes, Deng Li CS 521 Feb., 9,2006.
Discontinuous Displacement Mapping for Volume Graphics, Volume Graphics 2006, July 30, Boston, MA Discontinuous Displacement Mapping for Volume Graphics.
Generic Compressed Matrix Insertion P ETER G OTTSCHLING – S MART S OFT /TUD D AG L INDBO – K UNGLIGA T EKNISKA H ÖGSKOLAN SmartSoft – TU Dresden
Debunking the 100X GPU vs. CPU Myth An Evaluation of Throughput Computing on CPU and GPU Present by Chunyi Victor W Lee, Changkyu Kim, Jatin Chhugani,
Linear Algebra Operators for GPU Implementation of Numerical Algorithms J. Krüger R. Westermann computer graphics & visualization Technical University.
Geometry processing on GPUs Jens Krüger Technische Universität München.
Jens Krüger & Polina Kondratieva – Computer Graphics and Visualization Group computer graphics & visualization GameFX C# / DirectX 2005 The Rendering Pipeline.
What are shaders? In the field of computer graphics, a shader is a computer program that runs on the graphics processing unit(GPU) and is used to do shading.
Mapping Computational Concepts to GPUs Mark Harris NVIDIA.
Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University
Programming Massively Parallel Graphics Multiprocessors using CUDA Final Project Amirhassan Asgari Kamiabad
Dynamic Geometry Displacement Jens Krüger Technische Universität München.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA Shirley Moore CPS5401 Fall 2013 svmoore.pbworks.com November 12, 2012.
COMP 175 | COMPUTER GRAPHICS Remco Chang1/XX13 – GLSL Lecture 13: OpenGL Shading Language (GLSL) COMP 175: Computer Graphics April 12, 2016.
Martin Kruliš by Martin Kruliš (v1.0)1.
Improvement to Hessenberg Reduction
Our Graphics Environment Landscape Rendering. Hardware  CPU  Modern CPUs are multicore processors  User programs can run at the same time as other.
Solving Linear Systems Syed Nasrullah
Ying Zhu Georgia State University
Linear Algebra review (optional)
CUDA Interoperability with Graphical Environments
Graphics Processing Unit
Deferred Lighting.
3D Graphics Rendering PPT By Ricardo Veguilla.
Chapter 6 GPU, Shaders, and Shading Languages
Lecture 19 MA471 Fall 2003.
The Graphics Rendering Pipeline
GPU Implementations for Finite Element Methods
Jens Krüger Technische Universität München
Use Inverse Matrices to Solve Linear Systems
Static Image Filtering on Commodity Graphics Processors
Graphics Processing Unit
CSCE569 Parallel Computing
Lecture 11 Matrices and Linear Algebra with MATLAB
Chapter III Modeling.
1.3 Vector Equations.
Computer Graphics Practical Lesson 10
RADEON™ 9700 Architecture and 3D Performance
University of Virginia
Linear Algebra review (optional)
Ph.D. Thesis Numerical Solution of PDEs and Their Object-oriented Parallel Implementations Xing Cai October 26, 1998.
Frame Buffers Fall 2018 CS480/680.
Computer Graphics Matrix Hierarchies / Animation
APPLICATIONS OF LINEAR ALGEBRA IN INFORMATION TECHNOLOGY
Presentation transcript:

Jens Krüger Technische Universität München Linear Algebra on GPUs Jens Krüger Technische Universität München

Linear algebra? Why are we interested in Linear Algebra? It is THE machinery to solve PDEs PDEs are at the core of many graphics applications Physics based simulation, Animation, Mesh fairing …

LA on GPUs? … and why put LA on GPU? A perfect couple… GPUs are fast stream processors, and many LA operations are “streamable” …which goes hand in hand The solution is already on the GPU and ready for display

Getting started … Computer graphics applications GPU as workhorse Visual simulation Visual computing Education and Training Basic linear algebra operators General linear algebra package GPU as workhorse for numerical computations High bandwidth Parallel computing Programmable GPUs

Getting started … Computer graphics applications GPU as workhorse Visual simulation Visual computing Education and Training Basic linear algebra operators General linear algebra package GPU as workhorse for numerical computations High bandwidth Parallel computing Programmable GPUs

Internal affairs Vector representation Per-pixel vs. per-vertex operations 6 gigapixels/second vs. 0.7 gigavertices/second Efficient texture memory cache Texture read-write access Textures best we can do 2D Textures are even better 2D RGBA textures really rock 1 N 1 N

Representation (cont.) Dense Matrix representation Treat a dense matrix as a set of column vectors Again, store these vectors as 2D textures Matrix N i N Vectors ... N 1 i N 2D-Textures ... 1 i N

Representation (cont.) Banded Sparse Matrix representation Treat a banded matrix as a set of diagonal vectors Combine opposing vectors to save space 2 Vectors N 1 2 N i Matrix 2 2D-Textures 1 2 N-i N

Operations 1 Vector-Vector Operations Reduced to 2D texture operations Coded in pixel shaders Example: Vector1 + Vector2  Vector3 Vector 1 Vector 2 Vector 3 Static quad TexUnit 0 TexUnit 1 Render Texture Pass through return tex0 + tex1 Vertex Shader Pixel Shader

Operations 2 (reduce) Reduce operation for scalar products original Texture ... st 1 pass ... 2 pass nd ... Reduce m x n region in fragment shader

The “single float” on GPUs Some operations generate single float values e.g. reduce Read-back to main-mem is slow  Keep single floats on the GPU as 1x1 textures ...

Operations (cont.) Matrix-Vector Operations Split it up into Vector – Vector operations N Matrix i 2 Vectors 1 2 2 2D-Textures N-i Matrix N i N Vectors ... 1 N 2D-Textures

Operations In depth example: Vector / Banded-Matrix Multiplication A b =

Example (cont.) Vector / Banded-Matrix Multiplication A b A b x =

Example (cont.) Compute the result in 2 Passes: A Pass 2 Pass 1 b b‘ x =

Building a Framework Presented so far: Representations on the GPU for Single float values Vectors Matrices Dense Banded Random sparse (see SIGGRAPH ‘03) Operations on these representations Add, multiply, reduce, … Upload, download, clear, clone, …

Framework Classes (UML)

Framework Example: CG Encapsulate into Classes for more complex algorithms Example use: Conjugate Gradient Method, complete source: void clCGSolver::solveInit() { Matrix->matrixVectorOp(CL_SUB,X,B,R); // R = A*x-b R->multiply(-1); // R = -R R->clone(P); // P = R R->reduceAdd(R, Rho); // rho = sum(R*R); } void clCGSolver::solveIteration() { Matrix->matrixVectorOp(CL_NULL,P,NULL,Q); // Q = Ap; P->reduceAdd(Q,Temp); // temp = sum(P*Q); Rho->div(Temp,Alpha); // alpha = rho/temp; X->addVector(P,X,1,Alpha); // X = X + alpha*P R->subtractVector(Q,R,1,Alpha); // R = R - alpha*Q R->reduceAdd(R,NewRho); // newrho = sum(R*R); NewRho->divZ(Rho,Beta); // beta = newrho/rho R->addVector(P,P,1,Beta); // P = R+beta*P; clFloat *temp; temp=NewRho; NewRho=Rho; Rho=temp; // swap rho and newrho pointers void clCGSolver::solve(int maxI) { solveInit(); for (int i = 0;i< maxI;i++) solveIteration(); int clCGSolver::solve(float rhoTresh, int maxI) { solveInit(); Rho->clone(NewRho); for (int i = 0;i< maxI && NewRho.getData() > rhoTresh;i++) solveIteration(); return i;

Example 1 2D Waves (explicit) Finite difference discretization: You could write a custom shader for this filter Think about this as a matrix-vector operation

2D Waves (cont.) One Time Matrix Initialization: Per Frame Iteration for (i=sY;i<sX*sY;i++) data[i] = ß; // setup diagonal-sY matrix->getRow(sX*(sY-1))->setData(data); for (i=0;i<sX*sY;i++) data[i] = (i%sX) ? ß : 0; // setup diagonal-1 matrix->getRow(sX*sY-1)->setData(data); for (i=0;i<sX*sY;i++) data[i] = 2-4ß; // setup diagonal matrix->getRow(sX*sY)->setData(data); for (i=0;i<sX*sY;i++) data[i] = ((i+1)%sX) ? ß : 0; // setup diagonal+1 matrix->getRow(sX*sY+1)->setData(data); for (i=0;i<sX*(sY-1);i++) data[i] = ß; // setup diagonal+sY matrix->getRow(sX*(sY+1))->setData(data); Per Frame Iteration clMatrix->matrixVectorOp(CL_SUB,clCurrent,clLast,clNext); // next = matrix*current-last; clLast->copyVector(clCurrent); // save for next iteration clCurrent->copyVector(clNext); // save for next iteration cluNext->unpack(clNext); // unpack for rendering renderHF(cluNext->m_pVectorTexture); // render as heightfield

Example 2 2D Waves (implicit) Key Idea Use different time discretization (e.g. Crank Nicholson) Results in system of linear equations Iterative solution using CG 4+1 - = 1 + t x 2 3 4 5 6 7 8 9 c

2D Waves (cont.) One Time Matrix Initialization: Per Frame Iteration for (i=sY;i<sX*sY;i++) data[i] = -alpha; // setup diagonal-sY matrix->getRow(sX*(sY-1))->setData(data); for (i=0;i<sX*sY;i++) data[i] = (i%sX) ? - alpha : 0; // setup diagonal-1 matrix->getRow(sX*sY-1)->setData(data); for (i=0;i<sX*sY;i++) data[i] = 4*alpha+1 // setup diagonal matrix->getRow(sX*sY)->setData(data); for (i=0;i<sX*sY;i++) data[i] = ((i+1)%sX) ? -alpha:0; // setup diagonal+1 matrix->getRow(sX*sY+1)->setData(data); for (i=0;i<sX*(sY-1);i++) data[i] = -alpha // setup diagonal+sY matrix->getRow(sX*(sY+1))->setData(data); Per Frame Iteration cluRHS->computeRHS(cluLast, cluCurrent); // generate c(i,j,t) clRHS->repack(cluRHS); // encode into RGBA iSteps = pCGSolver->solve(iMaxSteps); // solve using CG cluLast->copyVector(cluCurrent); // save for next iteration clNext->unpack(cluCurrent); // unpack for rendering renderHF(cluCurrent->m_pVectorTexture); // render as heightfield

Demos

For more infos, browse to: The End Thank you! Questions? For more infos, browse to: http://wwwcg.in.tum.de/GPU http://www.gpgpu.org