Bisection and Twisted SVD on GPU

Slides:



Advertisements
Similar presentations
Zhen Lu CPACT University of Newcastle MDC Technology Reduced Hessian Sequential Quadratic Programming(SQP)
Advertisements

Parallel Jacobi Algorithm Steven Dong Applied Mathematics.
Chapter 28 – Part II Matrix Operations. Gaussian elimination Gaussian elimination LU factorization LU factorization Gaussian elimination with partial.
Generalised Inverses Modal Analysis and Modal Testing S. Ziaei Rad.
Systems of Linear Equations (see Appendix A.6, Trucco & Verri) CS485/685 Computer Vision Prof. George Bebis.
Robot Vision SS 2005 Matthias Rüther 1 ROBOT VISION Lesson 3: Projective Geometry Matthias Rüther Slides courtesy of Marc Pollefeys Department of Computer.
Solving Linear Systems (Numerical Recipes, Chap 2)
Nequalities Takagi Factorization on a GPU using CUDA Gagandeep S. Sachdev, Vishay Vanjani & Mary W. Hall School of Computing, University of Utah What is.
3D Geometry for Computer Graphics
Some useful linear algebra. Linearly independent vectors span(V): span of vector space V is all linear combinations of vectors v i, i.e.
Math for CSLecture 41 Linear Least Squares Problem Over-determined systems Minimization problem: Least squares norm Normal Equations Singular Value Decomposition.
Information Retrieval in Text Part III Reference: Michael W. Berry and Murray Browne. Understanding Search Engines: Mathematical Modeling and Text Retrieval.
CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.
Singular Value Decomposition
Communication [Lower] Bounds for Heterogeneous Architectures Julian Bui.
CS232.
3D Geometry for Computer Graphics
Math for CSTutorial 41 Contents: 1.Least squares solution for overcomplete linear systems. 2.… via normal equations 3.… via A = QR factorization 4.… via.
Ordinary least squares regression (OLS)
Camera parameters Extrinisic parameters define location and orientation of camera reference frame with respect to world frame Intrinsic parameters define.
CE 311 K - Introduction to Computer Methods Daene C. McKinney
Introduction The central problems of Linear Algebra are to study the properties of matrices and to investigate the solutions of systems of linear equations.
SVD(Singular Value Decomposition) and Its Applications
Data Partitioning on Heterogeneous Multicore and Multi-GPU Systems Using Functional Performance Models of Data-Parallel Applications Published in: Cluster.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Xin Huo, Vignesh T. Ravi, Gagan Agrawal Department of Computer Science and Engineering.
SVD: Singular Value Decomposition
Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.
Progress in identification of damping: Energy-based method with incomplete and noisy data Marco Prandina University of Liverpool.
Accelerating the Singular Value Decomposition of Rectangular Matrices with the CSX600 and the Integrable SVD September 7, 2007 PaCT-2007, Pereslavl-Zalessky.
Scientific Computing Singular Value Decomposition SVD.
MATH 685/ CSI 700/ OR 682 Lecture Notes Lecture 4. Least squares.
Solving linear models. x y The two-parameter linear model.
Image transformations Digital Image Processing Instructor: Dr. Cheng-Chien LiuCheng-Chien Liu Department of Earth Sciences National Cheng Kung University.
Discrete Mathematics 1 Kemal Akkaya DISCRETE MATHEMATICS Lecture 16 Dr. Kemal Akkaya Department of Computer Science.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA
Big data Usman Roshan CS 675. Big data Typically refers to datasets with very large number of instances (rows) as opposed to attributes (columns). Data.
EIGENSYSTEMS, SVD, PCA Big Data Seminar, Dedi Gadot, December 14 th, 2014.
08/10/ NRL Hybrid QR Factorization Algorithm for High Performance Computing Architectures Peter Vouras Naval Research Laboratory Radar Division Professor.
1. Systems of Linear Equations and Matrices (8 Lectures) 1.1 Introduction to Systems of Linear Equations 1.2 Gaussian Elimination 1.3 Matrices and Matrix.
Ch 6 Vector Spaces. Vector Space Axioms X,Y,Z elements of  and α, β elements of  Def of vector addition Def of multiplication of scalar and vector These.
Section 1.7 Linear Independence and Nonsingular Matrices
Instructor: Mircea Nicolescu Lecture 8 CS 485 / 685 Computer Vision.
Stereo March 8, 2007 Suggested Reading: Horn Chapter 13.
1 Chapter 4 Interpolation and Approximation Lagrange Interpolation The basic interpolation problem can be posed in one of two ways: The basic interpolation.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA Shirley Moore CPS5401 Fall 2013 svmoore.pbworks.com November 12, 2012.
Large-scale geophysical electromagnetic imaging and modeling on graphical processing units Michael Commer (LBNL) Filipe R. N. C. Maia (LBNL-NERSC) Gregory.
Introduction The central problems of Linear Algebra are to study the properties of matrices and to investigate the solutions of systems of linear equations.
Introduction The central problems of Linear Algebra are to study the properties of matrices and to investigate the solutions of systems of linear equations.
Review of Linear Algebra
Eigen & Singular Value Decomposition
Ioannis E. Venetis Department of Computer Engineering and Informatics
School of Computer Science & Engineering
Singular Value Decomposition
ANTHAN HALKO, PER-GUNNAR MARTINSSON, YOEL SHAOLNISKY, AND MARK TYGERT
Unfolding Problem: A Machine Learning Approach
Epipolar geometry continued
Some useful linear algebra
SVD: Physical Interpretation and Applications
CS485/685 Computer Vision Dr. George Bebis
Scale-Space Representation for Matching of 3D Models
Recitation: SVD and dimensionality reduction
Parallelization of Sparse Coding & Dictionary Learning
~ Least Squares example
Scale-Space Representation for Matching of 3D Models
Outline Singular Value Decomposition Example of PCA: Eigenfaces.
Lecture 13: Singular Value Decomposition (SVD)
~ Least Squares example
Unfolding with system identification
CS5321 Numerical Optimization
Presentation transcript:

Bisection and Twisted SVD on GPU Lu He

Bisection and Twisted SVD on GPU Content Background Algorithm Design Experiments Conclusion

Introduction Application Background Math Engineering Pseudo-inverse of a matrix Homogeneous Linear Equation Least Square Minimization Low-rank Approximation Engineering Signal Processing Computer Vision Machine Learning Information Retrieval

Introduction A = UΣVT SVD A : m*n Arbitrary matrix U : m*m left orthogonal matrix Σ : m*n singular value matrix, only diagonal have non-zero value V : n*n right orthogonal matrix

Introduction Traditional Method Disadvantage Background QR Jacobi Divide-and-conquer Disadvantage Data dependency Scalability Subsets

Introduction Bisection & Twisted Weak data dependency Scalability

Bisection For Singular Value Algorithm Bisection For Singular Value 4 Range Sub-ranges 5 Intermittent Sub-ranges 4 Sub-ranges

Algorithm ATA−λ2In = LDLLT = UDUUT = WDWWT Twisted for Singular Vector 4 A ATA−λ2In 4 U DU W DW

Twisted for Singular Vector Algorithm Twisted for Singular Vector ATA−λ2In = LDLLT = UDUUT = WDWWT Wz = e z is the left singular vector corresponding to λ W z e

Design Original Design

Original Design for Singular Value Pros Easily assign the interval into block Cons The distribution of Singular Value is not uniform Usually, some blocks will assign more than others

Improved Design for Singular Value Pros More balance than original one Cons Need cost to assign the interval into blocks.

Evaluation of two different designs

Design Base Design Read/Write Optimization Design Column Major Design Singular Vector Base Design Raw major, global memory Read-after-write operations Read/Write Optimization Design Local memory to save read-after-write temporarily. GPU memory alignment Column Major Design Column major

Evaluation for Singular Vector Design Evaluation for Singular Vector

Design Divide-and-Conquer Large Matrix Singular Value is able to divide Singular Vector is based on singular value Singular Value GPU1 GPU2 Singular Vector 1 2 1 2 1 2 1 2

Compared with other Algorithms on GPU Experiments Compared with other Algorithms on GPU Spec. Quadro 600 Tesla K40 S1070 M2070 Architecture Fermi Kepler Tesla CUDA Cores 96 2880 960 448 TFLOPS 0.246 4.29 4.14 1.29 Mem Size 1 GB 12 GB 16 GB 6 G Mem BW 25.6 GB/s 288 GB/s 408 GB/s 150 GB/s

Experiments Large Matrix Matrix Size Tesla Static (2-GPU) Dynamic (2-GPU) 50K*50K 71s 50s / 45s 44s / 44s 100K*100K 341s 217s / 189s 210s / 202s 150K*150K 864s 524s / 467s 498s / 507s 200K*200K 1407s 955s / 827s 849s / 858s 300K*300K 3490s 2234s / 1906s 2123s / 2110s 400K*400K 6559s 4110s / 3709s 3853s / 3871s 500K*500K 12282s 7371s / 6916s 7148s / 7129s 800K*800K 40311s 22454s / 21627s 22046s / 22026s 1000K*1000K 54801s 36119s / 35071s 35587s / 35607s

Precision vs. Execution Time

Conclusion Bisection & Twisted algorithms Implement and optimize on GPU Big Matrix to 1M*1M size Good scalability High speed than other algorithms.