Bisection and Twisted SVD on GPU

Bisection and Twisted SVD on GPU
Lu He

Bisection and Twisted SVD on GPU
Content Background Algorithm Design Experiments Conclusion

Introduction Application Background Math Engineering
Pseudo-inverse of a matrix Homogeneous Linear Equation Least Square Minimization Low-rank Approximation Engineering Signal Processing Computer Vision Machine Learning Information Retrieval

Introduction A = UΣVT SVD A : m*n Arbitrary matrix
U : m*m left orthogonal matrix Σ : m*n singular value matrix, only diagonal have non-zero value V : n*n right orthogonal matrix

Introduction Traditional Method Disadvantage Background QR Jacobi
Divide-and-conquer Disadvantage Data dependency Scalability Subsets

Introduction Bisection & Twisted Weak data dependency Scalability

Bisection For Singular Value
Algorithm Bisection For Singular Value 4 Range Sub-ranges 5 Intermittent Sub-ranges 4 Sub-ranges

Algorithm ATA−λ2In = LDLLT = UDUUT = WDWWT Twisted for Singular Vector
4 A ATA−λ2In 4 U DU W DW

Twisted for Singular Vector
Algorithm Twisted for Singular Vector ATA−λ2In = LDLLT = UDUUT = WDWWT Wz = e z is the left singular vector corresponding to λ W z e

Design Original Design

Original Design for Singular Value
Pros Easily assign the interval into block Cons The distribution of Singular Value is not uniform Usually, some blocks will assign more than others

Improved Design for Singular Value
Pros More balance than original one Cons Need cost to assign the interval into blocks.

Evaluation of two different designs

Design Base Design Read/Write Optimization Design Column Major Design
Singular Vector Base Design Raw major, global memory Read-after-write operations Read/Write Optimization Design Local memory to save read-after-write temporarily. GPU memory alignment Column Major Design Column major

Evaluation for Singular Vector
Design Evaluation for Singular Vector

Design Divide-and-Conquer Large Matrix
Singular Value is able to divide Singular Vector is based on singular value Singular Value GPU1 GPU2 Singular Vector 1 2 1 2 1 2 1 2

Compared with other Algorithms on GPU
Experiments Compared with other Algorithms on GPU Spec. Quadro 600 Tesla K40 S1070 M2070 Architecture Fermi Kepler Tesla CUDA Cores 96 2880 960 448 TFLOPS 0.246 4.29 4.14 1.29 Mem Size 1 GB 12 GB 16 GB 6 G Mem BW 25.6 GB/s 288 GB/s 408 GB/s 150 GB/s

Experiments Large Matrix Matrix Size Tesla Static (2-GPU)
Dynamic (2-GPU) 50K*50K 71s 50s / 45s 44s / 44s 100K*100K 341s 217s / 189s 210s / 202s 150K*150K 864s 524s / 467s 498s / 507s 200K*200K 1407s 955s / 827s 849s / 858s 300K*300K 3490s 2234s / 1906s 2123s / 2110s 400K*400K 6559s 4110s / 3709s 3853s / 3871s 500K*500K 12282s 7371s / 6916s 7148s / 7129s 800K*800K 40311s 22454s / 21627s 22046s / 22026s 1000K*1000K 54801s 36119s / 35071s 35587s / 35607s

Precision vs. Execution Time

Conclusion Bisection & Twisted algorithms
Implement and optimize on GPU Big Matrix to 1M*1M size Good scalability High speed than other algorithms.

Bisection and Twisted SVD on GPU

Similar presentations

Presentation on theme: "Bisection and Twisted SVD on GPU"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Bisection and Twisted SVD on GPU

Similar presentations

Presentation on theme: "Bisection and Twisted SVD on GPU"— Presentation transcript:

Similar presentations

About project

Feedback