Parallelization of Sparse Coding & Dictionary Learning

Slides:

Advertisements

Similar presentations

Eigen Decomposition and Singular Value Decomposition

Advertisements

3D Geometry for Computer Graphics

Chapter 28 – Part II Matrix Operations. Gaussian elimination Gaussian elimination LU factorization LU factorization Gaussian elimination with partial.

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Computer vision: models, learning and inference Chapter 13 Image preprocessing and feature extraction.

Machine Learning Lecture 8 Data Processing and Representation

Dimensionality Reduction PCA -- SVD

3D Reconstruction – Factorization Method Seong-Wook Joo KG-VISA 3/10/2004.

Eigenvalues and Eigenvectors

Slides by Olga Sorkine, Tel Aviv University. 2 The plan today Singular Value Decomposition  Basic intuition  Formal definition  Applications.

1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.

Principal Component Analysis CMPUT 466/551 Nilanjan Ray.

Symmetric Matrices and Quadratic Forms

Computer Graphics Recitation 5.

3D Geometry for Computer Graphics

Ch 7.9: Nonhomogeneous Linear Systems

Real-time Combined 2D+3D Active Appearance Models Jing Xiao, Simon Baker,Iain Matthew, and Takeo Kanade CVPR 2004 Presented by Pat Chan 23/11/2004.

The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.

Previously Two view geometry: epipolar geometry Stereo vision: 3D reconstruction epipolar lines Baseline O O’ epipolar plane.

3D Geometry for Computer Graphics

3D Geometry for Computer Graphics

Ordinary least squares regression (OLS)

6 1 Linear Transformations. 6 2 Hopfield Network Questions.

Orthogonality and Least Squares

Boot Camp in Linear Algebra Joel Barajas Karla L Caballero University of California Silicon Valley Center October 8th, 2008.

1cs542g-term Notes  Extra class next week (Oct 12, not this Friday)  To submit your assignment: me the URL of a page containing (links to)

5.1 Orthogonality.

Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka Virginia de Sa (UCSD) Cogsci 108F Linear.

SVD(Singular Value Decomposition) and Its Applications

Summarized by Soo-Jin Kim

Chapter 2 Dimensionality Reduction. Linear Methods

CSE554AlignmentSlide 1 CSE 554 Lecture 8: Alignment Fall 2014.

Presented By Wanchen Lu 2/25/2013

Linear Algebra Review 1 CS479/679 Pattern Recognition Dr. George Bebis.

1 February 24 Matrices 3.2 Matrices; Row reduction Standard form of a set of linear equations: Chapter 3 Linear Algebra Matrix of coefficients: Augmented.

CS246 Topic-Based Models. Motivation  Q: For query “car”, will a document with the word “automobile” be returned as a result under the TF-IDF vector.

Some matrix stuff.

CSE554AlignmentSlide 1 CSE 554 Lecture 5: Alignment Fall 2011.

CPSC 491 Xin Liu Nov 17, Introduction Xin Liu PhD student of Dr. Rokne Contact Slides downloadable at pages.cpsc.ucalgary.ca/~liuxin.

Eigenvalues and Eigenvectors

Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

AN ORTHOGONAL PROJECTION

Computing Eigen Information for Small Matrices The eigen equation can be rearranged as follows: Ax = x  Ax = I n x  Ax - I n x = 0  (A - I n )x = 0.

N– variate Gaussian. Some important characteristics: 1)The pdf of n jointly Gaussian R.V.’s is completely described by means, variances and covariances.

SINGULAR VALUE DECOMPOSITION (SVD)

CSE554AlignmentSlide 1 CSE 554 Lecture 8: Alignment Fall 2013.

EIGENSYSTEMS, SVD, PCA Big Data Seminar, Dedi Gadot, December 14 th, 2014.

Instructor: Mircea Nicolescu Lecture 8 CS 485 / 685 Computer Vision.

5 5.1 © 2016 Pearson Education, Ltd. Eigenvalues and Eigenvectors EIGENVECTORS AND EIGENVALUES.

Irena Váňová. B A1A1. A2A2. A3A3. repeat until no sample is misclassified … labels of classes Perceptron algorithm for i=1...N if then end * * * * *

Chapter 61 Chapter 7 Review of Matrix Methods Including: Eigen Vectors, Eigen Values, Principle Components, Singular Value Decomposition.

Unsupervised Learning II Feature Extraction

Boot Camp in Linear Algebra TIM 209 Prof. Ram Akella.

Graphics Graphics Korea University kucg.korea.ac.kr Mathematics for Computer Graphics 고려대학교 컴퓨터 그래픽스 연구실.

CS246 Linear Algebra Review. A Brief Review of Linear Algebra Vector and a list of numbers Addition Scalar multiplication Dot product Dot product as a.

CSE 554 Lecture 8: Alignment

Review of Linear Algebra

Computer vision: models, learning and inference

Lecture: Face Recognition and Feature Reduction

Principal Component Analysis (PCA)

Singular Value Decomposition

Principal Component Analysis

Feature space tansformation methods

Symmetric Matrices and Quadratic Forms

Lecture 13: Singular Value Decomposition (SVD)

Math review - scalars, vectors, and matrices

Eigenvalues and Eigenvectors

Symmetric Matrices and Quadratic Forms

Marios Mattheakis and Pavlos Protopapas

Presentation transcript:

Parallelization of Sparse Coding & Dictionary Learning University of Colorado Denver Parallel Distributed System Fall 2016 Huynh Manh 1/17/2019

Recap::K-SVD algorithm Initialize Sparse Coding Use MP X T T 𝛼 Dictionary Update Column-by-Column by SVD computation 1/17/2019

Matching Pursuit Algorithms K P  X 𝛼 X D  A N K Each example is a linear combination of atoms from D Each example has a sparse representation with no more than L atoms 1/17/2019

Matching Pursuit Algorithms 1/17/2019

K-SVD algorithm Here is three-dimensional data set, spanned by over-complete dictionary of four vectors. What we want is to update each of these vector to better represent the data. 1/17/2019

K-SVD algorithm 1. Remove one of these vector If we do sparse coding using only three vectors, from the dictionary, we cannot perfectly represent the data. 1/17/2019

K-SVD algorithm 2. Find approximation error on each data point 1/17/2019

K-SVD algorithm 2. Find approximation error on each data point 1/17/2019

K-SVD algorithm 3. Apply SVD on error matrix The SVD provides us a set of orthogonal basis vector sorted in order of decreasing ability to represent the variance error matrix. 1/17/2019

K-SVD algorithm 3. Replace the chosen vector with the first eigenvector of error matrix. 4. Do the same for other vectors. 1/17/2019

K-SVD algorithm 1. Initialize the dictionary randomly 2. Using any pursuit algorithm to find a sparse coding 𝛼, for the input data X using dictionary D. 3. Update D: a. Remove a basis vector 𝑑 𝑘 b. Compute the approximation error 𝐸 𝑘 on data points that were actually using 𝑑 𝑘 c. Take SVD of 𝐸 𝑘 d. Update 𝑑 𝑘 . 4. Repeat to step 2 until convergence. 1/17/2019

Content Tracking framework Feature extraction Singular Decomposition Value (SVD) Serial implementation. GPU implementation and comparisons. A summary Next work

Multi-target Tracking Feature Extraction Video Sequence Linear motion Motion Affinity Final Affinity Appearance Hungarian Algorithm Human Detection Bounding Box Sparse Coding (Appearance Affinity) Assignment Matrix Tracking Results + Spatial information (x,y) Dictionary Learning + Concatenation operator

Multi-target Tracking Feature Extraction Video Sequence Linear motion Motion Affinity Final Affinity Appearance Hungarian Algorithm Human Detection Bounding Box Sparse Coding (Appearance Affinity) Assignment Matrix Tracking Results + Spatial information (x,y) Dictionary Learning + Concatenation operator

Feature Extraction Calculating Color Histogram on upper half and bottom half separately. R:[nbins x 1] G:[nbins x1] B:[nbins x1] All feature channels are concatenated : [6*nbins x 1] Spatial location (x,y): [2x1] Overall, feature size = [(6*nbins + 2) x 1] If nbins = 1, feature size = [8x1] If nbins = 255, feature size = [1534 x 1] R: [nbins x1] G:[nbins x1] B:[nbins x 1]

K-SVD algorithm 3. Apply SVD on error matrix We want to replace a atom with the principal component that has the most variance. 1/17/2019

Principal Component Analysis (PCA) Change of basis PCA asks: “Is there another basis, which is a linear combination of the original basis, that best re-express our data set ? “ Let 𝑋 and 𝑌 be 𝑚 𝑥 𝑛 matrices related by a linear transformation P 𝑃𝑋=𝑌 P is a rotation and a stretch which transform X into Y. Then, rows of P, 𝑝 1 , …., 𝑝 𝑚 are a set of new basis vectors for X

Principal Component Analysis (PCA) Change of basis Then, rows of linear transformation matrix P, 𝑝 1 , …., 𝑝 𝑚 are a set of new basis vectors (important) Each coefficient of 𝑦 𝑖 is a dot product of 𝑥 𝑖 with the corresponding row in P

Principal Component Analysis (PCA) The goal Reducing noise

Principal Component Analysis (PCA) The goal Reducing redundancy

Principal Component Analysis (PCA) The goal We would like each variable to co-vary as little as possible with other variables. Diagonalize covariance matrix 𝑆 𝑥 of X. (assumes X is zero-mean matrix)

Principal Component Analysis (PCA) How to find principal component of X ? Find an orthonormal linear transformation P where 𝑌=𝑃𝑋 such that 𝑆 𝑌 = 1 𝑛−1 𝑌 𝑌 𝑇 is diagonalized. Remember that , rows of linear transformation matrix P, 𝑝 1 , …., 𝑝 𝑚 are a set of new basis vectors. (previous slide) => rows of P are the principal components of X

Principal Component Analysis (PCA) How to find principal component of X ? Solution 1 Theorem: A symmetric matrix is diagonalized by a matrix of its orthogonal eigenvectors. Proof: [3] If 𝐴=𝑋 𝑋 𝑇 so A is a symmetric matrix and 𝐴=𝐸𝐷 𝐸 𝑇 Where: E is a matrix of eigenvectors D is a diagonal matrix.

Principal Component Analysis (PCA) How to find principal component of X ? Solution 1 Find an orthonormal linear transformation P where 𝑌=𝑃𝑋 such that 𝑆 𝑌 = 1 𝑛−1 𝑌 𝑌 𝑇 is diagonalized.

Principal Component Analysis (PCA) How to find principal component of X ? Solution 1 Find an orthonormal linear transformation P where 𝑌=𝑃𝑋 such that 𝑆 𝑌 = 1 𝑛−1 𝑌 𝑌 𝑇 is diagonalized. Choose 𝑃= 𝐸 𝑇 , meaning that each row of P is an eigenvector of 𝑋 𝑋 𝑇 we have,

Principal Component Analysis (PCA) How to find principal component of X ? Solution 1 - Conclusion Find an orthonormal linear transformation P where 𝑌=𝑃𝑋 such that 𝑆 𝑌 = 1 𝑛−1 𝑌 𝑌 𝑇 is diagonalized. Thus, if P is chosen as the eigenvector matrix of 𝑋 𝑋 𝑇 , then P is the principal component of X.

Principal Component Analysis (PCA) How to find principal component of X ? Solution 2 - SVD Theorem: Any arbitrary [n x m] matrix X has a single value decomposition. Proof: [3] Where: V { 𝑣 1 , 𝑣 2 , … 𝑟 𝑟 } is the set of orthonormal m x 1 eigenvectors of 𝑋 𝑋 𝑇 ; U { 𝑢 1 , 𝑢 2 , …, 𝑢 𝑟 } is the set of orthonormal n x 1 defined by 𝑢 𝑖 = 1 𝜎 𝑖 𝑋 𝑣 𝑖 𝜎 𝑖 = 𝜆 𝑖 is singular values.

Principal Component Analysis (PCA) How to find U, Σ , 𝑉 Principal Component Analysis (PCA) How to find U, Σ , 𝑉? Solution 2 - SVD 𝑋 𝑇 𝑋= 𝑈Σ 𝑉 ∗ 𝑇 𝑈Σ 𝑉 ∗ =VΣ 𝑈 ∗ 𝑈Σ 𝑉 ∗ =𝑉 Σ 2 𝑉 ∗ 𝑋 𝑇 𝑋𝑉=𝑉 Σ 2 𝑉 ∗ 𝑉=𝑉 Σ 2 It has the form: 𝑋𝑣 =𝜆𝑣 solve linear equation

Serial implementation Initialize D Sparse Coding Use MP Dictionary Update Column-by-Column by SVD computation

Serial implementation Matching pursuit

Serial implementation Learning Dictionary Update D: a. Remove a basis vector 𝑑 𝑘 b. Compute the approximation error 𝐸 𝑘 on data points that were actually using 𝑑 𝑘 c. Take SVD of 𝐸 𝑘 d. Update 𝑑 𝑘 .

GPU implementation Matching pursuit vector- matrix multiplication “modified” max reduction Inner product sum reduction Scalar product subtraction

GPU implementation Matching pursuit –max reduction Product <input,atom> T0 T1 T2 T3 T4 T5 T6 T7 > > > > T0 T1 T2 T3 > > T0 T1 > isChoosen[this atom index ] = true; max T0

GPU implementation Matching pursuit –max reduction – data layout Since an atom is chosen only one time 1 2 3 4 5 6 7 atom index 1 isChosen Array Product <input,atom> T0 T1 T2 T3 T4 T5 T6 T7 > > > > T0 T1 T2 T3

GPU implementation Matching pursuit –max reduction Additional branch condition is needed Whenever do swap, also swap its index and it’s isChosen value

GPU implementation Learning Dictionary Divided into 2 sets: - independent atoms - dependence atoms Update D: a. Remove a basis vector 𝑑 𝑘 b. Compute the approximation error 𝐸 𝑘 on data points that were actually using 𝑑 𝑘 c. Take SVD of 𝐸 𝑘 d. Update 𝑑 𝑘 . Sparse matrix multiplication Parallelized SVD

GPU implementation Learning Dictionary Divided into 2 sets: - independent atoms - dependence atoms Update D: a. Remove a basis vector 𝑑 𝑘 b. Compute the approximation error 𝐸 𝑘 on data points that were actually using 𝑑 𝑘 c. Take SVD of 𝐸 𝑘 d. Update 𝑑 𝑘 . Sparse matrix multiplication Parallelized SVD

GPU implementation Dataset PETS09-S2L1 – MOT Challenge 2015. numTargets / frame = 4-10 persons Number of frames: 785 Frame rate: 15fps Frame Dimension: 768x576

GPU implementation Result feature size = 512 Sparsity 30% nDetections dicSize gpu(ms) cpu(ms) Speed up 3 20 10 0.5 4 49 50 100 2 101 70 420 6 5 203 260 1650 6.34 299 460 3780 8.21 401 840 8580 10.21 500 1130 16670 14.75 7 599 3250 27870 8.57 700 3510 32909 9.37 800 5640 41100 7.28 900 7340 51390 7.00

What’s next ? Getting more results/analysis (will be in report) Finish parallelizing update dictionary Working on independent/ dependent sets Sparse matrix multiplication SVD Existing SVD parallel library for CUDA: CuBLAS, CULA Comparing my own SVD (solving one linear equations) with function in these libraries.

What’s next ? Avoid sending the whole Dictionary, Input Features to GPU every frame, but instead concatenating them. Algorithms for Better Dictionary Quality Fit the GPU’s memory Still good for tracking. Parallelizing the whole multi-target tracking system.

References [1] http://mathworld.wolfram.com/EigenDecompositionTheorem.html [2] http://stattrek.com/matrix-algebra/covariance-matrix.aspx [3] eigenvectors and SVD https://www.cs.ubc.ca/~murphyk/Teaching/Stat406 Spring08/Lectures/linalg1.pdf [4] https://www.cs.princeton.edu/picasso/mats/PCA-Tutorial-Intuition_jp.pdf