Download presentation
Presentation is loading. Please wait.
Published byHelena Sims Modified over 9 years ago
1
Informatics and Mathematical Modelling / Intelligent Signal Processing 1 Sparse’09 8 April 2009 Sparse Coding and Automatic Relevance Determination for Multi-way models Morten Mørup DTU Informatics Intelligent Signal Processing Technical University of Denmark Joint work with Lars Kai Hansen DTU Informatics Intelligent Signal Processing Technical University of Denmark
2
Informatics and Mathematical Modelling / Intelligent Signal Processing 2 Sparse’09 8 April 2009 Multi-way modeling has become an important tool for Unsupervised Learning and are in frequent use today in a variety of fields including Psychometrics (Subject x Task x Time) Chemometrics (Sample x Emission x Absorption) NeuroImaging(Channel x Time x Trial) Textmining(Term x Document x Time/User) Signal Processing (ICA, i.e. diagonalization of Cummulants) Vector: 1-way array/ 1st order tensor, Matrix: 2-way array/ 2nd order tensor, 3-way array/3rd order tensor
3
Informatics and Mathematical Modelling / Intelligent Signal Processing 3 Sparse’09 8 April 2009 Some Tensor algebra Matricizing X X (n) N-mode multiplication i.e. for matrix multiplication CA=A x 1 C, AB T =A x 2 B I x JK Matrix K x IJ Matrix J x KI Matrix X I x J x K X (1) X (2) X (3)
4
Informatics and Mathematical Modelling / Intelligent Signal Processing 4 Sparse’09 8 April 2009 Tensor Decomposition The two most commonly used tensor decomposition models are the CandeComp/PARAFAC (CP) model and the Tucker model
5
Informatics and Mathematical Modelling / Intelligent Signal Processing 5 Sparse’09 8 April 2009 CP-model unique Tucker model not unique What constitutes an adequate number of components, i.e. determining J for CP and J 1,J 2,…,J N for Tucker is an open problem, particularly difficult for the Tucker model as the number of components are specified for each modality separately
6
Informatics and Mathematical Modelling / Intelligent Signal Processing 6 Sparse’09 8 April 2009 Agenda To use sparse coding to Simplify the Tucker core forming a unique representation as well as enable interpolation between the Tucker (full core) and CP (diagonal core) model To use sparse coding to turn off excess components in the CP and Tucker model and thereby select the model order. To tune the pruning strength from data by Automatic Relevance Determination (ARD).
7
Informatics and Mathematical Modelling / Intelligent Signal Processing 7 Sparse’09 8 April 2009 Sparse Coding Bruno A. Olshausen David J. Field Nature, 1996 Preserve InformationPreserve Sparsity (Simplicity) Tradeoff parameter
8
Informatics and Mathematical Modelling / Intelligent Signal Processing 8 Sparse’09 8 April 2009 Sparse Coding (cont.) Patch image Vectorize patches … Image patch 1 to N A corresponds to Gabor like features resembling simple cell behavior in V1 of the brain! Sparse coding attempts to both learn dictionary and encoding at the same time (This problem is not convex!)
9
Informatics and Mathematical Modelling / Intelligent Signal Processing 9 Sparse’09 8 April 2009 Automatic Relevance Determination (ARD) Automatic Relevance Determination (ARD) is a hierarchical Bayesian approach widely used for model selection In ARD hyper-parameters explicitly represents the relevance of different features by defining the range of variation. (i.e., Range of variation0 Feature removed)
10
Informatics and Mathematical Modelling / Intelligent Signal Processing 10 Sparse’09 8 April 2009 A Bayesian formulation of the Lasso / BPD problem
11
Informatics and Mathematical Modelling / Intelligent Signal Processing 11 Sparse’09 8 April 2009 ARD in reality a L0-norm optimization scheme. As such ARD based on Laplace prior corresponds to L0-norm optimization by re-weighted L1-norm L0 norm by re-weighted L2 follows by imposing Gaussian prior instead of Laplace
12
Informatics and Mathematical Modelling / Intelligent Signal Processing 12 Sparse’09 8 April 2009 Sparse Tucker decomposition by ARD Brakes into standard Lasso/BPD sub-problems of the form
13
Informatics and Mathematical Modelling / Intelligent Signal Processing 13 Sparse’09 8 April 2009 Solving efficiently the Lasso/BPD sub-problems Notice, in general the alternating sub-problems for Tucker estimation have J<I!
14
Informatics and Mathematical Modelling / Intelligent Signal Processing 14 Sparse’09 8 April 2009 The Tucker ARD Algorithm
15
Informatics and Mathematical Modelling / Intelligent Signal Processing 15 Sparse’09 8 April 2009 Results Tucker(10,10,10) models were fitted to the data, given are below the extracted cores Fluorescene data: emission x excitation x samples Synthetic Tucker(3,4,5)Tucker(3,6,4) Tucker(3,3,3) Tucker(?,4,4)Tucker(4,4,4)
16
Informatics and Mathematical Modelling / Intelligent Signal Processing 16 Sparse’09 8 April 2009 Sparse Coding in fact 2-way case of Tucker model The presented ARD framework extracts a 20 component model when trained on the natural image data of (Olshausen and Field, Nature 1996) Patch image Vectorize patches … Image patch 1 to N
17
Informatics and Mathematical Modelling / Intelligent Signal Processing 17 Sparse’09 8 April 2009 Conclusion Imposing sparseness on the core enable to interpolate between Tucker and CP model ARD based on MAP estimation form a simple framework to tune the pruning in sparse coding models giving the model order. ARD framework especially useful for the Tucker model where the order is specified for each mode separately which makes exhaustive evaluation of all potential models expensive. ARD-estimation based on MAP is closely related to L0 norm estimation based on reweighted L1 and L2 norm. Thus, ARD form a principled framework for learning the sparsity pattern in CS.
18
Informatics and Mathematical Modelling / Intelligent Signal Processing 18 Sparse’09 8 April 2009 http://mmds.imm.dtu.dk
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.