Informatics and Mathematical Modelling / Intelligent Signal Processing 1 Sparse’09 8 April 2009 Sparse Coding and Automatic Relevance Determination for.

Slides:



Advertisements
Similar presentations
CSI :Florida A BAYESIAN APPROACH TO LOCALIZED MULTI-KERNEL LEARNING USING THE RELEVANCE VECTOR MACHINE R. Close, J. Wilson, P. Gader.
Advertisements

V-1 Part V: Collaborative Signal Processing Akbar Sayeed.
Informatics and Mathematical Modelling / Intelligent Signal Processing 1 Morten Mørup Decomposing event related EEG using Parallel Factor Morten Mørup.
HMAX Models Architecture Jim Mutch March 31, 2010.
Exploiting Sparse Markov and Covariance Structure in Multiresolution Models Presenter: Zhe Chen ECE / CMR Tennessee Technological University October 22,
Multi-Task Compressive Sensing with Dirichlet Process Priors Yuting Qi 1, Dehong Liu 1, David Dunson 2, and Lawrence Carin 1 1 Department of Electrical.
Latent Causal Modelling of Neuroimaging Data Informatics and Mathematical Modeling Morten Mørup 1 1 Cognitive Systems, DTU Informatics, Denmark, 2 Danish.
Patch-based Image Deconvolution via Joint Modeling of Sparse Priors Chao Jia and Brian L. Evans The University of Texas at Austin 12 Sep
A novel supervised feature extraction and classification framework for land cover recognition of the off-land scenario Yan Cui
An Introduction to Sparse Coding, Sparse Sensing, and Optimization Speaker: Wei-Lun Chao Date: Nov. 23, 2011 DISP Lab, Graduate Institute of Communication.
Image classification by sparse coding.
Quadtrees, Octrees and their Applications in Digital Image Processing
Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.
Bayesian Robust Principal Component Analysis Presenter: Raghu Ranganathan ECE / CMR Tennessee Technological University January 21, 2011 Reading Group (Xinghao.
“Random Projections on Smooth Manifolds” -A short summary
Pattern Recognition and Machine Learning
Informatics and Mathematical Modelling / Intelligent Signal Processing 1 EMMDS 2009 July 3rd, 2009 Clustering on the Simplex Morten Mørup DTU Informatics.
Efficient Sparse Coding Algorithms
Image Denoising via Learned Dictionaries and Sparse Representations
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Dimensional reduction, PCA
Quadtrees, Octrees and their Applications in Digital Image Processing
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Independent Component Analysis (ICA) and Factor Analysis (FA)
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Informatics and Mathematical Modelling / Intelligent Signal Processing ISCAS Morten Mørup Approximate L0 constrained NMF/NTF Morten Mørup Informatics.
An ALPS’ view of Sparse Recovery Volkan Cevher Laboratory for Information and Inference Systems - LIONS
Super-Resolution of Remotely-Sensed Images Using a Learning-Based Approach Isabelle Bégin and Frank P. Ferrie Abstract Super-resolution addresses the problem.
An Introduction to Support Vector Machines Martin Law.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Informatics and Mathematical Modelling / Intelligent Signal Processing 1 EUSIPCO’09 27 August 2009 Tuning Pruning in Sparse Non-negative Matrix Factorization.
Multimodal Interaction Dr. Mike Spann
A Framework for Mining Signatures from Event Sequences and Its Applications in Healthcare Data.
Mining Discriminative Components With Low-Rank and Sparsity Constraints for Face Recognition Qiang Zhang, Baoxin Li Computer Science and Engineering Arizona.
Non Negative Matrix Factorization
Cs: compressed sensing
General Tensor Discriminant Analysis and Gabor Features for Gait Recognition by D. Tao, X. Li, and J. Maybank, TPAMI 2007 Presented by Iulian Pruteanu.
Introduction to tensor, tensor factorization and its applications
INDEPENDENT COMPONENT ANALYSIS OF TEXTURES based on the article R.Manduchi, J. Portilla, ICA of Textures, The Proc. of the 7 th IEEE Int. Conf. On Comp.
EE369C Final Project: Accelerated Flip Angle Sequences Jan 9, 2012 Jason Su.
Shifted Independent Component Analysis Morten Mørup, Kristoffer Hougaard Madsen and Lars Kai Hansen The shift problem Informatics and Mathematical Modelling.
Unsupervised Learning of Compositional Sparse Code for Natural Image Representation Ying Nian Wu UCLA Department of Statistics October 5, 2012, MURI Meeting.
Learning With Structured Sparsity
A Tensorial Approach to Access Cognitive Workload related to Mental Arithmetic from EEG Functional Connectivity Estimates S.I. Dimitriadis, Yu Sun, K.
An Introduction to Support Vector Machines (M. Law)
A DISTRIBUTION BASED VIDEO REPRESENTATION FOR HUMAN ACTION RECOGNITION Yan Song, Sheng Tang, Yan-Tao Zheng, Tat-Seng Chua, Yongdong Zhang, Shouxun Lin.
Joint Sparse Representation of Brain Activity Patterns in Multi-Task fMRI Data 2015/03/21.
Some Aspects of Bayesian Approach to Model Selection Vetrov Dmitry Dorodnicyn Computing Centre of RAS, Moscow.
Biointelligence Laboratory, Seoul National University
Extensions of Non-Negative Matrix Factorization (NMF) to Higher Order Data HONMF (Higher Order Non-negative Matrix Factorization) NTF2D/SNTF2D ((Sparse)
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.
Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP
SGPP: Spatial Gaussian Predictive Process Models for Neuroimaging Data Yimei Li Department of Biostatistics St. Jude Children’s Research Hospital Joint.
2D-LDA: A statistical linear discriminant analysis for image matrix
Machine Learning CUNY Graduate Center Lecture 6: Linear Regression II.
Multi-label Prediction via Sparse Infinite CCA Piyush Rai and Hal Daume III NIPS 2009 Presented by Lingbo Li ECE, Duke University July 16th, 2010 Note:
Xiaoying Pang Indiana University March. 17 th, 2010 Independent Component Analysis for Beam Measurement.
Jianchao Yang, John Wright, Thomas Huang, Yi Ma CVPR 2008 Image Super-Resolution as Sparse Representation of Raw Image Patches.
Unsupervised Learning II Feature Extraction
From Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images Alfred M. Bruckstein (Technion), David L. Donoho (Stanford), Michael.
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
Big data classification using neural network
Journal of Vision. 2009;9(3):5. doi: /9.3.5 Figure Legend:
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Machine Learning Basics
PCA vs ICA vs LDA.
Sparse Learning Based on L2,1-norm
INFONET Seminar Application Group
Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks
Presentation transcript:

Informatics and Mathematical Modelling / Intelligent Signal Processing 1 Sparse’09 8 April 2009 Sparse Coding and Automatic Relevance Determination for Multi-way models Morten Mørup DTU Informatics Intelligent Signal Processing Technical University of Denmark Joint work with Lars Kai Hansen DTU Informatics Intelligent Signal Processing Technical University of Denmark

Informatics and Mathematical Modelling / Intelligent Signal Processing 2 Sparse’09 8 April 2009 Multi-way modeling has become an important tool for Unsupervised Learning and are in frequent use today in a variety of fields including Psychometrics (Subject x Task x Time) Chemometrics (Sample x Emission x Absorption) NeuroImaging(Channel x Time x Trial) Textmining(Term x Document x Time/User) Signal Processing (ICA, i.e. diagonalization of Cummulants) Vector: 1-way array/ 1st order tensor, Matrix: 2-way array/ 2nd order tensor, 3-way array/3rd order tensor

Informatics and Mathematical Modelling / Intelligent Signal Processing 3 Sparse’09 8 April 2009 Some Tensor algebra Matricizing X  X (n) N-mode multiplication i.e. for matrix multiplication CA=A x 1 C, AB T =A x 2 B I x JK Matrix K x IJ Matrix J x KI Matrix X I x J x K X (1) X (2) X (3)

Informatics and Mathematical Modelling / Intelligent Signal Processing 4 Sparse’09 8 April 2009 Tensor Decomposition The two most commonly used tensor decomposition models are the CandeComp/PARAFAC (CP) model and the Tucker model

Informatics and Mathematical Modelling / Intelligent Signal Processing 5 Sparse’09 8 April 2009 CP-model unique Tucker model not unique What constitutes an adequate number of components, i.e. determining J for CP and J 1,J 2,…,J N for Tucker is an open problem, particularly difficult for the Tucker model as the number of components are specified for each modality separately

Informatics and Mathematical Modelling / Intelligent Signal Processing 6 Sparse’09 8 April 2009 Agenda To use sparse coding to Simplify the Tucker core forming a unique representation as well as enable interpolation between the Tucker (full core) and CP (diagonal core) model To use sparse coding to turn off excess components in the CP and Tucker model and thereby select the model order. To tune the pruning strength from data by Automatic Relevance Determination (ARD).

Informatics and Mathematical Modelling / Intelligent Signal Processing 7 Sparse’09 8 April 2009 Sparse Coding Bruno A. Olshausen David J. Field Nature, 1996 Preserve InformationPreserve Sparsity (Simplicity) Tradeoff parameter

Informatics and Mathematical Modelling / Intelligent Signal Processing 8 Sparse’09 8 April 2009 Sparse Coding (cont.) Patch image Vectorize patches … Image patch 1 to N  A corresponds to Gabor like features resembling simple cell behavior in V1 of the brain! Sparse coding attempts to both learn dictionary and encoding at the same time (This problem is not convex!)

Informatics and Mathematical Modelling / Intelligent Signal Processing 9 Sparse’09 8 April 2009 Automatic Relevance Determination (ARD) Automatic Relevance Determination (ARD) is a hierarchical Bayesian approach widely used for model selection In ARD hyper-parameters explicitly represents the relevance of different features by defining the range of variation. (i.e., Range of variation0  Feature removed)

Informatics and Mathematical Modelling / Intelligent Signal Processing 10 Sparse’09 8 April 2009 A Bayesian formulation of the Lasso / BPD problem

Informatics and Mathematical Modelling / Intelligent Signal Processing 11 Sparse’09 8 April 2009 ARD in reality a L0-norm optimization scheme. As such ARD based on Laplace prior corresponds to L0-norm optimization by re-weighted L1-norm L0 norm by re-weighted L2 follows by imposing Gaussian prior instead of Laplace

Informatics and Mathematical Modelling / Intelligent Signal Processing 12 Sparse’09 8 April 2009 Sparse Tucker decomposition by ARD Brakes into standard Lasso/BPD sub-problems of the form

Informatics and Mathematical Modelling / Intelligent Signal Processing 13 Sparse’09 8 April 2009 Solving efficiently the Lasso/BPD sub-problems Notice, in general the alternating sub-problems for Tucker estimation have J<I!

Informatics and Mathematical Modelling / Intelligent Signal Processing 14 Sparse’09 8 April 2009 The Tucker ARD Algorithm

Informatics and Mathematical Modelling / Intelligent Signal Processing 15 Sparse’09 8 April 2009 Results Tucker(10,10,10) models were fitted to the data, given are below the extracted cores Fluorescene data: emission x excitation x samples Synthetic Tucker(3,4,5)Tucker(3,6,4) Tucker(3,3,3) Tucker(?,4,4)Tucker(4,4,4)

Informatics and Mathematical Modelling / Intelligent Signal Processing 16 Sparse’09 8 April 2009 Sparse Coding in fact 2-way case of Tucker model The presented ARD framework extracts a 20 component model when trained on the natural image data of (Olshausen and Field, Nature 1996) Patch image Vectorize patches … Image patch 1 to N 

Informatics and Mathematical Modelling / Intelligent Signal Processing 17 Sparse’09 8 April 2009 Conclusion Imposing sparseness on the core enable to interpolate between Tucker and CP model ARD based on MAP estimation form a simple framework to tune the pruning in sparse coding models giving the model order. ARD framework especially useful for the Tucker model where the order is specified for each mode separately which makes exhaustive evaluation of all potential models expensive. ARD-estimation based on MAP is closely related to L0 norm estimation based on reweighted L1 and L2 norm. Thus, ARD form a principled framework for learning the sparsity pattern in CS.

Informatics and Mathematical Modelling / Intelligent Signal Processing 18 Sparse’09 8 April