Informatics and Mathematical Modelling / Intelligent Signal Processing 1 EUSIPCO’09 27 August 2009 Tuning Pruning in Sparse Non-negative Matrix Factorization.

Slides:

Advertisements

Similar presentations

Bayesian Inference Chris Mathys Wellcome Trust Centre for Neuroimaging UCL SPM Course London, May 12, 2014 Thanks to Jean Daunizeau and Jérémie Mattout.

Advertisements

Informatics and Mathematical Modelling / Intelligent Signal Processing 1 Morten Mørup Decomposing event related EEG using Parallel Factor Morten Mørup.

Informatics and Mathematical Modelling / Intelligent Signal Processing 1 Morten Mørup Extensions of Non-negative Matrix Factorization to Higher Order data.

MEG/EEG Inverse problem and solutions In a Bayesian Framework EEG/MEG SPM course, Bruxelles, 2011 Jérémie Mattout Lyon Neuroscience Research Centre ? ?

Fast Bayesian Matching Pursuit Presenter: Changchun Zhang ECE / CMR Tennessee Technological University November 12, 2010 Reading Group (Authors: Philip.

Multi-Task Compressive Sensing with Dirichlet Process Priors Yuting Qi 1, Dehong Liu 1, David Dunson 2, and Lawrence Carin 1 1 Department of Electrical.

Latent Causal Modelling of Neuroimaging Data Informatics and Mathematical Modeling Morten Mørup 1 1 Cognitive Systems, DTU Informatics, Denmark, 2 Danish.

Informatics and Mathematical Modelling / Cognitive Sysemts Group 1 MLSP 2010 September 1st Archetypal Analysis for Machine Learning Morten Mørup DTU Informatics.

Rob Fergus Courant Institute of Mathematical Sciences New York University A Variational Approach to Blind Image Deconvolution.

Bayesian Robust Principal Component Analysis Presenter: Raghu Ranganathan ECE / CMR Tennessee Technological University January 21, 2011 Reading Group (Xinghao.

Visual Recognition Tutorial

ERPWAVELAB 1st International Summer School in Biomedical Engineering1st International Summer School in Biomedical Engineering August 8, st International.

Informatics and Mathematical Modelling / Intelligent Signal Processing 1 EMMDS 2009 July 3rd, 2009 Clustering on the Simplex Morten Mørup DTU Informatics.

Non-negative Tensor Decompositions

Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.

Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc Variational Bayes 101.

Expectation Maximization for GMM Comp344 Tutorial Kai Zhang.

Computer vision: models, learning and inference

Informatics and Mathematical Modelling / Intelligent Signal Processing ISCAS Morten Mørup Approximate L0 constrained NMF/NTF Morten Mørup Informatics.

Study of Sparse Online Gaussian Process for Regression EE645 Final Project May 2005 Eric Saint Georges.

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

Cao et al. ICML 2010 Presented by Danushka Bollegala.

The horseshoe estimator for sparse signals CARLOS M. CARVALHO NICHOLAS G. POLSON JAMES G. SCOTT Biometrika (2010) Presented by Eric Wang 10/14/2010.

ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:

Non Negative Matrix Factorization

Shifted Independent Component Analysis Morten Mørup, Kristoffer Hougaard Madsen and Lars Kai Hansen The shift problem Informatics and Mathematical Modelling.

Learning With Structured Sparsity

Particle Filters for Shape Correspondence Presenter: Jingting Zeng.

Informatics and Mathematical Modelling / Intelligent Signal Processing 1 Sparse’09 8 April 2009 Sparse Coding and Automatic Relevance Determination for.

ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.

EEG/MEG Source Localisation SPM Course – Wellcome Trust Centre for Neuroimaging – Oct ? ? Jérémie Mattout, Christophe Phillips Jean Daunizeau Guillaume.

A Passive Approach to Sensor Network Localization Rahul Biswas and Sebastian Thrun International Conference on Intelligent Robots and Systems 2004 Presented.

A Clustering Method Based on Nonnegative Matrix Factorization for Text Mining Farial Shahnaz.

Slides for “Data Mining” by I. H. Witten and E. Frank.

BCS547 Neural Decoding.

Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.

1 Analytic Solution of Hierarchical Variational Bayes Approach in Linear Inverse Problem Shinichi Nakajima, Sumio Watanabe Nikon Corporation Tokyo Institute.

Biointelligence Laboratory, Seoul National University

Endmember Extraction from Highly Mixed Data Using MVC-NMF Lidan Miao AICIP Group Meeting Apr. 6, 2006 Lidan Miao AICIP Group Meeting Apr. 6, 2006.

Boosted Particle Filter: Multitarget Detection and Tracking Fayin Li.

1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

Extensions of Non-Negative Matrix Factorization (NMF) to Higher Order Data HONMF (Higher Order Non-negative Matrix Factorization) NTF2D/SNTF2D ((Sparse)

Guest lecture: Feature Selection Alan Qi Dec 2, 2004.

Lecture 2: Statistical learning primer for biologists

5. Maximum Likelihood –II Prof. Yuille. Stat 231. Fall 2004.

NONNEGATIVE MATRIX FACTORIZATION WITH MATRIX EXPONENTIATION Siwei Lyu ICASSP 2010 Presenter : 張庭豪.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: MLLR For Two Gaussians Mean and Variance Adaptation MATLB Example Resources:

6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.

Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.

Lecture 1.31 Criteria for optimal reception of radio signals.

Probability Theory and Parameter Estimation I

Zhu Han University of Houston Thanks for Dr. Mingyi Hong’s slides

LECTURE 09: BAYESIAN ESTIMATION (Cont.)

Ch3: Model Building through Regression

LECTURE 03: DECISION SURFACES

Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani

Special Topics In Scientific Computing

Learning with information of features

PRAKASH CHOCKALINGAM, NALIN PRADEEP, AND STAN BIRCHFIELD

Statistical NLP: Lecture 4

Mixture Models with Adaptive Spatial Priors

Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks

Non-Negative Matrix Factorization

An Efficient Projection for L1-∞ Regularization

Presentation transcript:

Informatics and Mathematical Modelling / Intelligent Signal Processing 1 EUSIPCO’09 27 August 2009 Tuning Pruning in Sparse Non-negative Matrix Factorization Morten Mørup DTU Informatics Intelligent Signal Processing Technical University of Denmark Joint work with Lars Kai Hansen DTU Informatics Intelligent Signal Processing Technical University of Denmark

Informatics and Mathematical Modelling / Intelligent Signal Processing 2 EUSIPCO’09 27 August 2009 VWH, V ≥0,W ≥0,H≥0 Non-negative Matrix Factorization (NMF) Nature 1999 Sebastian Seung Daniel D. Lee Gives part-based representation (and as such also promotes sparse representations) (Lee and Seung, 1999) Also named Positive Matrix Factorization (PMF) (Paatero and Tapper, 1994) Popularized due to a simple algorithmic procedure based on multiplicative update (Lee & Seung, 2001) V  W H

Informatics and Mathematical Modelling / Intelligent Signal Processing 3 EUSIPCO’09 27 August 2009 (first part of this talk) A good starting point is not to use multiplicative updates Roadmap: Some important challenges in NMF How to efficiently compute the decomposition (NMF is a non-convex problem) How to resolve the non-uniqueness of the decomposition How to determine the number of components z y x Convex Hull z y x Positive Orthant z y x (second part of this talk) We will demonstrate that Automatic Relevance Determination in Bayesian learning can address these challenges by tuning the pruning in sparse NMF NMF only unique when data adequately spans the positive orthant (Donoho & Stodden, 2004)

Informatics and Mathematical Modelling / Intelligent Signal Processing 4 EUSIPCO’09 27 August 2009 Multiplicative updates Step size parameter (Salakhutdinov, Roweis, Ghahramani, 2004) (Lee & Seung, 2001)

Informatics and Mathematical Modelling / Intelligent Signal Processing 5 EUSIPCO’09 27 August 2009 Other common approaches for solving the NMF problem Active set procedure (Analytic closed form solution wihtin active set for LS-error) (Lawson and Hansen 1974), (R. Bro and S. de Jong 1997) Projected gradient (C.-J. Lin 2007) MU do not converge to optimal solution!!!!

Informatics and Mathematical Modelling / Intelligent Signal Processing 6 EUSIPCO’09 27 August 2009 Sparseness has been imposed to alleviate the non-uniqueness of NMF (P. Hoyer 2002, 2004), (J. Eggert and E. Körner 2004) Sparseness motivated by the principle of parsimony, i.e. forming the simplest account. As such sparseness is also related to VARIMAX and ML-ICA based on sparse priors

Informatics and Mathematical Modelling / Intelligent Signal Processing 7 EUSIPCO’09 27 August 2009 Open problems for Sparse NMF (SNMF) What is the adequate degree of sparsity imposed  What is the adequate number of components K to model the data Both issues can be posed as the single problem of tuning the pruning in sparse NMF (SNMF). Hence, by imposing a component wise sparsity penalty the above problems boils down to determining  k.  k  results in k th component turned off (i.e. removed).

Informatics and Mathematical Modelling / Intelligent Signal Processing 8 EUSIPCO’09 27 August 2009 Bayesian Learning and the Principle of Parsimony To get the posterior probability distribution, multiply the prior probability distribution by the likelihood function and then normalize The explanation of any phenomenon should make as few assumptions as possible, eliminating those that make no difference in the observable predictions of the explanatory hypothesis or theory. Bayesian learning embodies Occam’s razor, i.e. Complex models are penalized. The horizontal axis represents the space of possible data sets D. Bayes rule rewards models in proportion to how much they predicted the data that occurred. These predictions are quantified by a normalized probability distribution on D. David J.C. MacKay Thomas Bayes William of Ockham

Informatics and Mathematical Modelling / Intelligent Signal Processing 9 EUSIPCO’09 27 August 2009 SNMF in a Bayesian formulation Likelihood functionPrior (In the hierarchical Bayesian framework priors on  can further be imposed)

Informatics and Mathematical Modelling / Intelligent Signal Processing 10 EUSIPCO’09 27 August 2009 The log posterior for Sparse NMF is now given by The contribution in the log posterior from the normalization constant of the priors enables to learn from data the regularization strength  (This is also known as Automatic Relevance Determination (ARD)) When Inserting this value for  in the objective it can be seen that ARD corresponds to a reweighted L 0 -norm optimization scheme of the component activation

Informatics and Mathematical Modelling / Intelligent Signal Processing 11 EUSIPCO’09 27 August 2009 No closed form solution for posterior moments of W and H due to non-negativity constraint and use of non- conjugate priors. Posterior distribution can be estimated by sampling approaches, c.f. previous talk by Mikkel Schmidt. Point estimates of W and H can be obtained by maximum a posteriori (MAP) estimation forming a regular sparse NMF optimization problem. Tuning Pruning algorithm for sparse NMF

Informatics and Mathematical Modelling / Intelligent Signal Processing 12 EUSIPCO’09 27 August 2009 Data results Handwritten digits: X 256 Pixels x 7291 digits CBCL face database: X 361 Pixels x 2429 faces Wavelet transformed EEG: X 64 channels x tim.-freq. bins

Informatics and Mathematical Modelling / Intelligent Signal Processing 13 EUSIPCO’09 27 August 2009 Analyzing X vs. X T Handwritten digits (X): X 256 Pixels x 7291 digits Handwritten digits (X T ): X 7291 digits x 256 Pixels SNMF has clustering like-properties (As reported in Ding, 2005) SNMF have part based representation (As reported in Lee&Seung, 1999)

Informatics and Mathematical Modelling / Intelligent Signal Processing 14 EUSIPCO’09 27 August 2009 Conclusion Bayesian learning forms a simple framework for tuning the pruning in sparse NMF thereby both establishing the model order as well as resolving the non-uniqueness of the NMF representation. Likelihood function (i.e. KL (Poisson noise) vs. LS (Gaussian noise)) heavily impacted the extracted number of components. In comparison, a tensor decomposition study given in (Mørup et al., Journal of Chemometrics 2009) demonstrated that the choice of prior distribution only has limited effect for the model order estimation. Many other conceivable parameterizations of the prior as well as approaches to parameter estimation. However, Bayesian learning forms a promising framework for model order estimation as well as resolving ambiguities in the NMF model through the tuning of the pruning.