Nonsmooth Nonnegative Matrix Factorization (nsNMF) Alberto Pascual-Montano, Member, IEEE, J.M. Carazo, Senior Member, IEEE, Kieko Kochi, Dietrich Lehmann,

Slides:



Advertisements
Similar presentations
Nonnegative Matrix Factorization with Sparseness Constraints S. Race MA591R.
Advertisements

Aggregating local image descriptors into compact codes
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Modulation Spectrum Factorization for Robust Speech Recognition Wen-Yi Chu 1, Jeih-weih Hung 2 and Berlin Chen 1 Presenter : 張庭豪.
Structured Sparse Principal Component Analysis Reading Group Presenter: Peng Zhang Cognitive Radio Institute Friday, October 01, 2010 Authors: Rodolphe.
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) Dimensionality Reductions or data projections Random projections.
LIAL HORNSBY SCHNEIDER
Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers.
Learning Convolutional Feature Hierarchies for Visual Recognition
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
When Measurement Models and Factor Models Conflict: Maximizing Internal Consistency James M. Graham, Ph.D. Western Washington University ABSTRACT: The.
Incremental Learning of Temporally-Coherent Gaussian Mixture Models Ognjen Arandjelović, Roberto Cipolla Engineering Department, University of Cambridge.
Real-time Combined 2D+3D Active Appearance Models Jing Xiao, Simon Baker,Iain Matthew, and Takeo Kanade CVPR 2004 Presented by Pat Chan 23/11/2004.
Learning the parts of objects by nonnegative matrix factorization D.D. Lee from Bell Lab H.S. Seung from MIT Presenter: Zhipeng Zhao.
Distributed Representations of Sentences and Documents
Autoencoders Mostafa Heidarpour
Informatics and Mathematical Modelling / Intelligent Signal Processing ISCAS Morten Mørup Approximate L0 constrained NMF/NTF Morten Mørup Informatics.
Relationships Among Variables
Separate multivariate observations
Summarized by Soo-Jin Kim
NUS CS5247 A dimensionality reduction approach to modeling protein flexibility By, By Miguel L. Teodoro, George N. Phillips J* and Lydia E. Kavraki Rice.
AN ITERATIVE METHOD FOR MODEL PARAMETER IDENTIFICATION 4. DIFFERENTIAL EQUATION MODELS E.Dimitrova, Chr. Boyadjiev E.Dimitrova, Chr. Boyadjiev BULGARIAN.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Repairable Fountain Codes Megasthenis Asteris, Alexandros G. Dimakis IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 32, NO. 5, MAY /5/221.
Online Learning for Matrix Factorization and Sparse Coding
Mining Discriminative Components With Low-Rank and Sparsity Constraints for Face Recognition Qiang Zhang, Baoxin Li Computer Science and Engineering Arizona.
A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA
Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.
Non Negative Matrix Factorization
Extraction of Fetal Electrocardiogram Using Adaptive Neuro-Fuzzy Inference Systems Khaled Assaleh, Senior Member,IEEE M97G0224 黃阡.
Bayesian Sets Zoubin Ghahramani and Kathertine A. Heller NIPS 2005 Presented by Qi An Mar. 17 th, 2006.
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
Local Non-Negative Matrix Factorization as a Visual Representation Tao Feng, Stan Z. Li, Heung-Yeung Shum, HongJiang Zhang 2002 IEEE Presenter : 張庭豪.
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
Chih-Ming Chen, Student Member, IEEE, Ying-ping Chen, Member, IEEE, Tzu-Ching Shen, and John K. Zao, Senior Member, IEEE Evolutionary Computation (CEC),
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
INTRODUCTION When two or more instruments sound the same portion of atmosphere and observe the same species either in different spectral regions or with.
Applied Quantitative Analysis and Practices
Digital Media Lab 1 Data Mining Applied To Fault Detection Shinho Jeong Jaewon Shim Hyunsoo Lee {cinooco, poohut,
Structured Face Hallucination Chih-Yuan Yang Sifei Liu Ming-Hsuan Yang Electrical Engineering and Computer Science 1.
Mingyang Zhu, Huaijiang Sun, Zhigang Deng Quaternion Space Sparse Decomposition for Motion Compression and Retrieval SCA 2012.
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, David W. Cheung, Ben Kao The University of Hong.
Sparse Signals Reconstruction Via Adaptive Iterative Greedy Algorithm Ahmed Aziz, Ahmed Salim, Walid Osamy Presenter : 張庭豪 International Journal of Computer.
A Clustering Method Based on Nonnegative Matrix Factorization for Text Mining Farial Shahnaz.
Optimal Component Analysis Optimal Linear Representations of Images for Object Recognition X. Liu, A. Srivastava, and Kyle Gallivan, “Optimal linear representations.
Simultaneous estimation of monotone trends and seasonal patterns in time series of environmental data By Mohamed Hussian and Anders Grimvall.
Non-negative Matrix Factor Deconvolution; Extracation of Multiple Sound Sources from Monophonic Inputs International Symposium on Independent Component.
CpSc 881: Machine Learning
Multivariate Analysis and Data Reduction. Multivariate Analysis Multivariate analysis tries to find patterns and relationships among multiple dependent.
NONNEGATIVE MATRIX FACTORIZATION WITH MATRIX EXPONENTIATION Siwei Lyu ICASSP 2010 Presenter : 張庭豪.
STATIC ANALYSIS OF UNCERTAIN STRUCTURES USING INTERVAL EIGENVALUE DECOMPOSITION Mehdi Modares Tufts University Robert L. Mullen Case Western Reserve University.
2D-LDA: A statistical linear discriminant analysis for image matrix
Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs C.G. Puntonet and A. Prieto (Eds.): ICA 2004 Presenter.
NMF Demo: Lee, Seung Bryan Russell Computer Demonstration.
Jianchao Yang, John Wright, Thomas Huang, Yi Ma CVPR 2008 Image Super-Resolution as Sparse Representation of Raw Image Patches.
Unsupervised Learning II Feature Extraction
1 Objective To provide background material in support of topics in Digital Image Processing that are based on matrices and/or vectors. Review Matrices.
Non-Negative Matrix Factorization ( NMF ) Reportor: MaPeng Paper :D.D.Lee andS.Seung,”Learning the parts of objects by non-negative matrix factorization”
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
PREDICT 422: Practical Machine Learning
An Image Database Retrieval Scheme Based Upon Multivariate Analysis and Data Mining Presented by C.C. Chang Dept. of Computer Science and Information.
International Workshop
کاربرد نگاشت با حفظ تنکی در شناسایی چهره
Principal Component Analysis (PCA)
REMOTE SENSING Multispectral Image Classification
Non-negative Matrix Factorization (NMF)
Non-Negative Matrix Factorization
NON-NEGATIVE COMPONENT PARTS OF SOUND FOR CLASSIFICATION Yong-Choon Cho, Seungjin Choi, Sung-Yang Bang Wen-Yi Chu Department of Computer Science &
Combination of Feature and Channel Compensation (1/2)
Presentation transcript:

Nonsmooth Nonnegative Matrix Factorization (nsNMF) Alberto Pascual-Montano, Member, IEEE, J.M. Carazo, Senior Member, IEEE, Kieko Kochi, Dietrich Lehmann, and Roberto D. Pascual-Marqui 2006,IEEE Presenter : 張庭豪

Outline NONSMOOTH NMF (nsNMF) EXPERIMENTS CONCLUSIONS AND DISCUSSION INTRODUCTION REVIEW OF NMF AND ITS SPARSE VARIANTS NONSMOOTH NMF (nsNMF) EXPERIMENTS CONCLUSIONS AND DISCUSSION

INTRODUCTION Nonnegative matrix factorization (NMF) has been introduced as a matrix factorization technique that produces a useful decomposition in the analysis of data. This method results in a reduced representation of the original data that can be seen either as a feature extraction or a dimensionality reduction technique. More importantly, NMF can be interpreted as a parts-based representation of the data due to the fact that only additive, not subtractive, combinations are allowed.

INTRODUCTION Formally, the nonnegative matrix decomposition can be described as follows: V ≈WH where 𝑽∈ 𝑹 𝒑×𝒏 is a positive data matrix with p variables and n objects, W∈ 𝑹 𝒑×𝒒 are the reduced q basis vectors or factors, and 𝐇∈ 𝑹 𝒒×𝒏 contains the coefficients of the linear combinations of the basis vectors needed to reconstruct the original data (also known as encoding vectors).

INTRODUCTION In fact, taking a closer look at the basis and encoding vectors produced by NMF, it is noticeable that there is a high degree of overlapping among basis vectors that contradict the intuitive nature of the “parts”. In this sense, a matrix factorization technique capable of producing more localized, less overlapped feature representations of the data is highly desirable in many applications. The new method, here referred to as Nonsmooth Nonnegative Matrix Factorization (nsNMF), differs from the original in the use of an extra smoothness matrix for imposing sparseness. The goal of nsNMF is to find sparse structures in the basis functions that explain the data set.

REVIEW OF NONNEGATIVE MATRIX FACTORIZATION (NMF) AND ITS SPARSE VARIANTS V ≈WH 𝑽∈ 𝑹 𝒑×𝒏 W∈ 𝑹 𝒑×𝒒 𝐇∈ 𝑹 𝒒×𝒏 The columns of W (the basis vectors) are normalized (sum up to 1). The objective function, based on the Poisson likelihood, is: which, after some simplifications and elimination of pure data terms, gives:

REVIEW OF NONNEGATIVE MATRIX FACTORIZATION (NMF) AND ITS SPARSE VARIANTS Taking the derivative with respect to H gives: The gradient algorithm then states: for some step size .

REVIEW OF NONNEGATIVE MATRIX FACTORIZATION (NMF) AND ITS SPARSE VARIANTS Forcing: gives the multiplicative rule:

REVIEW OF NONNEGATIVE MATRIX FACTORIZATION (NMF) AND ITS SPARSE VARIANTS Taking the derivative with respect to W gives: The gradient algorithm then states: Forcing the step size: gives:

REVIEW OF NONNEGATIVE MATRIX FACTORIZATION (NMF) AND ITS SPARSE VARIANTS Finally, the derived algorithm is as follows: 1. Initialize W and H with positive random numbers. 2. For each basis vector 𝑾 𝒂 ∈ 𝑹 𝒑×𝟏 ,update the corresponding encoding vector 𝑯 𝒂 ∈ 𝑹 𝟏×𝒏 , followed by updating and normalizing the basis vector Wa .Repeat this process until convergence. Repeat until convergence: For a = 1...q do begin For b = 1...n do For c=1...p do begin

REVIEW OF NONNEGATIVE MATRIX FACTORIZATION (NMF) AND ITS SPARSE VARIANTS Local Nonnegative Matrix Factorization (LNMF) LNMF algorithm intended for learning spatially localized, parts-based representation of visual patterns. Their aim was to obtain a truly part-based representation of objects by imposing sparseness constraints on the encoding vectors (matrix H) and locality constraints to the basis components (matrix W). Taking the factorization problem defined in (1), define 𝐴= 𝑎 𝑖𝑗 = 𝑊 𝑡 𝑊 and B= 𝑏 𝑖𝑗 =𝐻 𝐻 𝑡 , where A,B ∈ 𝑹 𝒒×𝒒 .

REVIEW OF NONNEGATIVE MATRIX FACTORIZATION (NMF) AND ITS SPARSE VARIANTS The LNMF algorithm is based on the following three additional constraints: 1. Maximum Sparseness in H. It should contain as many zero components as possible. This implies that the number of basis components required to represent V is minimized. Mathematically, each aij should be minimum. 2. Maximum expressiveness of W. This constraint is closely related to the previous one and it aims at further enforcing maximum sparseness in H. Mathematically, 𝑖=1 𝑞 𝑏 𝑖𝑖 should be maximum. 3. Maximum orthogonality of W. This constraint imposes that different bases should be as orthogonal as possible to minimize redundancy. This is forced by minimizing ∀𝑖,𝑗.𝑖≠𝑗 𝑎 𝑖𝑗 . Combining this constraint, with the one described in point 1, the objective is to minimize ∀𝑖,𝑗 𝑎 𝑖𝑗 .

REVIEW OF NONNEGATIVE MATRIX FACTORIZATION (NMF) AND ITS SPARSE VARIANTS Thus, ∀𝑖,𝑗 , the constrained divergence function is: where 𝛼 , 𝛽 > 0 represent some constants for expressing the importance of the additional constraints described above.

REVIEW OF NONNEGATIVE MATRIX FACTORIZATION (NMF) AND ITS SPARSE VARIANTS Nonnegative Sparse Coding (NNSC) Similar to the LNMF algorithm, the Nonnegative Sparse Coding (NNSC) method is intended to decompose multivariate data into a set of positive sparse components Combining a small reconstruction error with a sparseness criterion, the objective function is: where the form of f defines how sparseness on H is measured and controls the trade-off between sparseness and the accuracy of the reconstruction.

REVIEW OF NONNEGATIVE MATRIX FACTORIZATION (NMF) AND ITS SPARSE VARIANTS Sparse Nonnegative Matrix Factorization (SNMF) Instead of using a Euclidean least-square type functional, as in (22), they used a divergence term. Thus, the sparse NMF functional is: for 𝛼 > 0. This method forces sparseness via minimizing the sum of all Hij. The update rule for matrix H is: while the update rule for W is expressed in (17) and (18).

OUR PROPOSAL: NONSMOOTH NONNEGATIVE MATRIX FACTORIZATION (nsNMF) Because of the multiplicative nature of the model, i.e., “basis” multiplied by “encoding,” sparseness in one of the factors will almost certainly force “nonsparseness” or smoothness in the other. On the other hand, forcing sparseness constraints on both the basis and the encoding vectors will deteriorate the goodness of fit of the model to the data. Therefore, from the outset, this approach is doomed to failure in achieving generalized sparseness and satisfactory goodness of fit.

OUR PROPOSAL: NONSMOOTH NONNEGATIVE MATRIX FACTORIZATION (nsNMF) The new model proposed in this study, denoted as “NonSmooth Nonnegative Matrix Factorization” (nsNMF), is defined as: where V, W, and H are the same as in the original NMF model. The positive symmetric matrix 𝐒∈ 𝑹 𝒒×𝒒 is a “smoothing” matrix defined as: where I is the identity matrix, 1 is a vector of ones, and the parameter satisfies 𝟎<𝜽<𝟏 . (V ≈WH)

OUR PROPOSAL: NONSMOOTH NONNEGATIVE MATRIX FACTORIZATION (nsNMF) The interpretation of S as a smoothing matrix can be explained as follows: Let X be a positive, nonzero, vector. Consider the transformed vector Y = SX. If 𝜽 = 0, then Y = X and no smoothing on X has occurred. However, as 𝜽→ 1, the vector Y tends to the constant vector with all elements almost equal to the average of the elements of X. This is the smoothest possible vector in the sense of “nonsparseness” because all entries are equal to the same nonzero value, instead of having some values close to zero and others clearly nonzero. Note that the parameter 𝜃 controls the extent of smoothness of the matrix operator S.

OUR PROPOSAL: NONSMOOTH NONNEGATIVE MATRIX FACTORIZATION (nsNMF) However, due to the multiplicative nature of the model (28), strong smoothing in S will force strong sparseness in both the basis and the encoding vectors. Therefore, the parameter 𝜃 controls the sparseness of the model. 𝑉= 𝑊𝑆 𝐻=𝑊(𝑆𝐻) Nonsparseness in the basis W will force sparseness in the encoding H. At the same time, nonsparseness in the encoding H will force sparseness in the basis W. 1. In the update equation for H (16), substitute (W) with (WS). 2. In the update equation for W (17), substitute (H) with (SH). 3. Equation (18) remains the same.

EXPERIMENTS As mentioned in the previous section, the multiplicative nature of the sparse variants of the NMF model will produce a paradoxical effect: Imposing sparseness in one of the factors will almost certainly force smoothness in the other in an attempt to reproduce the data as best as possible. Additionally, forcing sparseness constraints on both the basis and the encoding vectors will decrease the explained variance of the data by the model. Table 1 shows the results when using exactly three factors in all cases. Different NMF-type methods were applied to the same randomly generated positive data set (5 variables, 20 items, rank = 3).

EXPERIMENTS

EXPERIMENTS basis “swimmer” data set NMF failed in extracting the 16 limbs and the torso, while nsNMF successfully explained the data using one factor for each independent part. NMF extract parts of the data in a more holistic manner, while nsNMF sparsely represents the same reality.

EXPERIMENTS CBCL faces data set 49 basis Not sparse

EXPERIMENTS NMF Results Fig. 3(a) shows the results using the Lee and Seung algorithm applied to the facial database using 49 factors. Even if the factors’ images give an intuitive notion of a parts-based representation of the original faces, the factorization is not really sparse enough to represent unique parts of an average face. In other words, the NMF algorithm allows some undesirable overlapping of parts, especially in those areas that are common to most of the faces in the input data.

EXPERIMENTS nsNMF Results 0.5 83.84 0.6 80.69 0.7 78.17 0.8 76.44

EXPERIMENTS 0.5 0.6 0.7 V = (WS)H 𝜃↑,𝑊越smooth H越sparse 0.8

CONCLUSIONS AND DISCUSSION The approach presented here is an attempt to improve the ability of the classical NMF algorithm in this process by producing truly sparse components of the data structure. The experimental results on both synthetic data and real data sets have shown that the nsNMF algorithm outperformed the existing sparse NMF variants in performing parts-based representation of the data while maintaining the goodness of fit.