Non-Negative Matrix Factorization

Non-Negative Matrix Factorization

A Quick Review of Linear Algebra
Every vector can be expressed as the linear combination of basis vectors Can think of images as big vectors (Raster scan image into vector) This means we can express an image as the linear combination of a set of basis images

Pixel vector

Problem Statement Given a set of images:
Create a set of basis images that can be linearly combined to create new images Find the set of weights to reproduce every input image from the basis images One set of weights for each input image

3 ways to do this discussed
Vector Quantization Principal Components Analysis Non-negative Matrix Factorization Each method optimizes a different aspect

Vector Quantization The reconstructed image is the basis image that is closest to the input image.

What’s wrong with VQ? Limited by the number of basis images
Not very useful for analysis

PCA Find a set of orthogonal basis images
The reconstructed image is a linear combination of the basis images

What don’t we like about PCA?
PCA involves adding up some basis images then subtracting others Basis images aren’t physically intuitive Subtracting doesn’t make sense in context of some applications How do you subtract a face? What does subtraction mean in the context of document classification?

Non-negative Matrix Factorization
Like PCA, except the coefficients in the linear combination cannot be negative

Nonnegative Matrix Factorization (NMF)
Data Matrix: n points in p-dimensional space: is an image, document, webpage, etc Factorization (low-rank approximation) Nonnegative Matrices

Some historical notes Earlier work by statistics people (cf. G. Golub)
P. Paatero (1994) Environmetrices Lee and Seung (1999, 2000) Parts of whole (no cancellation) A multiplicative update algorithm

Lee and Seung (1999): Parts-of-whole Perspective
= Matrix of images

“Parts of Whole” Picture
(Li, et al, 2001; Hoyer 2003) Straightforward NMF doesn’t get parts-of-whole Several People explicitly sparsify F to get parts-of-whole Donono & Stodden (2003) study conditions for parts-of-whole

NMF Basis Images Only allowing adding of basis images makes intuitive sense Has physical analogue in neurons Forcing the reconstruction coefficients to be positive leads to nice basis images To reconstruct images, all you can do is add in more basis images This leads to basis images that represent parts

PCA vs NMF PCA Designed for producing optimal (in some sense) basis images Just because it’s optimal doesn’t mean it’s good for your application NMF Designed for producing coefficients with a specific property Forcing coefficients to behave induces “nice” basis images No SI unit for “nice”

The cool idea By constraining the weights, we can control how the basis images wind up In this case, constraining the weights leads to “parts-based” basis images

How do we derive the update rules?
This is in the NIPS paper. error function is to match the NIPS paper) Use gradient descent to find a local minimum The gradient descent update rule is:

Deriving Update Rules Gradient Descent Rule: Set
The update rule becomes (4)

What’s significant about this?
This is a multiplicative update instead of an additive update. If the initial values of W and H are all non-negative, then the W and H can never become negative. This lets us produce a non-negative factorization (See NIPS Paper for full proof that this will converge)

KL-divergence

Update rules for KL Divergence

How do we know that this will converge?
If F is the objective function, let be G be an auxiliary function If G is an auxiliary function of F, then F is non-increasing under the update

Auxiliary Function

How do we know that this will converge?
Let the auxiliary function be Then the update is Which results in the update rule:

Auxiliary function for divergence

Main Contributions Idea that representations which allow negative weights do not make sense in some applications A simple system for producing basis images with non-negative weights Points out that this leads to basis images that are based on parts A larger point here is that by constraining the problem in new ways, we can induce nice properties

Meanwhile ……. Several studies empirically show the usefulness of NMF for pattern discovery/clustering Research shows NMF factors give holistic pictures of the data i.e., NMF is doing data clustering

NMF Gives Holistic Pictures I
F factors

NMF Gives Holistic Pictures II
F factors Original data

NMF is doing “Data Clustering”
NMF => K-means Clustering

NMF-Kmeans Theorem G -orthogonal NMF is equivalent to relaxed K-means clustering. Proof. (Ding, He, Simon, SDM 2005)

K-means clustering Computationally Efficient (order-mN)
Most widely used in practice Benchmark to evaluate other algorithms Given n points in m-dim: K-means objective Also called “isodata”, “vector quantization” Developed in 1960’s (Lloyd, MacQueen, Hartigan, etc)

Reformulate K-means Clustering
Cluster membership indicators: Solving K-mean => (Zha, Ding, Gu, He, Simon, NIPS 2001) (Ding & He, ICML 2004)

Reformulate K-means Clustering
Cluster membership indicators : C1 C2 C3

Orthogonality in NMF Ambiguous/outlier points
Strict orthogonal G: hard clustering Non-orthogonal G: soft clustering Ambiguous/outlier points

K-means Clustering Theorem
G -orthogonal NMF is equivalent to relaxed K-means clustering. Requires only G-orthogonality and nonnegativity => cluster centroids => cluster indicators (Ding, Li, Jordan, 2006)

NMF Generalizations SVD: Semi-NMF: (Ding, Li, Jordan, 2006)
Convex-NMF: Kernel-NMF: Tri-NMF: (Ding, Li, Peng, Park, KDD 2006)

Orthogonal Nonnegative Tri-Factorization
3-factor NMF with explicit orthogonality constraints 1. Solution is unique 2. Can’t reduce to NMF Simultaneous K-means clustering of rows and columns => Row cluster indicators => Column cluster indicators (Ding, Li, Peng, Park, KDD 2006)

Semi-NMF: For any mixed-sign input data (centered data)
Clustrering and Low-rank approximation Update F: Update G: (Ding, Li, Jordan, 2006)

Proof of the Semi-NMF Algorithm Correctness
Solve: min Constrained Optimization: solve Lagrangian function Update rule satisfies this fixed point condition

Proof of the Semi-NMF Algorithm Convergence
Use auxiliary function is Let

Other NMF Algorithms Alternating Least Squares
Projective Gradient Descent

Non-Negative Matrix Factorization

Similar presentations

Presentation on theme: "Non-Negative Matrix Factorization"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Non-Negative Matrix Factorization

Similar presentations

Presentation on theme: "Non-Negative Matrix Factorization"— Presentation transcript:

Similar presentations

About project

Feedback