Download presentation
Presentation is loading. Please wait.
1
Non-Negative Matrix Factorization
2
A Quick Review of Linear Algebra
Every vector can be expressed as the linear combination of basis vectors Can think of images as big vectors (Raster scan image into vector) This means we can express an image as the linear combination of a set of basis images
3
Pixel vector
4
Problem Statement Given a set of images:
Create a set of basis images that can be linearly combined to create new images Find the set of weights to reproduce every input image from the basis images One set of weights for each input image
5
3 ways to do this discussed
Vector Quantization Principal Components Analysis Non-negative Matrix Factorization Each method optimizes a different aspect
6
Vector Quantization The reconstructed image is the basis image that is closest to the input image.
7
What’s wrong with VQ? Limited by the number of basis images
Not very useful for analysis
8
PCA Find a set of orthogonal basis images
The reconstructed image is a linear combination of the basis images
9
What don’t we like about PCA?
PCA involves adding up some basis images then subtracting others Basis images aren’t physically intuitive Subtracting doesn’t make sense in context of some applications How do you subtract a face? What does subtraction mean in the context of document classification?
10
Non-negative Matrix Factorization
Like PCA, except the coefficients in the linear combination cannot be negative
11
Nonnegative Matrix Factorization (NMF)
Data Matrix: n points in p-dimensional space: is an image, document, webpage, etc Factorization (low-rank approximation) Nonnegative Matrices
12
Some historical notes Earlier work by statistics people (cf. G. Golub)
P. Paatero (1994) Environmetrices Lee and Seung (1999, 2000) Parts of whole (no cancellation) A multiplicative update algorithm
13
Lee and Seung (1999): Parts-of-whole Perspective
= Matrix of images
14
“Parts of Whole” Picture
(Li, et al, 2001; Hoyer 2003) Straightforward NMF doesn’t get parts-of-whole Several People explicitly sparsify F to get parts-of-whole Donono & Stodden (2003) study conditions for parts-of-whole
15
NMF Basis Images Only allowing adding of basis images makes intuitive sense Has physical analogue in neurons Forcing the reconstruction coefficients to be positive leads to nice basis images To reconstruct images, all you can do is add in more basis images This leads to basis images that represent parts
17
PCA vs NMF PCA Designed for producing optimal (in some sense) basis images Just because it’s optimal doesn’t mean it’s good for your application NMF Designed for producing coefficients with a specific property Forcing coefficients to behave induces “nice” basis images No SI unit for “nice”
18
The cool idea By constraining the weights, we can control how the basis images wind up In this case, constraining the weights leads to “parts-based” basis images
19
How do we derive the update rules?
This is in the NIPS paper. error function is to match the NIPS paper) Use gradient descent to find a local minimum The gradient descent update rule is:
20
Deriving Update Rules Gradient Descent Rule: Set
The update rule becomes (4)
21
What’s significant about this?
This is a multiplicative update instead of an additive update. If the initial values of W and H are all non-negative, then the W and H can never become negative. This lets us produce a non-negative factorization (See NIPS Paper for full proof that this will converge)
22
KL-divergence
23
Update rules for KL Divergence
24
How do we know that this will converge?
If F is the objective function, let be G be an auxiliary function If G is an auxiliary function of F, then F is non-increasing under the update
26
Auxiliary Function
28
How do we know that this will converge?
Let the auxiliary function be Then the update is Which results in the update rule:
36
Auxiliary function for divergence
40
Main Contributions Idea that representations which allow negative weights do not make sense in some applications A simple system for producing basis images with non-negative weights Points out that this leads to basis images that are based on parts A larger point here is that by constraining the problem in new ways, we can induce nice properties
42
Meanwhile ……. Several studies empirically show the usefulness of NMF for pattern discovery/clustering Research shows NMF factors give holistic pictures of the data i.e., NMF is doing data clustering
43
NMF Gives Holistic Pictures I
F factors
44
NMF Gives Holistic Pictures II
F factors Original data
45
NMF is doing “Data Clustering”
NMF => K-means Clustering
46
NMF-Kmeans Theorem G -orthogonal NMF is equivalent to relaxed K-means clustering. Proof. (Ding, He, Simon, SDM 2005)
47
K-means clustering Computationally Efficient (order-mN)
Most widely used in practice Benchmark to evaluate other algorithms Given n points in m-dim: K-means objective Also called “isodata”, “vector quantization” Developed in 1960’s (Lloyd, MacQueen, Hartigan, etc)
48
Reformulate K-means Clustering
Cluster membership indicators: Solving K-mean => (Zha, Ding, Gu, He, Simon, NIPS 2001) (Ding & He, ICML 2004)
49
Reformulate K-means Clustering
Cluster membership indicators : C1 C2 C3
50
Orthogonality in NMF Ambiguous/outlier points
Strict orthogonal G: hard clustering Non-orthogonal G: soft clustering Ambiguous/outlier points
51
K-means Clustering Theorem
G -orthogonal NMF is equivalent to relaxed K-means clustering. Requires only G-orthogonality and nonnegativity => cluster centroids => cluster indicators (Ding, Li, Jordan, 2006)
52
NMF Generalizations SVD: Semi-NMF: (Ding, Li, Jordan, 2006)
Convex-NMF: Kernel-NMF: Tri-NMF: (Ding, Li, Peng, Park, KDD 2006)
53
Orthogonal Nonnegative Tri-Factorization
3-factor NMF with explicit orthogonality constraints 1. Solution is unique 2. Can’t reduce to NMF Simultaneous K-means clustering of rows and columns => Row cluster indicators => Column cluster indicators (Ding, Li, Peng, Park, KDD 2006)
54
Semi-NMF: For any mixed-sign input data (centered data)
Clustrering and Low-rank approximation Update F: Update G: (Ding, Li, Jordan, 2006)
55
Proof of the Semi-NMF Algorithm Correctness
Solve: min Constrained Optimization: solve Lagrangian function Update rule satisfies this fixed point condition
56
Proof of the Semi-NMF Algorithm Convergence
Use auxiliary function is Let
57
Other NMF Algorithms Alternating Least Squares
Projective Gradient Descent
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.