Download presentation
Presentation is loading. Please wait.
Published byBeryl Young Modified over 9 years ago
1
Direct Robust Matrix Factorization Liang Xiong, Xi Chen, Jeff Schneider Presented by xxx School of Computer Science Carnegie Mellon University
2
Matrix Factorization Extremely useful… – Assumes the data matrix is of low-rank. – PCA/SVD, NMF, Collaborative Filtering… – Simple, effective, and scalable. For Anomaly Detection – Assumption: the normal data is of low-rank, and anomalies are poorly approximated by the factorization. DRMF: Liang Xiong, Xi Chen, Jeff Schneider2
3
Robustness Issue Usually not robust (sensitive to outliers) – Because of the L 2 (Frobenius) measure they use. For anomaly detection, of course we have outliers. DRMF: Liang Xiong, Xi Chen, Jeff Schneider3 Minimize the approximation error Low rank
4
Why outliers matter DRMF: Liang Xiong, Xi Chen, Jeff Schneider4 Input signals Output basis No outlier Moderate outlier Wild outlier Simulation – We use SVD to find the first basis of 10 sine signals. – To make it more fun, let us turn one point of one signal into a spike (the outlier). Cool Disturbed Totally lost
5
Direct Robust Matrix Factorization (DRMF) Throw outliers out of the factorization, and problem solved! Mathematically, this is DRMF: – : number of non-zeros in S. DRMF: Liang Xiong, Xi Chen, Jeff Schneider5 “Trash can” for outliers There should be only a small number of outliers.
6
DRMF Algorithm Input: Data X. Output: Low-rank L; Outliers S. Iterate (block coordinate descent): – Let C = X – S. Do rank-K SVD: L = SVD(C, K). – Let E = X – L. Do thresholding: t: the e-th largest elements in {|E ij |}. That’s it! Everyone could try at home. DRMF: Liang Xiong, Xi Chen, Jeff Schneider6
7
Related Work Nuclear norm minimization (NNM) – Effective methods with nice theoretical properties from compressive sensing. – NNM is the convex relaxation of DRMF: A parallel work GoDec by Zhou et al. found in ICML’11. DRMF: Liang Xiong, Xi Chen, Jeff Schneider7
8
Pros & Cons Pros: – No compromise/relaxation => High quality – Efficient – Easy to implement and use Cons: – Difficult theory Because of the rank and the L 0 norm… – Non-convex. Local minima exist. But can be greatly mitigated if initialized by its convex version, NNM. DRMF: Liang Xiong, Xi Chen, Jeff Schneider8
9
Highly Extensible Structured Outliers – Outlier rows instead of entries? Just use structured measurements. Sparse Input / Missing data – Useful for Recommendation, Matrix Completion. Non-Negativity like in NMF – Still readily solvable with the constraints. For large-scale problems. – Use approximate SVD solvers. DRMF: Liang Xiong, Xi Chen, Jeff Schneider9
10
Simulation Study Factorize noisy low-rank matrices to find entry outliers. – SVD: plain SVD. RPCA, SPCP: two representative NNM methods. DRMF: Liang Xiong, Xi Chen, Jeff Schneider10 Error of recovering normal entries Detection rate of outlier entries. Running time (log-scale)
11
Simulation Study Sensitivity to outliers – We examine the recovering errors when the outlier amplitude grows. – Noiseless case. All assumptions by RPCA hold. DRMF: Liang Xiong, Xi Chen, Jeff Schneider11
12
Find Stranger Digits USPS dataset is used. We mix a few ‘7’s into many ‘1’’s, and then ask DRMF to find out those ‘7’s. Unsupervised. – Treat each digit as a row in the matrix. – Rank the digits by reconstruction errors. – Use the structured extension of DRMF: row outliers. Resulting ranked list: DRMF: Liang Xiong, Xi Chen, Jeff Schneider12
13
Conclusion DRMF is a direct and intuitive solution to the robust factorization problem. Easy to implement and use. Highly extensible. Good empirical performance. DRMF: Liang Xiong, Xi Chen, Jeff Schneider13 Please direct questions to Liang Xiong (lxiong@cs.cmu.edu)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.