Download presentation
1
Less is More: Compact Matrix Decomposition for Large Sparse Graphs
Jimeng Sun, Yinglian Xie, Hui Zhang, Christos Faloutsos Speaker: Jimeng Sun
2
Motivation Sparse matrices are everywhere Network Forensics
Social network analysis Web graph analysis Text mining # of nonzeros in Amxn= O(m+n) Why do we want a concise and intuitive representation?
3
Compression, Anomaly detection
Motivation Sparse matrices are everywhere Network Forensics Social network analysis Web graph analysis Text mining How to summarize sparse matrices in a concise and intuitive manner? Why do we want a concise and intuitive representation? Compression, Anomaly detection
4
Problem: Network forensics
Input: Network flows <src, dst, # of packets> over time. < , , 128> < , , 128> < , , 128> … Output: Useful patterns Summarize the traffic flows Identify abnormal traffic patterns time
5
Challenges High volume Sparsity source destination destination source
A large ISP with 100 POPs, each POP 10Gbps link capacity [Hotnets2004] has 450 GB/hour with compression Sparsity Distribution is skewed source destination destination source
6
Outline Motivation Problem definition Proposed mining framework
Sparsification Matrix decomposition Error Measure Experiments Related work Conclusion
7
Network forensics Sparsification load shedding
Matrix decomposition summarization Error Measure anomaly detection
8
Random sampling w/ prob p Rescale each entry by 1/p
Sparsification i-th hour i+1-th hour Sparsification dst Random sampling w/ prob p src Rescale each entry by 1/p
9
Sparsification (cont.)
Perform sampling and rescaling on the original data source
10
Network forensics Sparisfication load shedding
Matrix decomposition summarization Error Measure anomaly detection
11
Matrix decomposition Goal: Summarize traffic matrices
Why? Anomaly detection How? Singular Value Decomposition (SVD) - existing CUR Decomposition - existing Compact Matrix Decomposition (CMD) - new
12
Background: Singular Value Decomposition (SVD)
X = UVT X U x(1) x(2) x(M) u1 u2 uk VT v1 v2 vk 1 2 . . = k right singular vectors singular values input data left singular vectors
13
Background: SVD applications
Low-rank approximation Pseudo-inverse: M+= V-1UT Principle component analysis Latent semantic indexing Webpage ranking: Kleinberg’s HITS score
14
Pros and cons of SVD Optimal low-rank approximation
1st left singular vector Optimal low-rank approximation in L2 and Frobenius norm Interpretability problem: A singular vector specifies a linear combination of all input columns or rows. Lack of Sparsity Singular vectors are usually dense VT = U
15
Matrix decomposition Goal: Summarize traffic matrices
Why? Anomaly detection How? Singular Value Decomposition (SVD) - existing CUR Decomposition - existing Compact Matrix Decomposition (CMD) - new
16
Background: CUR decomposition
Goal: make ||A-CUR|| small. Drineas et al., Fast Monte Carlo Algorithms for Matrices III: Computing a Compressed Approximate Matrix Decomposition, SIAM Journal on Computing, 2006.
17
Background: CUR decomposition
Goal: make ||A-CUR|| small. Pseudo-inverse of the intersection of C and R Drineas et al., Fast Monte Carlo Algorithms for Matrices III: Computing a Compressed Approximate Matrix Decomposition, SIAM Journal on Computing, 2006.
18
CUR: provably good approximation to SVD
Assume Ak is the “best” rank k approximation to A (through SVD). Thm [Drineas et al.] CUR in O(mn) time achieves ||A-CUR|| <= ||A-Ak||+ ||A|| with probability at least 1-, by picking O( k log(1/) / 2 ) columns, and O( k2 log3(1/) / 6 ) rows Drineas et al., Fast Monte Carlo Algorithms for Matrices III: Computing a Compressed Approximate Matrix Decomposition, SIAM Journal on Computing, 2006.
19
Background: CUR applications
DNA SNP Data analysis Recommendation system Fast kernel approximation Intra- and interpopulation genotype reconstruction from tagging SNPs, P. Paschou, M. W. Mahoney, A. Javed, J. R. Kidd, A. J. Pakstis, S. Gu, K. K. Kidd, and P. Drineas, Genome Research, 17(1), (2007) Tensor-CUR Decompositions For Tensor-Based Data, M. W. Mahoney, M. Maggioni, and P. Drineas, Proc. 12-th Annual SIGKDD, (2006)
20
Pros and cons of CUR Easy interpretation Sparse basis
Since the basis vectors are actual columns and rows Sparse basis Duplicate columns and rows Columns of large norms will be sampled many times Actual column Singular vector
21
Matrix decomposition Goal: Summarize traffic matrices
Why? Anomaly detection How? Singular Value Decomposition (SVD) – existing CUR Decomposition - existing Compact Matrix Decomposition (CMD) - new
22
Compact Matrix Decomposition (CMD)
Given a matrix A, find three matrices C, U, R such that ||A-CUR|| is small No duplicates in C and R CUR CMD A Cd X Rd Rs Cs = Finding U is more involved! U = X+
23
Column sampling: subspace construction
Sample c columns with replacement Biased toward the columns of large norm, the probably pi =||A(i)||2/j ||A(j)||2 Rescale by c=6 A Cd
24
Column sampling: duplicate column removal
Remove duplicate columns Scale the columns by the square root of the number of duplicates Cd Cs
25
Column sampling: correctness proof
Thm: Matrix Cs and Cd have the same singular values and left singular vectors See our paper for the proof Implication: Column duplicate removal preserves the sample top-k subspace
26
CMD construction Low rank approximation
details CMD construction Low rank approximation Project to top-c column subspace C+ c £ m big, dense entire matrix C m £ c sparse
27
Row sampling Approximate matrix multiplication
details Row sampling Approximate matrix multiplication Sample and rescale the columns and rows Remove duplicate rows and scale the rows by the number of duplicates C C+ A ¼ C U R C+c£m Uc£r Rr£m An£m
28
CMD summary Given a matrix A, find three matrices C, U, R, such that ||A-CUR|| is small Biased sampling with replacement of columns/rows to construct Cd and Rd Remove duplicates with proper scaling Construct a small U A Rd Cd Cs Rs Construct a small U
29
Network forensics Sparsification load shedding
Matrix decomposition summarization Error Measure anomaly detection
30
Error Measure True error Approximated error
for some sample elements in a set S
31
Outline Motivation Problem definition Proposed mining framework
Sparsification Matrix decomposition Error Measure Experiments Related work Conclusion
32
Experiment datasets Network flow data DBLP bibliographic data
22k x 22k matrices Every matrix corresponds to 1 hour of data Elements are the log(packet count +1) 1200 hours, 500 GB raw trace DBLP bibliographic data Author-conference graphs from 1980 to 2004 428K authors, 3659 conferences Elements are the numbers of papers published by the authors
33
Experiment design CMD vs. SVD, CUR w.r.t. Evaluation of other modules
Space CPU time Accuracy = 1 – relative sum square error Evaluation of other modules Sparsification, Error measure Case-study on network anomaly detection
34
1.a Space efficiency Network DBLP
CMD uses up to 100x less space to achieve the same accuracy CUR limitation: duplicate columns and rows SVD limitation: singular vectors are dense CUR limitation: duplicate columns and rows SVD limitation: orthogonal projection densifies the data
35
1.b Computational efficiency
Network DBLP CMD is fastest among all three CMD and CUR requires SVD on only the sampled columns CUR is much worse than CMD due to duplicate columns SVD is slowest since it performs on the entire data
36
2.a Robustness of Sparsification
Small accuracy penalty for all algorithms Difference is small
37
2.b Accuracy Estimation Matrix approximation for network flow data (22k-by-22k) Vary the number of sampled cols and rows from 200 to 2000
38
3. Case study: network anomaly detection
Identify the onset of worm-like hierarchical scanning activities The tradition method based on volume monitoring cannot detect that
39
Outline Motivation Problem definition Proposed mining framework
Sparsification Matrix decomposition Error Measure Experiments Related work Conclusion
40
Deterministic approach Monte-Carlo Sampling approach
CUR decompositions Deterministic approach Stewart, Berry, Pulatova (Num. Math.’99, TOMS’05 ) C: variant of the QR algorithm, U: minimizes ||A-CUR||F R: variant of the QR algorithm, No a priori bounds Solid experimental performance Goreinov, Tyrtyshnikov, & Zamarashkin (LAA ’97, Cont. Math. ’01) C: columns that span max volume U: W+ R: rows that span max volume Existential result Error bounds depend on ||W+||2 Spectral norm bounds! Williams & Seeger (NIPS ’00) C: uniformly at random R: uniformly at random Experimental evaluation A is assumed PSD Connections to Nystrom method Drineas, Kannan, & Mahoney (SODA ’03, ’04) C: w.r.t. column lengths U: in linear/constant time R: w.r.t. row lengths Randomized algorithm Provable, a priori, bounds Explicit dependency on A –Ak Drineas, Mahoney, & Muthukrishnan (’05, ’06) C: depends on singular vectors of A. U: (almost) W+ R: depends on singular vectors of C (1+) approximation to A –Ak Computable in SVDk(A) time. Monte-Carlo Sampling approach CMD can help here! Acknowledge to Petros Drineas for this slide
41
Other related work Low-rank approximation Other sparse approximations
Frieze, Kannan, Vempala (1998) Achlioptas and McSherry (2001) Sarlós (2006) Zhang, Zha, Simon (2002) Other sparse approximations Sebro, Jaakkola (2004): max-margin matrix factorization Nonnegative matrix factorization L1 regularization
42
Conclusion How to summarize sparse matrices
in a concise and intuitive manner? Proposed method - CMD Provable accuracy guarantee 10x to 100x improvement Interpretability Applied to 500 Gb network forensics data Why do we want a concise and intuitive representation?
43
Thank you Contact: Jimeng Sun Acknowledgement to Petros Drineas and Michael Mahoney for the insightful discussion/help on CUR decomposition
44
SVD: A = U VT CMD: A = C U R The sparsity property sparse and small
Big but sparse Big and dense dense but small CMD: A = C U R Big but sparse Big but sparse
45
Column sampling: subspace construction
Biased sampling with replacement of the “large” columns
46
Column sampling: duplicate column removal
Remove duplicate columns and scale the column by the square root of the number of duplicates
47
Summary on CMD CMD: A C U R Properties Application
C/R: sampled and scaled columns and rows without duplicates (sparse) U: a small matrix (dense) Properties Interpretability: interpret matrix by sampled rows and columns Efficiency: in computation and space Application Network forensics: Anomaly detection
48
Conclusion How to summarize sparse matrices
in a concise and intuitive manner? CMD: low rank approximation sampled and scaled columns and rows without duplicates (sparse) a small matrix (dense) Theory Provable accuracy guarantee 10x to 100x improvement Interpretability Applied to 500 Gb network forensics data CMD: Network Forensics Sparsification through sampling Low-rank approximation Error measure Application Why do we want a concise and intuitive representation?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.