Less is More: Compact Matrix Decomposition for Large Sparse Graphs

Name: Less is More: Compact Matrix Decomposition for Large Sparse Graphs
Uploaded: 2017-08-14T21:09:54+00:00
Duration: PTM20S23
Description: Less is More: Compact Matrix Decomposition for Large Sparse Graphs

Less is More: Compact Matrix Decomposition for Large Sparse Graphs
Jimeng Sun, Yinglian Xie, Hui Zhang, Christos Faloutsos Speaker: Jimeng Sun

Motivation Sparse matrices are everywhere Network Forensics
Social network analysis Web graph analysis Text mining # of nonzeros in Amxn= O(m+n) Why do we want a concise and intuitive representation?

Compression, Anomaly detection
Motivation Sparse matrices are everywhere Network Forensics Social network analysis Web graph analysis Text mining How to summarize sparse matrices in a concise and intuitive manner? Why do we want a concise and intuitive representation? Compression, Anomaly detection

Problem: Network forensics
Input: Network flows <src, dst, # of packets> over time. < , , 128> < , , 128> < , , 128> … Output: Useful patterns Summarize the traffic flows Identify abnormal traffic patterns time

Challenges High volume Sparsity source destination destination source
A large ISP with 100 POPs, each POP 10Gbps link capacity [Hotnets2004] has 450 GB/hour with compression Sparsity Distribution is skewed source destination destination source

Outline Motivation Problem definition Proposed mining framework
Sparsification Matrix decomposition Error Measure Experiments Related work Conclusion

Network forensics Sparsification  load shedding
Matrix decomposition  summarization Error Measure  anomaly detection

Random sampling w/ prob p Rescale each entry by 1/p
Sparsification i-th hour i+1-th hour Sparsification dst Random sampling w/ prob p src Rescale each entry by 1/p

Sparsification (cont.)
Perform sampling and rescaling on the original data source

Network forensics Sparisfication  load shedding

Matrix decomposition Goal: Summarize traffic matrices
Why? Anomaly detection How? Singular Value Decomposition (SVD) - existing CUR Decomposition - existing Compact Matrix Decomposition (CMD) - new

Background: Singular Value Decomposition (SVD)
X = UVT X U x(1) x(2) x(M) u1 u2 uk  VT v1 v2 vk 1 2 . . = k right singular vectors singular values input data left singular vectors

Background: SVD applications
Low-rank approximation Pseudo-inverse: M+= V-1UT Principle component analysis Latent semantic indexing Webpage ranking: Kleinberg’s HITS score

Pros and cons of SVD Optimal low-rank approximation
1st left singular vector Optimal low-rank approximation in L2 and Frobenius norm Interpretability problem: A singular vector specifies a linear combination of all input columns or rows. Lack of Sparsity Singular vectors are usually dense VT =  U

Why? Anomaly detection How? Singular Value Decomposition (SVD) - existing CUR Decomposition - existing Compact Matrix Decomposition (CMD) - new

Background: CUR decomposition
Goal: make ||A-CUR|| small. Drineas et al., Fast Monte Carlo Algorithms for Matrices III: Computing a Compressed Approximate Matrix Decomposition, SIAM Journal on Computing, 2006.

Background: CUR decomposition
Goal: make ||A-CUR|| small. Pseudo-inverse of the intersection of C and R Drineas et al., Fast Monte Carlo Algorithms for Matrices III: Computing a Compressed Approximate Matrix Decomposition, SIAM Journal on Computing, 2006.

CUR: provably good approximation to SVD
Assume Ak is the “best” rank k approximation to A (through SVD). Thm [Drineas et al.] CUR in O(mn) time achieves ||A-CUR|| <= ||A-Ak||+ ||A|| with probability at least 1-, by picking O( k log(1/) / 2 ) columns, and O( k2 log3(1/) / 6 ) rows Drineas et al., Fast Monte Carlo Algorithms for Matrices III: Computing a Compressed Approximate Matrix Decomposition, SIAM Journal on Computing, 2006.

Background: CUR applications
DNA SNP Data analysis Recommendation system Fast kernel approximation Intra- and interpopulation genotype reconstruction from tagging SNPs, P. Paschou, M. W. Mahoney, A. Javed, J. R. Kidd, A. J. Pakstis, S. Gu, K. K. Kidd, and P. Drineas, Genome Research, 17(1), (2007) Tensor-CUR Decompositions For Tensor-Based Data, M. W. Mahoney, M. Maggioni, and P. Drineas, Proc. 12-th Annual SIGKDD, (2006)

Pros and cons of CUR Easy interpretation Sparse basis
Since the basis vectors are actual columns and rows Sparse basis Duplicate columns and rows Columns of large norms will be sampled many times Actual column Singular vector

Why? Anomaly detection How? Singular Value Decomposition (SVD) – existing CUR Decomposition - existing Compact Matrix Decomposition (CMD) - new

Compact Matrix Decomposition (CMD)
Given a matrix A, find three matrices C, U, R such that ||A-CUR|| is small No duplicates in C and R CUR CMD A Cd X Rd Rs Cs = Finding U is more involved! U = X+

Column sampling: subspace construction
Sample c columns with replacement Biased toward the columns of large norm, the probably pi =||A(i)||2/j ||A(j)||2 Rescale by c=6 A Cd

Column sampling: duplicate column removal
Remove duplicate columns Scale the columns by the square root of the number of duplicates Cd Cs

Column sampling: correctness proof
Thm: Matrix Cs and Cd have the same singular values and left singular vectors See our paper for the proof Implication: Column duplicate removal preserves the sample top-k subspace

CMD construction Low rank approximation
details CMD construction Low rank approximation Project to top-c column subspace C+ c £ m big, dense entire matrix C m £ c sparse

Row sampling Approximate matrix multiplication
details Row sampling Approximate matrix multiplication Sample and rescale the columns and rows Remove duplicate rows and scale the rows by the number of duplicates C C+ A ¼ C U R C+c£m Uc£r Rr£m An£m

CMD summary Given a matrix A, find three matrices C, U, R, such that ||A-CUR|| is small Biased sampling with replacement of columns/rows to construct Cd and Rd Remove duplicates with proper scaling Construct a small U A Rd Cd Cs Rs Construct a small U

Network forensics Sparsification  load shedding

Error Measure True error Approximated error
for some sample elements in a set S

Experiment datasets Network flow data DBLP bibliographic data
22k x 22k matrices Every matrix corresponds to 1 hour of data Elements are the log(packet count +1) 1200 hours, 500 GB raw trace DBLP bibliographic data Author-conference graphs from 1980 to 2004 428K authors, 3659 conferences Elements are the numbers of papers published by the authors

Experiment design CMD vs. SVD, CUR w.r.t. Evaluation of other modules
Space CPU time Accuracy = 1 – relative sum square error Evaluation of other modules Sparsification, Error measure Case-study on network anomaly detection

1.a Space efficiency Network DBLP
CMD uses up to 100x less space to achieve the same accuracy CUR limitation: duplicate columns and rows SVD limitation: singular vectors are dense CUR limitation: duplicate columns and rows SVD limitation: orthogonal projection densifies the data

1.b Computational efficiency
Network DBLP CMD is fastest among all three CMD and CUR requires SVD on only the sampled columns CUR is much worse than CMD due to duplicate columns SVD is slowest since it performs on the entire data

2.a Robustness of Sparsification
Small accuracy penalty for all algorithms Difference is small

2.b Accuracy Estimation Matrix approximation for network flow data (22k-by-22k) Vary the number of sampled cols and rows from 200 to 2000

3. Case study: network anomaly detection
Identify the onset of worm-like hierarchical scanning activities The tradition method based on volume monitoring cannot detect that

Deterministic approach Monte-Carlo Sampling approach
CUR decompositions Deterministic approach Stewart, Berry, Pulatova (Num. Math.’99, TOMS’05 ) C: variant of the QR algorithm, U: minimizes ||A-CUR||F R: variant of the QR algorithm, No a priori bounds Solid experimental performance Goreinov, Tyrtyshnikov, & Zamarashkin (LAA ’97, Cont. Math. ’01) C: columns that span max volume U: W+ R: rows that span max volume Existential result Error bounds depend on ||W+||2 Spectral norm bounds! Williams & Seeger (NIPS ’00) C: uniformly at random R: uniformly at random Experimental evaluation A is assumed PSD Connections to Nystrom method Drineas, Kannan, & Mahoney (SODA ’03, ’04) C: w.r.t. column lengths U: in linear/constant time R: w.r.t. row lengths Randomized algorithm Provable, a priori, bounds Explicit dependency on A –Ak Drineas, Mahoney, & Muthukrishnan (’05, ’06) C: depends on singular vectors of A. U: (almost) W+ R: depends on singular vectors of C (1+) approximation to A –Ak Computable in SVDk(A) time. Monte-Carlo Sampling approach CMD can help here! Acknowledge to Petros Drineas for this slide

Other related work Low-rank approximation Other sparse approximations
Frieze, Kannan, Vempala (1998) Achlioptas and McSherry (2001) Sarlós (2006) Zhang, Zha, Simon (2002) Other sparse approximations Sebro, Jaakkola (2004): max-margin matrix factorization Nonnegative matrix factorization L1 regularization

Conclusion How to summarize sparse matrices
in a concise and intuitive manner? Proposed method - CMD Provable accuracy guarantee 10x to 100x improvement Interpretability Applied to 500 Gb network forensics data Why do we want a concise and intuitive representation?

Thank you Contact: Jimeng Sun Acknowledgement to Petros Drineas and Michael Mahoney for the insightful discussion/help on CUR decomposition

SVD: A = U  VT CMD: A = C U R The sparsity property sparse and small
Big but sparse Big and dense dense but small CMD: A = C U R Big but sparse Big but sparse

Column sampling: subspace construction
Biased sampling with replacement of the “large” columns

Column sampling: duplicate column removal
Remove duplicate columns and scale the column by the square root of the number of duplicates

Summary on CMD CMD: A  C U R Properties Application
C/R: sampled and scaled columns and rows without duplicates (sparse) U: a small matrix (dense) Properties Interpretability: interpret matrix by sampled rows and columns Efficiency: in computation and space Application Network forensics: Anomaly detection

Conclusion How to summarize sparse matrices
in a concise and intuitive manner? CMD: low rank approximation sampled and scaled columns and rows without duplicates (sparse) a small matrix (dense) Theory Provable accuracy guarantee 10x to 100x improvement Interpretability Applied to 500 Gb network forensics data CMD: Network Forensics Sparsification through sampling Low-rank approximation Error measure Application Why do we want a concise and intuitive representation?

Less is More: Compact Matrix Decomposition for Large Sparse Graphs

Similar presentations

Presentation on theme: "Less is More: Compact Matrix Decomposition for Large Sparse Graphs"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Less is More: Compact Matrix Decomposition for Large Sparse Graphs

Similar presentations

Presentation on theme: "Less is More: Compact Matrix Decomposition for Large Sparse Graphs"— Presentation transcript:

Similar presentations

About project

Feedback