Download presentation
Presentation is loading. Please wait.
Published byJason Cole Modified over 8 years ago
1
SCS CMU Speaker Hanghang Tong Colibri: Fast Mining of Large Static and Dynamic Graphs 2009-3-31Speaking Skill Requirement
2
SCS CMU Joint work with 2009-3-3Speaking Skill Requirement2 Spiros Papadimitriou Jimeng Sun Christos Faloutsos Philip S. Yu
3
SCS CMU Graphs are everywhere! Q: How to find patterns? e.g., communities, anomalies, etc. 3
4
SCS CMU Motivation Q: How to find patterns? –e.g., communities, anomalies, etc. A: Low-Rank Approximation (LRA) for Adjacency Matrix of the Graph. A L MR XX ~ ~ 4
5
SCS CMU LRA for Graph Mining: Communities John KDD Tom Bob Carl Van Roy RECOMB ISMB ICDM AuthorConf. LMR ~ ~ XX Adj. matrix: A Au. clusters Conf. Cluster Interaction 2009-3-3Speaking Skill Requirement5
6
SCS CMU LRA for Graph Mining: Anomalies John KDD Tom Bob Carl Van Roy RECOMB ISMB ICDM AuthorConf. LMR ~ ~ XX Adj. matrix: A Au. clusters Conf. Cluster Interaction Recon. error is high ‘Carl’ is abnormal 2009-3-3Speaking Skill Requirement6
7
SCS CMU Challenges Prob.1: Given a static graph A, + (C1) How to get (L, M, R) efficiently? - Both time and space + (C2) How to get (L, M, R) Intuitively? - Easy for interpretation Prob. 2: Given a dynamic graph A t (t=1,2,…), + (C3) How to get (L t, M t, R t ) dynamically? - Track patterns over time 2009-3-3Speaking Skill Requirement7
8
SCS CMU Roadmap Motivation Existing Methods –SVD –CUR/CX Proposed Methods: Colibri Experimental Results Conclusion 2009-3-3Speaking Skill Requirement8
9
SCS CMU Matrix & Vector MatrixB = 3 1 1 0 b 1 b 2 b 1, b 2 are vectors in 3-d space! 2009-3-3Speaking Skill Requirement9 SIGMOD ICML SIGMOD Philip Yu John SmithWilliam Cohen
10
SCS CMU Column Space Matrix Column Space of a Matrix B = 3 1 1 0 b 1 b 2 b 1, b 2 are vectors in 3-d space! 2009-3-3Speaking Skill Requirement10 SIGMOD ICML SIGMOD VLDB = SIGMOD – ICML = [2 0 0]’
11
SCS CMU Projection, Projection Matrix & Core Matrix v v ~ v ~ = B v BTBT BTBBTB + XXX Projection of v Projection matrix of B An arbitrary vector Core Matrix 2009-3-311 ICML SIGMOD KDD ~
12
SCS CMU Projection, Projection Matrix & Core Matrix v v ~ v ~ = B v BTBT BTBBTB + XXX Projection of v Projection matrix of B An arbitrary vector Core Matrix 2009-3-312 ICML SIGMOD KDD ~
13
SCS CMU Roadmap Motivation Existing Methods –SVD –CUR/CX Proposed Methods: Colibri Experimental Results Conclusion 2009-3-3Speaking Skill Requirement13
14
SCS CMU Singular-Value-Decomposition (SVD) …. a1a1 a2a2 a3a3 amam … A: n x m …. u1u1 ukuk … U : left singular vectors …. … v1v1 V : right singular vectors vkvk xx … … … …………… … … ~ ~ 14
15
SCS CMU SVD: Characteristic #1: Find the left matrix U, where #2: Project A into the column space of U Projection Matrix of Column Space of U 2009-3-3Speaking Skill Requirement15
16
SCS CMU SVD: advantages Optimal Low-Rank Approximation –In both L 2 and L F For any rank-k matrix A k || A – || 2, F <= || A – A k || 2,F 2009-3-3Speaking Skill Requirement16
17
SCS CMU SVD: drawbacks Efficiency –Time –Space (U, V) are dense Interpretation 1 st singular vector 2 nd singular vector 2009-3-3Speaking Skill Requirement17
18
SCS CMU SVD: drawbacks Dynamic: not easy 2009-3-3Speaking Skill Requirement18 1 st singular vector 2 nd singular vector 1 st singular vector 2 nd singular vector X Y ? t+1 t
19
SCS CMU Roadmap Motivation Existing Methods –SVD –CUR/CX Proposed Methods: Colibri Experimental Results Conclusion 2009-3-3Speaking Skill Requirement19
20
SCS CMU CUR (CX) decomposition [Drineas+ 2005] …. … A: n x m …. C R xx … … … … … … … … U ~ ~ Sample Columns from A to form C Project A onto the col. Space of C 2009-3-3Speaking Skill Requirement 20
21
SCS CMU CUR (CX): advantages Quality: Near-Optimal Efficiency (better than SVD) –Time (c is # of sampled col.s) –Space (C, R) are sparse Interpretation 2009-3-321
22
SCS CMU Redundancy in C, wasting both time and space CUR (CX): drawbacks 3 copies of green, 2 copies of red, 2 copies of purple purple=0.5*green + red… 2009-3-3Speaking Skill Requirement22
23
SCS CMU Redundant Col. Does Not Help 2009-3-3Speaking Skill Requirement23 KDD ICML SIGMOD ~ KDD VLDB KDD ICML SIGMOD ~ KDD Observations: VLDB #1: Does not help KDD #2: wastes Time & Space ~
24
SCS CMU Dynamic: not easy CUR (CX): drawbacks 2009-3-3Speaking Skill Requirement24 tt+1 ? ~ ~~ C ~ C
25
SCS CMU Roadmap Motivation Existing Methods Proposed Method: Colibri –Colibri-S for static graphs (Prob. 1) –Colibri-D for dynamic graphs (Prob. 2) Experimental Results Conclusion 2009-3-3Speaking Skill Requirement25
26
SCS CMU 3 copies of green, 2 copies of red, 2 copies of purple purple=0.5*green + red… Colibri-S: Basic Idea L ….…. ….…. …. RM x x CUR (CX) Colibri-S Original Matrix We want the Col.s in L to be linearly independent! 2009-3-326Speaking Skill Requirement
27
SCS CMU A Pictorial Comparison 27 1 st singular vector 2 nd singular vector SVD CUR [Drineas+ 2005] Colibri-S [Tong+ 2008] # of copies X: SVM Y: Optimization Dark dot: selected
28
SCS CMU M= = Core Matrix Initially Sampled matrix C …. L = : Linearly Ind. Col.s ….…. ….…. ….…. ….…. R = L T x A = …. InputOutput ? LTLT L Q: How to find L & M from C efficiently ? 28
29
SCS CMU discard v A: Find L & M iteratively! …. Current L & M Redundant ? … For each col. v in C Project it on L Initially Sampled Matrix C Expand L & M 2009-3-3Speaking Skill Requirement29 Easy!
30
SCS CMU Update Core Matrix 2009-3-3 SIGMOD M old ICML SIGMOD ICML = X SIGMOD M new ICML SIGMOD ICML = X KDD KDD ICML SIGMOD ~ KDD ?
31
SCS CMU Update Core Matrix 2009-3-3 __ M new M old KDD ~ ~ X + __ 1 X 2 KDD ~ - 1 X __ - 1 KDD ~ X 1 = Theorem [Tong et al 2008] We only need to know KDD and ! ~
32
SCS CMU Colibri-S vs. CUR(CX) Quality: Colibri-S = CUR(CX) Time: Colibri-S bettor or equal CUR(CX) Space Colibri-S bettor or equal CUR(CX) Iterpretations Colibri-S = CUR(CX) 2009-3-3Speaking Skill Requirement32
33
SCS CMU A Pictorial Comparison 2009-3-3Speaking Skill Requirement33 X: SVM Y: Optimization Each dot is a document
34
SCS CMU A Pictorial Comparison: SVD 2009-3-3Speaking Skill Requirement34 X: SVM Y: Optimization 1 st singular vector2 nd singular vector Each dot is a document
35
SCS CMU A Pictorial Comparison: CUR [Drineas+ 2005] 2009-3-3Speaking Skill Requirement35 Each dot is a document X: SVM Y: Optimization 2 2 2 4 3 1
36
SCS CMU A Pictorial Comparison: Colibri-S [Tong+ 2008] 2009-3-3Speaking Skill Requirement36 Each dot is a document X: SVM Y: Optimization
37
SCS CMU A Pictorial Comparison 37 1 st singular vector 2 nd singular vector SVD CUR [Drineas+ 2005] CMD [Sun+ 2007] Colibri-S [Tong+ 2008]
38
SCS CMU Roadmap Motivation Existing Methods Proposed Method: Colibri –Colibri-S for static graphs (Prob. 1) –Colibri-D for dynamic graphs (Prob. 2) Experimental Results Conclusion 2009-3-3Speaking Skill Requirement38
39
SCS CMU Problem Definitions Given (e.g., Author-Conference Graphs) Find 2009-3-3Speaking Skill Requirement39 A1A1 A2A2 A3A3 L1L1 M1M1 R1R1 L2L2 M2M2 R2R2 L3L3 M3M3 R3R3 … …
40
SCS CMU Colibri-D for dynamic graphs Initially sampled matrix t+1 LtLt MtMt RtRt L t+1 M t+1 R t+1 ? Q: How to update L and M efficiently? t 40
41
SCS CMU Colibri-D: How-To Initially sampled matrix t+1 LtLt MtMt RtRt L t+1 M t+1 R t+1 t Selected Redundant ? Changed from t 2009-3-3 41
42
SCS CMU Colibri-D: How-To Initially sampled matrix t+1 LtLt MtMt L t+1 M t+1 t Selected Redundant L ~ M ~ Subspace by blue cols at t+1 Unchanged Cols! 2009-3-3Speaking Skill Requirement42 Step 1 Step 2
43
SCS CMU t LtLt t+1 ~ LtLt v Get Core Matrix for Un-changed Col.s X MtMt 2009-3-3Speaking Skill Requirement43 = [(L t )’ x L t ] -1 = = [(L t )’ x L t ] -1 = ~~ X ? MtMt ~
44
SCS CMU Get Core Matrix for Un-changed Col.s Let 2009-3-344 _ MtMt ~ X X = Theorem [Tong et al 2008] We only need a matrix inverse the same size as changed columns in L t ! Speaking Skill Requirement
45
SCS CMU Comparison SVD, CUR vs. Colibri s Wish List SVD [ Golub+ 1989] CUR/CX [Drineas+ 2005] Colibri [Tong+ 2008] Efficiency Interpretation Dynamics ?
46
SCS CMU Roadmap Motivation Existing Methods Proposed Method: Colibri Experimental Results Conclusion 2009-3-3Speaking Skill Requirement46
47
SCS CMU Experimental Setup Datasets Network traffic 21,837 sources/destinations 1,222 consecutive hours (~ 2 months) 22,800 edges per hour Accuracy: Accuracy = Space Cost: 2009-3-3Speaking Skill Requirement47
48
SCS CMU Performance of Colibri-S TimeSpace Ours CUR CMD 48 SVD Ours Accuracy Same 91%+ Time 12x of CMD 28x of CUR Space ~1/3 of CMD ~10% of CUR
49
SCS CMU Performance of Colibri-D Time # of changed cols CMD Colibri-S Colibri-D achieves up to 112x speedups Colibri-D 49 Network traffic - 21,837 nodes - 1,220 hours - 22,800 edge/hr (Prior Best Method)
50
SCS CMU Conclusion Colibri-S –For static graphs –Remove redundancy –Up to 52x speedup; 2/3 space saving –No quality Loss Colibri-D –For dynamic graphs –Leverage “smoothness” –Up to 112x than best known methods 2009-3-3Speaking Skill Requirement50
51
SCS CMU Q&A Thank you! www.cs.cmu.edu/~htong 2009-3-3Speaking Skill Requirement51
52
SCS CMU How many columns do we need? # of cols/rows: polynomial on k, log(1/epsilon), and 1/delta w/ 1-epsilon, || A – CUR || <= || A – A k || + delta || A || 2009-3-3Speaking Skill Requirement52
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.