Download presentation
Presentation is loading. Please wait.
Published byCleopatra Summers Modified over 9 years ago
1
SCS CMU Joint Work by Hanghang Tong, Spiros Papadimitriou, Jimeng Sun, Philip S. Yu, Christos Faloutsos Speaker: Hanghang Tong Aug. 24-27, 2008, Las Vegas KDD 2008 Colibri: Fast Mining of Large Static and Dynamic Graphs
2
SCS CMU 2 Graphs are everywhere! Q: How to find patterns? e.g., community, anomaly, etc.
3
SCS CMU Motivation Q: How to find patterns? –e.g., community, anomaly, etc. A: Low-Rank Approximation (LRA) for Adjacency Matrix of the Graph. 3 A L MR XX ~ ~
4
SCS CMU LRA for Graph Mining: Example 4 John KDD Tom Bob Carl Van Roy RECOMB ISMB ICDM AuthorConf. LMR ~ ~ XX Adj. matrix: A Au. clusters Conf. Cluster Interaction Recon. error is high ‘Carl’ is abnormal
5
SCS CMU Challenges How to get (L, M, R) + Efficiently (both time and space); + Intuitively (easy for interpretation); + Dynamically (track patterns over time)? 5
6
SCS CMU 6 Roadmap Motivation Existing Methods –SVD –CUR/CX Proposed Methods: Colibri Experimental Results Conclusion
7
SCS CMU Matrix & Column Space Matrix Column Space of a Matrix B = 7 3 1 1 0 b 1 b 2 b 1, b 2 are vectors in 3-d space! b2b2 b1b1
8
SCS CMU Projection, Projection Matrix & Core Matrix 8 v v ~ v ~ = B v BTBT BTBBTB + XXX Projection of v Projection matrix of B An arbitrary vector Core Matrix
9
SCS CMU Singular-Value-Decomposition (SVD) 9 …. a1a1 a2a2 a3a3 amam … A: n x m …. u1u1 ukuk … U : left singular vectors …. … v1v1 V : right singular vectors vkvk xx … … … …………… … … ~ ~
10
SCS CMU SVD: How to #1: Find the left matrix U, where #2: Project A into the column space of U 10 Projection Matrix of Column Space of U
11
SCS CMU SVD: drawbacks Efficiency –Time –Space (U, V) are dense Interpretation Dynamic: not easy 11 1 st singular vector 2 nd singular vector
12
SCS CMU CUR (CX) decomposition 12 …. … A: n x m …. C R xx … … … … … … … … U ~ ~ Sample Columns from A to form C Project A onto the col. Space of C
13
SCS CMU CUR (CX): advantages 13 Efficiency (better than SVD) –Time (c is # of sampled col.s) –Space (C, R) are sparse Interpretation
14
SCS CMU Redundancy in C, wasting both time and space Dynamic: not easy CUR (CX): drawbacks 14 3 copies of green, 2 copies of red, 2 copies of purple purple=0.5*green + red…
15
SCS CMU 15 Roadmap Motivation Existing Methods Colibri –Colibri-S for static graphs –Colibri-D for dynamic graphs Experimental Results Conclusion
16
SCS CMU 16 3 copies of green, 2 copies of red, 2 copies of purple purple=0.5*green + red… Colibri-S: Basic Idea L ….…. ….…. …. RM x x CUR (CX) Colibri-S Original Matrix We want the Col.s in L are linearly independent with each other!
17
SCS CMU M= = Core Matrix 17 Initially Sampled matrix C …. L = : Linearly Ind. Col.s ….…. ….…. ….…. ….…. R = L T x A = …. InputOutput ? LTLT L Q: How to find L & M from C efficiently ?
18
SCS CMU discard v 18 A: Find L & M iteratively! …. Current L & M Redundant ? … For each col. v in C Project it on L Initial Sampled Matrix c Expand L & M
19
SCS CMU 19 Colibri-S vs. CUR(CX) Quality: Colibri-S = CUR(CX) Time: Colibri-S >= CUR(CX) Space Colibri-S >= CUR(CX) Illustrations Colibri-S CUR (CX)
20
SCS CMU Colirbri-D for dynamic graphs 20 Initially sampled matrix t+1 LtLt MtMt RtRt L t+1 M t+1 R t+1 ? Q: How to update L and M efficiently? t
21
SCS CMU Colibri-D: How-To 21 Initially sampled matrix t+1 LtLt MtMt RtRt L t+1 M t+1 R t+1 t Selected Redundant ? Changed from t
22
SCS CMU Colibri-D: How-To 22 Initially sampled matrix t+1 LtLt MtMt L t+1 M t+1 t Selected Redundant L ~ M ~ Subspace by blue cols at t+1 Unchanged Cols!
23
SCS CMU 23 Roadmap Motivation Existing Methods Colibri Experimental Results Conclusion
24
SCS CMU 24 Experimental Setup Datasets Network traffic 21,837 sources/destinations 1,222 consecutive hours 22,800 edges per hour Accuracy: Accu = Space Cost:
25
SCS CMU 25 Performance of Colibri-S TimeSpace Ours CUR CMD Ours CMD Accuracy Same 91%+ Time 12x of CMD 28x of CUR Space ~1/3 of CMD ~10% of CUR
26
SCS CMU 26 Approximation Accuracy CUR CMD Colibri-S More Evaluation on Colibri-S Log Time (Sec)
27
SCS CMU 27 Performance of Colibri-D Time # of changed cols CMD Colibri-S Colibri-D achieves up to 112x speedups Colibri-D
28
SCS CMU A Family of Low-Rank Approximation for Fast Graph Mining Colibri-S –For static graphs –Remove redundancy –Significant saving in time & space by “free” Colibri-D –For dynamic graphs –Explores “smoothness” –Up to 112x than best known methods 28
29
SCS CMU 29 Poster tonight! Thank you! www.cs.cmu.edu/~htong
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.