SCS CMU Joint Work by Hanghang Tong, Spiros Papadimitriou, Jimeng Sun, Philip S. Yu, Christos Faloutsos Speaker: Hanghang Tong Aug. 24-27, 2008, Las Vegas.

Slides:



Advertisements
Similar presentations
An Interactive-Voting Based Map Matching Algorithm
Advertisements

Beyond Streams and Graphs: Dynamic Tensor Analysis
On the Vulnerability of Large Graphs
CMU SCS : Multimedia Databases and Data Mining Lecture #19: SVD - part II (case studies) C. Faloutsos.
CMU SCS I2.2 Large Scale Information Network Processing INARC 1 Overview Goal: scalable algorithms to find patterns and anomalies on graphs 1. Mining Large.
CMU SCS : Multimedia Databases and Data Mining Lecture #21: Tensor decompositions C. Faloutsos.
Fast Algorithms for Querying and Mining Large Graphs Hanghang Tong Machine Learning Department Carnegie Mellon University
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) Dimensionality Reductions or data projections Random projections.
© 2012 IBM Corporation IBM Research Gelling, and Melting, Large Graphs by Edge Manipulation Joint Work by Hanghang Tong (IBM) B. Aditya Prakash (Virginia.
15-826: Multimedia Databases and Data Mining
Efficient Distribution Mining and Classification Yasushi Sakurai (NTT Communication Science Labs), Rosalynn Chong (University of British Columbia), Lei.
Streaming Pattern Discovery in Multiple Time-Series Spiros Papadimitriou Jimeng Sun Christos Faloutsos Carnegie Mellon University VLDB 2005, Trondheim,
1 Social Influence Analysis in Large-scale Networks Jie Tang 1, Jimeng Sun 2, Chi Wang 1, and Zi Yang 1 1 Dept. of Computer Science and Technology Tsinghua.
Lecture 19 Singular Value Decomposition
Fast Algorithms for Querying and Mining Large Graphs Hanghang Tong Machine Learning Department Carnegie Mellon University
N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.
Fast Direction-Aware Proximity for Graph Mining KDD 2007, San Jose Hanghang Tong, Yehuda Koren, Christos Faloutsos.
DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December
Yuan Yao Joint work with Hanghang Tong, Feng Xu, and Jian Lu Predicting Long-Term Impact of CQA Posts: A Comprehensive Viewpoint 1 Aug 24-27, KDD 2014.
International Conference on Image Analysis and Recognition (ICIAR’09). Halifax, Canada, 6-8 July Video Compression and Retrieval of Moving Object.
© 2011 IBM Corporation IBM Research SIAM-DM 2011, Mesa AZ, USA, Non-Negative Residual Matrix Factorization w/ Application to Graph Anomaly Detection Hanghang.
Efficient Test Compaction for Combinational Circuits Based on Fault Detection Count- Directed Clustering Aiman El-Maleh and Saqib Khurshid King Fahd University.
SCS CMU Proximity Tracking on Time- Evolving Bipartite Graphs Speaker: Hanghang Tong Joint Work with Spiros Papadimitriou, Philip S. Yu, Christos Faloutsos.
Measure Proximity on Graphs with Side Information Joint Work by Hanghang Tong, Huiming Qu, Hani Jamjoom Speaker: Mary McGlohon 1 ICDM 2008, Pisa, Italy15-19.
Computing Sketches of Matrices Efficiently & (Privacy Preserving) Data Mining Petros Drineas Rensselaer Polytechnic Institute (joint.
10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos.
Kathryn Linehan Advisor: Dr. Dianne O’Leary
Less is More: Compact Matrix Decomposition for Large Sparse Graphs
Fast Random Walk with Restart and Its Applications
SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct , 2008, Napa, CA CIKM 2008.
Privacy Preservation for Data Streams Feifei Li, Boston University Joint work with: Jimeng Sun (CMU), Spiros Papadimitriou, George A. Mihaila and Ioana.
CMU SCS Large Graph Mining Christos Faloutsos CMU.
1 Information Retrieval through Various Approximate Matrix Decompositions Kathryn Linehan Advisor: Dr. Dianne O’Leary.
KDD 2007, San Jose Fast Direction-Aware Proximity for Graph Mining Speaker: Hanghang Tong Joint work w/ Yehuda Koren, Christos Faloutsos.
Fast Random Walk with Restart and Its Applications Hanghang Tong, Christos Faloutsos and Jia-Yu (Tim) Pan ICDM 2006 Dec , HongKong.
Mingyang Zhu, Huaijiang Sun, Zhigang Deng Quaternion Space Sparse Decomposition for Motion Compression and Retrieval SCA 2012.
CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 5: Graphs over time & tensors Faloutsos,
1 LinkClus: Efficient Clustering via Heterogeneous Semantic Links Xiaoxin Yin, Jiawei Han Univ. of Illinois at Urbana-Champaign Philip S. Yu IBM T.J. Watson.
Tools and Algorithms for Querying and Mining Large Graphs Hanghang Tong Machine Learning Department Carnegie Mellon University
Lei Li Computer Science Department Carnegie Mellon University Pre Proposal Time Series Learning completed work 11/27/2015.
1 Panther: Fast Top-K Similarity Search on Large Networks Jing Zhang 1, Jie Tang 1, Cong Ma 1, Hanghang Tong 2, Yu Jing 1, and Juanzi Li 1 1 Department.
Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad
Optimal Component Analysis Optimal Linear Representations of Images for Object Recognition X. Liu, A. Srivastava, and Kyle Gallivan, “Optimal linear representations.
KDD 2007, San Jose Fast Direction-Aware Proximity for Graph Mining Speaker: Hanghang Tong Joint work w/ Yehuda Koren, Christos Faloutsos.
ParCube: Sparse Parallelizable Tensor Decompositions
Are All Brains Wired Equally Danai Koutra Yu GongJoshua VogelsteinChristos Faloutsos Motivation Connectomics -- creation of brain connectivity maps. Analysing.
Incremental Pattern Discovery on Streams, Graphs and Tensors Jimeng Sun Ph.D.Thesis Proposal May 15, 2006.
Streaming Pattern Discovery in Multiple Time-Series Jimeng Sun Spiros Papadimitrou Christos Faloutsos PARALLEL DATA LABORATORY Carnegie Mellon University.
CMU SCS : Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos.
Kijung Shin Jinhong Jung Lee Sael U Kang
Speaker : Yu-Hui Chen Authors : Dinuka A. Soysa, Denis Guangyin Chen, Oscar C. Au, and Amine Bermak From : 2013 IEEE Symposium on Computational Intelligence.
Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University.
Facets: Fast Comprehensive Mining of Coevolving High-order Time Series Hanghang TongPing JiYongjie CaiWei FanQing He Joint Work by Presenter:Wei Fan.
Algebraic Techniques for Analysis of Large Discrete-Valued Datasets 
Arizona State University1 Fast Mining of a Network of Coevolving Time Series Wei FanHanghang TongPing JiYongjie Cai.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P9-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
SCS CMU Speaker Hanghang Tong Colibri: Fast Mining of Large Static and Dynamic Graphs Speaking Skill Requirement.
Arizona State University Fast Eigen-Functions Tracking on Dynamic Graphs Chen Chen and Hanghang Tong - 1 -
Large Graph Mining: Power Tools and a Practitioner’s guide
DOULION: Counting Triangles in Massive Graphs with a Coin
Non-linear Mining of Competing Local Activities
Jure Leskovec and Christos Faloutsos Machine Learning Department
Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad
Predicting Long-Term Impact of CQA Posts: A Comprehensive Viewpoint
Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad
Jimeng Sun · Charalampos (Babis) E
Approximating the Community Structure of the Long Tail
Asymmetric Transitivity Preserving Graph Embedding
Presentation transcript:

SCS CMU Joint Work by Hanghang Tong, Spiros Papadimitriou, Jimeng Sun, Philip S. Yu, Christos Faloutsos Speaker: Hanghang Tong Aug , 2008, Las Vegas KDD 2008 Colibri: Fast Mining of Large Static and Dynamic Graphs

SCS CMU 2 Graphs are everywhere! Q: How to find patterns? e.g., community, anomaly, etc.

SCS CMU Motivation Q: How to find patterns? –e.g., community, anomaly, etc. A: Low-Rank Approximation (LRA) for Adjacency Matrix of the Graph. 3 A L MR XX ~ ~

SCS CMU LRA for Graph Mining: Example 4 John KDD Tom Bob Carl Van Roy RECOMB ISMB ICDM AuthorConf. LMR ~ ~ XX Adj. matrix: A Au. clusters Conf. Cluster Interaction Recon. error is high  ‘Carl’ is abnormal

SCS CMU Challenges How to get (L, M, R) + Efficiently (both time and space); + Intuitively (easy for interpretation); + Dynamically (track patterns over time)? 5

SCS CMU 6 Roadmap Motivation Existing Methods –SVD –CUR/CX Proposed Methods: Colibri Experimental Results Conclusion

SCS CMU Matrix & Column Space Matrix Column Space of a Matrix B = b 1 b 2 b 1, b 2 are vectors in 3-d space! b2b2 b1b1

SCS CMU Projection, Projection Matrix & Core Matrix 8 v v ~ v ~ = B v BTBT BTBBTB + XXX Projection of v Projection matrix of B An arbitrary vector Core Matrix

SCS CMU Singular-Value-Decomposition (SVD) 9 …. a1a1 a2a2 a3a3 amam … A: n x m …. u1u1 ukuk … U : left singular vectors …. … v1v1 V : right singular vectors vkvk xx … … … …………… … … ~ ~

SCS CMU SVD: How to #1: Find the left matrix U, where #2: Project A into the column space of U 10 Projection Matrix of Column Space of U

SCS CMU SVD: drawbacks Efficiency –Time –Space (U, V) are dense Interpretation Dynamic: not easy 11 1 st singular vector 2 nd singular vector

SCS CMU CUR (CX) decomposition 12 …. … A: n x m …. C R xx … … … … … … … … U ~ ~ Sample Columns from A to form C Project A onto the col. Space of C

SCS CMU CUR (CX): advantages 13 Efficiency (better than SVD) –Time (c is # of sampled col.s) –Space (C, R) are sparse Interpretation

SCS CMU Redundancy in C, wasting both time and space Dynamic: not easy CUR (CX): drawbacks 14 3 copies of green, 2 copies of red, 2 copies of purple purple=0.5*green + red…

SCS CMU 15 Roadmap Motivation Existing Methods Colibri –Colibri-S for static graphs –Colibri-D for dynamic graphs Experimental Results Conclusion

SCS CMU 16 3 copies of green, 2 copies of red, 2 copies of purple purple=0.5*green + red… Colibri-S: Basic Idea L ….…. ….…. …. RM x x CUR (CX) Colibri-S Original Matrix We want the Col.s in L are linearly independent with each other!

SCS CMU M= = Core Matrix 17 Initially Sampled matrix C …. L = : Linearly Ind. Col.s ….…. ….…. ….…. ….…. R = L T x A = …. InputOutput ? LTLT L Q: How to find L & M from C efficiently ?

SCS CMU discard v 18 A: Find L & M iteratively! …. Current L & M Redundant ? … For each col. v in C Project it on L Initial Sampled Matrix c Expand L & M

SCS CMU 19 Colibri-S vs. CUR(CX) Quality: Colibri-S = CUR(CX) Time: Colibri-S >= CUR(CX) Space Colibri-S >= CUR(CX) Illustrations Colibri-S CUR (CX)

SCS CMU Colirbri-D for dynamic graphs 20 Initially sampled matrix t+1 LtLt MtMt RtRt L t+1 M t+1 R t+1 ? Q: How to update L and M efficiently? t

SCS CMU Colibri-D: How-To 21 Initially sampled matrix t+1 LtLt MtMt RtRt L t+1 M t+1 R t+1 t Selected Redundant ? Changed from t

SCS CMU Colibri-D: How-To 22 Initially sampled matrix t+1 LtLt MtMt L t+1 M t+1 t Selected Redundant L ~ M ~ Subspace by blue cols at t+1 Unchanged Cols!

SCS CMU 23 Roadmap Motivation Existing Methods Colibri Experimental Results Conclusion

SCS CMU 24 Experimental Setup Datasets Network traffic 21,837 sources/destinations 1,222 consecutive hours 22,800 edges per hour Accuracy: Accu = Space Cost:

SCS CMU 25 Performance of Colibri-S TimeSpace Ours CUR CMD Ours CMD Accuracy Same 91%+ Time 12x of CMD 28x of CUR Space ~1/3 of CMD ~10% of CUR

SCS CMU 26 Approximation Accuracy CUR CMD Colibri-S More Evaluation on Colibri-S Log Time (Sec)

SCS CMU 27 Performance of Colibri-D Time # of changed cols CMD Colibri-S Colibri-D achieves up to 112x speedups Colibri-D

SCS CMU A Family of Low-Rank Approximation for Fast Graph Mining Colibri-S –For static graphs –Remove redundancy –Significant saving in time & space by “free” Colibri-D –For dynamic graphs –Explores “smoothness” –Up to 112x than best known methods 28

SCS CMU 29 Poster tonight! Thank you!