Large Graph Mining: Power Tools and a Practitioner’s guide

Slides:



Advertisements
Similar presentations
Beyond Streams and Graphs: Dynamic Tensor Analysis
Advertisements

Multilinear Algebra for Analyzing Data with Multiple Linkages
CMU SCS : Multimedia Databases and Data Mining Lecture #20: SVD - part III (more case studies) C. Faloutsos.
Multilinear Algebra for Analyzing Data with Multiple Linkages Tamara G. Kolda plus: Brett Bader, Danny Dunlavy, Philip Kegelmeyer Sandia National Labs.
CMU SCS : Multimedia Databases and Data Mining Lecture #21: Tensor decompositions C. Faloutsos.
N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.
SCS CMU Joint Work by Hanghang Tong, Spiros Papadimitriou, Jimeng Sun, Philip S. Yu, Christos Faloutsos Speaker: Hanghang Tong Aug , 2008, Las Vegas.
10-603/15-826A: Multimedia Databases and Data Mining SVD - part II (more case studies) C. Faloutsos.
Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun.
Multimedia Databases SVD II. Optimality of SVD Def: The Frobenius norm of a n x m matrix M is (reminder) The rank of a matrix M is the number of independent.
Multimedia Databases SVD II. SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case studies SVD properties More case.
SCS CMU Proximity Tracking on Time- Evolving Bipartite Graphs Speaker: Hanghang Tong Joint Work with Spiros Papadimitriou, Philip S. Yu, Christos Faloutsos.
10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos.
Multimedia Databases LSI and SVD. Text - Detailed outline text problem full text scanning inversion signature files clustering information filtering and.
Fast Random Walk with Restart and Its Applications
Singular Value Decomposition and Data Management
SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct , 2008, Napa, CA CIKM 2008.
The PageRank Citation Ranking: Bringing Order to the Web Presented by Aishwarya Rengamannan Instructor: Dr. Gautam Das.
CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU.
1 Information Retrieval through Various Approximate Matrix Decompositions Kathryn Linehan Advisor: Dr. Dianne O’Leary.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P0-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
Distributed Computing Rik Sarkar. Distributed Computing Old style: Use a computer for computation.
Introduction to tensor, tensor factorization and its applications
CMU SCS U Kang (CMU) 1KDD 2012 GigaTensor: Scaling Tensor Analysis Up By 100 Times – Algorithms and Discoveries U Kang Christos Faloutsos School of Computer.
CMU SCS Talk 3: Graph Mining Tools – Tensors, communities, parallelism Christos Faloutsos CMU.
CMU SCS Talk 3: Graph Mining Tools – Tensors, communities, parallelism Christos Faloutsos CMU.
Fast Random Walk with Restart and Its Applications Hanghang Tong, Christos Faloutsos and Jia-Yu (Tim) Pan ICDM 2006 Dec , HongKong.
CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 5: Graphs over time & tensors Faloutsos,
Guided Learning for Role Discovery (GLRD) Presented by Rui Liu Gilpin, Sean, Tina Eliassi-Rad, and Ian Davidson. "Guided learning for role discovery (glrd):
ParCube: Sparse Parallelizable Tensor Decompositions
CMU SCS : Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos.
Kijung Shin Jinhong Jung Lee Sael U Kang
Block-level Link Analysis Presented by Lan Nie 11/08/2005, Lehigh University.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P9-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
CMU SCS Panel: Social Networks Christos Faloutsos CMU.
CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P8-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 8: hadoop and Tera/Peta byte graphs.
SCS CMU Speaker Hanghang Tong Colibri: Fast Mining of Large Static and Dynamic Graphs Speaking Skill Requirement.
WEB STRUCTURE MINING SUBMITTED BY: BLESSY JOHN R7A ROLL NO:18.
Multidimensional Network Analysis Foundations of multidimensional Network Analysis, Berlingerio, Coscia, Giannotti, Monreale, Pedreschi. WWW Journal 2012.
1 / 24 Distributed Methods for High-dimensional and Large-scale Tensor Factorization Kijung Shin (Seoul National University) and U Kang (KAIST)
Visualization in Process Mining
A Peta-Scale Graph Mining System
DOULION: Counting Triangles in Massive Graphs with a Coin
Estimation Techniques for High Resolution and Multi-Dimensional Array Signal Processing EMS Group – Fh IIS and TU IL Electronic Measurements and Signal.
CS479/679 Pattern Recognition Dr. George Bebis
15-826: Multimedia Databases and Data Mining
Zhu Han University of Houston Thanks for Dr. Hung Nguyen’s Slides
Jure Leskovec and Christos Faloutsos Machine Learning Department
Graph and Tensor Mining for fun and profit
Mining Dynamics of Data Streams in Multi-Dimensional Space
LSI, SVD and Data Management
Large Graph Mining: Power Tools and a Practitioner’s guide
CS7280: Special Topics in Data Mining Information/Social Networks
Large Graph Mining: Power Tools and a Practitioner’s guide
CIS 5590: Large-Scale Matrix Decomposition Tensors and Applications
Part 2: Graph Mining Tools - SVD and ranking
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
Discovering Functional Communities in Social Media
Graph and Tensor Mining for fun and profit
Jimeng Sun · Charalampos (Babis) E
Graph and Tensor Mining for fun and profit
Graph and Tensor Mining for fun and profit
Graph and Tensor Mining for fun and profit
Jiawei Han Department of Computer Science
Asymmetric Transitivity Preserving Graph Embedding
15-826: Multimedia Databases and Data Mining
Large Graph Mining: Power Tools and a Practitioner’s guide
Junghoo “John” Cho UCLA
Presentation transcript:

Large Graph Mining: Power Tools and a Practitioner’s guide Task 5: Graphs over time & tensors Faloutsos, Miller, Tsourakakis CMU KDD '09 Faloutsos, Miller, Tsourakakis

Faloutsos, Miller, Tsourakakis Outline Introduction – Motivation Task 1: Node importance Task 2: Community detection Task 3: Recommendations Task 4: Connection sub-graphs Task 5: Mining graphs over time Task 6: Virus/influence propagation Task 7: Spectral graph theory Task 8: Tera/peta graph mining: hadoop Observations – patterns of real graphs Conclusions KDD '09 Faloutsos, Miller, Tsourakakis

Faloutsos, Miller, Tsourakakis Thanks to Tamara Kolda (Sandia) for the foils on tensor definitions, and on TOPHITS KDD '09 Faloutsos, Miller, Tsourakakis

Faloutsos, Miller, Tsourakakis Detailed outline Motivation Definitions: PARAFAC and Tucker Case study: web mining KDD '09 Faloutsos, Miller, Tsourakakis

Examples of Matrices: Authors and terms data mining classif. tree ... John Peter Mary Nick KDD '09 Faloutsos, Miller, Tsourakakis

Motivation: Why tensors? Q: what is a tensor? KDD '09 Faloutsos, Miller, Tsourakakis

Motivation: Why tensors? A: N-D generalization of matrix: KDD’09 data mining classif. tree ... John Peter Mary Nick KDD '09 Faloutsos, Miller, Tsourakakis

Motivation: Why tensors? A: N-D generalization of matrix: KDD’07 KDD’08 KDD’09 data mining classif. tree ... John Peter Mary Nick KDD '09 Faloutsos, Miller, Tsourakakis

Tensors are useful for 3 or more modes Terminology: ‘mode’ (or ‘aspect’): Mode#3 data mining classif. tree ... Mode#2 Mode (== aspect) #1 KDD '09 Faloutsos, Miller, Tsourakakis

Faloutsos, Miller, Tsourakakis Notice 3rd mode does not need to be time we can have more than 3 modes Dest. port 125 ... 80 IP source IP destination KDD '09 Faloutsos, Miller, Tsourakakis

Faloutsos, Miller, Tsourakakis Notice 3rd mode does not need to be time we can have more than 3 modes Eg, fFMRI: x,y,z, time, person-id, task-id From DENLAB, Temple U. (Prof. V. Megalooikonomou +) http://denlab.temple.edu/bidms/cgi-bin/browse.cgi KDD '09 Faloutsos, Miller, Tsourakakis

Motivating Applications Why tensors are useful? web mining (TOPHITS) environmental sensors Intrusion detection (src, dst, time, dest-port) Social networks (src, dst, time, type-of-contact) face recognition etc … KDD '09 Faloutsos, Miller, Tsourakakis

Faloutsos, Miller, Tsourakakis Detailed outline Motivation Definitions: PARAFAC and Tucker Case study: web mining KDD '09 Faloutsos, Miller, Tsourakakis

Faloutsos, Miller, Tsourakakis Tensor basics Multi-mode extensions of SVD – recall that: KDD '09 Faloutsos, Miller, Tsourakakis

Faloutsos, Miller, Tsourakakis Reminder: SVD Best rank-k approximation in L2 n n m A   VT m U KDD '09 Faloutsos, Miller, Tsourakakis

Faloutsos, Miller, Tsourakakis Reminder: SVD Best rank-k approximation in L2 n 1u1v1 2u2v2 A  + m KDD '09 Faloutsos, Miller, Tsourakakis

Goal: extension to >=3 modes I x R K x R A B J x R C R x R x R I x J x K +…+ = ¼ KDD '09 Faloutsos, Miller, Tsourakakis

Faloutsos, Miller, Tsourakakis Main points: 2 major types of tensor decompositions: PARAFAC and Tucker both can be solved with ``alternating least squares’’ (ALS) KDD '09 Faloutsos, Miller, Tsourakakis

Specially Structured Tensors Tucker Tensor Kruskal Tensor Our Notation Our Notation +…+ = u1 uR v1 w1 vR wR = U I x R V J x R W K x R R x R x R “core” K x T W I x J x K I x J x K I x R J x S U V = R x S x T KDD '09 Faloutsos, Miller, Tsourakakis

Tucker Decomposition - intuition I x J x K ¼ A I x R B J x S C K x T R x S x T author x keyword x conference A: author x author-group B: keyword x keyword-group C: conf. x conf-group G: how groups relate to each other KDD '09 Faloutsos, Miller, Tsourakakis

Intuition behind core tensor 2-d case: co-clustering [Dhillon et al. Information-Theoretic Co-clustering, KDD’03] KDD '09 Faloutsos, Miller, Tsourakakis

Faloutsos, Miller, Tsourakakis n eg, terms x documents m k l n l k m KDD '09 Faloutsos, Miller, Tsourakakis

Faloutsos, Miller, Tsourakakis med. doc cs doc med. terms cs terms term group x doc. group common terms doc x doc group term x term-group KDD '09 Faloutsos, Miller, Tsourakakis

Faloutsos, Miller, Tsourakakis Tensor tools - summary Two main tools PARAFAC Tucker Both find row-, column-, tube-groups but in PARAFAC the three groups are identical ( To solve: Alternating Least Squares ) KDD '09 Faloutsos, Miller, Tsourakakis

Faloutsos, Miller, Tsourakakis Detailed outline Motivation Definitions: PARAFAC and Tucker Case study: web mining KDD '09 Faloutsos, Miller, Tsourakakis

Faloutsos, Miller, Tsourakakis Web graph mining How to order the importance of web pages? Kleinberg’s algorithm HITS PageRank Tensor extension on HITS (TOPHITS) 8 mins KDD '09 Faloutsos, Miller, Tsourakakis

Kleinberg’s Hubs and Authorities (the HITS method) Sparse adjacency matrix and its SVD: authority scores for 2nd topic authority scores for 1st topic to from hub scores for 1st topic hub scores for 2nd topic KDD '09 Faloutsos, Miller, Tsourakakis Kleinberg, JACM, 1999

HITS Authorities on Sample Data We started our crawl from http://www-neos.mcs.anl.gov/neos, and crawled 4700 pages, resulting in 560 cross-linked hosts. authority scores for 2nd topic authority scores for 1st topic to from hub scores for 1st topic KDD '09 hub scores for 2nd topic Faloutsos, Miller, Tsourakakis

Three-Dimensional View of the Web Observe that this tensor is very sparse! KDD '09 Faloutsos, Miller, Tsourakakis Kolda, Bader, Kenny, ICDM05

Three-Dimensional View of the Web Observe that this tensor is very sparse! KDD '09 Faloutsos, Miller, Tsourakakis Kolda, Bader, Kenny, ICDM05

Three-Dimensional View of the Web Observe that this tensor is very sparse! KDD '09 Faloutsos, Miller, Tsourakakis Kolda, Bader, Kenny, ICDM05

Topical HITS (TOPHITS) Main Idea: Extend the idea behind the HITS model to incorporate term (i.e., topical) information. term scores for 1st topic term scores for 2nd topic term to from authority scores for 1st topic authority scores for 2nd topic hub scores for 2nd topic hub scores for 1st topic KDD '09 Faloutsos, Miller, Tsourakakis

Topical HITS (TOPHITS) Main Idea: Extend the idea behind the HITS model to incorporate term (i.e., topical) information. term scores for 1st topic term scores for 2nd topic term to from authority scores for 1st topic authority scores for 2nd topic hub scores for 2nd topic hub scores for 1st topic KDD '09 Faloutsos, Miller, Tsourakakis

TOPHITS Terms & Authorities on Sample Data TOPHITS uses 3D analysis to find the dominant groupings of web pages and terms. wk = # unique links using term k Tensor PARAFAC term scores for 1st topic term scores for 2nd topic term to from authority scores for 1st topic authority scores for 2nd topic hub scores for 2nd topic KDD '09 Faloutsos, Miller, Tsourakakis hub scores for 1st topic

Faloutsos, Miller, Tsourakakis Conclusions Real data are often in high dimensions with multiple aspects (modes) Tensors provide elegant theory and algorithms PARAFAC and Tucker: discover groups KDD '09 Faloutsos, Miller, Tsourakakis

Faloutsos, Miller, Tsourakakis References T. G. Kolda, B. W. Bader and J. P. Kenny. Higher-Order Web Link Analysis Using Multilinear Algebra. In: ICDM 2005, Pages 242-249, November 2005. Jimeng Sun, Spiros Papadimitriou, Philip Yu. Window-based Tensor Analysis on High-dimensional and Multi-aspect Streams, Proc. of the Int. Conf. on Data Mining (ICDM), Hong Kong, China, Dec 2006 KDD '09 Faloutsos, Miller, Tsourakakis

Faloutsos, Miller, Tsourakakis Resources See tutorial on tensors, KDD’07 (w/ Tamara Kolda and Jimeng Sun): www.cs.cmu.edu/~christos/TALKS/KDD-07-tutorial KDD '09 Faloutsos, Miller, Tsourakakis

Tensor tools - resources Toolbox: from Tamara Kolda: csmr.ca.sandia.gov/~tgkolda/TensorToolbox T. G. Kolda and B. W. Bader. Tensor Decompositions and Applications. SIAM Review, Volume 51, Number 3, September 2009 csmr.ca.sandia.gov/~tgkolda/pubs/bibtgkfiles/TensorReview-preprint.pdf T. Kolda and J. Sun: Scalable Tensor Decomposition for Multi-Aspect Data Mining (ICDM 2008) KDD '09 ICDE’09 Copyright: Faloutsos, Tong (2009) Faloutsos, Miller, Tsourakakis 2-38 2-38