2007-8-13KDD 2007, San Jose Fast Direction-Aware Proximity for Graph Mining Speaker: Hanghang Tong Joint work w/ Yehuda Koren, Christos Faloutsos.

Slides:



Advertisements
Similar presentations
On the Vulnerability of Large Graphs
Advertisements

Fast Algorithms For Hierarchical Range Histogram Constructions
Exact Inference in Bayes Nets
Junction Trees And Belief Propagation. Junction Trees: Motivation What if we want to compute all marginals, not just one? Doing variable elimination for.
Discussion #33 Adjacency Matrices. Topics Adjacency matrix for a directed graph Reachability Algorithmic Complexity and Correctness –Big Oh –Proofs of.
Link Analysis: PageRank
More on Rankings. Query-independent LAR Have an a-priori ordering of the web pages Q: Set of pages that contain the keywords in the query q Present the.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
SASH Spatial Approximation Sample Hierarchy
N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.
Fast Direction-Aware Proximity for Graph Mining KDD 2007, San Jose Hanghang Tong, Yehuda Koren, Christos Faloutsos.
SCS CMU Joint Work by Hanghang Tong, Spiros Papadimitriou, Jimeng Sun, Philip S. Yu, Christos Faloutsos Speaker: Hanghang Tong Aug , 2008, Las Vegas.
Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun.
Chapter 9 Graph algorithms. Sample Graph Problems Path problems. Connectedness problems. Spanning tree problems.
Multimedia Databases SVD II. Optimality of SVD Def: The Frobenius norm of a n x m matrix M is (reminder) The rank of a matrix M is the number of independent.
NuCAD ACG - Adjacent Constraint Graph for General Floorplans Hai Zhou and Jia Wang ICCD 2004, San Jose October 11-13, 2004.
Multimedia Databases SVD II. SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case studies SVD properties More case.
SCS CMU Proximity Tracking on Time- Evolving Bipartite Graphs Speaker: Hanghang Tong Joint Work with Spiros Papadimitriou, Philip S. Yu, Christos Faloutsos.
Scaling Personalized Web Search Glen Jeh, Jennfier Widom Stanford University Presented by Li-Tal Mashiach Search Engine Technology course (236620) Technion.
Measure Proximity on Graphs with Side Information Joint Work by Hanghang Tong, Huiming Qu, Hani Jamjoom Speaker: Mary McGlohon 1 ICDM 2008, Pisa, Italy15-19.
Code and Decoder Design of LDPC Codes for Gbps Systems Jeremy Thorpe Presented to: Microsoft Research
Computing Sketches of Matrices Efficiently & (Privacy Preserving) Data Mining Petros Drineas Rensselaer Polytechnic Institute (joint.
Fast Random Walk with Restart and Its Applications
SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct , 2008, Napa, CA CIKM 2008.
Graphs, relations and matrices
Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405.
Block-LDPC: A Practical LDPC Coding System Design Approach
1 Applications of Relative Importance  Why is relative importance interesting? Web Social Networks Citation Graphs Biological Data  Graphs become too.
Random Walks and Semi-Supervised Learning Longin Jan Latecki Based on : Xiaojin Zhu. Semi-Supervised Learning with Graphs. PhD thesis. CMU-LTI ,
Liang Ge.  Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary.
DATA MINING LECTURE 13 Absorbing Random walks Coverage.
Clustering Spatial Data Using Random Walk David Harel and Yehuda Koren KDD 2001.
DATA MINING LECTURE 13 Pagerank, Absorbing Random Walks Coverage Problems.
Xiaowei Ying, Xintao Wu Univ. of North Carolina at Charlotte PAKDD-09 April 28, Bangkok, Thailand On Link Privacy in Randomizing Social Networks.
1 Closures of Relations: Transitive Closure and Partitions Sections 8.4 and 8.5.
SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong Guest Lecture.
Fast Random Walk with Restart and Its Applications Hanghang Tong, Christos Faloutsos and Jia-Yu (Tim) Pan ICDM 2006 Dec , HongKong.
1 Presented by: Yuchen Bian MRWC: Clustering based on Multiple Random Walks Chain.
The all-pairs shortest path problem (APSP) input: a directed graph G = (V, E) with edge weights goal: find a minimum weight (shortest) path between every.
1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.
Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad
The mathematical challenge of large networks László Lovász Eötvös Loránd University, Budapest Joint work with Christian Borgs, Jennifer Chayes, Balázs.
KDD 2007, San Jose Fast Direction-Aware Proximity for Graph Mining Speaker: Hanghang Tong Joint work w/ Yehuda Koren, Christos Faloutsos.
Kijung Shin Jinhong Jung Lee Sael U Kang
Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.
Speaker : Yu-Hui Chen Authors : Dinuka A. Soysa, Denis Guangyin Chen, Oscar C. Au, and Amine Bermak From : 2013 IEEE Symposium on Computational Intelligence.
Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University
Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University.
SCS CMU Speaker Hanghang Tong Colibri: Fast Mining of Large Static and Dynamic Graphs Speaking Skill Requirement.
Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University
Finding Dense and Connected Subgraphs in Dual Networks
Lecture 11 Graph Algorithms
Sofus A. Macskassy Fetch Technologies
Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad
DTMC Applications Ranking Web Pages & Slotted ALOHA
Community detection in graphs
Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad
Finding Heuristics Using Abstraction
Large Graph Mining: Power Tools and a Practitioner’s guide
Enumerating Distances Using Spanners of Bounded Degree
Lectures on Graph Algorithms: searching, testing and sorting
Graph-based Security and Privacy Analytics via Collective Classification with Joint Weight Learning and Propagation Binghui Wang, Jinyuan Jia, and Neil.
Richard Anderson Winter 2009 Lecture 6
Solving Linear Systems: Iterative Methods and Sparse Systems
Learning to Rank Typed Graph Walks: Local and Global Approaches
Topological Signatures For Fast Mobility Analysis
Lecture 10 Graph Algorithms
Chapter 9 Graph algorithms
Proximity in Graphs by Using Random Walks
Presentation transcript:

KDD 2007, San Jose Fast Direction-Aware Proximity for Graph Mining Speaker: Hanghang Tong Joint work w/ Yehuda Koren, Christos Faloutsos

2 Proximity on Graph Un-directed graph –What is Prox between A and B –‘how close is Smith to Johnson’? But, many real graphs are directed….

3 Edge Direction w/ Proximity What is Prox from A to B? What is Prox from B to A?

4 Motivating Questions (Fast DAP) Q1: How to define it? Q2: How to compute it efficiently? Q3: How to benefit real applications?

5 Roadmap DAP definitions –Escape Probability –Issue # 1: ‘degree-1 node’ effect –Issue # 2: weakly connected pair Computational Issues –FastAllDAP: ALL pairs –FastOneDAP: One pair Experimental Results Conclusion

6 Defining DAP: escape probability Define Random Walk (RW) on the graph Esc_Prob(A  B) –Prob (starting at A, reaches B before returning to A) Esc_Prob = Pr (smile before cry) A B the remaining graph

7 Esc_Prob: Example Esc_Prob(a->b)=1 > Esc_Prob(b->a)=0.5

8 Esc_Prob is good, but… Issue #1: –`Degree-1 node’ effect Issue #2: –Weakly connected pair Need some practical modifications!

9 Issue#1: `degree-1 node’ effect [Faloutsos+] [Koren+] no influence for degree-1 nodes (E, F)! –known as ‘pizza delivery guy’ problem in undirected graph Solutions: Universal Absorbing Boundary! Esc_Prob(a->b)=1

10 Universal Absorbing Boundary U-A-B is a black-hole! Footnote: fly-out probability = 0.1

11 Introducing Universal-Absorbing-Boundary Prox(a->b)=0.91 Prox(a->b)=0.74 Footnote: fly-out probability = 0.1 Esc_Prob(a->b)=1

12 Issue#2: Weakly connected pair Prox(A  B) = Prox (B  A)=0 Solution: Partial symmetry!

13 Practical Modifications: Partial Symmetry Prox(A  B) = Prox (B  A)=0 Prox(A  B) =0.081 > Prox (B  A)=0.009

14 Roadmap DAP definitions –Escape Probability –Issue # 1: ‘degree-1 node’ effect –Issue # 2: weakly connected pair Computational Issues –FastAllDAP: ALL pairs –FastOneDAP: One pair Experimental Results Conclusion

15 Solving Esc_Prob: [Doyle+] P: transition matrix (row norm.) n: # of nodes in the graph 1 x (n-2) (n-2) x (n-2) One matrix inversion, one Esc_Prob! i^th row  removing i^th & j^th elements P  removing i^th & j^th rows & cols i^th col  removing i^th & j^th elements

16 Esc_Prob(1->5) = P= I - + P: Transition matrix (row norm.)

17 Solving DAP (Straight-forward way) One matrix inversion, one proximity! 1 x (n-2) (n-2) x (n-2) 1-c: fly-out probability (to black-hole)

18 Case 1, Medium Size Graph –Matrix inversion is feasible, but… –What if we want many proximities? –Q: How to get all (n ) proximities efficiently? –A: FastAllDAP! Case 2: Large Size Graph –Matrix inversion is infeasible –Q: How to get one proximity efficiently? –A: FastOneDAP! Challenges 2

19 FastAllDAP Q1: How to efficiently compute all possible proximities on a medium size graph? –a.k.a. how to efficiently solve multiple linear systems simultaneously? Goal: reduce # of matrix inversions!

20 FastAllDAP: Observation Need two different matrix inversions! P=

21 FastAllDAP: Rescue Redundancy among different linear systems! P= Overlap between two gray parts! Prox(1  5) Prox(1  6)

22 FastAllDAP: Theorem Theorem: Proof: by SM Lemma Example:

23 FastAllDAP: Algorithm Alg. –Compute Q –For i,j =1,…, n, compute Computational Save O(1) instead of O(n )! Example –w/ 1000 nodes, –1m matrix inversion vs. 1 matrix! 2

24 FastOneDAP Q1: How to efficiently compute one single proximity on a large size graph? –a.k.a. how to solve one linear system efficiently? Goal: avoid matrix inversion!

25 FastOneDAP: Observation Partial Info. (4 elements /2 cols ) of Q is enough!

26 FastOneDAP: Observation Q: How to compute one column of Q? A: Taylor expansion Reminder: i col of Q th [0, …0, 1, 0, …, 0] T

27 FastOneDAP: Observation xxx Sparse matrix-vector multiplications! …. i col of Q th [0, …0, 1, 0, …, 0] T

28 FastOneDAP: Iterative Alg. Alg. to estimate i Col of Q th

29 FastOneDAP: Property Convergence Guaranteed ! Computational Save –Example: 100K nodes and 1M edges (50 Iterations) 10,000,000x fast! Footnote: 1 col is enough! –(details in paper)

30 Roadmap DAP definitions –Escape Probability –Issue # 1: ‘degree-1 node’ effect –Issue # 2: weakly connected pair Computational Issues –FastAllDAP: ALL pairs –FastOneDAP: One pair Experimental Results Conclusion

31 Datasets (all real) NameNode #Edge #Directionality WL4k10kA-links to-B PC36k64kWho-contact-whom EP76k509kWho-trust-whom CN28k353kA-cites-B AE38k115kWho- to-whom

32 We want to check… Effectiveness –Link Prediction Existence Direction Efficiency –FastAllDAP –FastOneDAP

33 Link Prediction: existence no link with link density Prox (i  j)+Prox (j  i) DAP is effective to distinguish red and blue!

34 Link Prediction: existence DatasetAccuracy DAPUDAP WL65.40% PC79.60%80.78% AE81.51%80.60% CN86.71%84.00% EP92.21%92.09%

35 Link Prediction: existence DatasetAccuracy WL65.40% PC79.60% AE81.51% CN86.71% EP92.21%

36 Link Prediction: direction Q: Given the existence of the link, what is the direction of the link? A: Compare prox(i  j) and prox(j  i) >70% Prox (i  j) - Prox (j  i) density

37 Efficiency: FastAllDAP Size of Graph Time (sec) Straight-Solver FastAllDAP 1,000x faster!

38 Efficiency: FastOneDAP Size of Graph Time (sec) FastOneDAP Straight-Solver 1,0000x faster!

39 Roadmap DAP definitions –Escape Probability –Issue # 1: ‘degree-1 node’ effect –Issue # 2: weakly connected pair Computational Issues –FastAllDAP: ALL pairs –FastOneDAP: One pair Experimental Results Conclusion

40 Conclusion (Fast DAP) Q1: How to define it? A1: Esc_Prob + Practical Modifications Q2: How to compute it efficiently? A2: FastAllDAP & FastOneDAP –(100x – 10,000x faster!) Q3: How to benefit real applications? A3: Link Prediction (existence & direction)

41 More in the paper… Generalization to group proximity –Definitions; Fast solutions – ‘How close between/from CEOs and/to Accountants?’ More applications –Dir-CePS, attributed-graphs CePS Common descendant Common ancestor Descendant of B; & Common ancestor of A and C...

42 Cupid uses arrows, so does graph mining! Thank you!

43 Back-up foils

44 DAP: Size Bias [Koren+] We want: Solution: degree preserving! Actually:

45 Practical Modifications: Degree-Preserving A->D->B A->E->F->B A->D->G->B Original graph: Prox(a->b)=0.875 Prox(a->b)=1 Prox(a->b)=0.75 Paths (A->B):

46 Practical Modifications: Degree-Preserving Size of Graph Proximity

47 Solving DAP: [Doyle+] Key quantity: –Pr (RW starting at k, will visit j before i) – Q: How to solve ?

48 Setup a linear system Solving [Doyle+] Harmonic property Boundary condition

49 Effectiveness: CePS Original Graph Black: query nodes CePS

50 From CePS to Dir-CePS Common descendant Common ancestor Descendant of B; & Common ancestor of A and C