Presentation is loading. Please wait.

Presentation is loading. Please wait.

Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins1 Fast Discovery of Connection Subgraphs Christos Faloutsos (CMU) Kevin McCurley (IBM) Andrew Tomkins.

Similar presentations


Presentation on theme: "Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins1 Fast Discovery of Connection Subgraphs Christos Faloutsos (CMU) Kevin McCurley (IBM) Andrew Tomkins."— Presentation transcript:

1 Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins1 Fast Discovery of Connection Subgraphs Christos Faloutsos (CMU) Kevin McCurley (IBM) Andrew Tomkins (IBM)

2 Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins2 Outline Introduction / Motivation Survey Proposed Method Algorithms Experiments Conclusions

3 Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins3 Introduction What are the best paths between ‘Kidman’ and ‘Diaz’? Kidman Diaz

4 Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins4 Problem definition Given a graph, and two nodes s and t, and a 'budget' b of nodes Find the best b nodes that capture the relationship between s and t st f

5 Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins5 Problem definition Given a graph, and two nodes s and t, and a 'budget' b of nodes Find the best b nodes that capture the relationship between s and t st f

6 Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins6 Problem definition Part 1: How to quantify the goodness? Part 2: How to pick ‘best few’ nodes? Part 3: Scalability: large graphs (10**7 nodes) st f

7 Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins7 Survey Graph Partitioning –[Karypis+Kumar]; [Newman+]; –[Virtanen]; … Communities –[Flake+]; [Tomkins, Kleinberg+] External distances [Palmer+]

8 Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins8 Outline Introduction / Motivation Survey Proposed Method Algorithms Experiments Conclusions

9 Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins9 part 1: measuring goodness: – electricity part 2: finding good paths –dynamic programming part 3: scalability –heuristics Proposed method

10 Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins10 st f Electricity Why not shortest path?

11 Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins11 st f Electricity Why not shortest path? Why not net. flow?

12 Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins12 st f Electricity Why not shortest path? Why not net. flow? Why not plain ‘voltages’? +1V 0V

13 Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins13 st f Electricity Why not shortest path? Why not net. flow? Why not plain ‘voltages’? +1V 0V +0.5V

14 Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins14 st f... Electricity, cont’d Proposed method: voltages with universal sink: –~ ‘tax collector’ goodness of a path: its electric current (*) ! +1V 0V

15 Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins15 Outline Introduction / Motivation Survey Proposed Method Algorithms Experiments Conclusions

16 Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins16 Electricity – Algorithm Voltages/Amperages can be computed easily ( O(E) ) without universal sink: v(i) = Σum j [v(j) * C(i,j) / C(i,*) ] i != source, sink v(source)=1; v(sink)=0

17 Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins17 Electricity – Algorithm With universal sink: v(i) = 1/(1+a) Σum j [v(j) * C(i,j) / C(i,*) ] (~ insensitive to a (=1))

18 Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins18 Given the voltages and amperages Which b nodes to keep? (and how to spot them quickly?) Part 2: DisplayGen

19 Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins19 Part 2: DisplayGen

20 Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins20 Part 2: DisplayGen ‘delivered current’ of a path: –~ ‘how many electrons’ choose this path =4/5 *1/2A

21 Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins21 Part 2: DisplayGen find subgraph that max’s delivered current Incrementally, add nodes with max marginal delivered current

22 Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins22 Part 3: Scalability ‘CandidateGen’ Starting from the large graph Eliminate nodes that are too far away to matter How?

23 Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins23 s t source sink Part 3: Scalability By successive, careful expansions

24 Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins24 s t Part 3: Scalability

25 Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins25 s t Part 3: Scalability

26 Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins26 s t Part 3: Scalability

27 Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins27 Pseudo-code Until (stoppingCriterion) use pickHeuristic() to pick a node n expand node n

28 Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins28 Pseudo-code pickHeuristic() favors Nearby nodes with Strong connections to source or sink and with Small degree

29 Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins29 Outline Introduction / Motivation Survey Proposed Method Algorithms Experiments Conclusions

30 Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins30 Experiments on large real graph –~15M nodes, ~100M edges, weighted –‘who co-appears with whom’ (from 500M web pages) Q1: Quality of ‘voltage’ approach? Q2: Speed/accuracy trade-off?

31 Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins31 Q1: Quality Actors (A); Computer-Scientists (CS) Kidman-Diaz (A-A) Negreponte-Palmisano (CS-CS) Turing-Stone (CS-A)

32 Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins32 (A-A) Kidman-Diaz Strong, direct link What are the best paths between ‘Kidman’ and ‘Diaz’? Kidman Diaz

33 Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins33 CS-CS: Negreponte - Palmisano NN SP Mainly: CEOs of major Computer companies (Dell, Gates, Fiorina, ++)

34 Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins34 CS-CS: Negreponte - Palmisano NN Esther DysonLouis Gerstner SP

35 Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins35 CS-A: Turing - Stone Turing Anderson Stone

36 Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins36 Outline Introduction / Motivation... Experiments –Q1: quality –Q2: speed/accuracy trade-off Conclusions

37 Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins37 Speed/Accuracy Trade-off number of nodes kept (‘b’) delivered current Kleinberg-Newell Rivest-Hoffman Turing-Stone Kidman-Diaz

38 Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins38 Speed/accuracy trade-off 80/20-like rule: the first few nodes/paths contribute the vast majority of ‘delivered current’ Thus: CandidateGen makes sense

39 Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins39 Conclusions Defined the problem Part 1: Electricity-based method to measure quality Part 2: Dynamic programming to spot best paths (‘DisplayGen’) Part 3: Scalability with good accuracy (‘CandidateGen’) Operational system


Download ppt "Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins1 Fast Discovery of Connection Subgraphs Christos Faloutsos (CMU) Kevin McCurley (IBM) Andrew Tomkins."

Similar presentations


Ads by Google