Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins1 Fast Discovery of Connection Subgraphs Christos Faloutsos (CMU) Kevin McCurley (IBM) Andrew Tomkins.

Slides:



Advertisements
Similar presentations
Date: 2014/05/06 Author: Michael Schuhmacher, Simon Paolo Ponzetto Source: WSDM’14 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Knowledge-based Graph Document.
Advertisements

Social network partition Presenter: Xiaofei Cao Partick Berg.
CMU SCS I2.2 Large Scale Information Network Processing INARC 1 Overview Goal: scalable algorithms to find patterns and anomalies on graphs 1. Mining Large.
School of Computer Science Carnegie Mellon University Duke University DeltaCon: A Principled Massive- Graph Similarity Function Danai Koutra Joshua T.
DIJKSTRA’s Algorithm. Definition fwd search Find the shortest paths from a given SOURCE node to ALL other nodes, by developing the paths in order of increasing.
Mining di Dati Web Web Community Mining and Web log Mining : Commody Cluster based execution Romeo Zitarosa.
The Connectivity and Fault-Tolerance of the Internet Topology
Community Detection Laks V.S. Lakshmanan (based on Girvan & Newman. Finding and evaluating community structure in networks. Physical Review E 69,
Breadth-First Search Seminar – Networking Algorithms CS and EE Dept. Lulea University of Technology 27 Jan Mohammad Reza Akhavan.
Midwestern State University Department of Computer Science Dr. Ranette Halverson CMPS 2433 CHAPTER 4 - PART 2 GRAPHS 1.
Graph Partitioning Dr. Frank McCown Intro to Web Science Harding University This work is licensed under Creative Commons Attribution-NonCommercial 3.0Attribution-NonCommercial.
Construction of Simple Graphs with a Target Joint Degree Matrix and Beyond Minas Gjoka, Balint Tillman, Athina Markopoulou University of California, Irvine.
 Graph Graph  Types of Graphs Types of Graphs  Data Structures to Store Graphs Data Structures to Store Graphs  Graph Definitions Graph Definitions.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
Learning using Graph Mincuts Shuchi Chawla Carnegie Mellon University 1/11/2003.
Graph Algorithms: Minimum Spanning Tree We are given a weighted, undirected graph G = (V, E), with weight function w:
Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun.
Placement of Integration Points in Multi-hop Community Networks Ranveer Chandra (Cornell University) Lili Qiu, Kamal Jain and Mohammad Mahdian (Microsoft.
Detecting Network Intrusions via Sampling : A Game Theoretic Approach Presented By: Matt Vidal Murali Kodialam T.V. Lakshman July 22, 2003 Bell Labs, Lucent.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Probabilistic Roadmaps for Path Planning in High-Dimensional Configuration Spaces Kavraki, Svestka, Latombe, Overmars 1996 Presented by Dongkyu, Choi.
Measuring and Extracting Proximity in Networks By - Yehuda Koren, Stephen C.North and Chris Volinsky - Rahul Sehgal.
1 Efficient Placement and Dispatch of Sensors in a Wireless Sensor Network Prof. Yu-Chee Tseng Department of Computer Science National Chiao-Tung University.
The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.
Scalable Network Distance Browsing in Spatial Database Samet, H., Sankaranarayanan, J., and Alborzi H. Proceedings of the 2008 ACM SIGMOD international.
Probabilistic Roadmaps for Path Planning in High-Dimensional Configuration Spaces Kavraki, Svestka, Latombe, Overmars 1996 Presented by Chris Allocco.
Minimum Spanning Trees. Subgraph A graph G is a subgraph of graph H if –The vertices of G are a subset of the vertices of H, and –The edges of G are a.
Package Transportation Scheduling Albert Lee Robert Z. Lee.
MAX FLOW CS302, Spring 2013 David Kauchak. Admin.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 2007 (TPDS 2007)
CAFE router: A Fast Connectivity Aware Multiple Nets Routing Algorithm for Routing Grid with Obstacles Y. Kohira and A. Takahashi School of Computer Science.
Efficient Gathering of Correlated Data in Sensor Networks
Crossing Minimisation (1) Ronald Kieft. Global contents Specific 2-layer Crossing Minimisation techniques After the break, by Johan Crossing Minimisation.
Lecture 13 Graphs. Introduction to Graphs Examples of Graphs – Airline Route Map What is the fastest way to get from Pittsburgh to St Louis? What is the.
CS 3343: Analysis of Algorithms Lecture 21: Introduction to Graphs.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P0-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
Information Flow using Edge Stress Factor Communities Extraction from Graphs Implied by an Instant Messages Corpus Franco Salvetti University of Colorado.
Graph Theory in Computer Science
IEEE Globecom 2010 Tan Le Yong Liu Department of Electrical and Computer Engineering Polytechnic Institute of NYU Opportunistic Overlay Multicast in Wireless.
Efficient Route Computation on Road Networks Based on Hierarchical Communities Qing Song, Xiaofan Wang Department of Automation, Shanghai Jiao Tong University,
Finding Top-k Shortest Path Distance Changes in an Evolutionary Network SSTD th August 2011 Manish Gupta UIUC Charu Aggarwal IBM Jiawei Han UIUC.
Improving Voltage Assignment by Outlier Detection and Incremental Placement Huaizhi Wu* and Martin D.F. Wong** * Atoptech, Inc. ** University of Illinois.
The Structure of the Web. Getting to knowing the Web How big is the web and how do you measure it? How many people use the web? How many use search engines?
CS223 Advanced Data Structures and Algorithms 1 Maximum Flow Neil Tang 3/30/2010.
Clusters Recognition from Large Small World Graph Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel University of Haifa, Israel.
Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.
Graphs Upon completion you will be able to:
Algorithm Design and Analysis (ADA)
Outline Standard 2-way minimum graph cut problem. Applications to problems in computer vision Classical algorithms from the theory literature A new algorithm.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
1 CPSC 320: Intermediate Algorithm Design and Analysis July 14, 2014.
Efficient Placement and Dispatch of Sensors in a Wireless Sensor Network You-Chiun Wang, Chun-Chi Hu, and Yu-Chee Tseng IEEE Transactions on Mobile Computing.
Bo Zong, Yinghui Wu, Ambuj K. Singh, Xifeng Yan 1 Inferring the Underlying Structure of Information Cascades
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P9-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Electricity Based External Similarity of Categorical Attributes.
11/21/02CSE Max Flow CSE Algorithms Max Flow Problems.
Construction of Optimal Data Aggregation Trees for Wireless Sensor Networks Deying Li, Jiannong Cao, Ming Liu, and Yuan Zheng Computer Communications and.
CMU SCS Panel: Social Networks Christos Faloutsos CMU.
Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Network Biology.
A Viewpoint-based Approach for Interaction Graph Analysis
CS 326A: Motion Planning Probabilistic Roadmaps for Path Planning in High-Dimensional Configuration Spaces (1996) L. Kavraki, P. Švestka, J.-C. Latombe,
Topological Sort (topological order)
Dejun Yang (Arizona State University)
CSE 373 Data Structures and Algorithms
Connectivity Section 10.4.
Bidirectional Query Planning Algorithm
Minimum spanning trees
COMPUTER NETWORKS CS610 Lecture-16 Hammad Khalid Khan.
Presentation transcript:

Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins1 Fast Discovery of Connection Subgraphs Christos Faloutsos (CMU) Kevin McCurley (IBM) Andrew Tomkins (IBM)

Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins2 Outline Introduction / Motivation Survey Proposed Method Algorithms Experiments Conclusions

Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins3 Introduction What are the best paths between ‘Kidman’ and ‘Diaz’? Kidman Diaz

Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins4 Problem definition Given a graph, and two nodes s and t, and a 'budget' b of nodes Find the best b nodes that capture the relationship between s and t st f

Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins5 Problem definition Given a graph, and two nodes s and t, and a 'budget' b of nodes Find the best b nodes that capture the relationship between s and t st f

Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins6 Problem definition Part 1: How to quantify the goodness? Part 2: How to pick ‘best few’ nodes? Part 3: Scalability: large graphs (10**7 nodes) st f

Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins7 Survey Graph Partitioning –[Karypis+Kumar]; [Newman+]; –[Virtanen]; … Communities –[Flake+]; [Tomkins, Kleinberg+] External distances [Palmer+]

Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins8 Outline Introduction / Motivation Survey Proposed Method Algorithms Experiments Conclusions

Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins9 part 1: measuring goodness: – electricity part 2: finding good paths –dynamic programming part 3: scalability –heuristics Proposed method

Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins10 st f Electricity Why not shortest path?

Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins11 st f Electricity Why not shortest path? Why not net. flow?

Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins12 st f Electricity Why not shortest path? Why not net. flow? Why not plain ‘voltages’? +1V 0V

Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins13 st f Electricity Why not shortest path? Why not net. flow? Why not plain ‘voltages’? +1V 0V +0.5V

Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins14 st f... Electricity, cont’d Proposed method: voltages with universal sink: –~ ‘tax collector’ goodness of a path: its electric current (*) ! +1V 0V

Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins15 Outline Introduction / Motivation Survey Proposed Method Algorithms Experiments Conclusions

Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins16 Electricity – Algorithm Voltages/Amperages can be computed easily ( O(E) ) without universal sink: v(i) = Σum j [v(j) * C(i,j) / C(i,*) ] i != source, sink v(source)=1; v(sink)=0

Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins17 Electricity – Algorithm With universal sink: v(i) = 1/(1+a) Σum j [v(j) * C(i,j) / C(i,*) ] (~ insensitive to a (=1))

Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins18 Given the voltages and amperages Which b nodes to keep? (and how to spot them quickly?) Part 2: DisplayGen

Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins19 Part 2: DisplayGen

Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins20 Part 2: DisplayGen ‘delivered current’ of a path: –~ ‘how many electrons’ choose this path =4/5 *1/2A

Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins21 Part 2: DisplayGen find subgraph that max’s delivered current Incrementally, add nodes with max marginal delivered current

Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins22 Part 3: Scalability ‘CandidateGen’ Starting from the large graph Eliminate nodes that are too far away to matter How?

Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins23 s t source sink Part 3: Scalability By successive, careful expansions

Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins24 s t Part 3: Scalability

Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins25 s t Part 3: Scalability

Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins26 s t Part 3: Scalability

Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins27 Pseudo-code Until (stoppingCriterion) use pickHeuristic() to pick a node n expand node n

Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins28 Pseudo-code pickHeuristic() favors Nearby nodes with Strong connections to source or sink and with Small degree

Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins29 Outline Introduction / Motivation Survey Proposed Method Algorithms Experiments Conclusions

Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins30 Experiments on large real graph –~15M nodes, ~100M edges, weighted –‘who co-appears with whom’ (from 500M web pages) Q1: Quality of ‘voltage’ approach? Q2: Speed/accuracy trade-off?

Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins31 Q1: Quality Actors (A); Computer-Scientists (CS) Kidman-Diaz (A-A) Negreponte-Palmisano (CS-CS) Turing-Stone (CS-A)

Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins32 (A-A) Kidman-Diaz Strong, direct link What are the best paths between ‘Kidman’ and ‘Diaz’? Kidman Diaz

Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins33 CS-CS: Negreponte - Palmisano NN SP Mainly: CEOs of major Computer companies (Dell, Gates, Fiorina, ++)

Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins34 CS-CS: Negreponte - Palmisano NN Esther DysonLouis Gerstner SP

Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins35 CS-A: Turing - Stone Turing Anderson Stone

Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins36 Outline Introduction / Motivation... Experiments –Q1: quality –Q2: speed/accuracy trade-off Conclusions

Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins37 Speed/Accuracy Trade-off number of nodes kept (‘b’) delivered current Kleinberg-Newell Rivest-Hoffman Turing-Stone Kidman-Diaz

Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins38 Speed/accuracy trade-off 80/20-like rule: the first few nodes/paths contribute the vast majority of ‘delivered current’ Thus: CandidateGen makes sense

Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins39 Conclusions Defined the problem Part 1: Electricity-based method to measure quality Part 2: Dynamic programming to spot best paths (‘DisplayGen’) Part 3: Scalability with good accuracy (‘CandidateGen’) Operational system