Finding Top-k Shortest Path Distance Changes in an Evolutionary Network SSTD 2011 24 th August 2011 Manish Gupta UIUC Charu Aggarwal IBM Jiawei Han UIUC.

Slides:



Advertisements
Similar presentations
Maximum flow Main goals of the lecture:
Advertisements

ADAPTIVE FASTEST PATH COMPUTATION ON A ROAD NETWORK: A TRAFFIC MINING APPROACH Hector Gonzalez, Jiawei Han, Xiaolei Li, Margaret Myslinska, John Paul Sondag.
Random Forest Predrag Radenković 3237/10
Minimum Energy Mobile Wireless Networks IEEE JSAC 2001/10/18.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture10.
Algorithms in sensor networks By: Raghavendra kyatham.
Data and Computer Communications Ninth Edition by William Stallings Chapter 12 – Routing in Switched Data Networks Data and Computer Communications, Ninth.
Generated Waypoint Efficiency: The efficiency considered here is defined as follows: As can be seen from the graph, for the obstruction radius values (200,
Introduction To Algorithms CS 445 Discussion Session 8 Instructor: Dr Alon Efrat TA : Pooja Vaswani 04/04/2005.
Topology Generation Suat Mercan. 2 Outline Motivation Topology Characterization Levels of Topology Modeling Techniques Types of Topology Generators.
LINK PREDICTION IN CO-AUTHORSHIP NETWORK Le Nhat Minh ( A N) Supervisor: Dongyuan Lu Aobo Tao Chen 1.
Finding Top-k Shortest Path Distance Changes in an Evolutionary Network SSTD th August 2011 Manish Gupta UIUC Charu Aggarwal IBM Jiawei Han UIUC.
N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao † Wei Fan ‡ Yizhou Sun † Jiawei Han † †University of Illinois at Urbana-Champaign.
Tracking Moving Objects in Anonymized Trajectories Nikolay Vyahhi 1, Spiridon Bakiras 2, Panos Kalnis 3, and Gabriel Ghinita 3 1 St. Petersburg State University.
Lecture 11. Matching A set of edges which do not share a vertex is a matching. Application: Wireless Networks may consist of nodes with single radios,
LPT for Data Aggregation in Wireless Sensor networks Marc Lee and Vincent W.S Wong Department of Electrical and Computer Engineering, University of British.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Greedy Algorithms Like dynamic programming algorithms, greedy algorithms are usually designed to solve optimization problems Unlike dynamic programming.
Query-Based Outlier Detection in Heterogeneous Information Networks Jonathan Kuck 1, Honglei Zhuang 1, Xifeng Yan 2, Hasan Cam 3, Jiawei Han 1 1 University.
Lecture 11. Matching A set of edges which do not share a vertex is a matching. Application: Wireless Networks may consist of nodes with single radios,
Clustering An overview of clustering algorithms Dènis de Keijzer GIA 2004.
ROUTING ON THE INTERNET COSC Aug-15. Routing Protocols  routers receive and forward packets  make decisions based on knowledge of topology.
Survey on Evolving Graphs Research Speaker: Chenghui Ren Supervisors: Prof. Ben Kao, Prof. David Cheung 1.
Link Recommendation In P2P Social Networks Yusuf Aytaş, Hakan Ferhatosmanoğlu, Özgür Ulusoy Bilkent University, Ankara, Turkey.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 2007 (TPDS 2007)
1 Pertemuan 20 Teknik Routing Matakuliah: H0174/Jaringan Komputer Tahun: 2006 Versi: 1/0.
Internet Traffic Engineering by Optimizing OSPF Weights Bernard Fortz (Universit é Libre de Bruxelles) Mikkel Thorup (AT&T Labs-Research) Presented by.
Evolutionary Clustering and Analysis of Bibliographic Networks Manish Gupta (UIUC) Charu C. Aggarwal (IBM) Jiawei Han (UIUC) Yizhou Sun (UIUC) ASONAM 2011.
Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Charu Aggarwal + * Department of Computer Science, University of Texas at Dallas + IBM T. J. Watson.
Mehdi Kargar Aijun An York University, Toronto, Canada Discovering Top-k Teams of Experts with/without a Leader in Social Networks.
Rate-based Data Propagation in Sensor Networks Gurdip Singh and Sandeep Pujar Computing and Information Sciences Sanjoy Das Electrical and Computer Engineering.
A Distributed Clustering Framework for MANETS Mohit Garg, IIT Bombay RK Shyamasundar School of Tech. & Computer Science Tata Institute of Fundamental Research.
2015/10/111 DBconnect: Mining Research Community on DBLP Data Osmar R. Zaïane, Jiyang Chen, Randy Goebel Web Mining and Social Network Analysis Workshop.
Researchers: Preet Bola Mike Earnest Kevin Varela-O’Hara Han Zou Advisor: Walter Rusin Data Storage Networks.
TCP/IP Protocol Suite 1 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 11 Unicast Routing Protocols.
Solution to HW1. Problem 1 Need to find shortest path from a single source s to a single destination d. Have a condition in the Dijkstra algo loop which.
Discovering Meta-Paths in Large Heterogeneous Information Network
On Node Classification in Dynamic Content-based Networks.
All-Pairs Shortest Paths & Essential Subgraph 01/25/2005 Jinil Han.
Zibin Zheng DR 2 : Dynamic Request Routing for Tolerating Latency Variability in Cloud Applications CLOUD 2013 Jieming Zhu, Zibin.
Mining Top-K Large Structural Patterns in a Massive Network Feida Zhu 1, Qiang Qu 2, David Lo 1, Xifeng Yan 3, Jiawei Han 4, and Philip S. Yu 5 1 Singapore.
Graphs A ‘Graph’ is a diagram that shows how things are connected together. It makes no attempt to draw actual paths or routes and scale is generally inconsequential.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
CS 6401 Overlay Networks Outline Overlay networks overview Routing overlays Resilient Overlay Networks Content Distribution Networks.
A Framework for Reliable Routing in Mobile Ad Hoc Networks Zhenqiang Ye Srikanth V. Krishnamurthy Satish K. Tripathi.
Presented by: Dardan Xhymshiti Spring 2016:. Authors: Publication:  ICDM 2015 Type:  Research Paper 2 Michael ShekelyamGregor JosseMatthias Schubert.
Supervised Random Walks: Predicting and Recommending Links in Social Networks Lars Backstrom (Facebook) & Jure Leskovec (Stanford) Proc. of WSDM 2011 Present.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.
Zaiben Chen et al. Presented by Lian Liu. You’re traveling from s to t. Which gas station would you choose?
Presented by: Siddhant Kulkarni Spring Authors: Publication:  ICDE 2015 Type:  Research Paper 2.
ROUTING ON THE INTERNET COSC Jun-16. Routing Protocols  routers receive and forward packets  make decisions based on knowledge of topology.
Outlier Detection for Information Networks Manish Gupta 15 th Jan 2013.
Instructor Materials Chapter 5: Dynamic Routing
Semi-Supervised Clustering
Sofus A. Macskassy Fetch Technologies
Dijkstra’s shortest path Algorithm
Chapter 5: Dynamic Routing
Community detection in graphs
CS223 Advanced Data Structures and Algorithms
Fast Nearest Neighbor Search on Road Networks
3.5 Minimum Cuts in Undirected Graphs
Graph Indexing for Shortest-Path Finding over Dynamic Sub-Graphs
Scaling up Link Prediction with Ensembles
Algorithms (2IL15) – Lecture 7
Dijkstra’s Algorithm for Shortest Paths
and 6.855J Dijkstra’s Algorithm
OSPF Protocol.
Presentation transcript:

Finding Top-k Shortest Path Distance Changes in an Evolutionary Network SSTD th August 2011 Manish Gupta UIUC Charu Aggarwal IBM Jiawei Han UIUC

Networks as evolutionary graphs Social networks: new users join, new friendships are created. Bibliographic networks: new authors publish more papers, more collaborations are done. Transportation/road networks: new roads are constructed. Ad hoc networks: Army vehicles change positions very frequently, new messages transmitted.

Analysis of evolutionary networks Community formation, using clustering techniques Metrics to study evolution – merge/split Information diffusion across evolutionary networks Link prediction tasks Queries over evolving networks

Queries over Evolving networks Updating shortest path distance between two nodes as the edge weights change. E.g., in computer networks, routers need to update their shortest path trees when a link goes down. Given a time dependent network (edge weights are function of time), how to compute SPD(u, v, t). Queries incorporating the max flow constraints.

Transportation Planning Problem Given the current set of roads, we want to overlay a network of new roads. Civil engineers propose two plans: A and B with different sets of new roads Which plan is better? Plan A brings cities X and Y very close. X produces a lot of product P while Y has a rich demand for product P. Plan A actually brings lots of “economically important pairs” of cities close to each other. Select plan A over B.

Our problem Given an evolutionary network with two snapshots G 1 and G 2. Compute top few node pairs with maximum shortest path distance change across the two snapshots. For example, across 2005 and 2011, distance between which pair of cities in Illinois decreased the most, thanks to the new roads built in this time period?

Naïve Approach Compute shortest path distance between every pair of nodes for snapshot G 1. Compute shortest path distance between every pair of nodes for snapshot G 2. Compute distance change for every pair of nodes. Sort the distance change vector Return node pairs corresponding to the top few distance change values. Highly inefficient solution!

Solution We experiment on three datasets: DBLP co-authorship graph, IMDB co-starring graph and Ontario province road network. Throw in more CPUs! Shortest path algorithms are easily parallelizable. Run single source shortest path runs across thousands of machines. On the Ontario road network dataset, it took around 400 CPU days! OR Use our algorithm Our methods are ~50-100X faster than baseline

Outline Smartly choose a seed set of few source nodes to run single source shortest path algorithm from: Incidence Algorithm. Improve the accuracy of Incidence Algorithm by intelligently expanding the seed set using Edge importance estimation algorithm. Generalize the problem to a node ranking problem. Suggest node ranking strategies. Experimental results and analysis.

Incidence Algorithm Maximum distance change will happen for node pairs consisting of nodes on which new edges or edges with changed weights are incident. Let V’ be the set of nodes with new edges. Algorithm: Run single source SPD algorithm from each node in V’ on both snapshots, compute difference (change), sort and return top k.

Is Incidence Algorithm accurate?

How to expand the seed set (V’)? Consider the neighbors of all the nodes currently in V’ as potential candidates. Expand to a promising neighbor. In particular, expand to a neighbor node a, if the edge that connects a to the current set V’ has relatively high importance, relative to other edges incident on node a. V’ a a Terminate when top k node pairs don’t change.

Edge importance number Importance number of an edge is the probability that the edge will lie on a randomly chosen shortest path tree in the graph. How to compute edge importance number for edge e? First find all shortest path trees and then find how many of such trees contain edge e. Too expensive! As inefficient as the naïve solution itself! Hence we compute estimate edge importance number using a randomized algorithm.

Edge Importance Estimation Algorithm Randomly sample a few nodes from the graph. Using each of these nodes S as source, obtain a shortest path tree T using an SPD algorithm (e.g. Dijkstra). For each tree T, perform distance labeling. Alternative Tight edge: An alternative edge which could replace an existent edge from T to give T’. For each edge in T, obtain multiple T’ by replacing a tight edge using an alternative tight edge. Edge importance of an edge wrt T is proportional to the number of descendants. Aggregate I(edge) across all different SPTs.

Generalizing the problem Naïve solution: Use all nodes in both snapshots. Incidence algorithm: Use only nodes in V’. Generalized solution? Node ranking problem. Rank nodes such that running Dijkstra algorithm from just top few nodes provides high accuracy for “topK node pairs with max distance change problem”.

How to rank nodes? Random: Randomly select nodes from the graph. RandomNWNE: Randomly select nodes from seed set V’ (nodes with new edges). Edge Weight Based Ranking (EWBR). Edge Weight Change Based Ranking (EWCBR)

How to rank nodes? Importance Number Based Ranking (INBR) Importance Number Change Based Ranking (INCBR) Ranking Using Edge Weight and Importance Numbers (RUEWIN)

How to rank nodes? Clustering Based Ranking (CBR) Clustering Based Ranking with Partitions (CBRP) Inter-cluster edges are more important than intra-cluster edges.

Clustering Based Ranking

Experiments

Related work Shortest path algorithms: Dijkstra [11], Shimbel [20], Johnson [15], Floyd, Warshall [14,21] Router networks [8,22] Outlier detection [5,13,18] Time dependent shortest paths [25,26] Dynamic shortest paths computation [3,4,6,19] Between-ness measures [23,24]

References

Thanks!