Measuring and Extracting Proximity in Complex Networks Emden Gansner, Yehuda Koren, Stephen North, Chris Volinsky AT&T Labs Research.

Slides:



Advertisements
Similar presentations
Chapter 5: Tree Constructions
Advertisements

COS 461 Fall 1997 Routing COS 461 Fall 1997 Typical Structure.
Unsupervised Learning
~1~ Infocom’04 Mar. 10th On Finding Disjoint Paths in Single and Dual Link Cost Networks Chunming Qiao* LANDER, CSE Department SUNY at Buffalo *Collaborators:
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
Techniques for Dealing with Hard Problems Backtrack: –Systematically enumerates all potential solutions by continually trying to extend a partial solution.
Maximizing the Spread of Influence through a Social Network
Absorbing Random walks Coverage
The Out of Kilter Algorithm in Introduction The out of kilter algorithm is an example of a primal-dual algorithm. It works on both the primal.
Small-World Graphs for High Performance Networking Reem Alshahrani Kent State University.
CPSC 322, Lecture 9Slide 1 Search: Advanced Topics Computer Science cpsc322, Lecture 9 (Textbook Chpt 3.6) January, 23, 2009.
C++ Programming: Program Design Including Data Structures, Third Edition Chapter 21: Graphs.
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Computational problems, algorithms, runtime, hardness
Data Transmission and Base Station Placement for Optimizing Network Lifetime. E. Arkin, V. Polishchuk, A. Efrat, S. Ramasubramanian,V. PolishchukA. EfratS.
1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.
Hierarchical Region-Based Segmentation by Ratio-Contour Jun Wang April 28, 2004 Course Project of CSCE 790.
CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.
Tirgul 10 Rehearsal about Universal Hashing Solving two problems from theoretical exercises: –T2 q. 1 –T3 q. 2.
CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.
Nature’s Algorithms David C. Uhrig Tiffany Sharrard CS 477R – Fall 2007 Dr. George Bebis.
Greedy Algorithms Reading Material: Chapter 8 (Except Section 8.5)
Measuring and Extracting Proximity in Networks By - Yehuda Koren, Stephen C.North and Chris Volinsky - Rahul Sehgal.
MAE 552 – Heuristic Optimization Lecture 26 April 1, 2002 Topic:Branch and Bound.
Computational Complexity, Physical Mapping III + Perl CIS 667 March 4, 2004.
Dept. of Computer Science Distributed Computing Group Asymptotically Optimal Mobile Ad-Hoc Routing Fabian Kuhn Roger Wattenhofer Aaron Zollinger.
© 2005, it - instituto de telecomunicações. Todos os direitos reservados. Gerhard Maierbacher Scalable Coding Solutions for Wireless Sensor Networks IT.
The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.
Simpath: An Efficient Algorithm for Influence Maximization under Linear Threshold Model Amit Goyal Wei Lu Laks V. S. Lakshmanan University of British Columbia.
Approximation Algorithms
1 1 Slide © 2000 South-Western College Publishing/ITP Slides Prepared by JOHN LOUCKS.
Radial Basis Function Networks
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Efficient Gathering of Correlated Data in Sensor Networks
Mehdi Kargar Aijun An York University, Toronto, Canada Discovering Top-k Teams of Experts with/without a Leader in Social Networks.
Message-Optimal Connected Dominating Sets in Mobile Ad Hoc Networks Paper By: Khaled M. Alzoubi, Peng-Jun Wan, Ophir Frieder Presenter: Ke Gao Instructor:
L14. Fair networks and topology design D. Moltchanov, TUT, Spring 2008 D. Moltchanov, TUT, Spring 2015.
Graph Theory Topics to be covered:
Topology aggregation and Multi-constraint QoS routing Presented by Almas Ansari.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
DATA MINING LECTURE 13 Absorbing Random walks Coverage.
Advanced Algorithm Design and Analysis (Lecture 13) SW5 fall 2004 Simonas Šaltenis E1-215b
Minimum Cost Flows. 2 The Minimum Cost Flow Problem u ij = capacity of arc (i,j). c ij = unit cost of shipping flow from node i to node j on (i,j). x.
Representing and Using Graphs
Algorithm Course Dr. Aref Rashad February Algorithms Course..... Dr. Aref Rashad Part: 5 Graph Algorithms.
Clustering Spatial Data Using Random Walk David Harel and Yehuda Koren KDD 2001.
Expanders via Random Spanning Trees R 許榮財 R 黃佳婷 R 黃怡嘉.
Simulated Annealing.
Computer Science CPSC 322 Lecture 9 (Ch , 3.7.6) Slide 1.
Online Algorithms By: Sean Keith. An online algorithm is an algorithm that receives its input over time, where knowledge of the entire input is not available.
Data Communications and Networking Chapter 11 Routing in Switched Networks References: Book Chapters 12.1, 12.3 Data and Computer Communications, 8th edition.
Heuristic Optimization Methods Greedy algorithms, Approximation algorithms, and GRASP.
Graph Colouring L09: Oct 10. This Lecture Graph coloring is another important problem in graph theory. It also has many applications, including the famous.
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
Graphs A ‘Graph’ is a diagram that shows how things are connected together. It makes no attempt to draw actual paths or routes and scale is generally inconsequential.
CIAR Summer School Tutorial Lecture 1b Sigmoid Belief Nets Geoffrey Hinton.
De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer.
Graph Drawing by Stress Majorization Authors: Emden R. Gansner, Yehuda Koren and Stephen North Presenter: Kewei Lu.
Biologically Inspired Computation Ant Colony Optimisation.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Iterative Improvement for Domain-Specific Problems Lecturer: Jing Liu Homepage:
Introduction Wireless Ad-Hoc Network  Set of transceivers communicating by radio.
Topics In Social Computing (67810) Module 1 (Structure) Centrality Measures, Graph Clustering Random Walks on Graphs.
Unconstrained Submodular Maximization Moran Feldman The Open University of Israel Based On Maximizing Non-monotone Submodular Functions. Uriel Feige, Vahab.
Finding Dense and Connected Subgraphs in Dual Networks
Design and Analysis of Algorithm
Chapter 5. Optimal Matchings
Introduction Wireless Ad-Hoc Network
Artificial Intelligence
Alan Kuhnle*, Victoria G. Crawford, and My T. Thai
Presentation transcript:

Measuring and Extracting Proximity in Complex Networks Emden Gansner, Yehuda Koren, Stephen North, Chris Volinsky AT&T Labs Research

AT&T “Safe Harbor” The following contains "forward-looking statements" which are based on management's beliefs as well as on a number of assumptions concerning future events made by and information currently available to management. Readers are cautioned not to put undue reliance on such forward-looking statements, which are not a guarantee of performance and are subject to a number of uncertainties and other factors, many of which are outside AT&T's control, that could cause actual results to differ materially from such statements. For a more detailed description of the factors that could cause such a difference, please see AT&T's filings with the Securities and Exchange Commission. AT&T disclaims any intention or obligation to update or revise any forward-looking statements, whether as a result of new information, future events or otherwise.

large social networks 31M438K co-authors 1.1M896K actor-actor 1000M300M phone calls 800M200M IM data source|V||E|

18 node subgraph Proximity: 1.35e+01 Captured: 1.31e+01 (97%) Adam? Glenn? Emden? Connecting Co-Authors in DBLP

95% of communication between… 5 node subgraph Proximity: 7.10e+00 Captured: 6.74e+00 (95%(

Our goals Measure proximity between nodes. Explain proximity by extracting connection subgraphs that are readily visualized.

What is proximity? proximity [prox·im·i·ty || pr ɑ k's ɪ mət ɪ /pr ɒ -]n. adjacency, nearness, closeness, vicinity Network proximity is an elusive notion! Let’s work by refining a series of definitions.

Measuring proximity Simplest approach – length of shortest path Easily visualized

Measuring proximity Simplest approach – length of shortest path Easily illustrated Disregards alternative paths Captures 56% Captures 98%

Measuring proximity Simplest approach – length of shortest path Easily visualized Disregards alternative paths Naïve calculation will be fooled by high degrees Example from a telephone call graph…

Which pair is closer? Lefty Stephen Suresh Shankar Both paths are 2-hops, about the same lengths But when considering node-degrees… Meaningful connection Random connection?

Measuring proximity – 2 nd try Net network flow between the nodes Accounts for multiple paths Distance indifferent – might favor long paths High degree are still an issue

Measuring proximity – 3 rd try Delivered electric current (effective conductance) Resistor network model Accounts for multiple paths Penalizes long paths High degrees?? Getting us closer… “intuitive” Physical analogy is not perfect! edge weights conductance, inverse-resistance 1V0V

When is the electrical current analogy misleading? Significant connectionNoise? What does current flow mean?

When is the electric current analogy misleading? Noise?Significant connection Same current flow in both cases! Degree-1 nodes are neutral (attract no flow) Degree-1 nodes are very common, due to incomplete information

Augment network by a universal sink [Faloutsos, McCurley & Tomkins, KDD 2004] Connect all nodes to a grounded universal sink (with 0V) Tax each node - deliver portion of the flow to the sink No internal nodes of degree 1 (above problem solved) Penalizes long paths A new parameter to worry about: Which tax system? - Constant tax? Proportional tax? Tax brackets? How much? There is a worse problem…

Universal sink and (non-)monotonicity In our previous notions of proximity, adding nodes/edges to the network couldn’t decrease proximity Hmmm…this “blind monotonicity” was part of their shortcoming… Network size Proximity

Universal sink and (non-)monotonicity For all previous measures, adding nodes/edges to the network couldn’t decrease proximity With universal sink – no monotonicity: Larger network  proximity tends to zero, sink attracts more flow Even adding s—t paths can decrease proximity! Network size Proximity

Universal sink and (non-)monotonicity Problems with non-monotonicity: –Counter-intuitive and hard to use –Size bias makes proximity-comparison across different pairs completely unreliable –Impossible to explain (size-dependent) proximity using a connection subgraph Network size Proximity

A random-walk perspective Current-flow model has a direct r.w. interpretation Reminder: We defined proximity by “delivered current” or “effective conductance” The escape probability, Pesc(s  t), is the probability that a r.w. originating at s will reach t before visiting s again Let Deg(s) be the number of r.w.’s originating at s The effective conductance between s and t, is Pesc(s  t)*Deg(s)

“ Dead end” paths have no influence on escape probability Both graphs have the same escape- probability from red to green Lower red  green escape probability Higher red  green escape probability In both cases higher effective conductance by Rayleigh’s Monotonicity Law

Extending escape probability The escape probability, Pesc(s  t), is the probability that a r.w. originating at s will reach t before visiting s again The cycle-free escape probability, Pc.f.esc(s  t) is the probability that a r.w. originating at s will reach t without visiting any node more than once Multiply by degree to get an absolute quantity (accounting for the number of "actually initiated" r.w.'s): The c.f. effective conductance between s and t is Pc.f.esc(s  t)*Deg(s)

Higher red  green c.f. escape probability Lower red  green c.f. escape probability The c.f. effective conductance is a good candidate proximity measure: Accounts for multiple paths Favors short paths Penalizes high-degree nodes Penalizes dead-end paths Parameter free Has the “right” monotonicity Accommodates edge directions Has a natural extension to multiple endpoints

Computing c.f. escape probability Unlike previous measures, exact computation is impossible Practically, we can estimate it extremely well Probability of paths declines exponentially (e.g., 100 th path is x10 6 less probable than the first one.) Estimate using the most probable paths: 

Finding k most probable paths For an edge u-v of weight w(u,v), define its length Edge lengths are positive Exp(- ) = Prob(path) Short path High-probable path Compute k shortest simple paths in O(k|E|log|E|) time [Katoh, Ibarki and Mine, 1982] Stop searching when probability drops below “10 -6 ” of first path

Extracting and explaining proximity

Extracting proximity Cycle free effective conductance (CFEC) depends on the full graph Find a small subgraph that captures the most proximity A tradeoff between “size” and “captured proximity”, can be expressed in alternative ways: –Extract a subgraph with at most B nodes that captures maximal CFEC Maybe with B+1 nodes we can capture much more??? –Extract a minimal-sized subgraph that captures at least P% of total CFEC Maybe we can capture (P-1)% of total CFEC with a much smaller subgraph???

Extracting proximity Find a small subgraph that captures most proximity Achieve an efficient balance between “size” and “proximity” by maximizing the ratio: Larger α  emphasize proximity  larger subgraph – α=0  returns only the shortest path – α= ∞  return all paths Optionally, explicitly fix lower and upper bounds on subgraph size

What solutions do we seek? Overlapping paths delivering the most flow

The path merger algorithm We already have a collection of paths Find the subset of the paths that maximizes Combine the selected paths into a “proximity subgraph” Overlapping paths are cheaper to add An NP-hard problem…

Optimal algorithm Scanning all subsets takes O(2 k ) time (can we do better?) A branch-and-bound pruning significantly reduces running time Huge deviations in path-quality make this approach effective e.g. often it is clear that the best-subset must contain first path(s) Prematurely terminate exponential algorithm after scanning “too many” subsets

Agglomerative algorithm If optimal algorithm couldn’t finish, improve current result by an agglomerative algorithm Iteratively, merge the two subsets that maximize the ratio Record the best subset discovered

Working with large graphs in external storage Dealing with full graph is sometimes infeasible and usually unnecessary Prior to running the algorithm, we construct a candidate graph in main memory We begin by growing increasing neighborhoods around the endpoints

ST

ST Dist(T,i)=2Dist(S,i)=2

ST Dist(T,i)=3Dist(S,i)=3

ST Dist(T,i)=4Dist(S,i)=4

ST Dist(T,i)=5Dist(S,i)=5 Shortest path of length 10

No use for low- probability paths... Paths longer than “ 24 ” unneeded! Most probable path of length 10 was found

ST Dist(T,i)=12Dist(S,i)=12

ST Dist(T,i)=12Dist(S,i)=12 i Stop adding nodes Any s—t path through unscanned node must be longer than “24”, thus useless Can we prune the resulting graph? Yes! From two circles into an ellipse…

Pruning the candidate graph We can safely prune a significant portion of the candidate graph Use the fact: dist(i,s)+dist(i,t)>L  all s—t paths going via i are longer than L We ignore much less probable paths  Paths longer than “24” are not interesting Take only nodes within the ellipse defined by: dist(i,s)+dist(i,t)<24

ST From 2-centers of circles to 2-foci of ellipse Dist(T,i)=12Dist(S,i)=12 Dist(S,i)+Dist(T,i)=24

Some statistics…

Distribution of proximities in phone-call network

Distribution of #hops in phone-call network

Summary Proposed cycle free effective conductance (CFEC) with a random walk interpretation to measure “proximity” in social networks and other ad-hoc networks Described a way of approximating CFEC Described a way of visualizing CFEC as a subgraph Extended the method to external datasets Showed empirical evidence for its utility

Extensions Study proximity in other kinds of networks. Extend c.f. effective conductance to: –Multiple endpoints (already demonstrated) –Directed edges (future work – use k-shortest paths in a directed graph, alg. due to Hershberger et al ).