Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst.

Slides:



Advertisements
Similar presentations
Generating Random Spanning Trees Sourav Chatterji Sumit Gulwani EECS Department University of California, Berkeley.
Advertisements

Analysis and Modeling of Social Networks Foudalis Ilias.
1 2.5K-Graphs: from Sampling to Generation Minas Gjoka, Maciej Kurant ‡, Athina Markopoulou UC Irvine, ETZH ‡
Practical Recommendations on Crawling Online Social Networks
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
1 Evolution of Networks Notes from Lectures of J.Mendes CNR, Pisa, Italy, December 2007 Eva Jaho Advanced Networking Research Group National and Kapodistrian.
Topology Generation Suat Mercan. 2 Outline Motivation Topology Characterization Levels of Topology Modeling Techniques Types of Topology Generators.
1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira.
Complex Networks Third Lecture TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA TexPoint fonts used in EMF. Read the.
Networks. Graphs (undirected, unweighted) has a set of vertices V has a set of undirected, unweighted edges E graph G = (V, E), where.
Parallel random walks Brian Moffat. Outline What are random walks What are Markov chains What are cover/hitting/mixing times Speed ups for different graphs.
More on Rankings. Query-independent LAR Have an a-priori ordering of the web pages Q: Set of pages that contain the keywords in the query q Present the.
1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej Kurant Minas Gjoka, Carter T. Butts, Athina Markopoulou.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
Masoud Valafar †, Reza Rejaie †, Walter Willinger ‡ † University of Oregon ‡ AT&T Labs-Research WOSN’09 Barcelona, Spain Beyond Friendship Graphs: A Study.
UNDERSTANDING VISIBLE AND LATENT INTERACTIONS IN ONLINE SOCIAL NETWORK Presented by: Nisha Ranga Under guidance of : Prof. Augustin Chaintreau.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Sampling from Large Graphs. Motivation Our purpose is to analyze and model social networks –An online social network graph is composed of millions of.
1 On Compressing Web Graphs Michael Mitzenmacher, Harvard Micah Adler, Univ. of Massachusetts.
Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions.
Advanced Topics in Data Mining Special focus: Social Networks.
SDSC, skitter (July 1998) A random graph model for massive graphs William Aiello Fan Chung Graham Lincoln Lu.
Overview of UMass Activities D. Towsley W. Gong. Ongoing UMass MURI Research W. Gong, D. Towsley Poisson counter driven stochastic differential Equation.
Graphs and Topology Yao Zhao. Background of Graph A graph is a pair G =(V,E) –Undirected graph and directed graph –Weighted graph and unweighted graph.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 7 May 14, 2006
Measurement and Analysis of Online Social Networks Alan Mislove,Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, Bobby Bhattacharjee Presented.
Minas Gjoka, UC IrvineWalking in Facebook 1 Walking in Facebook: A Case Study of Unbiased Sampling of OSNs Minas Gjoka, Maciej Kurant ‡, Carter Butts,
On Distinguishing between Internet Power Law B Bu and Towsley Infocom 2002 Presented by.
Statistical inference Population - collection of all subjects or objects of interest (not necessarily people) Sample - subset of the population used to.
Correctness of Gossip-Based Membership under Message Loss Maxim Gurevich, Idit Keidar Technion.
COVERTNESS CENTRALITY IN NETWORKS Michael Ovelgönne UMIACS University of Maryland 1 Chanhyun Kang, Anshul Sawant Computer Science Dept.
Information Networks Power Laws and Network Models Lecture 3.
Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial
Multigraph Sampling of Online Social Networks Minas Gjoka, Carter Butts, Maciej Kurant, Athina Markopoulou 1Multigraph sampling.
1 Link-Trace Sampling for Social Networks: Advances and Applications Maciej Kurant (UC Irvine) Join work with: Minas Gjoka (UC Irvine), Athina Markopoulou.
Simulation Output Analysis
Approximating the MST Weight in Sublinear Time Bernard Chazelle (Princeton) Ronitt Rubinfeld (NEC) Luca Trevisan (U.C. Berkeley)
1 Applications of Relative Importance  Why is relative importance interesting? Web Social Networks Citation Graphs Biological Data  Graphs become too.
MapReduce and Graph Data Chapter 5 Based on slides from Jimmy Lin’s lecture slides ( (licensed.
Network properties Slides are modified from Networks: Theory and Application by Lada Adamic.
1 Sampling Massive Online Graphs Challenges, Techniques, and Applications to Facebook Maciej Kurant (UC Irvine) Joint work with: Minas Gjoka (UC Irvine),
Data Analysis in YouTube. Introduction Social network + a video sharing media – Potential environment to propagate an influence. Friendship network and.
WALKING IN FACEBOOK: A CASE STUDY OF UNBIASED SAMPLING OF OSNS junction.
M EASUREMENT AND A NALYSIS OF O NLINE S OCIAL N ETWORKS Professor : Dr Sheykh Esmaili Presenters: Pourya Aliabadi Boshra Ardallani Paria Rakhshani 1.
Bruno Ribeiro CS69000-DM1 Topics in Data Mining. Bruno Ribeiro  Reviews of next week’s papers due Friday 5pm (Sunday 11:59pm submission closes) ◦ Assignment.
Expanders via Random Spanning Trees R 許榮財 R 黃佳婷 R 黃怡嘉.
Understanding Crowds’ Migration on the Web Yong Wang Komal Pal Aleksandar Kuzmanovic Northwestern University
School of Information Sciences University of Pittsburgh TELCOM2125: Network Science and Analysis Konstantinos Pelechrinis Spring 2013 Figures are taken.
Challenges and Opportunities Posed by Power Laws in Network Analysis Bruno Ribeiro UMass Amherst MURI REVIEW MEETING Berkeley, 26 th Oct 2011.
Bruno Ribeiro Don Towsley University of Massachusetts Amherst IMC 2010 Melbourne, Australia.
Optimal Sampling Strategies for Multiscale Stochastic Processes Vinay Ribeiro Rolf Riedi, Rich Baraniuk (Rice University)
Minas Gjoka, Emily Smith, Carter T. Butts
Performance Evaluation Lecture 1: Complex Networks Giovanni Neglia INRIA – EPI Maestro 10 December 2012.
Importance Measures on Nodes Lecture 2 Srinivasan Parthasarathy 1.
1 Coarse-Grained Topology Estimation via Graph Sampling Maciej Kurant 1 Minas Gjoka 2 Yan Wang 2 Zack W. Almquist 2 Carter T. Butts 2 Athina Markopoulou.
Random Walk for Similarity Testing in Complex Networks
Cohesive Subgraph Computation over Large Graphs
Shan Lu, Jieqi Kang, Weibo Gong, Don Towsley UMASS Amherst
Uncovering the Mystery of Trust in An Online Social Network
Groups of vertices and Core-periphery structure
Stochastic Streams: Sample Complexity vs. Space Complexity
Approximating the MST Weight in Sublinear Time
Empirical analysis of Chinese airport network as a complex weighted network Methodology Section Presented by Di Li.
Modeling, sampling, generating Networks with MRV
Community detection in graphs
A Locality Model of the Evolution of Blog Networks
Department of Computer Science University of York
Binghui Wang, Le Zhang, Neil Zhenqiang Gong
Shan Lu, Jieqi Kang, Weibo Gong, Don Towsley UMASS Amherst
Advanced Topics in Data Mining Special focus: Social Networks
Presentation transcript:

Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst

Problem Given large, possibly dynamic, network, how does one efficiently sample/crawl to accurately characterize it? r degree distribution r centrality r clustering r …

Motivation r understanding technological networks, social networks m Internet, wireless networks m on-line social networks such as FaceBook, MySpace, Orkut, YouTube, … r when network dataset not available m size, lack of global view, dynamics

Outline r review of sampling r random walks (RWs) r multiple coupled RWs r results

Sampling methods r random sampling m uniform vertex sampling θ i - fraction of vertices with degree i degree i vertex sampled with probability θ i m uniform edge sampling π i - probability degree i vertex sampled π i = θ i x i / r crawling m snowball sampling – commonly used, highly biased m random walk

Estimate θ i - fraction of vertices with degree i Budget: B samples r accuracy: Normalized root Mean Squared Error r uniform vertex r uniform edge Random sampling: accuracy of estimates  head: GOOD  tail: BAD  head: BAD  tail: GOOD 6

NMSE in-degree Uniform vertex vs. edge sampling edge vertex  head: GOOD  tail: BAD  head: GOOD  tail: BAD GOOD  head: BAD  tail: GOOD  head: BAD  tail: GOOD BAD r Flickr graph (1.7 M vertices, 22M edges) r budget: B = |V|/100

uniform vertex r Pros: m independent sampling m OSN needs numeric user IDs. E.g.: Livejournal, Flickr, MySpace, Facebook,... r Cons: m resource intensive (sparse user ID space) m difficult to sample large degree vertices 8 Pros & Cons uniform edge r Pros: ◦ independent sampling ◦ easy to sample high degree vertices r Cons: ◦ no public OSN interface to sample edges

 start at node v  randomly select a neighbor of v r repeat till collected B samples r sampling with replacement 9 Random walk (RW)

Random walk sampling r produces biased estimate  i RW of  i r easily corrected  i RW = i   i /avg. degree  i = Norm   i RW /i CCDF RW sampling ^ ^

uniform vertex r Pros: m independent sampling m OSN needs numeric user IDs. E.g.: Livejournal, Flickr, MySpace, Facebook,... r Cons: m resource intensive (sparse user ID space) m difficult to sample large degree vertices 11 Pros & Cons random walk r Pros: m asymptotically unbiased  easy to sample high degree vertices m low cost resource-wise m Cons:  graph must be connected  large estimation errors when graph loosely connected m length of transient?

r uniform vertex samples A and C subgraphs m but is expensive r RW samples A or C m but is cheap 12 A C Combine advantages of uniform vertex & RWs? Hybrid sampling

Multiple random walks r m independent uniformly placed RWs r split budget B among them Pros m cover all components whp as m increases Cons m bias due to transient m difficult to combine estimates Couple the RWs?

m coupled walkers B – sampling budget S = {v 1, …, v m } initial set of m vertices; E’ =  (1) start from v r  S w.p.  deg( v r ) (2) walk one step from v r (3) add walked edge to E’ and update v r (4) return to (1) (until m + | E’ | = B ) Frontier Sampling (FS) 14

Random walk on G m At steady state m samples edges uniformly  as m → , walkers uniformly distributed in graph  m coupled RWs start approximately in steady state  short transient 15 FS properties

16 Sample paths for θ 1 estimate (Flickr graph)  Plot evolution (n), n - number of steps

r large connected component of Flickr graph r accuracy metric: NMSE of CCDF 17 Sampling errors in-degree NMSE

r 2 Albert-Barabasi graphs with average degrees 2, 10, connected by one edge 18 Sampling errors: G AB graph in-degree NMSE

r assortativity  measure of degree correlation between neighboring vertices 19 Errors: assortativity metric

r m independent walkers r walker i takes next step with exponentially distributed time, mean  current node degree r walkers run for time T, report to central site 20 Distributed FS

Future work r analyzing, speeding up convergence m other forms of coupling r other graph statistics r study how graph structure affects sampling efficiency m power law vs exponential tail m spatial correlation, independence vs. SRD vs. LRD r application to different networks m wireless, social, wireless/social