Presentation is loading. Please wait.

Presentation is loading. Please wait.

Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst.

Similar presentations


Presentation on theme: "Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst."— Presentation transcript:

1 Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst

2 Problem Given large, possibly dynamic, network, how does one efficiently sample/crawl to accurately characterize it? r degree distribution r centrality r clustering r …

3 Motivation r understanding technological networks, social networks m Internet, wireless networks m on-line social networks such as FaceBook, MySpace, Orkut, YouTube, … r when network dataset not available m size, lack of global view, dynamics

4 Outline r review of sampling r random walks (RWs) r multiple coupled RWs r results

5 Sampling methods r random sampling m uniform vertex sampling θ i - fraction of vertices with degree i degree i vertex sampled with probability θ i m uniform edge sampling π i - probability degree i vertex sampled π i = θ i x i / r crawling m snowball sampling – commonly used, highly biased m random walk

6 Estimate θ i - fraction of vertices with degree i Budget: B samples r accuracy: Normalized root Mean Squared Error r uniform vertex r uniform edge Random sampling: accuracy of estimates  head: GOOD  tail: BAD  head: BAD  tail: GOOD 6

7 NMSE in-degree Uniform vertex vs. edge sampling edge vertex  head: GOOD  tail: BAD  head: GOOD  tail: BAD GOOD  head: BAD  tail: GOOD  head: BAD  tail: GOOD BAD r Flickr graph (1.7 M vertices, 22M edges) r budget: B = |V|/100

8 uniform vertex r Pros: m independent sampling m OSN needs numeric user IDs. E.g.: Livejournal, Flickr, MySpace, Facebook,... r Cons: m resource intensive (sparse user ID space) m difficult to sample large degree vertices 8 Pros & Cons uniform edge r Pros: ◦ independent sampling ◦ easy to sample high degree vertices r Cons: ◦ no public OSN interface to sample edges

9  start at node v  randomly select a neighbor of v r repeat till collected B samples r sampling with replacement 9 Random walk (RW)

10 Random walk sampling r produces biased estimate  i RW of  i r easily corrected  i RW = i   i /avg. degree  i = Norm   i RW /i CCDF RW sampling ^ ^

11 uniform vertex r Pros: m independent sampling m OSN needs numeric user IDs. E.g.: Livejournal, Flickr, MySpace, Facebook,... r Cons: m resource intensive (sparse user ID space) m difficult to sample large degree vertices 11 Pros & Cons random walk r Pros: m asymptotically unbiased  easy to sample high degree vertices m low cost resource-wise m Cons:  graph must be connected  large estimation errors when graph loosely connected m length of transient?

12 r uniform vertex samples A and C subgraphs m but is expensive r RW samples A or C m but is cheap 12 A C Combine advantages of uniform vertex & RWs? Hybrid sampling

13 Multiple random walks r m independent uniformly placed RWs r split budget B among them Pros m cover all components whp as m increases Cons m bias due to transient m difficult to combine estimates Couple the RWs?

14 m coupled walkers B – sampling budget S = {v 1, …, v m } initial set of m vertices; E’ =  (1) start from v r  S w.p.  deg( v r ) (2) walk one step from v r (3) add walked edge to E’ and update v r (4) return to (1) (until m + | E’ | = B ) Frontier Sampling (FS) 14

15 Random walk on G m At steady state m samples edges uniformly  as m → , walkers uniformly distributed in graph  m coupled RWs start approximately in steady state  short transient 15 FS properties

16 16 Sample paths for θ 1 estimate (Flickr graph)  Plot evolution (n), n - number of steps

17 r large connected component of Flickr graph r accuracy metric: NMSE of CCDF 17 Sampling errors in-degree NMSE

18 r 2 Albert-Barabasi graphs with average degrees 2, 10, connected by one edge 18 Sampling errors: G AB graph in-degree NMSE

19 r assortativity  measure of degree correlation between neighboring vertices 19 Errors: assortativity metric

20 r m independent walkers r walker i takes next step with exponentially distributed time, mean  current node degree r walkers run for time T, report to central site 20 Distributed FS

21 Future work r analyzing, speeding up convergence m other forms of coupling r other graph statistics r study how graph structure affects sampling efficiency m power law vs exponential tail m spatial correlation, independence vs. SRD vs. LRD r application to different networks m wireless, social, wireless/social


Download ppt "Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst."

Similar presentations


Ads by Google