Download presentation
Presentation is loading. Please wait.
Published byShawn Cunningham Modified over 9 years ago
1
Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst
2
Problem Given large, possibly dynamic, network, how does one efficiently sample/crawl to accurately characterize it? r degree distribution r centrality r clustering r …
3
Motivation r understanding technological networks, social networks m Internet, wireless networks m on-line social networks such as FaceBook, MySpace, Orkut, YouTube, … r when network dataset not available m size, lack of global view, dynamics
4
Outline r review of sampling r random walks (RWs) r multiple coupled RWs r results
5
Sampling methods r random sampling m uniform vertex sampling θ i - fraction of vertices with degree i degree i vertex sampled with probability θ i m uniform edge sampling π i - probability degree i vertex sampled π i = θ i x i / r crawling m snowball sampling – commonly used, highly biased m random walk
6
Estimate θ i - fraction of vertices with degree i Budget: B samples r accuracy: Normalized root Mean Squared Error r uniform vertex r uniform edge Random sampling: accuracy of estimates head: GOOD tail: BAD head: BAD tail: GOOD 6
7
NMSE in-degree Uniform vertex vs. edge sampling edge vertex head: GOOD tail: BAD head: GOOD tail: BAD GOOD head: BAD tail: GOOD head: BAD tail: GOOD BAD r Flickr graph (1.7 M vertices, 22M edges) r budget: B = |V|/100
8
uniform vertex r Pros: m independent sampling m OSN needs numeric user IDs. E.g.: Livejournal, Flickr, MySpace, Facebook,... r Cons: m resource intensive (sparse user ID space) m difficult to sample large degree vertices 8 Pros & Cons uniform edge r Pros: ◦ independent sampling ◦ easy to sample high degree vertices r Cons: ◦ no public OSN interface to sample edges
9
start at node v randomly select a neighbor of v r repeat till collected B samples r sampling with replacement 9 Random walk (RW)
10
Random walk sampling r produces biased estimate i RW of i r easily corrected i RW = i i /avg. degree i = Norm i RW /i CCDF RW sampling ^ ^
11
uniform vertex r Pros: m independent sampling m OSN needs numeric user IDs. E.g.: Livejournal, Flickr, MySpace, Facebook,... r Cons: m resource intensive (sparse user ID space) m difficult to sample large degree vertices 11 Pros & Cons random walk r Pros: m asymptotically unbiased easy to sample high degree vertices m low cost resource-wise m Cons: graph must be connected large estimation errors when graph loosely connected m length of transient?
12
r uniform vertex samples A and C subgraphs m but is expensive r RW samples A or C m but is cheap 12 A C Combine advantages of uniform vertex & RWs? Hybrid sampling
13
Multiple random walks r m independent uniformly placed RWs r split budget B among them Pros m cover all components whp as m increases Cons m bias due to transient m difficult to combine estimates Couple the RWs?
14
m coupled walkers B – sampling budget S = {v 1, …, v m } initial set of m vertices; E’ = (1) start from v r S w.p. deg( v r ) (2) walk one step from v r (3) add walked edge to E’ and update v r (4) return to (1) (until m + | E’ | = B ) Frontier Sampling (FS) 14
15
Random walk on G m At steady state m samples edges uniformly as m → , walkers uniformly distributed in graph m coupled RWs start approximately in steady state short transient 15 FS properties
16
16 Sample paths for θ 1 estimate (Flickr graph) Plot evolution (n), n - number of steps
17
r large connected component of Flickr graph r accuracy metric: NMSE of CCDF 17 Sampling errors in-degree NMSE
18
r 2 Albert-Barabasi graphs with average degrees 2, 10, connected by one edge 18 Sampling errors: G AB graph in-degree NMSE
19
r assortativity measure of degree correlation between neighboring vertices 19 Errors: assortativity metric
20
r m independent walkers r walker i takes next step with exponentially distributed time, mean current node degree r walkers run for time T, report to central site 20 Distributed FS
21
Future work r analyzing, speeding up convergence m other forms of coupling r other graph statistics r study how graph structure affects sampling efficiency m power law vs exponential tail m spatial correlation, independence vs. SRD vs. LRD r application to different networks m wireless, social, wireless/social
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.