Multigraph Sampling of Online Social Networks Minas Gjoka, Carter Butts, Maciej Kurant, Athina Markopoulou 1Multigraph sampling
Outline Multigraph sampling – Motivation – Sampling method – Internet Measurements – Conclusion 2Multigraph samplingMinas Gjoka
3 Problem statement Obtain a representative sample of OSN users by exploration of the social graph. F H E IG D B C A Multigraph samplingMinas Gjoka
Motivation for multiple relations Principled methods for graph sampling – Metropolis Hastings Random Walk – Re-weighted Random Walk “Walking in Facebook: A Case Study of Unbiased Sampling of OSNs,” INFOCOM ‘10 But..graph characteristics affect mixing and convergence fragmented social graph highly clustered areas 4Multigraph samplingMinas Gjoka
Fragmented social graph 5 Union Friendship Event attendance Group membership Multigraph sampling Largest Connected Component Other Connected Components
Highly clustered social graph Friendship Event attendance 6 Union Multigraph samplingMinas Gjoka
Proposal Graph exploration using multiple user relations – perform random walk – re-weighting at the end of the walk – online convergence diagnostics applicable Theoretical benefits – faster mixing – discovery of isolated components Open questions – how to combine relations – implementation efficiency – evaluation of sampling benefits in a realistic scenario 7Multigraph samplingMinas Gjoka
8 D F H E I J G C B A K D F H E I J G C B A K D F H E I J G C B A K Friends Events Groups Multigraph samplingMinas Gjoka
9 D F H E I J G C B A K D F H E I J G C B A K D F H E I J G C B A K D F H E I J G C B A K Friends Events Groups Multigraph samplingMinas Gjoka
10 D F HEIJGCBAK deg(F, tot) = 8 deg(F, red) = 1 deg(F, blue) = 3 deg(F, green) = 4 G * = Friends + Events + Groups ( G * is a union multigraph ) Combination of multiple relations D F HEIJGCBAK G = Friends + Events + Groups ( G is a union graph ) Multigraph samplingMinas Gjoka
Multigraph sampling Implementation efficiency Degree information available without enumeration Take advantage of pages functionality 11Multigraph samplingMinas Gjoka
Multigraph sampling Internet Measurements Last.fm, an Internet radio service – social networking features – multiple relations – fragmented graph components and highly clustered users expected Last.fm relations used – Friends – Groups – Events – Neighbors 12Multigraph samplingMinas Gjoka
Data Collection Sampled node information Crawling using Last.fm API and HTML scraping userID country age registration time … 13Multigraph samplingMinas Gjoka
14 Uniform userID Sampling (UNI) Last.fm UNI sampling possible in Last.fm. – will serve as the ground truth Multigraph samplingMinas Gjoka
Summary of datasets Last.fm - July 2010 Friend:0.3% Events:5.4% Groups:94.2% Neighbors:0.02% Crawl type# Total Users% Unique Users Friends5x50K71% Events5x50K58% Groups5x50K74% Neighbors5x50K53% Friends-Events- Groups-Neighbors 5x50K76% UNI500K99% 15Multigraph samplingMinas Gjoka
Comparison to UNI % of Subscribers 16 % of Subscribers Multigraph samplingMinas Gjoka
Last.fm Charts Estimation Application of sampling 17Multigraph samplingMinas Gjoka
Last.fm Charts Estimation Artist Charts 18Multigraph samplingMinas Gjoka
Related Work Fastest mixing Markov Chain – Boyd et al - SIAM Review 2004 Sampling in fragmented graphs – Ribeiro et al. Frontier Sampling – IMC 2010 Last.fm studies – Konstas et al - SIGIR ‘09 – Schifanella et al - WSDM ‘10 19Multigraph samplingMinas Gjoka
20 Conclusion Introduced multigraph sampling – simple and efficient – discovers isolates components – better approximation of distributions and means – multigraph dataset planned for public release Future work on multigraph sampling – selection of relations – weighted relations Multigraph samplingMinas Gjoka
21 Thank you Questions? Multigraph samplingMinas Gjoka