Multigraph Sampling of Online Social Networks Minas Gjoka, Carter Butts, Maciej Kurant, Athina Markopoulou 1Multigraph sampling.

Slides:



Advertisements
Similar presentations
Correctness of Gossip-Based Membership under Message Loss Maxim GurevichIdit Keidar Technion.
Advertisements

謝文婷 SocialTube: P2P-assisted Video Sharing in Online Social Networks Authors: Ze Li ; Haiying Shen ; Hailang Wang ; Guoxin Liu ; Jin Li.
+ Multi-label Classification using Adaptive Neighborhoods Tanwistha Saha, Huzefa Rangwala and Carlotta Domeniconi Department of Computer Science George.
Multi-label Relational Neighbor Classification using Social Context Features Xi Wang and Gita Sukthankar Department of EECS University of Central Florida.
Analysis and Modeling of Social Networks Foudalis Ilias.
1 2.5K-Graphs: from Sampling to Generation Minas Gjoka, Maciej Kurant ‡, Athina Markopoulou UC Irvine, ETZH ‡
Practical Recommendations on Crawling Online Social Networks
Construction of Simple Graphs with a Target Joint Degree Matrix and Beyond Minas Gjoka, Balint Tillman, Athina Markopoulou University of California, Irvine.
Asking Questions on the Internet
Xiaowei Ying Xintao Wu Univ. of North Carolina at Charlotte 2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada Graph Generation with Prescribed.
More on Rankings. Query-independent LAR Have an a-priori ordering of the web pages Q: Set of pages that contain the keywords in the query q Present the.
1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej Kurant Minas Gjoka, Carter T. Butts, Athina Markopoulou.
Random Walk on Graph t=0 Random Walk Start from a given node at time 0
Masoud Valafar †, Reza Rejaie †, Walter Willinger ‡ † University of Oregon ‡ AT&T Labs-Research WOSN’09 Barcelona, Spain Beyond Friendship Graphs: A Study.
UNDERSTANDING VISIBLE AND LATENT INTERACTIONS IN ONLINE SOCIAL NETWORK Presented by: Nisha Ranga Under guidance of : Prof. Augustin Chaintreau.
Maggie Zhou COMP 790 Data Mining Seminar, Spring 2011
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
1 Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes Yunfeng Lin, Ben Liang, Baochun Li INFOCOM 2007.
Maciej Kurant (EPFL / UCI) Joint work with: Athina Markopoulou (UCI),
Sampling from Large Graphs. Motivation Our purpose is to analyze and model social networks –An online social network graph is composed of millions of.
Geographic Gossip: Efficient Aggregations for Sensor Networks Author: Alex Dimakis, Anand Sarwate, Martin Wainwright University: UC Berkeley Venue: IPSN.
EventBook What – An Android based Mobile App. Using Social Networking APIs Who – Every mobile user specially targeted to the age group of 16 – 40 Why –
On Unbiased Sampling for Unstructured Peer-to-Peer Networks Daniel Stutzbach – University of Oregon Reza Rejaie – University of Oregon Nick Duffield –
Minas Gjoka, UC IrvineWalking in Facebook 1 Walking in Facebook: A Case Study of Unbiased Sampling of OSNs Minas Gjoka, Maciej Kurant ‡, Carter Butts,
PageRank Identifying key users in social networks Student : Ivan Todorović, 3231/2014 Mentor : Prof. Dr Veljko Milutinović.
Final Presentation Undergraduate Researchers: Graduate Student Mentor: Faculty Mentor: Jordan Cowart, Katie Allmeroth Krist Culmer Dr. Wenjun Zeng Investigating.
1 Uniform Sampling from the Web via Random Walks Ziv Bar-Yossef Alexander Berg Steve Chien Jittat Fakcharoenphol Dror Weitz University of California at.
Correctness of Gossip-Based Membership under Message Loss Maxim Gurevich, Idit Keidar Technion.
Midterm Presentation Undergraduate Researchers: Graduate Student Mentor: Faculty Mentor: Jordan Cowart, Katie Allmeroth Krist Culmer Dr. Wenjun (Kevin)
SocialFilter: Introducing Social Trust to Collaborative Spam Mitigation Michael Sirivianos Telefonica Research Telefonica Research Joint work with Kyungbaek.
Models of Influence in Online Social Networks
A Distributed and Privacy Preserving Algorithm for Identifying Information Hubs in Social Networks M.U. Ilyas, Z Shafiq, Alex Liu, H Radha Michigan State.
1 Link-Trace Sampling for Social Networks: Advances and Applications Maciej Kurant (UC Irvine) Join work with: Minas Gjoka (UC Irvine), Athina Markopoulou.
Traceroute-like exploration of unknown networks: a statistical analysis A. Barrat, LPT, Université Paris-Sud, France I. Alvarez-Hamelin (LPT, France) L.
Seasonal Decomposition of Cell Phone Activity Series and Urban Dynamics Blerim Cici, Minas Gjoka, Athina Markopoulou, Carter T. Butts 1.
Modeling Relationship Strength in Online Social Networks Rongjing Xiang: Purdue University Jennifer Neville: Purdue University Monica Rogati: LinkedIn.
The Effects of Ranging Noise on Multihop Localization: An Empirical Study from UC Berkeley Abon.
1 Sampling Massive Online Graphs Challenges, Techniques, and Applications to Facebook Maciej Kurant (UC Irvine) Joint work with: Minas Gjoka (UC Irvine),
Data Analysis in YouTube. Introduction Social network + a video sharing media – Potential environment to propagate an influence. Friendship network and.
WALKING IN FACEBOOK: A CASE STUDY OF UNBIASED SAMPLING OF OSNS junction.
Poking Facebook: Characterization of OSN Applications Minas Gjoka, Michael Sirivianos, Athina Markopoulou, Xiaowei Yang University of California, Irvine.
Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst.
To Blog or Not to Blog: Characterizing and Predicting Retention in Community Blogs Imrul Kayes 1, Xiang Zuo 1, Da Wang 2, Jacob Chakareski 3 1 University.
MULTI-TORRENT: A PERFORMANCE STUDY Yan Yang, Alix L.H. Chow, Leana Golubchik Internet Multimedia Lab University of Southern California.
How far removed are you? Scalable Privacy-Preserving Estimation of Social Path Length with Social PaL Marcin Nagy joint work with Thanh Bui, Emiliano De.
Analysis of biological networks Part III Shalev Itzkovitz Shalev Itzkovitz Uri Alon’s group Uri Alon’s group July 2005 July 2005.
Zibin Zheng DR 2 : Dynamic Request Routing for Tolerating Latency Variability in Cloud Applications CLOUD 2013 Jieming Zhu, Zibin.
1 Presented by: Yuchen Bian MRWC: Clustering based on Multiple Random Walks Chain.
Bruno Ribeiro Don Towsley University of Massachusetts Amherst IMC 2010 Melbourne, Australia.
Probabilistic Models for Discovering E-Communities Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha The Pennsylvania State University WWW.
Jiafeng Guo(ICT) Xueqi Cheng(ICT) Hua-Wei Shen(ICT) Gu Xu (MSRA) Speaker: Rui-Rui Li Supervisor: Prof. Ben Kao.
A Visual and Statistical Benchmark for Graph Sampling Methods Fangyan Zhang 1 Song Zhang 1 Pak Chung Wong 2 J. Edward Swan II 1 T.J. Jankun-Kelly 1 1 Mississippi.
Minas Gjoka, Emily Smith, Carter T. Butts
Supervised Random Walks: Predicting and Recommending Links in Social Networks Lars Backstrom (Facebook) & Jure Leskovec (Stanford) Proc. of WSDM 2011 Present.
Stefanos Antaris Distributed Publish/Subscribe Notification System for Online Social Networks Stefanos Antaris *, Sarunas Girdzijauskas † George Pallis.
Anonymous communication over social networks Shishir Nagaraja and Ross Anderson Security Group Computer Laboratory.
Multiple-goal Search Algorithms and their Application to Web Crawling Dmitry Davidov and Shaul Markovitch Computer Science Department Technion, Haifa 32000,
Applying Link-based Classification to Label Blogs Smriti Bhagat, Irina Rozenbaum Graham Cormode.
1 Coarse-Grained Topology Estimation via Graph Sampling Maciej Kurant 1 Minas Gjoka 2 Yan Wang 2 Zack W. Almquist 2 Carter T. Butts 2 Athina Markopoulou.
TribeFlow Mining & Predicting User Trajectories Flavio Figueiredo Bruno Ribeiro Jussara M. AlmeidaChristos Faloutsos 1.
Trustworthiness Management in the Social Internet of Things
Peer-to-peer networking
Comparison of Social Networks by Likhitha Ravi
Uniform Sampling from the Web via Random Walks
Dieudo Mulamba November 2017
Liang Zheng and Yuzhong Qu
Binghui Wang, Le Zhang, Neil Zhenqiang Gong
Example: Academic Search
GANG: Detecting Fraudulent Users in OSNs
Finding Periodic Discrete Events in Noisy Streams
Presentation transcript:

Multigraph Sampling of Online Social Networks Minas Gjoka, Carter Butts, Maciej Kurant, Athina Markopoulou 1Multigraph sampling

Outline Multigraph sampling – Motivation – Sampling method – Internet Measurements – Conclusion 2Multigraph samplingMinas Gjoka

3 Problem statement Obtain a representative sample of OSN users by exploration of the social graph. F H E IG D B C A Multigraph samplingMinas Gjoka

Motivation for multiple relations Principled methods for graph sampling – Metropolis Hastings Random Walk – Re-weighted Random Walk “Walking in Facebook: A Case Study of Unbiased Sampling of OSNs,” INFOCOM ‘10 But..graph characteristics affect mixing and convergence fragmented social graph highly clustered areas 4Multigraph samplingMinas Gjoka

Fragmented social graph 5 Union Friendship Event attendance Group membership Multigraph sampling Largest Connected Component Other Connected Components

Highly clustered social graph Friendship Event attendance 6 Union Multigraph samplingMinas Gjoka

Proposal Graph exploration using multiple user relations – perform random walk – re-weighting at the end of the walk – online convergence diagnostics applicable Theoretical benefits – faster mixing – discovery of isolated components Open questions – how to combine relations – implementation efficiency – evaluation of sampling benefits in a realistic scenario 7Multigraph samplingMinas Gjoka

8 D F H E I J G C B A K D F H E I J G C B A K D F H E I J G C B A K Friends Events Groups Multigraph samplingMinas Gjoka

9 D F H E I J G C B A K D F H E I J G C B A K D F H E I J G C B A K D F H E I J G C B A K Friends Events Groups Multigraph samplingMinas Gjoka

10 D F HEIJGCBAK deg(F, tot) = 8 deg(F, red) = 1 deg(F, blue) = 3 deg(F, green) = 4 G * = Friends + Events + Groups ( G * is a union multigraph ) Combination of multiple relations D F HEIJGCBAK G = Friends + Events + Groups ( G is a union graph ) Multigraph samplingMinas Gjoka

Multigraph sampling Implementation efficiency Degree information available without enumeration Take advantage of pages functionality 11Multigraph samplingMinas Gjoka

Multigraph sampling Internet Measurements Last.fm, an Internet radio service – social networking features – multiple relations – fragmented graph components and highly clustered users expected Last.fm relations used – Friends – Groups – Events – Neighbors 12Multigraph samplingMinas Gjoka

Data Collection Sampled node information Crawling using Last.fm API and HTML scraping userID country age registration time … 13Multigraph samplingMinas Gjoka

14 Uniform userID Sampling (UNI) Last.fm UNI sampling possible in Last.fm. – will serve as the ground truth Multigraph samplingMinas Gjoka

Summary of datasets Last.fm - July 2010 Friend:0.3% Events:5.4% Groups:94.2% Neighbors:0.02% Crawl type# Total Users% Unique Users Friends5x50K71% Events5x50K58% Groups5x50K74% Neighbors5x50K53% Friends-Events- Groups-Neighbors 5x50K76% UNI500K99% 15Multigraph samplingMinas Gjoka

Comparison to UNI % of Subscribers 16 % of Subscribers Multigraph samplingMinas Gjoka

Last.fm Charts Estimation Application of sampling 17Multigraph samplingMinas Gjoka

Last.fm Charts Estimation Artist Charts 18Multigraph samplingMinas Gjoka

Related Work Fastest mixing Markov Chain – Boyd et al - SIAM Review 2004 Sampling in fragmented graphs – Ribeiro et al. Frontier Sampling – IMC 2010 Last.fm studies – Konstas et al - SIGIR ‘09 – Schifanella et al - WSDM ‘10 19Multigraph samplingMinas Gjoka

20 Conclusion Introduced multigraph sampling – simple and efficient – discovers isolates components – better approximation of distributions and means – multigraph dataset planned for public release Future work on multigraph sampling – selection of relations – weighted relations Multigraph samplingMinas Gjoka

21 Thank you Questions? Multigraph samplingMinas Gjoka