Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cong Ding, Yang Chen*, and Xiaoming Fu University of Göttingen

Similar presentations


Presentation on theme: "Cong Ding, Yang Chen*, and Xiaoming Fu University of Göttingen"— Presentation transcript:

1 Crowd Crawling: Towards Collaborative Data Collection for Large-scale Online Social Networks
Cong Ding, Yang Chen*, and Xiaoming Fu University of Göttingen *Duke University

2 Significance of social network data crawling
Understanding user behaviors Improving SNS architectures Handling privacy/security issues and so on...

3 Current data collection methods (1)
ISP-based measurement [Schneider IMC’09] Only ISP companies can do that

4 Current data collection methods (2)
Cooperate with SNS companies [Yang IMC’11] Most research groups do not have chance

5 Current data collection methods (3)
Crawl data by a single group (and share them to others) [Gjoka INFOCOM’10] Suffering request rate limiting

6 Shortages of crawling by a single group
Waste computing and network resources Introduce overhead to service providers (and may lead stricter rate limiting) Lack of ground truth for the research community

7 Why not collect data collaboratively?
A new thought Why not collect data collaboratively?

8 System overview Coordinator Crawlers

9 System design Fetching UIDs (BFS, etc.)
Handling crawling failure (timeout) Bypassing request rate limiting (massive IP addresses) Data fidelity (redundant crawling)

10 Implementation A proof-of-concept prototype (without the data fidelity part) to crawl in Weibo 472 PlanetLab servers as crawlers

11 Evaluation In 24 hours, we have crawled 2.22M users’ data from Weibo, including user profiles, all the posts, all the social connections Comparison: Fu et al. (PLOS ONE 2013) get 30K user’s data in 6 days Guo et al. (PAM 2013) get 1M user’s data in 1 month Crowd Crawling Fu et al. Guo et al. #UIDs/day 2.22M 5K 33K

12 Evaluation

13 Evaluation

14 Conclusion and Discussion
Data sharing may violate some providers’ terms of services Twitter does not allow to share data (even for research) Weibo allows to share data among researchers Unlimited data sharing might cause ethical issues The data should be anonymized We will publish the data crawled in the evaluation


Download ppt "Cong Ding, Yang Chen*, and Xiaoming Fu University of Göttingen"

Similar presentations


Ads by Google