Presentation is loading. Please wait.

Presentation is loading. Please wait.

2011/1/3 1 Reporter: Chun-Chih Wu Adviser: Hsing-Kuo Pao Author: Marco Balduzzi, Christian Platzer, Thorsten Holz, Engin Kirda, Davide Balzarotti, and.

Similar presentations


Presentation on theme: "2011/1/3 1 Reporter: Chun-Chih Wu Adviser: Hsing-Kuo Pao Author: Marco Balduzzi, Christian Platzer, Thorsten Holz, Engin Kirda, Davide Balzarotti, and."— Presentation transcript:

1 2011/1/3 1 Reporter: Chun-Chih Wu Adviser: Hsing-Kuo Pao Author: Marco Balduzzi, Christian Platzer, Thorsten Holz, Engin Kirda, Davide Balzarotti, and Christopher Kruegel ABUSING SOCIAL NETWORKS FOR AUTOMATED USER PROFILING

2 OUTLINE 2011/1/3 2 Introduction Ethical and Legal Considerations Abusing E-Mail Querying Evaluation with Real-World Experiments Conclusion

3 INTRODUCTION Recently, social networks such as Facebook have experienced a huge surge in popularity. The amount of personal information stored on these sites calls for appropriate security precautions to protect this data. 2011/1/3 3

4 4 They will suggesting users to join the friends from some rule. This kind of information is also of great value to entities with potentially malicious intentions. It is the responsibility of the service provider to ensure that unauthorized access to sensitive profile information is properly restricted.

5 2011/1/3 5 The main problem is twofold: Many users tend to be overly revealing when publishing personal information Information exists in social networks that a user cannot directly control, and may not even be aware of.

6 2011/1/3 6 we make the following three contributions: 1.We common weakness in eight popular social networks (Facebook, MySpace, Twitter, LinkedIn, Friendster, Badoo, Netlog, and XING) and present a system that automatically takes advantage of this weakness on a large-scale. 2.By using e-mail addresses as a unique identifier, we demonstrate that it is possible to correlate the information provided by thousands of users in different social networks. 3.Our findings were confirmed by all social network providers we contacted

7 ETHICAL AND LEGAL CONSIDERATIONS 2011/1/3 7 Crawling and correlating data in social networks is an ethically sensitive area. 1.for the crawling and correlation experiments we conducted, we only accessed user information that was publicly available within the social networks. Thus, we never broke into any accounts, passwords, or accessed any otherwise protected area or information. 2.the crawler that we developed was not powerful enough to influence the performance of any social network we investigated. 3.we used MD5 on the real names of users to anonymize them properly and handled this data carefully.

8 ABUSING E-MAIL QUERYING Many social network providers offer a feature that allows a user to search for her friends by providing a list of e-mail addresses. In return, she receives a list of accounts that are registered with these e-mail addresses. 2011/1/3 8

9 AUTOMATED PROFILING OF USERS 2011/1/3 9

10 IMPLEMENTATION OF THE ATTACK 2011/1/3 10

11 EVALUATION WITH REAL-WORLD EXPERIMENTS As a starting point, We used a set of 10,427,982 e-mail addresses. We saw that these e-mail addresses had been used for spamming, and thus, they provided a real-world test case for our system. 2011/1/3 11

12 2011/1/3 12 NetworkQuery method E-mail list length Size efficiency # queried e-mails Speed efficiency # identified accounts Percentage FacebookDirect500010M/day517,7474.96% MySpaceGMail1000500K/day209,6272.01% TwitterGMail1000500K/day124,3981.19% LinkedInDirect50009M/day246,0932.36% FriendsterGMail1000400K/day42,2360.41% BadooDirect10005M/day12,6890.12% NetlogGMail1000800K/day69,9710.67% XINGDirect5003.5M/day5,8830.06% Total1,228,64411.78%

13 # of Social Networks # of Profiles 1608,989 2199,161 355,660 411,483 51,478 6159 711 80 Total unique876,941 CombinationOccurrences 1Facebook – MySpace 57,696 2Facebook - LinkedIn 49,613 3Facebook - Twitter 25,759 4Facebook - MySpace - Twitter 13,754 5Facebook - LinkedIn - Twitter 13,733 6Facebook - NetLOG 12,600 7Badoo - FriendSter 11,299 8Facebook - MySpace - LinkedIn 9,720 9LinkedIn - Twitter 8,802 10MySpace - Twitter 7,593 2011/1/3 13

14 Network (%) Name Surname Profiles are open PhotoLocationFriendsAverage friends Last log inProfile visitor FacebookV 99.8976.40.4881.98142na MySpaceV 96.2655.2963.5976.513794.87na TwitterV 99.9747.5932.8478.2265na LinkedInV 96.7911.896.7996.7537na FriendsterV 99.7247.7699.5150.23378.79na BadooV 98.6170.8695.23na 92.01na NetlogV 99.9843.477.5464.8731na73.33 XINGV 99.8857.296.0447.253na96.83 Network (%) AgeSexSpoken language JobEducationCurrent relation Searched relation Sexual preference Facebook 0.350.5na0.23 0.440.310.22 MySpace 82.264.87na3.082.728.414.24.07 Twitter na LinkedIn na 96.7960.680na Friendster 82.9787.45na30.882.7264.5977.76na Badoo 98.61 47.8117.0619.9222.48na22.8 Netlog 97.6699.9944.5643.41.6425.7323.1429.3 XING na 84.5499.8749.21na 2011/1/3 14

15 2011/1/3 15 NetworkPersonal homepage PhoneBirthdayIMsPhysical appearance IncomeProf. skills Interests Hobbies FacebookVVVVVV MySpaceVVVV TwitterVVV LinkedInVVVVVV FriendsterV BadooVVVV NetlogVV XINGVVVVVV

16 # of Occurrences on X networks Information234567Total Name199,16155,66011,4831,47815911267,952 Location22,5832,10217411324,873 Age19,135887362,0085 Sex17,2828543418,170 Sexual preference76013773 Current relationship1,6523811,691 InformationValue % total mismatches % of mismatch values 234+ Namestring72.6562.735.3717.66 LocationCity53.2751.7416.243.72 Age0 < n < 10034.4933.5817.8430.56 Sexmale, female12.18 Sexual preferencestraight, homosexual, bisexual7.63 Current relationship single, in a relationship, married, complicated 35.5435.425.13 2011/1/3 16

17 Range#% 2 - 104,16360.17 11 - 301,79025.87 31 +96613.96 Profiles with Age20,085 Total mismatched6,919 2011/1/3 17

18 CONCLUSION 2011/1/3 18

19 POSSIBLE SOLUTION 2011/1/3 19 1.Mitigation From the User’s Because the e-mail address is the unique ID that identifies a specific user, an effective defense technique would be to educate users to use a different e-mail address each time they register for and enter personal information into a social networking site. 2.CAPTCHAs When searching for e-mail addresses, a user could be forced to solve a CAPTCHA (i.e., a type of challenge-response test which is hard to solve for a computer). This would prohibit automated, large-scale queries to a certain extent since CAPTCHAs cannot be (easily) solved by a computer.

20 2011/1/3 20 3.Contextual Information Another potential approach to mitigate the problem is to require contextual information for each query. If a user U wishes to search for his friends F1, F2,... Fn, he has some context information for each of them that he should include in his query. 4.Limiting Information Exposure Our attack is possible since the search result contains a mapping between the queried e-mail address and the profile name. Thus, a viable option to prevent our attack is to not return a mapping between e- mail address and profile name in the search result. This could without revealing which e-mail address belongs to which account.

21 2011/1/3 21 5.Incremental Updates We can query thousands of e-mail addresses at once, and also repeat this process as often as we wish. A natural approach for mitigation is to implement some kind of limitation for the queries a user is allowed to perform. This enables a user to search for his current friends on the given social network in the beginning. Afterwards, the number of queries can be restricted to only a small number of e-mail addresses. This enables a user to incrementally extend his network, but also limits the number of e-mail addresses a user can search for. 6.Rate-limiting Queries We restrict the (total) number of queries a user can send to the social network, therefore limiting the amount of e-mail addresses a given user can check

22 THANKS FOR YOUR ATTENTION 2011/1/3 22


Download ppt "2011/1/3 1 Reporter: Chun-Chih Wu Adviser: Hsing-Kuo Pao Author: Marco Balduzzi, Christian Platzer, Thorsten Holz, Engin Kirda, Davide Balzarotti, and."

Similar presentations


Ads by Google