On the Strength of Weak Identities in Social Computing Systems

Slides:



Advertisements
Similar presentations
An analysis of Social Network-based Sybil defenses Bimal Viswanath § Ansley Post § Krishna Gummadi § Alan Mislove ¶ § MPI-SWS ¶ Northeastern University.
Advertisements

Temporal Query Log Profiling to Improve Web Search Ranking Alexander Kotov (UIUC) Pranam Kolari, Yi Chang (Yahoo!) Lei Duan (Microsoft)
NHnetWORKS December 14,  Facebook is a global Social Networking website that is operated and privately owned by Facebook, Inc.  Users can add.
Location Based Trust for Mobile User – Generated Content : Applications, Challenges and Implementations Presented By : Anand Dipakkumar Joshi USC.
Social Media Networking Sites Charlotte Jenkins Designing the Social Web
Enabling the Social Web Krishna P. Gummadi Networked Systems Group Max Planck Institute for Software Systems.
Hongyu Gao, Tuo Huang, Jun Hu, Jingnan Wang.  Boyd et al. Social Network Sites: Definition, History, and Scholarship. Journal of Computer-Mediated Communication,
Experience with an Object Reputation System for Peer-to-Peer File Sharing NSDI’06(3th USENIX Symposium on Networked Systems Design & Implementation) Kevin.
Measurement and Analysis of Online Social Networks By Alan Mislove, Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, Bobby Bhattacharjee Attacked.
Social Media Motion: How to Get Started & Keep Going With Facebook, Twitter & More Presented by Eli Lilly and Company Hosted by Rob Robinson McNeely Pigott.
Anthony Bonomi, Amber Heeg, Elizabeth Newton, Bianca Robinson & Marzi Shabani.
The Social Web: A laboratory for studying s ocial networks, tagging and beyond Kristina Lerman USC Information Sciences Institute.
 Structured peer to peer overlay networks are resilient – but not secure.  Even a small fraction of malicious nodes may result in failure of correct.
 Digital marketing: Uses digital media to develop communications and exchanges with customers  Electronic media (E-marketing): Refers to the strategic.
SocialFilter: Introducing Social Trust to Collaborative Spam Mitigation Michael Sirivianos Telefonica Research Telefonica Research Joint work with Kyungbaek.
Online Presence for SAIPs What’s Online Presence?
 The Shift Towards Digital Branding By Zahra Karim.
Discovery of Emergent Malicious Campaigns in Cellular Networks Nathaniel Boggs, Wei Wang, Suhas Mathur, Baris Coskun, Carol Pincock © 2013 AT&T Intellectual.
Digital Literacy Tour Google/iKeepSafe/YouTube Presenter Introduction Davina Pruitt-Mentle, Ph.D. Ed Tech Policy, Research and Outreach.
OSN Research As If Sociology Mattered Krishna P. Gummadi Networked Systems Research Group MPI-SWS.
THE BUSINESS OF BEING AN ARTIST Social Media and the Visual Artist.
FaceTrust: Assessing the Credibility of Online Personas via Social Networks Michael Sirivianos, Kyungbaek Kim and Xiaowei Yang in collaboration with J.W.
Understanding Cross-site Linking in Online Social Networks Yang Chen 1, Chenfan Zhuang 2, Qiang Cao 1, Pan Hui 3 1 Duke University 2 Tsinghua University.
Man vs. Machine: Adversarial Detection of Malicious Crowdsourcing Workers Gang Wang, Tianyi Wang, Haitao Zheng, Ben Y. Zhao, UC Santa Barbara, Usenix Security.
Collusion-Resistance Misbehaving User Detection Schemes Speaker: Jing-Kai Lou 2015/10/131.
Understanding and Predicting Personal Navigation Date : 2012/4/16 Source : WSDM 11 Speaker : Chiu, I- Chih Advisor : Dr. Koh Jia-ling 1.
Leveraging Social Networks to Defend against Sybil attacks Krishna Gummadi Networked Systems Research Group Max Planck Institute for Software Systems Germany.
Who is on… Introduction Using social media entails particular kind of literacies i.e. skills which include the ability to engage in a medium for production.
Leveraging Asset Reputation Systems to Detect and Prevent Fraud and Abuse at LinkedIn Jenelle Bray Staff Data Scientist Strata + Hadoop World New York,
By Gianluca Stringhini, Christopher Kruegel and Giovanni Vigna Presented By Awrad Mohammed Ali 1.
Bimal Viswanath § Ansley Post § Krishna Gummadi § Alan Mislove ¶ § MPI-SWS ¶ Northeastern University SIGCOMM 2010 Presented by Junyao Zhang Many of the.
Once posted, other YouTube users can post comments about the video, post their YouTube video responses, or rate the video. Videos that are uploaded to.
Twitter Games: How Successful Spammers Pick Targets Vasumathi Sridharan, Vaibhav Shankar, Minaxi Gupta School of Informatics and Computing, Indiana University.
 Steve Craig  A Sacramento native and graduate of UC Davis  Over ten years experience working with web technologies  Associate Product Manager for.
Wikispam, Wikispam, Wikispam PmWiki Patrick R. Michaud, Ph.D. March 4, 2005.
Click to edit Master title style © by Nat Sakimura. Coping with Information Asymmetry SESSION G: Managing Risk & Reducing Online Fraud Using New.
1.  Usability study of phishing attacks & browser anti-phishing defenses – extended validation certificate.  27 Users in 3 groups classified 12 web.
Don’t Follow me : Spam Detection in Twitter January 12, 2011 In-seok An SNU Internet Database Lab. Alex Hai Wang The Pensylvania State University International.
Social Information Processing March 26-28, 2008 AAAI Spring Symposium Stanford University
Fabricio Benevenuto, Gabriel Magno, Tiago Rodrigues, and Virgilio Almeida Universidade Federal de Minas Gerais Belo Horizonte, Brazil ACSAC 2010 Fabricio.
Sybil Attacks VS Identity Clone Attacks in Online Social Networks Lei Jin, Xuelian Long, Hassan Takabi, James B.D. Joshi School of Information Sciences.
Social Media & Social Networking 101 Canadian Society of Safety Engineering (CSSE)
Measuring User Influence in Twitter: The Million Follower Fallacy Meeyoung Cha Hamed Haddadi Fabricio Benevenuto Krishna P. Gummadi.
Presented By: Madiha Saleem Sunniya Rizvi.  Collaborative filtering is a technique used by recommender systems to combine different users' opinions and.
Victor PTSA Fall Forum Don’t Lose Touch With Your Teen Tuesday, October 22, 2013 – 7PM Social media is now an integral part of our every day lives. For.
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
CrowdTarget: Target-based Detection of Crowdturfing in Online Social Networks Jenny (Bom Yi) Lee.
Where it is today and how it is used.
Uncovering Social Spammers: Social Honeypots + Machine Learning
Talal H. Noor, Quan Z. Sheng, Lina Yao,
Online Social Network: Threats &
By : Namesh Kher Big Data Insights – INFM 750
Discover How Your Business Can Benefit from a Facebook Fanpage
Discover How Your Business Can Benefit from a Facebook Fanpage
Shavonne Henry, Nikia Clarke, David Heymann, Brandon Knight
DIGITAL CURB APPEAL Learn how to leverage on technology and utilize all your social media platforms for publicity. .l.
Symantec Code Signing Certificate
Computer-Mediated Communication
Dieudo Mulamba November 2017
Distributed Peer-to-peer Name Resolution
Misinformation on Social Media
Trust and Culture on the Web
Building Relationships One Tweet at a Time
A Network Science Approach to Fake News Detection on Social Media
Binghui Wang, Le Zhang, Neil Zhenqiang Gong
Unit 2, Lesson 5 Social Media Marketing for Events
CS115/MAS115: Computing for The Socio-Techno Web
Social Media Interview
Cross Site Request Forgery (CSRF)
Abuse at scale
Presentation transcript:

On the Strength of Weak Identities in Social Computing Systems Krishna P. Gummadi Max Planck Institute for Software Systems

Social computing systems Online systems that allow people to interact Examples: Social networking sites: Facebook, Goolge+ Blogging sites: Twitter, LiveJournal Content-sharing sites: YouTube, Flickr Social bookmarking sites: Delicious, Reddit Crowd-sourced opinions: Yelp, eBay seller ratings Peer-production sites: Wikipedia, AMT Distributed systems of people

Social Computing Systems The Achilles Heel of Social Computing Systems

Identity infrastructures Most platforms use a weak identity infrastructure Weak identity infrastructure: No verification by trusted authorities required. Fill up a simple profile to create account Pros: Provides some level of anonymity Low entry barrier Cons: Lack accountability Vulnerable to fake (Sybil) id attacks

Sybil attacks: Attacks using fake identities Fundamental problem in systems with weak user ids Numerous real-world examples: Facebook: Fake likes and ad-clicks for businesses and celebrities Twitter: Fake followers and tweet popularity manipulation YouTube, Reddit: Content owners manipulate popularity Yelp: Restaurants buy fake reviews AMT, freelancer: Offer Sybil identities to hire Sybil attack is a fundamental problem in distributed systems. In a Sybil attack, an attacker created multiple fake identities and then uses the identities to manipulate the system. Many online services like webmail, social networks and p2p systems are vulnerable to such attacks. Give the digg example – vote many times For example, whenever a spammer’s email account is blacklisted or banned, the spammer can create a new web mail account and start sending spam. Similarly, there have been other recorded instances of Sybil attacks on real world social networking sites. On user generated content sharing sites like YouTube and Digg, end users vote on videos and URLs to identify and promote popular content. Today one can buy commercial software for a few hundred dollars to create a large number of Sybil accounts and manipulate the voting by casting fake votes. ----------------------------------

Sybil attacks are a growing menace There is an incentive to manipulate popularity of ids and information

The emergence of Abuse-As-A-Service

Sybil identities are a growing menace 40% of all newly created Twitter ids are fake!

Sybil identities are a growing menace 50% of all newly created Yelp ids are fake!

The Strength of Weak Identities

Strength of a weak identity Effort needed to forge the weak identity Weak ids come with zero external references Strength is the effort needed to forge ids’ activities And thereby, the ids’ reputation Idea: Could we measure ids’ strength by their blackmarket prices?

Key observation Attackers cannot tamper timestamps of activities E.g., join dates, id creation timestamps Older ids are less likely to be fake than newer ids Attackers do not target till sites reach critical mass Over time, older ids are more curated than newer ids Spam filters had more time to check older ids

Most active fakes are new ids Older ids are less likely to be fake than newer ids

Assessing strength of weak identities Leverage the temporal evolution of reputation scores Time Reputation score Evolution of reputation score of a single participant t0 t50 Evolution time t100 Current reputation score Attacker cannot forge the timestamps when the reputation score changed! Join date timestamp

Trustworthiness of Weak Identities

Trustworthiness of an identity Probability that its activities are in compliance with the online site’s ToS How to assess trustworthiness? Ability to hold the user behind the identity accountable Via non-anonymous strong ids Economic incentives vs. costs for the attack Strength of weak id determines attacker costs Leverage social behavioral stereotypes

Traditional Sybil defense approaches Catch & suspend ids with bad activities By checking for spam content in posts Can’t catch manipulation of genuine content’s popularity Profile identities to detect suspicious-looking ids Before they even commit fraudulent activities Analyze info available about individual ids, such as Demographic and activity-related info Social network links

Lots of recent work Gather a ground-truth set of Sybil and non-Sybil ids Social turing tests: Human verification of accounts to determine Sybils [NSDI ‘10, NDSS ‘13] Automatically flagging anomalous (rare) user behaviors [Usenix Sec. ’14] Train ML classifiers to distinguish between them [CEAS ’10] Classifiers trained to flag ids with similar profile features Like humans, they look for features that arise suspicion Does it have a profile photo? Does it have friends who look real? Do the posts look real?

Key idea behind id profiling For many profile attributes, the values assumed by Sybils & non-Sybils tend to be different

Key idea behind id profiling For many profile attributes, the values assumed by Sybils & non-Sybils tend to be different Location field is not set for >90% of Sybils, but <40% of non-Sybils Lots of Sybils have low follower-to-following ratio A much smaller fraction of Sybils have more than 100,000 followers

Limitations of profiling identities Potential discrimination against good users With rare behaviors that are flagged as anomalous With profile attributes that match those of Sybils Sets up a rat-race with attackers Sybils can avoid detection by assuming likely attribute values of good nodes Sybils can set location attributes, lower follower to following ratios Or, by attacking with new ids with no prior activity history

Attacks with newly created Sybils All our bought fake followers were newly created! Existing spam defenses cannot block them

Robust Tamper Detection in Crowd Computations

Is a crowd computation tampered? Does a large computation involve a sizeable fraction of Sybil participants? Business page Reviewer Business with tampered rating Twitter profile Follower User with tampered follower count

Are the following problems equivalent? 1. Detect whether a crowd computation is tampered Does the computation involve a sizeable fraction of Sybil participants? 2. Detect whether an identity is Sybil

Are the following problems equivalent? 1. Detect whether a crowd computation is tampered Does the computation involve a sizeable fraction of Sybil participants? 2. Detect whether an identity is Sybil Our Stamper project: NO! Claim: We can robustly detect tampered computations even when we cannot detect fake ids

Stamper: Detecting tampered crowds Idea: Analyze join date distributions of participants Entropy of tampered computations tends to be lower More generally, temporal evolution of reputation scores Significant fraction of identities dating all the way back to inception of site

Robustness against adaptive attackers Any malicious identity that gets suspended leads to a near permanent damage to attacker's power! Stamper can fundamentally alter the arms race with attackers What about attacks using compromised or colluding identities? Compromised/colluding identities have to be selected in such a way that it would match the reference distribution

TrulyFollowing: A prototype system It is said that we live in an information age and information societies, where the primary economic activity of a large fraction of users involves production, dissemination, and consumption of information. Detects popular users (politicians) with fake followers trulyfollowing.app-ns.mpi-sws.org

TrulyTweeting: A prototype system It is said that we live in an information age and information societies, where the primary economic activity of a large fraction of users involves production, dissemination, and consumption of information. Detects popular hashtags, URLs, tweets with fake promoters trulytweeting.app-ns.mpi-sws.org

DEMO

Detection by Stamper: How it works Assume unbiased participation in a computation The join date distributions for ids in any large-scale crowd computation must match that of a large random sample of ids on the site Any deviation indicates Sybil tampering Greater the deviation, the more likely the tampering Deviation can be calculated using KL-divergence Rank computations based on their divergence Flag the most anomalous computations

Dealing with computations with biased participation When nodes come from a biased user population: All computations suffer high deviations Making the tamper detection process less effective Solution: Compute join dates’ reference distribution from a similarly biased sample user population I.e., select a user population with similar demographics Has the potential to improve accuracy further

Detection accuracy: Yelp case study Case study: Find businesses with tampered reviews in Yelp Experimental set-up: 3,579 businesses with more than 100 reviews "Ground-truth" obtained using Yelp's review filter Stamper flags 362 businesses (83% of all with more then 30% tampering) Stamper flags only 3/54 (5.6%) normal crowds Stamper flags >97% of highly tampered crowds

Take-away lesson Ids are increasingly being profiled to detect Sybils Don’t profile individual identities! Accuracy would be low Can’t prevent tampering of computations Profile groups of ids participating in a computation After all, the goal is to prevent tampering of computations

Take-away questions What should a site do after detecting tampering? How do we know who tampered the computation? Could a politician / business slander competing politicians / businesses by buying fake endorsements for them? Can we eliminate the effects of tampering? Is it possible to discount tampered votes?

Take-away questions In practice, users have weak identities across multiple sites Such weak ids are increasingly being linked Can we transfer trust between weak identities of a user across domains? Can Gmail help Facebook assess trust in Facebook ids created using Gmail ids? Can a collection of a user’s weak user ids substitute for a strong user id?