WHO ARE YOU?...HONESTLY! A study on inferring missing attributes in social networks Zeinab Mahdavifar Advisor: Prof. Martine De Cock
We are living in social network era. Each social network help us in one way. we are in touch with our old finds(Facebook), we find the company with closest work culture to us(LinkedIn), and can find help from sources we never could find otherwise(Mechanical Turk).
3,035,749,340 online users 2,276,812,005 social network users % of Internet users who use the following social media by year 74% of all online adult population use social networking sites. one third of them are on facebook, and almost equal portions use other social networks such as Twitter, Instagram, Pinterest and LinkedIn. That is why 52% of online adults now use two or more social media sites. www.pewinternet.org
3,035,749,340 online users 2,276,812,005 social network users 74% of all online adult population use social networking sites. one third of them are on facebook, and almost equal portions use other social networks such as Twitter, Instagram, Pinterest and LinkedIn. That is why 52% of online adults now use two or more social media sites. www.cloudtweaks.com
… and we can use it for: Targeted Advertising Reputation Monitoring Sexual Predation Detection
This is not final http://www.datasciencecentral.com/
This is not final
Predictive Models Using like/ Comment/ Status Using Friendship Links
Problem: Inferring age and gender in social network Approach: Using friendship links between users Input: friendship links of 3 million users in Netlog Algorithm: Label Propagation (a community detection algorithm) Output: All ages and genders of that 3 million users
Problem: Inferring age and gender in social network Approach: Using friendship links between users Input: friendship links of 3 million users in Netlog Algorithm: Label Propagation (a community detection algorithm) Output: All ages and genders of that 3 million users
We start from users with known attributes in the network = colored users Lets have an example Imagine we want to know the age of all our network. We know the age of two users in the network that are shown in color. We use an iterative approach and at each approach we propagate the age from known users to unknown ones.
At each iteration, known users pass their label to neighbors.
Till the whole network is labeled …. Till the whole network is labeled
F M Andi F F Suzie
F M M F F M F F
Results with info of 15% of users minimum error ~ 6 years age gender with info of 15% of users minimum error ~ 6 years with info of 10% of users accuracy ~ 80%
Results with info of 15% of users minimum error ~ 6 years age gender with info of 15% of users minimum error ~ 6 years with info of 10% of users accuracy ~ 80%
misrepresentation of age/gender Sexual Predation Detection
Public | Private missing attributes vs. misrepresentation of attributes
Public | Private publically available data to privately accurate information Our model uses publicly available data to find privately accurate information about a user, and in case of suspicious behavior informs law enforcement authorities. This helps supporting communal safety and social well-being.
Time’s Up! About you: Zeinab Mahdavifar Masters of Computer Science Institute of Technology, UW Tacoma zmahdavi@uw.edu @ZeinabFar I am a fan of: Data Science Big Data CENTER FOR DATA SCIENCE WomenWhoCode New Technologies Bloomberg Cooking Cycling Puzzles Volunteering