Connecting Users across Social Media Sites: A Behavioral-Modeling Approach Jingchi Zhang.

Slides:



Advertisements
Similar presentations
Google News Personalization: Scalable Online Collaborative Filtering
Advertisements

Spelling Correction for Search Engine Queries Bruno Martins, Mario J. Silva In Proceedings of EsTAL-04, España for Natural Language Processing Presenter:
Large-Scale Entity-Based Online Social Network Profile Linkage.
Analysis and Modeling of Social Networks Foudalis Ilias.
Back to Table of Contents
Chapter 2 Cultural Representation of Gender _________________________.
Internet/Cyber Stalking AND HOW TO AVOID BEING A VICTIM.
Voice Biometric Overview for SfTelephony Meetup March 10, 2011 Dan Miller Opus Research.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 6 Advanced Data Modeling.
Database Systems: Design, Implementation, and Management Tenth Edition
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Detecting Computer Intrusions Using Behavioral Biometrics Ahmed Awad E. A, and Issa Traore University of Victoria PST’05 Oct 13,2005.
Mining Social Media: Looking Ahead Arizona State University Data Mining and Machine Learning Lab Arizona State University Data Mining and Machine Learning.
Section – Biometrics 1. Biometrics Biometric refers to any measure used to uniquely identify a person based on biological or physiological traits.
Service Discrimination and Audit File Reduction for Effective Intrusion Detection by Fernando Godínez (ITESM) In collaboration with Dieter Hutter (DFKI)
Intelligent Online Marketing Local Digital Advertising Specific To Your Business Presented by: Dexter Nelson Live Minder Trend Analysis & Marketing.
72% of all parents are concerned that other people could locate their child through their mobile phone using location based services.
Presentation of approach and pilot results Mannheim, March 20-22, 2015 You walk, you travel, you use your phone – differently!
BUSINESS DRIVEN TECHNOLOGY
Overview of Web Data Mining and Applications Part I
Lecture 21: Privacy and Online Advertising. References Challenges in Measuring Online Advertising Systems by Saikat Guha, Bin Cheng, and Paul Francis.
Creating Social ROI Presentation Ryan J. Swindall Whether you are just starting to explore the potential of social media, or have already developed a dynamic,
ENTROPY OF FINGERPRINT SENSORS. Do different fingerprint sensors affect the entropy of a fingerprint? RESEARCH QUESTION/HYPOTHESIS.
 Why would you want to be connected? o To make online connections that will improve your efficiency and speed o To provide a near instant platform.
Creating Online Class Communities Jennifer Dorman Discovery Education
Large-Scale Cost-sensitive Online Social Network Profile Linkage.
Chapter 3 Object-Oriented Analysis of Library Management System(LMS)
AdWords Instructor: Dawn Rauscher. Quality Score in Action 0a2PVhPQhttp:// 0a2PVhPQ.
FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.
Attention and Event Detection Identifying, attributing and describing spatial bursts Early online identification of attention items in social media Louis.
Health promotion and health education programs. Assumptions of Health Promotion Relationship between Health education& Promotion Definition of Program.
DATA MODELLING TOOLS FOR ORGANISING DATABASES. For a database to be organised and logical, it must be well-designed and set out. In such cases, the databases.
Getting started on informaworld™ How do I register my institution with informaworld™? How is my institution’s online access activated? What do I do if.
Social Media for Credit Unions? Facebook – Getting Started Adding content Promoting Advertising Summary W E L O O K A T T H I N G S D I F F E R E N T.
 Type of site: Social annotations, highlighting and social bookmarking  Launched: July 24 th, 2006  Institutional video:
Overview: Humans are unique creatures. Everything we do is slightly different from everyone else. Even though many times these differences are so minute.
Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Institute for System Programming of RAS.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
REZA ZAFARANI AND HUAN LIU DATA MINING AND MACHINE LEARNING LABORATORY (DMML) ARIZONA STATE UNIVERSITY KDD 2013 – CHICAGO, ILLINOIS.
Free Powerpoint Templates Page 1 Free Powerpoint Templates Influence and Correlation in Social Networks Azad University KurdistanSocial Network.
Using Internet Information Effectively in Marketing Floyd Memorial Library Greenport, NY Nov. 7, 2011.
Lecture 6: The Ultimate Authorship Problem: Verification for Short Docs Moshe Koppel and Yaron Winter.
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
Social Media How can we manage social network for children?
Understanding Cross-site Linking in Online Social Networks Yang Chen 1, Chenfan Zhuang 2, Qiang Cao 1, Pan Hui 3 1 Duke University 2 Tsinghua University.
Jeopardy K201 – The Computer In Business Exam Review 2 Manjit Trehan.
Wang-Chien Lee i Pervasive Data Access ( i PDA) Group Pennsylvania State University Mining Social Network Big Data Intelligent.
Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.
Chapter 8 Data Modeling Advanced Concepts Database Principles: Fundamentals of Design, Implementation, and Management Tenth Edition.
By Gianluca Stringhini, Christopher Kruegel and Giovanni Vigna Presented By Awrad Mohammed Ali 1.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Core Publisher: Station Administrator Tools. Training 1: Site Administration Training 2: Programs Training 3: Content Tagging Training 4: Creating Posts.
Linking Organizational Social Networking Profiles PROJECT ID: H JEROME CHENG ZHI KAI (A H ) 1.
By Andrew McDaniel. Bloom’s Revised Taxonomy Remembering Understanding Applying Analyzing Evaluating Creating.
Date: 2015/11/19 Author: Reza Zafarani, Huan Liu Source: CIKM '15
Recognizing Stances in Online Debates Unsupervised opinion analysis method for debate-side classification. Mine the web to learn associations that are.
Click to Add Title A Systematic Framework for Sentiment Identification by Modeling User Social Effects Kunpeng Zhang Assistant Professor Department of.
Communications Technology and Change Tutorial Week 4 Smart Technology Google By Jessica Hickey u
Chapter 13.3: Databases Invitation to Computer Science, Java Version, Second Edition.
I can be You: Questioning the use of Keystroke Dynamics as Biometrics Tey Chee Meng, Payas Gupta, Debin Gao Ke Chen.
Keystroke Dynamics By Hafez Barghouthi.
An Introduction to Content Analysis Communication Research Week 9 Myra Gurney.
Connecting Users across Social Media Sites: A Behavioral- Modeling Approach Reza Zafarani and Huan Liu KDD’13 Presenter: Changqing Luo, Zhihao Cao, and.
Chapter 8: Web Analytics, Web Mining, and Social Analytics
NASBLA Social Media: What is it for? NASBLA is involved in numerous Social Media that all serve a distinct purpose. So, what are they all for?
Database (Microsoft Access). Database A database is an organized collection of related data about a specific topic or purpose. Examples of databases include:
Automated Experiments on Ad Privacy Settings
Location Recommendation — for Out-of-Town Users in Location-Based Social Network Yina Meng.
Using Google Plus Skills: Use Google Plus
Build Network for Professional Up gradation with LinkedIn
Presentation transcript:

Connecting Users across Social Media Sites: A Behavioral-Modeling Approach Jingchi Zhang

Methodology  Modeling Behavior for identifying Users across Sites (MOBIUS)  Identifies users’ unique behavioral patterns that lead to information redundancies across sites  Constructs features that exploit information redundancies due to these behavioral patterns  Employs machine learning for effective user identification

Why important  Verifying ages online is important as it attempts to determine whether someone is “an 11-year-old girl or a 45-yearold man”.  “Skout, a mobile social networking app, discovered that, within two weeks, three adults had masqueraded as 13- to 17-year olds. three separate incidents, they contacted children and, the police say, sexually assaulted them.” New York Times

Problem Statement  Information shared by users on social media sites provides a social fingerprint of them and can help identify users across different sites.  Username  Unique on each site and can help identify individuals  Two general problem  Given two usernames u 1 and u 2, can we determine if they belong to the same individual?  Given a single username u from individual I, can we find other usernames of I?  R. Zafarani and H. Liu. Connecting Corresponding Identities across Communities. In ICWSM, pages 354–357, 2009.

Question 1  Given two usernames u 1 and u 2, can we determine if they belong to the same individual?  we find the set of all usernames C that are likely to belong to individual I. We denote set C as candidate usernames  for all candidate usernames c ∈ C, we check if c and u belong to the same individual.  Identification function  f(U, c) = 1 If c and set U belong to I ;  f(U, c) = 0 Otherwise ;

Question 1 Depending on the learning framework, one can even learn the probability that an individual owns the candidate username, generalizing our binary f function to a probabilistic model (f(U, c) = p)

Analyze behavioral patterns  MOBIUS contains  Behavioral patterns  Features constructed  Learning framework

Behavioral patterns and feature construction  Individuals can avoid such redundancies  short-term memory capacity of 7 ±2 items  Human memory thrives on redundancy  not long, not random, and have abundant redundancy These behavioral patterns can be categorized as follows: 1. Patterns due to Human Limitations 2. Exogenous Factors 3. Endogenous Factors The features designed to capture information generated by these patterns can be divided into three categories: 1. (Candidate) Username Features: 2. Prior-Usernames Features: these 3. Username ↔ Prior-Usernames Features:

Patterns due to Human Limitations  Limited time and memory  59% of individuals prefer to use the same usernames repeatedly  Users commonly have a limited set of potential usernames from which they select  Users often prefer not to create new usernames  approximated by the number of unique usernames (uniq(U)) among prior usernames U  uniqueness = | uniq(U)|/| U|  Limited knowledge  Limited Vocabulary: Our vocabulary is limited in any language  Limited Alphabet: alphabet letters used in the usernames are highly dependent on language.  no Arabic word transcribed in English contains the letter x

Exogenous Factors  Typing Patterns  layout of the keyboard significantly impacts how random usernames are selected  e.g., qwer1234 and aoeusnth are two well-known passwords  we construct the following 15 features for each keyboard layout :  (1 feature) The percentage of keys typed using the same hand used for the previous key  (1 feature) Percentage of keys typed using the same finger used for the previous key.  (8 features) The percentage of keys typed using each finger. Thumbs are not included.  (4 features) The percentage of keys pressed on rows:  (1 feature) The approximate distance (in meters) traveled for typing a username.  Language Patterns  Users often use the same or the same set of languages when selecting usernames.  n-gram statistical language detector over the European Parliament Proceedings Parallel Corpus 3,which consists of text in 21 European languages

Endogenous Factors  Endogenous factors play a major role when individuals select usernames.  Personal attributes (name, age, gender, roles and positions, etc.)  characteristics, e.g., a female selecting username fungirl09, a father selecting geekdad, or a PlayStation 3 fan selecting PS3lover2009.  habits, such as abbreviating usernames or adding prefixes/suffixes.

Personal Attributes and Personality Traits  Personal Information  language detection model is incapable of detecting several languages, as well as specific names, such as locations, or others that are of specific interest to the individual selecting the username  Kalambo, a waterfall in Zambia, or K2 and Rakaposhi, both mountains in Pakistan  patterns in these words can be captured by analyzing the alphabet distribution  Kalambo,  ‘I’ in languages such as Arabic or Tajik, if detection fails  Username Randomness  describe individuals’ level of privacy and help identify them

Habits  Username Modification  Add prefixes or suffixes  e.g., mark.brown → mark.brown2008,  Abbreviate there usernames  e.g., ivan.sears → isears,  Change characters or add characters in between  e.g., beth.smith → b3th.smith  Capture the modifications  detect added prefixes or suffixes  detecting abbreviations, Longest Common Subsequence  swapped letters and added letters, Edit Distance(Lev-enshtein) and Dynamic Time Warping (DTW) distance

Habits

Summary

Experiments  Social Networking Sites:  Google+ or Facebook, list their IDs on other sites,  Blogging and Blog Advertisement Portals  List not only blogs, but also their profiles on other sites  Forums  Content Management Systems: allow users to add their usernames on social media sites to their profiles  Overall, 100,179(c-U) pairs are collected, where c is a username and U is the set of prior usernames. Both c and U belong to the same individual. The dataset contains usernames from 32 sites, such as Flickr, Reddit, StumbleUpon, and YouTube.

Result 100,179 positive + 100,179 negative ≈200,000 instances

Debate  Only focus on username? Not enough.  In real world, no enough database to support this method  Eg. Bunnymartini, litchilover  If we know other than username,  Search history  Interest  …

Q&A