Linking Organizational Social Networking Profiles PROJECT ID: H0791030 JEROME CHENG ZHI KAI (A0080860H ) 1.

Slides:



Advertisements
Similar presentations
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Advertisements

Introduction to Information Retrieval
Reporter: Jing Chiu Advisor: Yuh-Jye Lee /7/181Data Mining & Machine Learning Lab.
Large-Scale Entity-Based Online Social Network Profile Linkage.
Distant Supervision for Emotion Classification in Twitter posts 1/17.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Semantic Matching of candidates’ profile with job data from Linkedln PRESENTED BY: TING XIAO SARABPREET KAUR DHILLON.
Opinion Spam and Analysis Nitin Jindal and Bing Liu Department of Computer Science University of Illinois at Chicago.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Information Retrieval in Practice
Support Vector Machines and Kernel Methods
Connecting Users across Social Media Sites: A Behavioral-Modeling Approach Jingchi Zhang.
Context-Aware Query Classification Huanhuan Cao 1, Derek Hao Hu 2, Dou Shen 3, Daxin Jiang 4, Jian-Tao Sun 4, Enhong Chen 1 and Qiang Yang 2 1 University.
Alias Detection in Link Data Sets Master’s Thesis Paul Hsiung.
Revision (Part II) Ke Chen COMP24111 Machine Learning Revision slides are going to summarise all you have learnt from Part II, which should be helpful.
Overview of Search Engines
EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University.
Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.
Huizhong Doan, Yonbo Cao, Chin-Yew Lin and Yong Yu Shanghai Jiao Tong University & MSRA ACL /7/91Rick Liu.
Large-Scale Cost-sensitive Online Social Network Profile Linkage.
Social Theory Driven Operational Forecasting of Civil Unrest Event Outbreaks Final Project Presentation Peter Wu Apr 30, 2015.
© URENIO Research Unit 2004 URENIO Online Benchmarking Application Thessaloniki 7 th of October 2004 Isidoros Passas BEng Computer System Engineering.
Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.
APPLICATIONS OF DATA MINING IN INFORMATION RETRIEVAL.
Using Taxonomies Effectively in the Organization v. 2.0 KnowledgeNets 2001 Vivian Bliss Microsoft Knowledge Network Group
Automatically Identifying Localizable Queries Center for E-Business Technology Seoul National University Seoul, Korea Nam, Kwang-hyun Intelligent Database.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
User Engagement Related to Advertising By Briana Candito.
Using Transactional Information to Predict Link Strength in Online Social Networks Indika Kahanda and Jennifer Neville Purdue University.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Topical Crawlers for Building Digital Library Collections Presenter: Qiaozhu Mei.
Collating Social Network Profiles. Objective 2 System.
Presenter: Lung-Hao Lee ( 李龍豪 ) January 7, 309.
Improving Suffix Tree Clustering Base cluster ranking s(B) = |B| * f(|P|) |B| is the number of documents in base cluster B |P| is the number of words in.
Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.
Introduction Use machine learning and various classifying techniques to be able to create an algorithm that can decipher between spam and ham s. .
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
Understanding User’s Query Intent with Wikipedia G 여 승 후.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
TEXT ANALYTICS - LABS Maha Althobaiti Udo Kruschwitz Massimo Poesio.
Object-Oriented Modeling: Static Models. Object-Oriented Modeling Model the system as interacting objects Model the system as interacting objects Match.
Research © 2008 Yahoo! Generating Succinct Titles for Web URLs Kunal Punera joint work with Deepayan Chakrabarti and Ravi Kumar Yahoo! Research.
Linking Organizational Social Networking Profiles Research Wrap-Up – 28 August
IR Homework #3 By J. H. Wang May 10, Programming Exercise #3: Text Classification Goal: to classify each document into predefined categories Input:
USE RECIPE INGREDIENTS TO PREDICT THE CATEGORY OF CUISINE Group 7 – MEI, Yan & HUANG, Chenyu.
Date: 2015/11/19 Author: Reza Zafarani, Huan Liu Source: CIKM '15
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Linked Data Profiling Andrejs Abele National University of Ireland, Galway Supervisor: Paul Buitelaar.
Post-Ranking query suggestion by diversifying search Chao Wang.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Finding the Right Facts in the Crowd: Factoid Question Answering over Social Media J. Bian, Y. Liu, E. Agichtein, and H. Zha ACM WWW, 2008.
Reputation Management System
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Don’t Follow me : Spam Detection in Twitter January 12, 2011 In-seok An SNU Internet Database Lab. Alex Hai Wang The Pensylvania State University International.
Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
BASS TRACK SELECTION IN MIDI FILES AND MULTIMODAL IMPLICATIONS TO MELODY gPRAI Pattern Recognition and Artificial Intelligence Group Computer Music Laboratory.
Presented by, Bafrand,
Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
© NCSR, Frascati, July 18-19, 2002 CROSSMARC big picture Domain-specific Web sites Domain-specific Spidering Domain Ontology XHTML pages WEB Focused Crawling.
A Simple Approach for Author Profiling in MapReduce
Using Social Media to Enhance Emergency Situation Awareness
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Learning to Rank Shubhra kanti karmaker (Santu)
Mitchell Kossoris, Catelyn Scholl, Zhi Zheng
Revision (Part II) Ke Chen
Presentation transcript:

Linking Organizational Social Networking Profiles PROJECT ID: H JEROME CHENG ZHI KAI (A H ) 1

Example: Holiday Inn TWITTERFACEBOOK 2

Motivation: Individuals Want to find profiles, but no one place has them Sometimes on company websites, but: No standardized location Not all companies bother 3

4

5

Motivation: Organizations Track competitor’s use of social media Find imposter profiles 6

Problem Definition 7 System Social Profiles Organization Name Official Affiliate Unrelated

Related Work Focused on deduplication for individuals Relevant: profile characteristics focused on 8

Related Work: Usernames Connecting Corresponding Identities across Communities (Zafarani & Liu, 2009) Connecting users across social media sites: a behavioral- modeling approach (Zafarani & Liu, 2013) Studying User Footprints in Different Online Social Networks (Malhotra et al., 2012) 9

Related Work: Created Content Identifying Users Across Social Tagging Systems (Iofciu, Fankhauser, Abel & Bischoff, 2011) 10

Methodology: System Design 1.Input: organization’s name (query) 2.Search Facebook/Twitter APIs, retrieve profiles 3.Convert profiles into feature vectors 4.Classify profile-as-vectors 11

Classifier Choice Evaluated scikit-learn’s: Decision Tree Naïve Bayes Support Vector Logistic Regression Random Forest Features aren’t independent – trees are well-suited 12

Feature Breakdown: Name-based Normalized Edit Distance Query to Username Query to Display Name Edit Distance Query to Username Query to Display Name Length of Query Length of Username Length of Display Name 13

Feature Breakdown: Name-based Quirks Need to handle abbreviations, stopwords Citigroup versus Citi, General Motors versus GM Take two edit distances: original string, processed string Use better scoring of the two 14

Feature Breakdown: Description Occurrences of Query Cosine Similarity Query and Description Duckduckgo Description and Profile Description 15

Feature Breakdown: Language Models Construct Bigram Language Model for: Official profile descriptions Affiliate profile descriptions Unrelated profile descriptions Probability that candidate description belongs to each 16

Evaluation: Ground Truth Creation 17 1.Retrieved organizations from Freebase 2.Searched for profiles on Twitter/Facebook 3.Manually labelled as official/affiliate/unrelated

Evaluation: Ground Truth Breakdown TWITTER CLASSESFACEBOOK CLASSES labels3413 labels

Evaluation: Process Mainly concerned with official and affiliate classes Not interested in unrelated class Modified 10-fold Cross Validation 19

Evaluation: Modified Cross Validation 1.Generate folds as per normal 2.Train classifier on training set as per normal 3.For each affiliate/official profile in test set: 1. Input organization’s name to system 2. Count number of correct results 4.Calculate precision/recall/F1 from counts 20

Evaluation: Baseline Normalised Edit Distance: Username/Display Name and Query Emulates searching networks manually without examining profile in detail 21

Results & Discussion: Twitter 22

Results & Discussion: Facebook 23

Discussion Baseline performs well for official class on Facebook Username and display name alone are good indicators for this class Other features still help, but not as much 24

Discussion: Facebook Characteristics Many profile types: people, pages, places, etc. Finding official pages is simplified But: finding affiliates requires more effort 25

Discussion: Facebook Characteristics Facebook doesn’t require a “username” be specified for pages Will just use an ID instead Auto-generated pages also only have IDs, use name from Wikipedia/other sources 26

Limitations Ground truth proportions: expand and/or balance 27

Limitations Ground truth proportions: expand and/or balance Limited number of profiles retrieved for classification 28

Future Work Support additional networks Examine post content “Preferential” classification 29