Collating Social Network Profiles
Objective 2 System
Objective 3 Company Name System Social Network Profiles InputOutput
4 Record Linkage + Identity
Agenda 5 Introduction Objective Contrast to Existing Work Work Done Baseline System Individual Network Approach Machine Learning Experiments Next Steps, Q&A
Baseline System 6
Ground Truth Two networks: Facebook and Twitter Top seventy 2013 Fortune 500 companies Two networks: Facebook and Twitter Top seventy 2013 Fortune 500 companies 7
Baseline Algorithm 1.Take company name. 2.Search Facebook/Twitter API using it. 3.Return first result from each. 1.Take company name. 2.Search Facebook/Twitter API using it. 3.Return first result from each. 8
Baseline Performance 9
Individual Network Approach 10
New Approach Score profiles based on Edit Distance Company Name – Username Company Name – Display Name Relative Popularity Score profiles based on Edit Distance Company Name – Username Company Name – Display Name Relative Popularity 11
12 Display Name Username
New Approach Score profiles based on Edit Distance Company Name – Username Company Name – Display Name Relative Popularity Score profiles based on Edit Distance Company Name – Username Company Name – Display Name Relative Popularity 13
Scoring 14
Best Performing Combination 15
Machine Learning Experiments 16
Freebase Ground Truth 397,071 Business Operations1,422 with a social media presence917 with Facebook, 687 with Twitter598 with both553 with valid profiles 17
Training Set 553 Correct 553 Incorrect 1106 Total 18
Cross Validation Results ClassifierTest | TrainTrain | Test Linear Regression Gaussian Naïve Bayes Multinomial Naïve Bayes Bernoulli Naïve Bayes Decision Tree
Next Steps Improve training set: provide harder examples 20
Next Steps Improve training set: provide harder examples Incorporate more profile data Improve training set: provide harder examples Incorporate more profile data 21
Next Steps Improve training set: provide harder examples Incorporate more profile data Build system around classifiers Improve training set: provide harder examples Incorporate more profile data Build system around classifiers 22
Agenda 23 Introduction Objective Contrast to Existing Work Work Done Baseline System Individual Network Approach Machine Learning Experiments Next Steps, Q&A
24