Collating Social Network Profiles. Objective 2 System.

Slides:



Advertisements
Similar presentations
Causal Data Mining Richard Scheines Dept. of Philosophy, Machine Learning, & Human-Computer Interaction Carnegie Mellon.
Advertisements

Ch. Eick: More on Machine Learning & Neural Networks Different Forms of Learning: –Learning agent receives feedback with respect to its actions (e.g. using.
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Large-Scale Entity-Based Online Social Network Profile Linkage.
An Analysis of Machine Learning Algorithms for Condensing Reverse Engineered Class Diagrams Hafeez Osman, Michel R.V. Chaudron and Peter van der Putten.
Distant Supervision for Emotion Classification in Twitter posts 1/17.
Finding your friends and following them to where you are by Adam Sadilek, Henry Kautz, Jeffrey P. Bigham Presented by Guang Ling 1.
Named Entity Classification Chioma Osondu & Wei Wei.
Collaborative Filtering in iCAMP Max Welling Professor of Computer Science & Statistics.
Emerging Tech Expo Social Networking November 18, 2009.
COMP 328: Midterm Review Spring 2010 Nevin L. Zhang Department of Computer Science & Engineering The Hong Kong University of Science & Technology
Instance Based Learning. Nearest Neighbor Remember all your data When someone asks a question –Find the nearest old data point –Return the answer associated.
Lazy Learning k-Nearest Neighbour Motivation: availability of large amounts of processing power improves our ability to tune k-NN classifiers.
Introduction to WEKA Aaron 2/13/2009. Contents Introduction to weka Download and install weka Basic use of weka Weka API Survey.
Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.
Jeff Howbert Introduction to Machine Learning Winter Machine Learning Feature Creation and Selection.
Forecasting with Twitter data Presented by : Thusitha Chandrapala MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Peter Myers Bitwise Solutions Pty Ltd. Predictive Analytics PresentationExplorationDiscovery Passive Interactive Proactive Business Insight Canned.
Large-Scale Cost-sensitive Online Social Network Profile Linkage.
Midterm Review. 1-Intro Data Mining vs. Statistics –Predictive v. experimental; hypotheses vs data-driven Different types of data Data Mining pitfalls.
InsideView Introduction Ralf VonSosen, VP of Marketing Dan Tajbl, Sr. Sales Engineer.
Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN.
Content-Based Recommendation Systems Michael J. Pazzani and Daniel Billsus Rutgers University and FX Palo Alto Laboratory By Vishal Paliwal.
Data mining and machine learning A brief introduction.
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Our Twitter Profiles, Our Selves: Predicting Personality with Twitter Daniele Quercia, Michal Kosinski, David Stillwell, Jon Crowcroft COMP4332 Wong Po.
A Comparison Between Bayesian Networks and Generalized Linear Models in the Indoor/Outdoor Scene Classification Problem.
K Nearest Neighbors Classifier & Decision Trees
JSR 73: Data Mining API 資工三 B 林宗澤. Introduction In JDM, data mining [Mitchell1997, BL1997] includes the functional areas of classification, regression,
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensemble Methods: Bagging and Boosting
SPAM DETECTION AND FILTERING By Prasanna Kunchavaram.
Introduction Use machine learning and various classifying techniques to be able to create an algorithm that can decipher between spam and ham s. .
By Gianluca Stringhini, Christopher Kruegel and Giovanni Vigna Presented By Awrad Mohammed Ali 1.
Prediction of Influencers from Word Use Chan Shing Hei.
CISC Machine Learning for Solving Systems Problems Presented by: Ashwani Rao Dept of Computer & Information Sciences University of Delaware Learning.
MLE’s, Bayesian Classifiers and Naïve Bayes Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 30,
Text categorization Updated 11/1/2006. Performance measures – binary classification Accuracy: acc = (a+d)/(a+b+c+d) Precision: p = a/(a+b) Recall: r =
CONFIDENTIAL1 Hidden Decision Trees to Design Predictive Scores – Application to Fraud Detection Vincent Granville, Ph.D. AnalyticBridge October 27, 2009.
Support Vector Machines and Kernel Methods Machine Learning March 25, 2010.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Linking Organizational Social Networking Profiles PROJECT ID: H JEROME CHENG ZHI KAI (A H ) 1.
Linking Organizational Social Networking Profiles Research Wrap-Up – 28 August
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Familysearch.org/partneraccess.
Social Searching and Information Recommendation Systems Hassan Zamir.
Reputation Management System
Support Vector Machines Optimization objective Machine Learning.
Background for Machine Learning (I) Usman Roshan.
October 20-23rd, 2015 Automatically Combining Static Malware Detection Techniques ir. David De Lille 1.
A Simple Approach for Author Profiling in MapReduce
INTRODUCTION TO Machine Learning 2nd Edition
Classification with Gene Expression Data
Pfizer HTS Machine Learning Algorithms: November 2002
Reading: Pedro Domingos: A Few Useful Things to Know about Machine Learning source: /cacm12.pdf reading.
COMP61011 : Machine Learning Ensemble Models
CS6604 Project Ensemble Classification
Categorizing networks using Machine Learning
Mitchell Kossoris, Catelyn Scholl, Zhi Zheng
Machine Learning – a Probabilistic Perspective
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
Elena Mikhalkova, Nadezhda Ganzherli, Yuri Karyakin, Dmitriy Grigoryev
Kanchana Ihalagedara Rajitha Kithuldeniya Supun weerasekara
How to use Social Media in your job search
What is Artificial Intelligence?
Presentation transcript:

Collating Social Network Profiles

Objective 2 System

Objective 3 Company Name System Social Network Profiles InputOutput

4 Record Linkage + Identity

Agenda 5 Introduction Objective Contrast to Existing Work Work Done Baseline System Individual Network Approach Machine Learning Experiments Next Steps, Q&A

Baseline System 6

Ground Truth  Two networks: Facebook and Twitter  Top seventy 2013 Fortune 500 companies  Two networks: Facebook and Twitter  Top seventy 2013 Fortune 500 companies 7

Baseline Algorithm 1.Take company name. 2.Search Facebook/Twitter API using it. 3.Return first result from each. 1.Take company name. 2.Search Facebook/Twitter API using it. 3.Return first result from each. 8

Baseline Performance 9

Individual Network Approach 10

New Approach Score profiles based on  Edit Distance  Company Name – Username  Company Name – Display Name  Relative Popularity Score profiles based on  Edit Distance  Company Name – Username  Company Name – Display Name  Relative Popularity 11

12 Display Name Username

New Approach Score profiles based on  Edit Distance  Company Name – Username  Company Name – Display Name  Relative Popularity Score profiles based on  Edit Distance  Company Name – Username  Company Name – Display Name  Relative Popularity 13

Scoring 14

Best Performing Combination 15

Machine Learning Experiments 16

Freebase Ground Truth 397,071 Business Operations1,422 with a social media presence917 with Facebook, 687 with Twitter598 with both553 with valid profiles 17

Training Set 553 Correct 553 Incorrect 1106 Total 18

Cross Validation Results ClassifierTest | TrainTrain | Test Linear Regression Gaussian Naïve Bayes Multinomial Naïve Bayes Bernoulli Naïve Bayes Decision Tree

Next Steps  Improve training set: provide harder examples 20

Next Steps  Improve training set: provide harder examples  Incorporate more profile data  Improve training set: provide harder examples  Incorporate more profile data 21

Next Steps  Improve training set: provide harder examples  Incorporate more profile data  Build system around classifiers  Improve training set: provide harder examples  Incorporate more profile data  Build system around classifiers 22

Agenda 23 Introduction Objective Contrast to Existing Work Work Done Baseline System Individual Network Approach Machine Learning Experiments Next Steps, Q&A

24