Our Twitter Profiles, Our Selves: Predicting Personality with Twitter Daniele Quercia, Michal Kosinski, David Stillwell, Jon Crowcroft COMP4332 Wong Po.

Slides:



Advertisements
Similar presentations
Kin 304 Regression Linear Regression Least Sum of Squares
Advertisements

Part II Sigma Freud & Descriptive Statistics
1]Knoll, N., Burkert, S., & Schwartzer, R. (2006). Reciprocal support provision: Personality as a moderator? European Journal of Personality, 20,
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Michael Draper Annamarie Elmer Hanover College
Multiple Linear Regression Model
Class 17: Tuesday, Nov. 9 Another example of interpreting multiple regression coefficients Steps in multiple regression analysis and example analysis Omitted.
Lecture 25 Multiple Regression Diagnostics (Sections )
Quantitative Business Analysis for Decision Making Simple Linear Regression.
Introduction to The Big Five While there are hundreds of personality traits that psychologists have been using for over a century, there are really just.
Linear Regression and Correlation Topic 18. Linear Regression  Is the link between two factors i.e. one value depends on the other.  E.g. Drivers age.
Trait Perspective.
The Geography of Online News Engagement Martin Saveski, MIT Media Lab, Cambridge, USA Daniele Quercia, Yahoo Labs, Barcelona, Spain Amin Mantrach, Yahoo.
Regression and Correlation Methods Judy Zhong Ph.D.
Fundamentals of Statistical Analysis DR. SUREJ P JOHN.
Fundamentals of Data Analysis. Four Types of Data Alphabetical / Categorical / Nominal data: –Information falls only in certain categories, not in-between.
Formula Compute a standard deviation with the Raw-Score Method Previously learned the deviation formula Good to see “what's going on” Raw score formula.
PowerPoint presentation to accompany Research Design Explained 6th edition ; ©2007 Mark Mitchell & Janina Jolley Chapter 7 Introduction to Descriptive.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Modeling Relationship Strength in Online Social Networks Rongjing Xiang: Purdue University Jennifer Neville: Purdue University Monica Rogati: LinkedIn.
Contact: Zubin Tavaria (650)
電管碩一 R 凌伊亭 Social Media Use In a Mobile Broadband Environment : Examination of Determinants of Twitter and Facebook Use International Journal of.
Agenda Review Association for Nominal/Ordinal Data –  2 Based Measures, PRE measures Introduce Association Measures for I-R data –Regression, Pearson’s.
COMM 250 Agenda - Week 12 Housekeeping RP2 Due Wed. RAT 5 – Wed. (FBK 12, 13) Lecture Experiments Descriptive and Inferential Statistics.
 Privacy in Social Networks. Facebook Background  Biggest social network with 1 billion users  Most information defaults to public.
Can We Predict Eat Out Behavior of a Person from Tweets and Check-ins? Md. Taksir Hasan Majumder ( ) Md. Mahabur Rahman ( ) Department of Computer.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Course Website: teacherweb.com/AZ/UniversityHighSchool/Sar ahGrace Remind: remind.com/join/3fed8 Bring Books on Monday/Tuesday for Exam 1 Review.
To Blog or Not to Blog: Characterizing and Predicting Retention in Community Blogs Imrul Kayes 1, Xiang Zuo 1, Da Wang 2, Jacob Chakareski 3 1 University.
Research Project Statistical Analysis. What type of statistical analysis will I use to analyze my data? SEM (does not tell you level of significance)
Correlation & Regression
1 G Lect 7M Statistical power for regression Statistical interaction G Multiple Regression Week 7 (Monday)
Dabrowski’s Overexcitabilities, The Big Five, and GPA Stephanie L. Dorn & Catya von Károlyi University of Wisconsin – Eau Claire H 1 GPA is positively.
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
Evaluating Transportation Impacts of Forecast Demographic Scenarios Using Population Synthesis and Data Simulation Joshua Auld Kouros Mohammadian Taha.
Predicting Personality from Twitter 1 Predicting Personality with Social Media 2 Jennifer Golbeck, Cristina Robles, Michon Edmondson 1, Karen Turner SocialCom.
By Gianluca Stringhini, Christopher Kruegel and Giovanni Vigna Presented By Awrad Mohammed Ali 1.
Prediction of Influencers from Word Use Chan Shing Hei.
MOTIVATION AND CHALLENGE Big data Volume Velocity Variety Veracity Contributor Content Context Value 5 Vs of Big Data 3 Cs of Veracity.
Evaluation of Recommender Systems Joonseok Lee Georgia Institute of Technology 2011/04/12 1.
Organizational Behavior 15th Global Edition
Introduction to Organizational Behavior Week 2. Agenda Big 5 Survey & Review “Steroids in Baseball” Teamwork on Case Study “Steroids in Baseball” Class.
Unit 1 Sections 1-1 & : Introduction What is Statistics?  Statistics – the science of conducting studies to collect, organize, summarize, analyze,
Intro to Psychology Statistics Supplement. Descriptive Statistics: used to describe different aspects of numerical data; used only to describe the sample.
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Fall 2015 Room 150 Harvill.
Lecturer’s desk Physics- atmospheric Sciences (PAS) - Room 201 s c r e e n Row A Row B Row C Row D Row E Row F Row G Row H Row A
EDUC 200C week10 December 7, Two main ideas… Describing a sample – Individual variables (mean and spread of data) – Relationships between two variables.
Fabricio Benevenuto, Gabriel Magno, Tiago Rodrigues, and Virgilio Almeida Universidade Federal de Minas Gerais Belo Horizonte, Brazil ACSAC 2010 Fabricio.
Multiple Linear Regression An introduction, some assumptions, and then model reduction 1.
INTRODUCTION Research suggests that use of online social networking sites (SNS) can have positive and negative effects on users. The way that individuals.
Measuring User Influence in Twitter: The Million Follower Fallacy Meeyoung Cha Hamed Haddadi Fabricio Benevenuto Krishna P. Gummadi.
Data Mining What is to be done before we get to Data Mining?
Introduction Results: Mediational Analyses Results: Zero-Order Correlations Method Presented at the 15 th Annual Meeting of the Society for Personality.
Snap Judgments and Social Media: Forming Personality Perceptions on Twitter People form impressions of others quickly, and can do so with very little information.
Statistical analysis.
Beyond the Words: Predicting User Personality from Heterogeneous Information Presenter: Benyi Gong.
Selecting the Best Measure for Your Study
Simulating Virtual Behaviour A Facebook “Like” Questionnaire
Contextual Intelligence as a Driver of Services Innovation
Statistical analysis.
A person’s pattern of thinking, feeling and acting.
A person’s pattern of thinking, feeling and acting.
Ch 4: Personality and Abilities
Personality Traits, Self-Efficacy, and Social Interaction Skills of the Office Administration Students Marvelous F. Opina Jenny Ann A. Sabanal Berdandino.
Personality An individual’s characteristic pattern of thinking, feeling, and acting.
A person’s pattern of thinking, feeling and acting.
15.1 The Role of Statistics in the Research Process
Analysis for Predicting the Selling Price of Apartments Pratik Nikte
CONCEPTS TO BE INCLUDED
A person’s pattern of thinking, feeling and acting.
Presentation transcript:

Our Twitter Profiles, Our Selves: Predicting Personality with Twitter Daniele Quercia, Michal Kosinski, David Stillwell, Jon Crowcroft COMP4332 Wong Po Yan

Introduction ▪ Significant correlation between personality and real-world behavior – Music taste – Formation of social relations ▪ Predicting the personality of users in Twitter

Why Twitter? ▪ Previous study on Facebook – The nature of online interactions does not significantly differ from that of real world interactions ▪ A different platform – See anything of anybody unless users protect their updates ▪ Popular

Twitter Users ▪ Four types with Five measures – Listeners : follow many users – Popular: are followed by many – Highly-read: are often listed in other’s reading list – Influential: ▪ Klout score Whether a user’s tweet is being clicked, replied or retweeted ▪ TIME score TIME magazine ranking measure that combines one’s popularity on both Tweeter and Facebook using the formula (2a + b) / 2, where a = number of Twitter followers, b = number of Facebook social contact

Personality ▪ The Big Five Personality Test – An individual is associated with fives scores that correspond to the five main personality traits ▪ Traits – Openness – Conscientiousness – Extraversion – Agreeableness – Neuroticism

myPersonality ▪ Facebook users are able to take a variety of personality and ability test ▪ Users can give consent to share their personality scores and profile information – 40% – Only few hundreds of those have posted links to their Twitter accounts. ▪ The Big Five Personality Test

Goal Relationship between ▪ Personality Traits (Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism) ▪ Two additional attributes (age, sex) And ▪ Five user characteristics (followings, followers, listings, influential score (Klout, TIME))

Data Collection ▪ Sample users: 335 – Have specified their twitter accounts on Facebook profile – Have done the Big Five Personality Test using myPersonality in Facebook – Have shared the results and profiles on Twitter ▪ Data – Number of followers – Number of following – Number of times that the user has been listed in others’ reading list

Data Processing:Logarithm ▪ Number of followed users ▪ Number of followers ▪ Listings ▪ Two influential scores (Klout, TIME) ▪ Age Why? ▪ Corresponding distributions are not normal ▪ Logarithm transformation accounts for the violation of normality

Pearson Product Moment Correlation ▪ A measure of the linear relationship between two random variables ▪ Formula ▪ Range: [-1, 1]

Results Listener & Popular ▪ Extraversions – 0.13 for Listener – 0.15 for Popular – Extroverts ▪ Neuroticism – for Listener – for Popular – Emotionally stable ▪ Age – 0.28 for Listener – 0.37 for Popular – Tend to be older Listener and Popular are extroverts and emotionally stable. They tend to be older.

Results Highly-read – Openness ▪ 0.17 Highly-read are people who are imaginative, spontaneous and adventurous.

Results Influential ▪ Klout – Extraversion: 0.15 – Neuroticism: ▪ TIME – Conscientiousness: 0.18 – Extraversion: 0.25 – Neuroticism: – Age: 0.39 Influential are people who are extroverts, emotionally stable, ambitious and resourceful. They are very likely to be older.

Model for Prediction ▪ Regression analysis ▪ 10-fold cross validation using M5’ Rule – M5’ is based closely on M5 – M5 (Model tree) combines a conventional decision tree with the possibility of linear regression functions at the leaves – M5’ is the enhanced algorithm that improves with handling missing values and enumerated attributes ▪ Root Mean Square Error – Compare the difference between predicted values and observed values – On score scale [1,5], maximum RMSE = 0.88 – Error is low  Accurate

Conclusion ▪ All user types are emotionally stable ▪ Most of the users are extroverts, except Highly-read people ▪ Listener, Popular and Influential people tend to be older ▪ Influential people tend to be ambitious, but seem to be not very agreeable ▪ Highly-read people tend to be adventurous and imaginative These inferences have long been supported informally by intuition but have been difficult to make it precise.

Suggestions ▪ Marketing – Marketing strategy is closely related to consumer personality – E.g. Select ads to which the user is likely to be most receptive ▪ User Interface Design – Match the “look and feel” of a social media site to personality traits ▪ Recommender Systems – Product recommendation – E.g. Recommend music to users under given well-established relationship between personality and music taste

Q & A