Inferring Perceived Demographics from User Emotional Tone and User-Environment Emotional Svitlana Volkova 1, Yoram Bachrach 2 1 Center for Language and Speech Processing Johns Hopkins University (now at Pacific Northwest National Laboratory) 2 Microsoft Research Cambridge
Social Media Obsession Personalized Diverse Timely Large volumes Billions of messages Millions of users Multilingual Automatically Inferring Demographics, Personality, Emotions, Opinions, Interests
What do they think and feel? Where do they go? What are their demographics and personality? What do they like? What do they buy? User Attribute Prediction Task
User Attribute Prediction Task Political Preference Rao et al., 2010; Conover et al., 2011, Pennacchiotti and Popescu, 2011; Zamal et al., 2012; Cohen and Ruths, 2013; Volkova et. al, Communications Gender Garera and Yarowsky, 2009; Rao et al., 2010; Burger et al., 2011; Van Durme, 2012; Zamal et al., 2012; Bergsma and Van Durme, 2013 Age Rao et al., 2010; Zamal et al., 2012; Cohen and Ruth, 2013; Nguyen et al., 2011, 2013; Sap et al., 2014 … … … … AAAI 2015 Demo (joint work with Microsoft Research) Income, Education, Ethnicity, Life Satisfaction, Optimism, Personality (stress, excitement)
Research Question: Do emotional differences between users and their neighbours correlate with user demographics? What do they think and feel? What are their demographics? Emotional contagion on Facebook neighbours
Sampling Twitter Data English speaking users from the US and Canada Average Twitter users! No news or celebs! But not dead! –Tweet on average 4 – 10 tweets per day; > 20 friends 10K random users with 200 tweets = 2M tweets Randomly sample 15 of their neighbors U = 123K users, T = 25M tweets
GenderAge…IncomeTweetEmoSent Male≥25> $35KOmg I'm bored an annoyedAngerNeg ……………… Female<25≤ ohhh!!thanksss!JoyPos Joy, Sad,..., Anger, Pos, Neg #emo + #syns Demographic Predictions Attribute Models Φ A (u) Crowdsourcing 10K Users + 113K Neighbours25M Tweets Gender, …, Age, Income Train EmoSent Predictions EmoSent Models Φ E (t) Train Demographic Classification EmoSent Classification 5K 75 K Approach 19 K How did we get annotations? How did we build the models? Sent Emo
Annotation Schemes Ways to get ≈“ground truth” annotations: Fun psychological tests (voluntarily): myPersonality project Profile info: Facebook, Youtube, Google+ e.g., relationship, gender, age but sparse for Twitter Self reports: “I am a republican…” (Volkova et al. 2013), “Happy ##th/st/nd/rd birthday to me” (Zamal et. al. 2012), “I have been diagnosed with …” (Coppersmith et. al. 2014), “I am a writer …” (Beller at. al., 2014) Distant supervision: following Obama vs. Romney (Zamal et. al. 2012), emotion hashtags (Mohammad et. al, 2014), links to blogs (Burger et. al., 2011) Crowdsourcing: subjective perceived annotations (Volkova et. al.2015), rationales (Bergsma et. al., 2013, Volkova et. al, 2014; 2015) A n n o t a t i o n B i a s e s
Perceived Attribute Annotation via Crowdsourcing 5K profiles Attribute Models Φ A (u) ULUL UPUP 5K 123K Trusted crowd $6/hour quality control
Tweets Revealing User Attributes ? ? ? ?
Predictive Models Supervised text classification: Lexical: binary word bigrams Socio-linguistic, syntactic and stylistic: smiles, hashtags, mixed punctuation, elongations, capitalization (POS tags, LIWC lexicon features, punctuation) Affect: emotion and sentiment proportions
Attribute Classification Quality
Predictive Features for Income
Comparison with Other Models Train \ TestUsersBurger at. al., 2011Gender graphPerceived Burger at. al., , Gender graph Perceived4, Train \ TestUsersGeo-centricCand-centricActivePerceived Geo-centric Cand-centric1, Active Perceived2,
Emotion and Sentiment Prediction Quality Features: BOW + Negation, Stylistic +0.3F1 Elongations Yaay, woooow, Capitalization COOL, Mixed Punctuation ???!!! Hashtags and Emoticons 6 classes: joy, sadness, fear, surprise, disgust, anger 3 classes: positive, negative and neutral F1=0.78 (Roberts’ , Qadir’ , Mohammad’ ) F1=0.66 SemEval-2013 (3K tweets) (1 st 0.69, 2 nd 0.65 out of 38 teams)
User-Environment Emotional Contrast Given Latent User Happy bday bro FREE100 MISS YOU MY BOY Omg I'm bored an annoyed The Green Mile is your new avi is ohhhhhh!!thanksss! User Emotional ToneEnvironment Emotional Tone Predicted Attributes
Methodology (1) Similarities between incoming and outgoing emotion and opinion distributions via Jensen- Shannon Divergence:
Methodology (2) Differences between every outgoing and incoming emotion and sentiment: Mann Whitney test on JSDs and emotion differences :
User-Environment Emotional Contrast Sadness Amplify + Dampen – Joy Fear SurpriseAnger Disgust Amplify + Dampen – Older Kids Dissatisfied Not Excited Older, High Income, Degree Male, Kids Stressed Pessimists
Attribute Prediction Improvement Gain over BOW
Predicting Demographics from User Outgoing Emotions and Opinions AUC ROC Satisfied Optimist Dissatisfied Pessimist No Kids Below 25 y.o. Female Male 1/3 attributes AUC >=75%
Predicting Demographics from User- Environment Emotional Contrast AUC ROC 1/2 attributes AUC >=60%
Applications Personalized recommendation and search Personalized marketing Online targeted advertising Recruitment and human resource management Real-time, large-scale polling Healthcare analytics
Summary I.Different demographics => varied reactions to the emotional tone in user environments II.Emotions and user-environment emotional contrasts improve attribute prediction
Questions? Data, code and models: