Influence detection of famous personalities using Politeness and Likeability Navita Jain
Motivation Goal: Model influentiality of known personalities on Twitter Hypothesis: People generally follow personalities who are likeable Politeness is a trait liked by people Polite and likeable people are influential
Data Twitter data 2 different types of data 1.For likeability or attitude detection: A dataset of tweets in which each influential or non-influential user is referred. Example tweets : FOUR Doctors Warn - Trump Has a Narcissistic Personality Disorder! Unstable 2b President? #DonaldsDisorder #VOAV https… 2.For politeness detection: A dataset of tweets, tweeted by each influential or non- influential user. @DanScavino Vote trump to save the west. Don't become like Europe - #WakeUpAmerica
Converted all the alphabets to lowercase for consistency Removed statements that do not convey a message (sentiment) Kept 100 genuine tweets (tweets after preprocessing) for each test user Data Pre-processing Data collected from Twitter Api in English language
Annotation: Manually classified tweets ‘we are becoming a third world country because of jerks like him\' Great !‘, 'On EarthDay, reverence and gratitude to our planet that has given us everything’, ‘polite’ referring Barack Obama: ‘I AM SO PROUD that I was able, in my lifetime, to see a Black man, BARACK H. OBAMA, become the President of the United States of America.’, referring Sarah Palin: ‘ Are you sure, That well known scientist Sarah Palin told me it's all rubbish. Phew! you had me going for a moment.’, ‘negative’
Tweet2FeatureVector Used Stanford NLP tool to automatically parse dataset. Politeness Feature Vector : pronouns, adjective, verb.. [‘greetings’, ‘thank you’, ’please’, ’respected’, ’go to hell’, ‘f*ck’…….] Likeable Feature Vector : adjective, verb.. Created Likeable Feature Vector ['hate', 'pleased', 'perfect', 'envy‘, ’lmao’]
Classification Trained Naïve Bayes classifier for Politeness: Performed k = 5 fold cross validation Train on 80% of data, test on rest 20% Trained Support Vector Machine classifier for Likeability: Kernel : Linear Performed k = 10 fold cross validation Train on 90% of data, test on rest 10%
Spearman's rank correlation coefficient Measure of the strength of the association between two rank sets, where x i and y i are the ranks of users based on two different influence measures in a dataset of N test case where x i is politeness ranking and y i is likeability ranking
Result Test CasePoliteness Rank/ScoreLikeability Rank/Score Influence Rank/Score Barack Obama Donald Trump Jeb Bush John Boehner Kim Kardashian Narendra Modi Rahul Gandhi Sarah Palin Stephen Smith Taylor Swift
Ground TruthPredicted Influential users Barack ObamaKim kardashian Donald TrumpNarendra Modi Narendra ModiJeb Bush Kim KardashianBarack ObamaTaylor Swift Non-influential users Jeb BushDonald TrumpJohn BoehnerRahul GandhiSarah PalinStephen Smith
Thanks
Influence Detection Train on 80% of data, test on rest 20% Politeness correlates positively with influence Higher Politeness score more influential a person should be. Likeability proportional to influence First iteration, sum the score if above some threshold : influential
Thanks Suggestions
Twitter Api command for Data collection ''' Tweets on : user ''‘ raw_tweets = myApi.GetSearch('Donald Trump',lang='en',count=100) ‘’' Tweets by user''‘ raw_tweets =
Opinion lexicon : Positive and negative word list
Downloaded 100 tweets for each test user for likability measurement Kept retweets, as if a person retweets it means its their sentiment/agrees with it. (not sure about this ) Removed urls but kept hashtag because informative. For ex. “President Donald J. Trump #IrritateMeIn4Words”