Prediction of Influencers from Word Use Chan Shing Hei
Detecting influential users before they show observable signals of influence Method: psycholinguistic category scores from word usage Twitter Data built predictive models of influence from such category based features Introduction
Measuring Influence measure influence score the average number of retweets generated from a user's tweets
Dataset Twitter's streaming API Nov 1, 2013 to Nov 14, 2013 randomly sample 1000 users Tweets from last one month (Oct 2013) historical tweets (max 200) average influence score: SD: 0.098
Psycholinguistic Analysis from text How to measure word use ? users’ historical tweets with the Linguistic Inquiry and Word Count (LIWC) 2001 dictionary How to computed his/her LIWC based scores? in each category as the ratio of the number of occurrences of words in that category in one’s tweets and the total number of words in his/her tweets
Influence and Word Use
Finding from analysis LIWC category that negatively correlated with influence score: Negative emotion physical states inhibition
LIWC category that positively correlated with influence score: more interactive positive feelings or emotion determination and desires for the future Finding from previous analysis
Prediction Models Using Weka regression analysis and a classification influence score Evaluate the performance
Regression analysis linear regressions predict influence score using LIWC measures
Classification study 1 supervised binary machine learning algorithms
divide influence scores into 10 equal sized bins, and trained supervised classifiers with 10 classes Classification study 2
Suggestion Adding more criteria for measure influence Other social media Real application – political campaigns
Conclusion Correlations of word usage with influence behavior discovers a set of psycholinguistic categories identify users early to be an influencer