Download presentation
Presentation is loading. Please wait.
1
Who Needs Polls? Gauging Public Opinion from Twitter Data David Cummings Haruki Oh Ningxuan (Jason) Wang
2
From Tweets to Poll Numbers Motivation: People spend millions of dollars on polling every year: politics, economy, entertainment Millions of posts on Twitter every day Can we model public opinion using tweets? Data: 476 million tweets from June to December 2009, courtesy of Jure Lescovec Public polls from The Gallup Organization (presidential approval, economic confidence) and Rasmussen Reports (generic Congressional ballot) Goal: high correlation with public opinion polls All correlation figures for 6-day smoothing window
3
Approach 1: Volume The simplest metric: percentage of tweets that mention a given topic in a certain time window Moderate negative correlation (-36.3%, -35.7%) for economy and Congressional ballot: mention things you want to complain about more often Higher correlation (52.4%) for Obama
4
Approach 2: Generic Sentiment Can we distinguish between positive and negative sentiment of tweets? University of Pennsylvania OpinionFinder subjective polarity lexicon “conceited”strong negative-10 “ironic”weak negative-5 “trendy”weak positive+5 “illuminating”strong positive+10 Sum word scores for a tweet to classify it as positive, negative, or neutral; then subtract negative counts from positive counts and normalize over window
5
Approach 2: Generic Sentiment Good results on economic confidence: 60.4% correlation, 70.1% correlation on 15-day window Poor performance on presidential approval and Congressional ballot: -24.5% and 21.5% correlation respectively Sentiment about politics expressed differently?
6
Approach 3: LM-based Classification Train three language models (positive, negative, and neutral) on hand-classified data Classify each tweet according to the language model that affords it the highest probability Applied for the case of Obama: manually classified 3,633 tweets “can we all talk about how awesome Obama is?” “that Obama sticker on your car might as well say ‘Yes I’m stupid’ #tcot #iamthemob #teaparty #glennbeck” Then we tested the language models: best performer was a linearly interpolated bigram model
7
Approach 3: LM-based Classification Much-improved results on presidential approval: 49.4% correlation Throwing out retweets and duplicate tweets helps a little more: 55.9% correlation Finally, combining both volume and LM-based sentiment gives best results: 63.3% correlation, or 69.6% correlation on a 15-day window
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.