Twitter Based Research Benny Bornfeld Mentors Professor Sheizaf Rafaeli Dr. Daphne Raban
Where research meets Bigbird Research Twitter Big Data My Research & Tools My Research & Tools
Research Big Data Twitter
About Twitter Facts – Established in 2006 – ~140 million active users – ~340 million messages per day Superlatives – “the stream of the world’s collective consciousness” – “the first rough draft of history”
How does it work?
Retweet Tweet ReTweet Tweet
Reply
Twitter is used for many different purposes
Power Law distribution
Research Big Data Twitter Research
What is Twitter? Social network! Social Network? Mass Media?
Replace surveys?
Twitter based predictions I Wanted to Predict Elections with Twitter and all I got was this Lousy Paper”
Twitter as a social learning platform
Technological determinism Why the revolution will not be tweeted? Influence What’s the influence of twitter on society? Clay Shirky Malcolm Gladwell
Influence in Twitter How do we measure influence? – Number of followers? – Centrality? – Creating action/reaction? – Viral spreading?
The Message vs the Carrier approaches
Research Twitter Big Data
Online social networks research fields Computers NetworksSociology
Big Data in SN Research Pros: – Exploratory research (vs confirmatory research) – Avoid the sampling reliability issue (power law) – Collect what people are actually saying – Non intrusive – Allow analysis of many dimensions – Catch irregular events
Big Data in SN Research Cons: – Lots of noise – It is sometimes hard to map the data to your research question – Cost of collecting the data – Lack of tools/knowledge on how to store and analyze the data – May come on the expense of theory
Where Research meets Bigbird Research Twitter Big Data My Research & Tools My Research & Tools
Influence the capacity or power of persons or things to be a compelling force on or produce effects on the actions, behavior, opinions, etc., of others
Influence In online social networks Sentiment Valence TweetReTweet
The research question Which is more viral? Which is more likely to spread in a social network (Twitter) ? Messages of negative or positive sentiment valence
The Data Collected ~2 million tweets about new movies Why movies: – People have opinions about movies – People share their opinions about movies – Can compare to other researches (benchmarks)
Collecting the Tweets Twitter provides an API for collecting tweets Up to mid 2010, full data streams were available for free, currently, the rate is very limited (~150/hour) Full data streams (fire hose) are available via a company called GNIP
Tweets Collecting architecture HTTP Streaming JSON RULES FILTER C o ll e c t A p p DB Files JSON parser My App
Data Fields #followers #following #number of tweets klout tweet rate creation date language name description location sender content type (original/RT) post time Device computed fields # of RT Total Exposure Sentiment User Data: Message Data:
Reading Tasks Handle partial messages Handle broken messages Handle duplicate messages Handle special characters
Clean the data Non related messages [build your dream house] Spammers Gibberish messages Normalize the data (e.g. Tweets/Time)
Tools for data analysis Sorting Filtering Counting Histograms Sentiment analysis
Classifying users
Classifying users with cluster analysis
Sentiment Analysis Classify each message to positive/neutral/negative Classification methods – Manual (~10 sec tweet) – Automatic
Sentiment Analysis : Some challenging Tweets examples – Just saw #Footloose with my sisters. The movie fab, and I even spotted my karaoke machine! Did you dolls catch it? – Paranormal Activity 3 seems almost as scary as a level 9 magikarp – My kids want to see Jack and Jill. Its making it hard to love them.
Automatic classifications
Naïve Bayes classifier POS NEG POS NEU POS Machine learning – supervised learning NEU
Naïve Bayes classifier POS NEG POS NEU POS Machine learning – supervised learning NEU NGRAMPOSNEGNEU NEG
Naïve Bayes classifier POS NEG POS NEU POS NGRAM = 2 NEU
references Why the revolution will not be tweeted? Clay Shirky: How social media can make history [ted] Clay Shirky: How social media can make history [ted] Looking At The World Through Twitter Data Twitter mood predicts the stock market Twitter mood predicts the stock market Six Provocations for Big Data Susan Blackmore on memes and "temes“ [ted]