Twitter Based Research Benny Bornfeld Mentors Professor Sheizaf Rafaeli Dr. Daphne Raban.

Slides:



Advertisements
Similar presentations
Learning to Love Social Media What is Twitter? Why use Twitter? Taking a tour – Lingo & navigation How to get started Etiquette Tools & Resources Presented.
Advertisements

C6 Databases.
Influence and Passivity in Social Media Daniel M. Romero, Wojciech Galuba, Sitaram Asur, and Bernardo A. Huberman Social Computing Lab, HP Labs.
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Distant Supervision for Emotion Classification in Twitter posts 1/17.
Handle] [Person Handle 1] [Person Handle 2] [Person Handle 3] [###] Handle] [Description.
Language and Computation Group 18 th November 2011.
Machine Learning Reading: Chapter 18, Agenda and Announcements Machine Learning assignment will go out on Thursday. Tutorial in class on tool for.
Skills: use common abbreviations, shorten URLs, writing tweets, use #hashtag search Twitter Concepts: application program interface (API),
Introduction Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Introduction Facebook How does Facebook use your data? Where do you think.
Forecasting with Twitter data Presented by : Thusitha Chandrapala MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.
| Computer Science Department | Ubiquitous Knowledge Processing Lab | © Prof. Dr. Iryna Gurevych | 1 Communities - Twitter.
Social Media Getting Social in a Digital World. (And, why it matters to your business!)
Intro to Computers Understanding Computers and Computer Literacy.
Hawaii Clean Energy Initiative Online Presence. Social Media Best Practices Leverage Networks Generate “noise” Influence Search Expand Reach.
Most of contents are provided by the website Introduction TJTSD66: Advanced Topics in Social Media Dr.
NLET Microblogging Jingjing Liu Xinyuan Sui Henna Heikkilä.
Social Media Training for CBT College
HOW TO DOMINATE TWITTER Communicating with 140 Characters..
Unit 1—Computer Basics Lesson 1 Understanding Computers and Computer Literacy.
Don’t Follow me : Spam Detection in Twitter January 12, 2011 In-seok An SNU Internet Database Lab. Alex Hai Wang The Pensylvania State University International.
Table of Contents Introduction Why Data Analytics Data Analytics Terminology Predictive Analytics Data Analytics challenges Data Analytics Platform Data.
2014 Lexicon-Based Sentiment Analysis Using the Most-Mentioned Word Tree Oct 10 th, 2014 Bo-Hyun Kim, Sr. Software Engineer With Lina Chen, Sr. Software.
Alvin CHAN Kay CHEUNG Alex YING Relationship between Twitter Events and Real-life.
Building Progressive Communities Using Technology PDA National Field Team.
More than words: Social network’s text mining for consumer brand sentiments Expert Systems with Applications 40 (2013) 4241–4251 Mohamed M. Mostafa Reporter.
| Computer Science Department | Ubiquitous Knowledge Processing Lab | © Prof. Dr. Iryna Gurevych | 1 Knowledge Management in Web 2.0 Comparison.
Social Networking in Politics Alejandro Mandujano.
Big Data Processing of School Shooting Archives
Social Networks Some content from Ding-Zhu Du, Lada Adamic, and Eytan Adar.
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Messages Using Word2Vec
Information Organization: Overview
Advanced Applied IT for Business 2
Queries Over Graph Data: Presidential Election
Sentiment analysis tools
Panagiotis Demestichas
Big-Data Fundamentals
D1 Social Media posting.
Turning Real-Time Data in Real-Time Insight
UNIT 4 - BIG DATA AND PRIVACY
Insight Ahmad Jabi | Yazan Shakhshir | Saleem Abu Dhair
MID-SEM REVIEW.
Emitter: Scalable, fast and secure pub/sub in Go
Correlating Stock Price Shifts with Predictions from Twitter
CALIFORNIA STATE UNIVERSITY, SACRAMENTO
Xerox Social on Demand.
Introduction to Data Programming
Social Media Marketing Analytics 社群網路行銷分析
This meme comes from South Park (S2E )
Twitter Equity Firm Value
Twitter as a novel source of mobility indicators
Big Data.
A Network Science Approach to Fake News Detection on Social Media
Sentiment Analysis of Social Netizens
Unexpected Peer-to-Peer
Search and Retrieval in a Virtual World
The three v’s of big data
Autonomous Network Alerting Systems and Programmable Networks
Big Data Environment. Analysing Public Perceptions of South Africa’s Local Elections by using Geo-located Twitter Data.
Information Organization: Overview
Introduction to Sentiment Analysis
Information & Democracy
Democracy and Information
MIS 5302 Managing Technology and Systems Week 4
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management
Democracy and Information
Information & Democracy
SLIDE DECK 5: Informed Citizenship.
Kaspersky Social Channel
Presentation transcript:

Twitter Based Research Benny Bornfeld Mentors Professor Sheizaf Rafaeli Dr. Daphne Raban

Where research meets Bigbird Research Twitter Big Data My Research & Tools My Research & Tools

Research Big Data Twitter

About Twitter Facts – Established in 2006 – ~140 million active users – ~340 million messages per day Superlatives – “the stream of the world’s collective consciousness” – “the first rough draft of history”

How does it work?

Retweet Tweet ReTweet Tweet

Reply

Twitter is used for many different purposes

Power Law distribution

Research Big Data Twitter Research

What is Twitter? Social network! Social Network? Mass Media?

Replace surveys?

Twitter based predictions I Wanted to Predict Elections with Twitter and all I got was this Lousy Paper”

Twitter as a social learning platform

Technological determinism Why the revolution will not be tweeted? Influence What’s the influence of twitter on society? Clay Shirky Malcolm Gladwell

Influence in Twitter How do we measure influence? – Number of followers? – Centrality? – Creating action/reaction? – Viral spreading?

The Message vs the Carrier approaches

Research Twitter Big Data

Online social networks research fields Computers NetworksSociology

Big Data in SN Research Pros: – Exploratory research (vs confirmatory research) – Avoid the sampling reliability issue (power law) – Collect what people are actually saying – Non intrusive – Allow analysis of many dimensions – Catch irregular events

Big Data in SN Research Cons: – Lots of noise – It is sometimes hard to map the data to your research question – Cost of collecting the data – Lack of tools/knowledge on how to store and analyze the data – May come on the expense of theory

Where Research meets Bigbird Research Twitter Big Data My Research & Tools My Research & Tools

Influence the capacity or power of persons or things to be a compelling force on or produce effects on the actions, behavior, opinions, etc., of others

Influence In online social networks Sentiment Valence TweetReTweet

The research question Which is more viral? Which is more likely to spread in a social network (Twitter) ? Messages of negative or positive sentiment valence

The Data Collected ~2 million tweets about new movies Why movies: – People have opinions about movies – People share their opinions about movies – Can compare to other researches (benchmarks)

Collecting the Tweets Twitter provides an API for collecting tweets Up to mid 2010, full data streams were available for free, currently, the rate is very limited (~150/hour) Full data streams (fire hose) are available via a company called GNIP

Tweets Collecting architecture HTTP Streaming JSON RULES FILTER C o ll e c t A p p DB Files JSON parser My App

Data Fields #followers #following #number of tweets klout tweet rate creation date language name description location sender content type (original/RT) post time Device computed fields # of RT Total Exposure Sentiment User Data: Message Data:

Reading Tasks Handle partial messages Handle broken messages Handle duplicate messages Handle special characters

Clean the data Non related messages [build your dream house] Spammers Gibberish messages Normalize the data (e.g. Tweets/Time)

Tools for data analysis  Sorting  Filtering  Counting  Histograms  Sentiment analysis

Classifying users

Classifying users with cluster analysis

Sentiment Analysis Classify each message to positive/neutral/negative Classification methods – Manual (~10 sec tweet) – Automatic

Sentiment Analysis : Some challenging Tweets examples – Just saw #Footloose with my sisters. The movie fab, and I even spotted my karaoke machine! Did you dolls catch it? – Paranormal Activity 3 seems almost as scary as a level 9 magikarp – My kids want to see Jack and Jill. Its making it hard to love them.

Automatic classifications

Naïve Bayes classifier POS NEG POS NEU POS Machine learning – supervised learning NEU

Naïve Bayes classifier POS NEG POS NEU POS Machine learning – supervised learning NEU NGRAMPOSNEGNEU NEG

Naïve Bayes classifier POS NEG POS NEU POS NGRAM = 2 NEU

references Why the revolution will not be tweeted? Clay Shirky: How social media can make history [ted] Clay Shirky: How social media can make history [ted] Looking At The World Through Twitter Data Twitter mood predicts the stock market Twitter mood predicts the stock market Six Provocations for Big Data Susan Blackmore on memes and "temes“ [ted]