Download presentation
Presentation is loading. Please wait.
Published byCorey Hancock Modified over 9 years ago
1
Twitter Analytics: The Sample of the London Olympics Week Two – INFO 480 – Introduction to Data Science
2
Undersanding What’s in Twitter Data 2 Relationships are unique (Golder & Yardi, 2010) – 22% are reciprocal (Kwak et al., 2010) Digging deeper than follower counts (Cha et al., 2010) Context Collapse (Marwick & boyd, 2011) Numerous syntactical features – Retweets – Reply-to – Mentions – Hashtags Device String
3
Retweets 3 RT @[username] “tweet text” Intention and Purpose (boyd et al., 2010) Frequency (Mustafaraj & Metaxas, 2011) Message Valence (Gruzd et al., 2011) Syntactic Structure (Suh et al., 2010) – Users that follow more users are retweeted more (counterintuitive) Crisis Informatics (Starbird et al., 2010; Starbird & Palen, 2012)
4
Conversation (Reply-to & Mentions) 4 Reply-to – @[username] at first position in tweet text Mention – @[username] at any position in tweet text Conversation marker (Honeycutt & Herring, 2009) 3-5 messages 3% of direct addressals were not with @ – Mascaro, Novak & Goggins, 2012 Engaging over controversy (Yardi & boyd, 2010) Measure of relationship strength (Bigonha et al., 2010; Bakshy et al., 2011)
5
Hashtags 5 #[alphanumeric text] no spaces Discourse marker (Huang et al., 2010) – Real-time topical identification (Mathioudakis & Koudas, 2010) Breaks down conversational barriers (Heverin & Zack, 2011; Bruns & Burgess, 2011; Sreenivasan, Lee & Goh, 2011) Diffusion of discourse (Chang, 2010; Szomszor, Kostkova & St. Louis, 2011; Chew & Eysenbach, 2010)
6
Twitter Access Mechanisms 6 Twitter API identifies device/application used for tweet Identifying communities of discourse (Black et al., 2012) Demographic identification (Wohn & Na, 2011) Human or Bot (Chu et al., 2010)
7
URL http:// until next white space Twitter users t.co shortener Need to decode URL multiple times – Other URL shorteners This process is “costly” with large datasets
8
Categorizing Twitter Users Politically 8 Research has categorized users politically by syntactical feature usage and content – Retweets (Conover et al., 2012) – URL’s and memes (Ratkiewicz et al., 2011) – Hashtags and Mentions (Livne et al., 2011; Hanna et al., 2011) “Content Injection”/”poaching” (Livne et al., 2011; Conover et al., 2011) Conversational networks – #Hashtag +/- (Jurgens et al., 2011) – Biased Gatekeepers
9
Syntactical Features
10
The Assignment: Week Two This is first and foremost an analysis assignment and an assignment focused on familiarizing yourself with what R can help you with. A full, working sample is provided on GitHub. If you download the Full Zip File, you will have access to the data under the “Week2” directory” – Set your working directory to “Week2” – Run “Complete.R”. Examine the comments and the resulting files to familiarize yourself with a Description of the data Analysis Questions. Write up a short essay with tables or graphs if needed to describe how you would: – Build a network using the scripts from week1 against the mention connections? Reply-To connections? In this sample data. What transformations are required? How would you filter the data? Use the actual data to ground your thinking. Feel free to actually write or modify the R code samples from the first two weeks to experiment. Some of you will be more comfortable doing this; some will be more comfortable addressing the question conceptually. This is OK. – Submit any issues you encounter to GitHub under this repository I will open a discussion board under our Blackboard Shell regarding the three papers you were assigned to read last week. I expect you to answer the questions and respond to your classmates. Your participation does not need to be long, just thoughtful.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.