Presentation is loading. Please wait.

Presentation is loading. Please wait.

Piet Daas, Ali Hürriyetoglu

Similar presentations


Presentation on theme: "Piet Daas, Ali Hürriyetoglu"— Presentation transcript:

1 Big Data Sources – Web, Social media and Text Analytics Social media and official statistics (2)
Piet Daas, Ali Hürriyetoglu THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION

2 ESTP Big Data training course nr. 3
Overview Analyse content of tweets If not yet collected then do it now! Topic identification What are people tweeting about Sentiment analysis How do people express their ´feelings´ in a tweet

3 Topic identification Identify topics in tweets by counting frequency of words used in texts Frequency of Hashtags used (#word) Frequency of words used With or without so-called stop words Relation stop words and topics? Example of English ´stop words´ a,able,about,across,after,all,almost,also,am,among,an,and,any,are,as,at,be,because,been,but,by,can,cannot,could,dear,did,do,does,either,else,ever,every,for,from,get,got,had,has,have,he,her,hers,him,his,how,however,i,if,in,into,is,it,its,just,least,let,like,likely,may,me,might,most,must,my,neither,no,nor,not,of,off,often,on,only,or,other,our,own,rather,said,say,says,she,should,since,so,some,than,that,the,their,them,then,there,these,they,this,tis,to,too,twas,us,wants,was,we,were,what,when,where,which,while,who,whom,why,will,with,would,yet,you,your

4 Topic identification (2)
Examples Cuba’s underwater paradise could be a model for sustainable tourism [link] How This Tiny Island Could Change Sustainable Tourism: [link] … #tourism Holiday aka yearly facebook photo shoot Ethan: "In the desert and ready to party" Good morning! 4

5 Sentiment analysis What is sentiment analysis? How?
Aims to determine the attitude of a speaker/writer with respect to a particular topic How? 1) Try to determine the polarity of a text Positive, negative or neutral 2) Go beyond polarity, try to determine emotional state Sad, happy, angry etc. (so-called basic emotions) or go even multiscale (combinations of basic emotions) 5

6 Sentiment: polarity Determine polarity with
Lists of positive and negative words Positive: “happy”, “nice”, “enjoy”, “positive” Negative: “sad”, “depressed”, “I’m down”, “negative” May be tricky for short texts! But with a large enough volume may be do-able 6

7 Sentiment polarity: example
An example of sentiment analysis of a costumers review of a new TV model: “The TV is wonderful. Great size, great picture, easy interface. It makes a cute little song when you boot it up and when you shut it of. However, it is really annoying that it does not play videos from USB.” Green: positive Red: negative Overall findings: positive 5, negative 1 7

8 Sentiment: specific emotions
Determine specific emotions by: List of words associated with a specific emotional state Happy: “happy”, “fulfilled”, “glad”, “complete” Sad: “sad”, “blue”, “down”, “heartbroken” Challenging to create a list of words specific for the emotions studied and with words used in that context on social media. How to validate? 8

9 Sentiment specific emotions: example
Examples of specific sentiments: Happy Sad Scared 9

10 Some example studies Yesterday: Topic identification in Dutch tweets
Topics in Social Cohesion study Sentiment indicator Basic emotions

11 1. Topics in Social Cohesion study
Total Tweets: Period:  64.775 Activity & Sentiment: 11 11

12 Top 10 most popular words Topics Hashtags Horst Sevenum Joa Venray
Veur Merge Nit Oet Toverland Reindonk nieuws #horst #gtst #ajax #psv #pvv #twexit #3fm #weer #koerier #dtv 12

13 Wordcloud of #hashtags
13

14 2. Sentiment indicator Determine sentiment in public Dutch social media messages Huge amounts of Facebook and Twitter messages (#pos - #neg)/#total (day/week/month) Compared : 5 years of data Jan March 2015 First findings were very intriguing 14

15 Daily, weekly, monthly sentiment
15

16 Sentiment per platform
(~10%) (~80%) 16

17 Platform specific results
(weak form) 17

18 Social media sentiment
Schematic overview Previous month Current month Day Day Day Day 22-28 Consumer Confidence Publication date (~20th) Social media sentiment Sentiment 18

19 Results of comparing various periods
Facebook Facebook Facebook + Twitter * Twitter Consumer Confidence *cointegration (weak form) LOOCV results 19

20 Overall findings Correlation and cointegration Granger causality
1st ‘week’ of Consumer confidence usually has 70% response Best correlation and ‘cointegration’ with 2nd ‘week’ of the month Highest correlation 0.93* (all Facebook * specific word filtered Twitter) Granger causality Changes in Consumer confidence precede changes in Social media sentiment For all combinations shown! Only tried linear models so far Prediction Slightly better than random chance Best for the 4th ‘week’ of month 20

21 ‘Sentiment’ indicator for NL (beta-version)
Based on the average sentiment of public Dutch Facebook and Twitter messages 21

22 3) Basic Emotions in Social Media
A number of basic emotions 22

23 First findings 23

24 3. Happy 24 New year New year New year New year New year New year
Easter Easter Easter Easter % messages Happy 24

25 3. Angry January 7th 2015 Charlie Hebdo attack % messages Angry 25

26 3. Sad 26 % messages Sad April 30th 2009 Attack on Queen’s Day
July 17th 2014 MH17 disaster October 7th 2010 Death of Antonie Kamerling January 17th 2012 Death of Piet Römer May 19th 2013 Death of 3 cyclers by hit and run November 15th 2014 Silent march Ferdyan January 17th 2015 Execution Dutchman in Indonesia % messages Sad 26

27 3. Scared 27 Dec. 17th 2009 Snow Dec. 20th 2012 Maya calendar fear
October 1st 2009 Car bomb attack May 5th 2010 Fear for riots June 28th 2011 Heavy weather July 27th 2013 Heavy weather January 7th 2015 Charlie Hebdo attack 27

28 3. Tender May 15th 2013 Sweet, nice girl 28

29 Exercises (continued)

30 Exercises 1) Connect with Twitter API (need the keys)
2) Find a famous Twitter user from your country and extract some basic info id, nr of followers, followers list, description, location (etc) 3) Download n tweets of this user (n = 200) check for duplicates and retweets 4) From tweets get most frequent used Hashtags (#word), links ( mentions 5) Create a wordcloud of words in tweets With and without stopwords, and without hashtags 30

31 Exercises 6) Search for key terms on Twitter and collect n tweets (n = 200) 7) Determine most frequent hashtags, links, mentions 8) Create wordcloud of these tweets 9) Topic detection from tweets (either user or key terms search result) 10) Sentiment analysis, create your own list of 10 positive and 10 negative words, calculate count based score 11) Look for an online classifier (for the language of your tweets), get access key and test it (watch the rate limit) E.g. MonkeyLearn 12) Study emoticons as an example for basic emotions 31

32 Additional exercises Additional tasks:
13) Detect place name, person name, organisation name, number, date recognition, geolocation/temporal characteristics, find similar tweets 14) Apply t-distributed stochastic neighbour embedding (t-SNE) visualization technique on tweets 32


Download ppt "Piet Daas, Ali Hürriyetoglu"

Similar presentations


Ads by Google