Overzicht van ervaringen Social data @ CBS (social media studies) Overzicht van ervaringen Netherlands workshop, 10 Sept Heerlen Piet Daas
Social media Dutch are very active on social media! Around 60% according to a surveyna altijd bij zich en staat vrijwel altijd aan Steeds meer mensen hebben een smartphone! Mogelijke informatiebron voor: Welke onderwerpen zijn actueel: Aantal berichten en sentiment hierover Als meetinstrument te gebruiken voor: . 2 Map by Eric Fischer (via Fast Company)
Social media messages Dutch are very active on Social Media What kind of ‘information’ is shared? Can it be used? It is quickly available! Discuss the results of two studies 1) Content: - Dutch Twitter messages: a ‘sample’ of 12 million 2) Sentiment - Sentiment in Dutch social media messages: ‘all’ > 3.5 billion 3
Dutch Twitter topics 4 12 million messages (3%) (7%) (3%) (10%) (7%) (5%) (46%) 4 12 million messages
Sentiment in Dutch social media About the data Dutch firm that continuously collects ALL public social media messages written in Dutch Dataset of more than 3.5 billion messages! Covering June 2010 till the present Between 3-4 million new messages are added per day About sentiment determination ‘Bag of words’ approach List of Dutch words with their associated sentiment Added social media specific words (‘FAIL’, ‘LOL’, ‘OMG’ etc.) Use overall score to determine sentiment Is either positive, negative or neutral Average sentiment per period (day / week / month) (#positive - #negative)/#total * 100%
Daily, weekly, monthly sentiment
Sentiment per platform (~10%) (~80%)
Platform specific results
Social media sentiment Schematic overview Previous month Current month Vorige maand Maand Day 1-7 Day 8-14 Day 15-21 Day 22-28 Consumer Confidence Publication date (~20th) Sentiment Social media sentiment
Results of comparing various periods Consumer Confidence Facebook Facebook Facebook + Twitter * Twitter *cointegration LOOCV results
Overall findings Correlation and cointegration Granger causality 1st ‘week’ of Consumer confidence usually has 70% response Best correlation and cointegration with 2nd ‘week’ of the month Highest correlation 0.93* (all Facebook * specific word filtered Twitter) Granger causality Changes in Consumer confidence precede changes in Social media sentiment For all combinations shown! Only tried linear models so far Prediction Slightly better than random chance Best for the 4th ‘week’ of month
C C S S I I Overall findings (2) How are Confidence and Sentiment related? ‘Mood of the nation’ and the integral emotion in the Appraisal-Tendency Framework Basis for a rapid sentiment based indicator Could be produced every month on or shortly after the 15th day Could even be produced on a weekly basis (Volatile!) But, what am I comparing here! Units differ for Consumer confidence and Sentiment Confidence: Response of a representative sample of households Sentiment: Messages produced during a specific period ?Representatives? future studies will focus on this phenomenon! I C I C S S
To do Continue to determine development over time Is the association spurious or not? What is the contribution of sentiment data on cons. conf survey data? Message based sentiment -> Username based sentiment - Different? Identify users (profiling techniques; gender, age etc.) - ‘NSA’ related work Experimental figure?
Thank you for your attention ! @pietdaas