Download presentation
Presentation is loading. Please wait.
Published byJemimah Morris Modified over 9 years ago
1
Barteld Braaksma and Kees Zeelenberg “Re-make / Re-model”: Should big data change the modelling paradigm in official statistics?
2
Lay-out of presentation – Sources and modes of inference – Big data examples at Statistics Netherlands – How to use big data? ‐ ‘as is’ ‐ models – But how about quality? – More examples – Conclusions 2
3
Sources for official statistics Always start from observations – Traditional surveys Statistical populations Owned by statistical offices (full control) Costly and burdensome – Administrative sources Administrative populations Owned by government bodies (limited control) Cheaper to obtain – Big (‘organic’) data Unclear populations Owned by private companies (no control) Cost unclear 3
4
Modes of inference in official statistics Main approaches for collecting and processing data – Design-based ‐ Stratified sample survey of sales – Model-assisted ‐ Combine tax data with sales survey (regression) – Model-based ‐ Add up all sales from tax declarations ‐ (small-area estimates) ‐ (seasonal adjustment) ‐ (…) – Sometimes ‘implicit models’ ‐ Imputation of missing values ‐ Preliminary estimates of GDP 4
5
Big data at Statistics Netherlands Experiments discussed today – Traffic detection loops – Social media messages – Mobile phone data Other examples, not discussed here – Scanner data (in production) – Satellite images – Financial transactions – Internet robots (close to production) – Google Trends – PM: Administrative data (in production) 5
6
22 Traffic detection loops: daily pattern
7
Daytime population based on mobile phone data
8
Big data ‘as is’ – Imperfect, yet timely, indicator of trends – “These data exist and that’s why they are interesting” – Example: social media messages ‐ Signals of human activity and feelings 8 Dutch social media activity, 2010-2012
9
What are people talking about on Twitter? 9
10
Sentiment indicator using social media 10
11
Big data and statistics Important issues: – Undercoverage – Selectivity – Volatility – Interpretation – Continuity Traditionalists’ view: – These sources are useless for producing quality statistics Modernists’ view: – We should stop doing surveys, everything is already out there Déjà-vu: – Similar discussions when introducing administrative data… 11
12
How to use big data? – Many methodological issues – No linking variables (often) – Additional information may be available – Possible approach: combine available information ‐ By old or new mathematical methods (often Bayesian) ‐ By integration techniques (“National accounts”-style) – But how about models? 12
13
Examples of models in official statistics – Correction by weighing for non-response – Imputation for item non-response – Seasonal adjustment – Estimates for small areas – Capture-recapture models for hard to observe populations – Preliminary (flash) estimates of GDP – So we are already using models in official statistics! – But we should look carefully at principles and conditions 13
14
Guiding principles of official statistics European Statistical System, mission statement – “We provide the European Union, the world and the public with independent high quality information on the economy and society on European, national and regional levels and make the information available to everyone for decision- making purposes, research and debate.” ESS Code of Practice, principle 6: ‐ “Statistical authorities develop, produce and disseminate European Statistics respecting scientific independence and in an objective, professional and transparent manner in which all users are treated equitably.” ESS Code of Practice, principle 7: – “Sound methodology underpins quality statistics. This requires adequate tools, procedures and expertise.” ESS Code of Practice, principle 12: – “European Statistics accurately and reliably portray reality.” 14
15
So how about quality? For use of models this implies: – Objectivity: ‐ Do not move too far from observed data ‐ Objects and populations for the model correspond to the statistical phenomenon ‐ No forecasting – Reliability: ‐ Extensive specification to guarantee robustness against model failure ‐ No behavioural models 15
16
Some model-based examples – Relation assumed between observations and phenomena – Sophisticated modelling – Trial and error – Signal and noise 16
17
Bayesian recursive filter (single traffic loop) 17
18
EMD-filtered monthly rush hour indicator and expected manufacturing development 18
19
Google Trends for nowcasting (Choi & Varian using a Bayesian regression method) 19
20
Mobile phone data vs. traffic loops: opportunities for integration? 20
21
Conclusions – Big data leads to new opportunities ‐ Better accuracy and more details ‐ More frequent and more timely estimates ‐ Statistics in new areas – Big data based statistics are useful in their own right – Don’t be afraid to use models ‐ Documented and transparent ‐ Well tested ‐ Describe, do not judge 21
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.