Barteld Braaksma and Kees Zeelenberg “Re-make / Re-model”: Should big data change the modelling paradigm in official statistics?

Slides:



Advertisements
Similar presentations
Federal Statistics in an Age of a Self-Monitoring Social and Economic Eco-System Robert M. Groves US Census Bureau.
Advertisements

Migration of a large survey onto a micro-economic platform Val Cox April 2014.
Will ‘big data’ transform official statistics?
Statistics : Role in Research. Statistics: A collection of procedures and processes to enable researchers in the unbiased pursuit of Knowledge Statistics.
Metadata to Support the Survey Life Cycle Alice Born, Statistics Canada Joint UNECE/Eurostat/OECD Work Session on Statistical Metadata (METIS) Geneva,
Enhancing Data Quality of Distributive Trade Statistics Workshop for African countries on the Implementation of International Recommendations for Distributive.
Big Data (and official statistics) Piet Daas and Mark van der Loo* Statistics Netherlands MSIS 2013, April 25, Paris * With contributions of: Edwin de.
Frank Yu Australian Bureau of Statistics Unstructured Data 1.
Regional Workshop for African Countries on Compilation of Basic Economic Statistics Pretoria, July 2007 Administrative Data and their Use in Economic.
Learning Goals Explain the importance of information to the company
Chapter 12 - Forecasting Forecasting is important in the business decision-making process in which a current choice or decision has future implications:
ONS Big Data Project. Plan for today Introduce the ONS Big Data Project Provide a overview of our work to date Provide information about our future plans.
Introduction to Communication Research
Marketing Research Unit 7.
Human Rights Monitoring and Reporting. What is human rights monitoring and how does it differ from similar activities? Human rights monitoring is a broad.
Estimating Potentials and Forecasting Sales
SCIENTIFIC INVESTIGATION
Max Booleman Kees Zeelenberg Quality with variable inputs.
Communication and dissemination of indicators Soong Sup Lee, World Bank.
Volunteer Angler Data Collection and Methods of Inference Kristen Olson University of Nebraska-Lincoln February 2,
Developing Business Practice –302LON Using data in your studies Unit: 5 Knowledgecast: 2.
Standardization and Test Development Nisrin Alqatarneh MSc. Occupational therapy.
List frames area frames and administrative data, are they complementary or in competition? Elisabetta Carfagna University of Bologna Department of Statistics.
Big Data Activities at Eurostat Workshop on Statistical Data Collection, 29 Apr – 1 May 2015, Washington D.C, USA
Quality issues on the way from survey to administrative data: the case of SBS statistics of microenterprises in Slovakia Andrej Vallo, Andrea Bielakova.
ILO Department of Statistics1 ILO experience in quickly estimating the impact of financial crisis on the global labour market International Seminar on.
Looking Backward, Guessing Forward Robert M. Groves US Census Bureau October 28, 2011.
Topic (ii): New and Emerging Methods Maria Garcia (USA) Jeroen Pannekoek (Netherlands) UNECE Work Session on Statistical Data Editing Paris, France,
Copyright © 2007 Pearson Education Canada 1 Chapter 14: Completing the Tests in the Sales and Collection Cycle: Accounts Receivable.
Survey Methodology Lilian Ma November 6, Three aspects 1. How questions were designed 2. How data was collected 3. How samples were drawn Probability.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 1. The Statistical Imagination.
Use of Administrative Data Seminar on Developing a Programme on Integrated Statistics in support of the Implementation of the SNA for CARICOM countries.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 1-4 Critical Thinking.
Weighting and estimation methods: description in the Memobust handbook Loredana di Consiglio, Fabrizio Solari 2013 European Establishment Statistics Workshop.
© Federal Statistical Office Germany, Division IB, Institute for Research and Development in Federal Statistics Sheet 1 Surveys, administrative data or.
Copyright 2010, The World Bank Group. All Rights Reserved. Principles, criteria and methods Part 2 Quality management Produced in Collaboration between.
Eurostat Weighting and Estimation. Presented by Loredana Di Consiglio Istituto Nazionale di Statistica, ISTAT.
Non-observed economy in Kyrgyz Republic The National Statistical Committee of Kyrgyz Republic Sultanaliev M.K. – Leading specialist of the Department of.
Why register-based statistics? Eric Schulte Nordholt Statistics Netherlands Division Social and Spatial Statistics Department Support and Development Section.
Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental.
1 The Role of Statistics in Engineering ENM 500 Chapter 1 The adventure begins… A look ahead.
Fall 2009 Dr. Bobby Franklin.  “... [the] systematic, controlled empirical and critical investigation of natural phenomena guided by theory and hypotheses.
Compilation of Meta Data Presentation to OG6 Canberra, Australia May 2011.
Audit Evidence Process
General Recommendations on STS Carsten Boldsen Hansen Economic Statistics Section, UNECE UNECE Workshop on Short-Term Statistics (STS) and Seasonal Adjustment.
United Nations Statistics Division Dissemination of IIP data.
Copyright 2010, The World Bank Group. All Rights Reserved. Principles, criteria and methods Part 1 Quality management Produced in Collaboration between.
1 European Statistics Code of Practice. I.Institutional Environment Principle II.Statistical processes Principle III.Statistical Output Principle.
A Training Course for the Analysis and Reporting of Data from Education Management Information Systems (EMIS)
Administrative Data and Official Statistics Administrative Data and Official Statistics Principles and good practices Quality in Statistics: Administrative.
Research Design
Implementation of Quality indicators for administrative data
Anna Borowska Civil Service Director
Hallmarks of scientific research
Developing a Methodology
4.1. Data Quality 1.
Nettest An implementation of BEREC’s recommendations
FME461 Engineering Design II
Camilla Stoltenberg IANPHI Annual Meeting Roma, 24 October 2017
Big Data Econometrics: Nowcasting and Early Estimates
“Managing Modern National Statistical Systems in Democratic Societies”
Data Validation in the ESS Context
Who are the users and what they want
Transformation of the National Statistical System: Experience
Managerial Decision Making and Evaluating Research
Ethical Implications of using Big Data for Official Statistics
Kees Zeelenberg, Winfried Ypma, Peter Struijs; Statistics Netherlands
Introduction Marco Puts
European Statistics Code of Practice
Big Data in Official Statistics: Generalities
Presentation transcript:

Barteld Braaksma and Kees Zeelenberg “Re-make / Re-model”: Should big data change the modelling paradigm in official statistics?

Lay-out of presentation – Sources and modes of inference – Big data examples at Statistics Netherlands – How to use big data? ‐ ‘as is’ ‐ models – But how about quality? – More examples – Conclusions 2

Sources for official statistics Always start from observations – Traditional surveys Statistical populations Owned by statistical offices (full control) Costly and burdensome – Administrative sources Administrative populations Owned by government bodies (limited control) Cheaper to obtain – Big (‘organic’) data Unclear populations Owned by private companies (no control) Cost unclear 3

Modes of inference in official statistics Main approaches for collecting and processing data – Design-based ‐ Stratified sample survey of sales – Model-assisted ‐ Combine tax data with sales survey (regression) – Model-based ‐ Add up all sales from tax declarations ‐ (small-area estimates) ‐ (seasonal adjustment) ‐ (…) – Sometimes ‘implicit models’ ‐ Imputation of missing values ‐ Preliminary estimates of GDP 4

Big data at Statistics Netherlands Experiments discussed today – Traffic detection loops – Social media messages – Mobile phone data Other examples, not discussed here – Scanner data (in production) – Satellite images – Financial transactions – Internet robots (close to production) – Google Trends – PM: Administrative data (in production) 5

22 Traffic detection loops: daily pattern

Daytime population based on mobile phone data

Big data ‘as is’ – Imperfect, yet timely, indicator of trends – “These data exist and that’s why they are interesting” – Example: social media messages ‐ Signals of human activity and feelings 8 Dutch social media activity,

What are people talking about on Twitter? 9

Sentiment indicator using social media 10

Big data and statistics Important issues: – Undercoverage – Selectivity – Volatility – Interpretation – Continuity Traditionalists’ view: – These sources are useless for producing quality statistics Modernists’ view: – We should stop doing surveys, everything is already out there Déjà-vu: – Similar discussions when introducing administrative data… 11

How to use big data? – Many methodological issues – No linking variables (often) – Additional information may be available – Possible approach: combine available information ‐ By old or new mathematical methods (often Bayesian) ‐ By integration techniques (“National accounts”-style) – But how about models? 12

Examples of models in official statistics – Correction by weighing for non-response – Imputation for item non-response – Seasonal adjustment – Estimates for small areas – Capture-recapture models for hard to observe populations – Preliminary (flash) estimates of GDP – So we are already using models in official statistics! – But we should look carefully at principles and conditions 13

Guiding principles of official statistics European Statistical System, mission statement – “We provide the European Union, the world and the public with independent high quality information on the economy and society on European, national and regional levels and make the information available to everyone for decision- making purposes, research and debate.” ESS Code of Practice, principle 6: ‐ “Statistical authorities develop, produce and disseminate European Statistics respecting scientific independence and in an objective, professional and transparent manner in which all users are treated equitably.” ESS Code of Practice, principle 7: – “Sound methodology underpins quality statistics. This requires adequate tools, procedures and expertise.” ESS Code of Practice, principle 12: – “European Statistics accurately and reliably portray reality.” 14

So how about quality? For use of models this implies: – Objectivity: ‐ Do not move too far from observed data ‐ Objects and populations for the model correspond to the statistical phenomenon ‐ No forecasting – Reliability: ‐ Extensive specification to guarantee robustness against model failure ‐ No behavioural models 15

Some model-based examples – Relation assumed between observations and phenomena – Sophisticated modelling – Trial and error – Signal and noise 16

Bayesian recursive filter (single traffic loop) 17

EMD-filtered monthly rush hour indicator and expected manufacturing development 18

Google Trends for nowcasting (Choi & Varian using a Bayesian regression method) 19

Mobile phone data vs. traffic loops: opportunities for integration? 20

Conclusions – Big data leads to new opportunities ‐ Better accuracy and more details ‐ More frequent and more timely estimates ‐ Statistics in new areas – Big data based statistics are useful in their own right – Don’t be afraid to use models ‐ Documented and transparent ‐ Well tested ‐ Describe, do not judge 21