The user as data detective

Slides:



Advertisements
Similar presentations
The Use of Administrative Sources for Economic Statistics An Overview Steven Vale Office for National Statistics UK.
Advertisements

Trade and business statistics: use of administrative data Lunch Seminar Enrico Giovannini Italian National Statistical Institute (ISTAT) New York, February,
United Nations Economic Commission for Europe Statistical Division Applying the GSBPM to Business Register Management Steven Vale UNECE
Research Methods in Computer Science Lecture: Quantitative and Qualitative Data Analysis | Department of Science | Interactive Graphics System.
Record matching for census purposes in the Netherlands Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands.
Multiply Decimals 1.Multiply as with whole numbers. 2.Count the total numbers in decimal place value for both factors 3.Place as many numbers of your FINAL.
Valentina Stoevska ILO Department of Statistics Workshop on MDG Data Reconciliation: Employment Indicators, Beirut, July
Notes 1.3 (Part 1) An Overview of Statistics. What you will learn 1. How to design a statistical study 2. How to collect data by taking a census, using.
The hidden side of successful story – implication of wide use of administrative data sources at national statistical institutes Metka Zaletel, Irena Križman.
Workshop on MDG, Bangkok, Jan.2009 MDG 3.2: Share of women in wage employment in the non-agricultural sector National and global data.
Copyright © 2009 Pearson Education, Inc. 4.1 What Is Average? LEARNING GOAL Understand the difference between a mean, median, and mode and how each is.
4-6 September 2013, Vilnius Quality in Statistics: Administrative Data and Official Statistics USING ADMINISTRATIVE DATA SOURCES IN OFFICIAL.
Chapter 5: Monitoring Jobs and Inflation
Sampling Distributions
Job Evaluation & Base Wage Systems
Unemployment – what does this show?
Posted workers and foreign companies in Danish construction
Documentation for Pay and Calculations
Missing data: Why you should care about it and what to do about it
Spontaneous recognition: Risk or distraction
Planning my research journey
2a. WHO of RESEARCH Quantitative Research
Disclosure scenario and risk assessment: Structure of Earnings Survey
Hilary Drewa, Felix Ritchiea, Michail Veliziotisb, Damian Whittarda
Warm Up: You have a job at Foodland and get paid $11.50/hr. You work 4 hours each week. How much will your pay cheque be after 2 weeks? -When finished.
Towards more flexibility in responding to users’ needs
RESEARCH METHODS Lecture 43
Rethinking data: Get creative!
Math in Science + Graphs
UNDP Bratislava Regional Center
Business Cycles and Unemployment
Paper F2 Management Accounting
Unit 1 Lesson 2 Scientific Investigations
Generic Statistical Business Process Model (GSBPM)
Fundamentals of Statistics
Writing the executive summary section of your report
Tomaž Špeh, Rudi Seljak Statistical Office of the Republic of Slovenia
5.1 Survey frame methodology
Motivating Workers.
6.1 Quality improvement Regional Course on
4.1 What Is Average? LEARNING GOAL
Unit 1 Lesson 2 Scientific Investigations
State of play of Urban Audit
Issues in Administrative Data
Cost Estimation I've got Bad News and Bad News!.
Concepts of industry, occupation and status in employment - Overview
Chapter 13: Economic Challenges Section 1
PRODCOM SURVEY IN THE UNITED KINGDOM
Administrative Data and their Use in Economic Statistics
LAMAS January 2016 Agenda Item 2.1 Structure of Earnings Survey (SES) Eusebio Bezzina Jean Thill.
6A Types of Data, 6E Measuring the Centre of Data
Chapter 13: Economic Challenges Section 1
Jeroen Pannekoek, Sander Scholtus and Mark van der Loo
Data validation handbook
Lecture 1: Descriptive Statistics and Exploratory
Chapter 8: Estimating with Confidence
Two Halves to Statistics
Parallel Session: BR maintenance Quality in maintenance of a BR:
Chapter 8: Estimating with Confidence
Research Methods & Statistics
Unit 2 – Methods Objective 1 Describe quantitative and qualitative  methods such as surveys, polls, and statistics used in sociological research.  Objective.
What do Samples Tell Us Variability and Bias.
The role of metadata in census data dissemination
Chapter 8: Estimating with Confidence
Karin Blix, Statistics Denmark,
Chapter 8: Estimating with Confidence
The Handy Dandy Guide 1. People choose. 2. People’s choices always have costs. 3. People respond to incentives in predictable ways. 4. People create economic.
Integration of inconsistent data sources using Hidden Markov Models (HMMs) Paulina Pankowska, Bart Bakker, Daniel Oberski & Dimitris Pavlopoulos.
A handbook on validation methodology. Metrics.
Pre-training competencies and the productivity of apprentices
Presentation transcript:

The user as data detective Presentation by Felix Ritchie Bristol Business School Budapest 21.10.16

Pressures on data collection More complexity in data sources linked, multiple sources data sourced from administrative systems changing definitions Greater demands for detail in aggregates Greater demands for microdata Limited resources at National Statistics Institutes (NSIs) and others greater use of statistical editing

Quality/resource trade-offs Aggregate statistics End Means Means or End? Microdata Resources Difficult to satisfy all demands

How can the user help? Different things matter to microdata users outliers multivariate characteristics and breakdowns measurement error in respect of multivariate bias genuine data, not imputation or estimation subsets Users bring different skills no adherence to quality or aggregation guidelines expertise on relationships between variables extended timelines different coding skills

Example: compliance with minimum wages Statutory minimum wage in the UK 3 survey datasets for checking compliance ONS: employer and employee surveys Department for Business: survey of apprentice pay ONS validates its own data as usual 1 extra rule: re-check response if wage appears to fall below the minimum Low Pay Commission (LPC) analyses validated ONS data complex code to break down data into sub-population estimates

Why use minimum wage compliance to study quality? three different datasets to triangulate yes/no nature makes data problems stand out more measurement error per se matters

Machine precision matters Things we’ve found: 1 Machine precision matters Estimated rate of non-compliance Number of decimal places used in calculation

Data sources can give very different answers Things we’ve found: 2 Data sources can give very different answers

Data quality is a function of other variables Things we’ve found: 3 Data quality is a function of other variables

Some errors can be obvious – when you draw the pictures Things we’ve found: 4 Some errors can be obvious – when you draw the pictures

Errors can be predictable Things we’ve found: 5 Errors can be predictable

Things we’ve found: 6 Definitions need to reflect data LPC defines ‘minimum wage worker’ as earning less than NMW+5p We define it as earning up to the next 10p boundary Effect on MWW counts using a “next 10p” rule

Effect of rounding in monthly hours calculation Things we’ve found: 7 We need to understand data collection ONS employer survey asks for data to 2 decimal places For monthly paid workers, employers multiply weekly hours by 4.348 Apprentices paid monthly at the minimum wage rate almost always recorded as ‘below minimum wage’ Effect of rounding in monthly hours calculation

Lessons from other areas In other work we’ve found observations missing values systematically missing ‘impossible’ values occurring conflicts between sources some data has no value documentation lacking institutional knowledge lost but generally microdata analysis confirms data quality No reason to believe ONS better or worse than any other NSI…

What have we learned? Problems with data aggregation interpretation Not amenable to NSI production systems resources dimensionality purpose Microdata users are expert persistent responsive to positive engagement cheap!