Errors in Surveys Training Course «Quality Management and

Slides:

Advertisements

Similar presentations

Paul Smith Office for National Statistics

Advertisements

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.

CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.

Who and How And How to Mess It up

Data Sources The most sophisticated forecasting model will fail if it is applied to unreliable data Data should be reliable and accurate Data should be.

Chapter 4 Multiple Regression.

Statistical Treatment of Data Significant Figures : number of digits know with certainty + the first in doubt. Rounding off: use the same number of significant.

Chapter 7 Correlational Research Gay, Mills, and Airasian

FINAL REPORT: OUTLINE & OVERVIEW OF SURVEY ERRORS

Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: prediction Original citation: Dougherty, C. (2012) EC220 - Introduction.

OECD Short-Term Economic Statistics Working PartyJune Analysis of revisions for short-term economic statistics Richard McKenzie OECD OECD Short.

1 PREDICTION In the previous sequence, we saw how to predict the price of a good or asset given the composition of its characteristics. In this sequence,

Arun Srivastava. Types of Non-sampling Errors Specification errors, Coverage errors, Measurement or response errors, Non-response errors and Processing.

Giovanna Brancato, Marina Signore Istat Work Session on Statistical Metadata (METIS) Metadata and Quality Indicators Reuse for Quality reporting Geneva,

Joint UNECE/Eurostat Meeting on Population and Housing Censuses (28-30 October 2009) Accuracy evaluation of Nuts level 2 hypercubes with the adoption of.

Chapter Nine Copyright © 2006 McGraw-Hill/Irwin Sampling: Theory, Designs and Issues in Marketing Research.

Q2010, Helsinki Development and implementation of quality and performance indicators for frame creation and imputation Kornélia Mag László Kajdi Q2010,

Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.

Eurostat Overall design. Presented by Eva Elvers Statistics Sweden.

Copyright 2010, The World Bank Group. All Rights Reserved. Managing processes Core business of the NSO Part 2 Strengthening Statistics Produced in Collaboration.

Managerial Economics Demand Estimation & Forecasting.

Gile Sampling1 Sampling. Fundamental principles. Daniel Gile

for statistics based on multiple sources

Eurostat Statistical matching when samples are drawn according to complex survey designs Training Course «Statistical Matching» Rome, 6-8 November 2013.

Copyright 2010, The World Bank Group. All Rights Reserved. Principles, criteria and methods Part 2 Quality management Produced in Collaboration between.

Eurostat Weighting and Estimation. Presented by Loredana Di Consiglio Istituto Nazionale di Statistica, ISTAT.

Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.

Eurostat Accuracy of Results of Statistical Matching Training Course «Statistical Matching» Rome, 6-8 November 2013 Marcello D’Orazio Dept. National Accounts.

How to deal with quality aspects in estimating national results Annalisa Pallotti Short Term Expert Asa 3st Joint Workshop on Pesticides Indicators Valletta.

Session topic (i) – Editing Administrative and Census data Discussants Orietta Luzi and Heather Wagstaff UNECE Worksession on Statistical Data Editing.

Outline Sampling Measurement Descriptive Statistics:

Multiple Regression.

The simple linear regression model and parameter estimation

Business Research Methods William G. Zikmund

Sample design strategies

Statistics in Management

Implementation of Quality indicators for administrative data

Chapter 7. Classification and Prediction

The treatment of uncertainty in the results

Point and interval estimations of parameters of the normally up-diffused sign. Concept of statistical evaluation.

Understanding Results

4.1. Data Quality 1.

Sampling Design.

An Active Collection using Intermediate Estimates to Manage Follow-Up of Non-Response and Measurement Errors Jeannine Claveau, Serge Godbout and Claude.

Survey phases, survey errors and quality control system

Introduction to Instrumentation Engineering

Evaluation of measuring tools: reliability

Multiple Regression.

Survey phases, survey errors and quality control system

Discrete Event Simulation - 4

The European Statistical Training Programme (ESTP)

Principal Component Analysis

Measurement errors Marina Signore

Disseminating ICT data

Istat - Structural Business Statistics

Marketing Research: Course 4

CHAPTER 2: Basic Summary Statistics

New Techniques and Technologies for Statistics 2017 Estimation of Response Propensities and Indicators of Representative Response Using Population-Level.

Analyzing Reliability and Validity in Outcomes Assessment

Lecture Slides Elementary Statistics Twelfth Edition

Sampling and estimation

Quality Reporting in CBS

Survey Quality Measurement» Rome, September 2013

ESS Quality and Performance Indicators

Chapter 13: Item nonresponse

MGS 3100 Business Analysis Regression Feb 18, 2016

EC 217 MEASUREMENTS AND INSTRUMENTATION

Presentation transcript:

Errors in Surveys Training Course «Quality Management and Survey Quality Measurement» Rome, 24-27 September 2013 Marcello D’Orazio, Giovanna Brancato Istat {madorazi, brancato}@istat.it

Outline Quality definition Accuracy Sampling and nonsampling errors Main sources of nonsampling errors Nature of errors Bias and variance Impact of errors on survey estimates Reliability and data revision

Eurostat Quality dimensions The quality of the statistics “produced” by a process is evaluated in terms of: Relevance Accuracy & reliability Timeliness & Punctuality Accessibility & Clarity Comparability & Coherence 3

Types of statistical production processes Traditional surveys (sample surveys or censuses) Processes involving the usage of one or more Administrative Registers Statistical compilations (e.g. those in National Accounts division) 4

Errors in the production processes A statistical production process consists of a series of phases performed more or less in sequence. In each phase there are various potential sources of error that may affect the quality of the final “product” (point estimates, tables, ...). Usually the errors decrease the accuracy of the final statistics “Accuracy: the difference between the estimate and the true parameter value”. It reflects the “closeness” between the estimate and the true (unknown) value of a parameter of the population under study. Moreover the errors affect, directly or indirectly, others components of the quality (comparability, timeliness, …) 5

few errors in the data = high accuracy of the final estimates Errors in statistical production processes (cont.ed) Users often believe that the quality of the final estimates directly depends on the amount of errors in the collected data: few errors in the data = high accuracy of the final estimates This thinking is partly true The quality of the estimates also depends on the data that were NOT collected, intentionally (because it was considered just sample) or unintentionally due to a series of errors (nonresponse, etc.) 6

Errors: sampling and nonsampling the accuracy of the final estimates depends on two types of error: error due to sampling (sampling error) errors that do NOT depend on the sampling (nonsampling errors) The sampling error disappears in the census. 7

Sampling error It is an intentional error: it is decided to observe just a subset of the units belonging to the population under investigation The amount of sampling error is directly related to the sample size. By fixing in advance the maximum acceptable sampling error (CV when expressed in relative terms) it possible to determine the size of the sample to be drawn (or vice versa) Sampling error disappears if sample size = size of the population (eg. Census) In most of the cases the sampling error can be estimated by using the data collected on the sample

Nonsampling errors These are errors that can occur at any stage of the production process (data collection, processing, ...) Can NOT can be programmed in advance and is difficult to keep them under control. Their extent tends to increase with increasing number of units to be observed and number of phases in the process The estimate of their impact on the final estimates requires the application of ad hoc methods or additional surveys (e.g. control surveys) 9

Nonsampling errors Typically, non-sampling errors have a greater impact on the final estimates than sampling error: 10

Trade-off between sampling and nonsampling error Small sample: high sampling error small nonsampling error Large sample: small sampling error high nonsampling error Optimal survey design: balance between sampling and nonsampling error so as to minimize the total survey error given the budget constraints and the available resources 11

Types of nonsampling errors* Errors of non-observation Frame errors: errors in the list (frame) of units belonging to the target population of the survey Nonresponse errors: errors due to the non observation of information on units which should be surveyed Errors of observation Measurement errors: difference among the observed values and the corresponding true values in surveyed units Processing Errors: errors which are introduced in the collected data during one or more the data processing phases *Biemer and Lyberg (2003) consider the “specification error” too

Frame errors The frame should contain all and only the units of the target population (eligible units) and, for each unit the auxiliary information necessary to locate/contact it Frame errors arise from discrepancies between the target population and the population listed in the frame due to: omissions, erroneous inclusions, duplications, misclassifications of the units in the frame and to incorrect, incomplete, not up-to-date information on units’ characteristics 13

Two types of non response: Nonresponse errors Non response error represents a failed attempt to obtain the desired information from one eligible unit (selected in the sample in the sample surveys) Two types of non response: unit nonresponse corresponds to a failed attempt to obtain any information from one sample/population unit Partial (or item) nonresponse occurs when a respondent provides some, but not all, the information required, or if the provided information cannot be considered valid 14

Measurement errors It is defined as the deviation between the value of a variable of a respondent observed in data collection phase and the true value of that variable on that unit (sometimes referred as response errors) Deviations may originate from the data collection mode, the respondent, the interviewer or the questionnaire. They include: errors resulting of confusion, ignorance, negligence or dishonesty of respondent; errors attributable to the interviewer, which may be due to poor or inadequate training, a priori expectations about the answers, or voluntary errors; errors attributable to the wording of the questions in the questionnaire, the order or the context in which the questions are presented, and the technique used to get the answers 15

Main sources of processing errors: The processing errors (or measurement error in a broad sense) refer to errors that are introduced in the data once they have been collected, during the steps of coding, data entry, editing and imputation, etc... before the final estimates are produced Main sources of processing errors: Typing errors (data entry and data coding) Errors due to misinterpretation (data coding) Errors in the localisation (editing) and correction of errors Errors in applying a model to the data (processing) 16

Impact on accuracy of sampling and nonsampling errors In order to understand how the errors affect the accuracy it is important to identify: The quantity to estimate The characteristic and the complexity of the production process The estimator applied (function of the available data used to derive the estimate): linear or nonlinear function The nature of the error: random or systematic 17

The nature of the errors Example in the case of measurement error: systematic term of the error : in an hypothetical high number of repetitions of the measurement, the errors are always in the same direction: below or over the true value Example: intentional underreporting of income random term of the error: ‘fluctuations’ around the true measure due to randomness in an hypothetical high number of repetitions of the measurement; the errors sum up to zero 18

Example with random measurement errors Let consider a high number T of repetitions, independent and under the same conditions, of the measurement on a given unit : true value : measurement obtained at the t-th occasion In the presence of just random errors it is expected , but if we sum out over the T measurements obtained it comes out: The random fluctuations of around are measured in terms of variance: 19

Example with systematic measurement errors In the presence of just systematic errors it is expected or , and if we sum out over the T measurements obtained it comes out: The average distance between the observed values and the true value is said bias The evaluation of the bias requires the knowledge of the true value. 20

Summarizing random and systematic measurement errors In the presence of both random and systematic errors, we have to deal with variance and bias The are usually summarized by the Mean Square Error (MSE): MSE = Variance + [bias]2 21

MSE decomposition in surveys Let suppose that the sample survey objective is estimating the population parameter Assume the is the estimator chosen to obtain an estimate of with the data available in a given survey occasion. In the hypothetical case of independent repetitions of the survey process under the same conditions, it comes out that the MSE associated to Esurvey refers to the average over the hypothetical independent repetitions of the survey process under the same conditions. 22

Impact of errors on survey estimates The impact of random errors (due to sampling or not) can be attenuated by increasing the sample size. On the other hand, increasing the number of units to be observed does not have any effect on systematic errors On the contrary, we expect that the amount of the systematic error will increase when the number of observed units increases. For this reason, when the objective is to estimate averages, percentages, etc.. more concern is on non-random component, which introduces bias in the estimates. 23

Trade-off variance-bias Large variance and small bias Large bias and small variance 24

Impact of survey errors in terms of variance and bias Potential impact of survey errors on the final survey estimates in terms of systematic and variable errors when the estimators are linear functions of observed values 25

Impact of errors on survey estimates with nonlinear estimators For nonlinear estimators (e.g. correlation coefficient, regression estimates, …) it is not simple to identify the most damaging errors. For these types of estimates, both systematic and variable errors can lead to bias. It is known that in presence of variable errors the estimates of regression (and correlation) coefficients are attenuated, i.e. the estimated relationship among variables is lower than it really is. The effect of systematic errors on the estimation of regression coefficients can not be easily predicted. 26

Reliability When dealing with outputs from a complex statistical process involving multiple data sources (ARa and/or survey data) it can be difficult to assess the accuracy because of the many sampling and nonsampling errors involved in the different sources being considered When the different input data sources may be available or updated at different times, it is a common practice to provide preliminary estimates and then update them when new input data become available. In this context, one can look the closeness of the estimates initially released to the subsequent released estimates, since it is reasonable to assume that estimates converge towards the true value as they are based on more and more reliable sources. 27

Reliability: Data revision The analysis of the degree of closeness of initial estimates to subsequent or final estimates give rises the analysis of revisions Assessing reliability is based on the analysis of revisions Revisions of estimates being considered are those explicitly foreseen by the revision policy. 28

Reliability: causes of revisions Causes of ‘routine’ revisions are: Incorporation of more complete, updated and error-free data sources Revision due to time series adjustments (seasonal adjustment model) improvements in statistical methods Causes of exceptional revisions are: Change in concepts, classifications, definitions (e.g. rebasing) Methodological improvements (e.g. change in the estimation method; use of new data sources, etc.) 29

Reliability: Data revision indicators In the analysis of revisions, in general the interest is in: direction of the revisions, usually measured in terms of average size of revisions: positive sign denotes tendency to underestimation negative sign indicates tendency to overestimation Stability of revisions, usually measured in terms of average size of revisions in absolute terms Looking at the direction of revisions is a sort of evaluation of bias. Stability is the equivalent of looking at the variance The OECD has a considerable experience on Revision Indicators (http://www.oecd.org/std/oecdeurostatguidelinesonrevisionspolicyandanalysis.htm) 30

Selected references Biemer, P.P.; Lyberg L.E. (2003). Introduction to survey quality. Hoboken, New Jersey: John Wiley & Sons. Biemer, Groves, Lyberg, Mathiowetz, Sudman,(1991) Measurement errors in survey. John Wiley & Sons Eurostat (2003), “Definition of Quality in Statistics”. Eurostat Working Group on Assessment of Quality in Statistics, Luxembourg, October 2-3. Eurostat (2009), “ESS Handbook for Quality Reports”, Metholodogies and working papers, Luxembourg. FCSM (2001) “Measuring and Reporting Sources of Error in Surveys”. Federal Committee on Statistical Methodology, Statistical Policy Working Paper 31. http://www.fcsm.gov/01papers/SPWP31_final.pdf Lessler, J., and Kalsbeek, W. (1992) Nonsampling Errors in Surveys. Wiley, New York. Särndal C. E., Swensson, B., Wretman, L. (1992) Model Assisted Survey Sampling. Springer-Verlag, New York.