Download presentation
Presentation is loading. Please wait.
Published byAino Salo Modified over 5 years ago
1
Errors in Surveys Training Course «Quality Management and
Survey Quality Measurement» Rome, September 2013 Marcello D’Orazio, Giovanna Brancato Istat {madorazi,
2
Outline Quality definition Accuracy Sampling and nonsampling errors Main sources of nonsampling errors Nature of errors Bias and variance Impact of errors on survey estimates Reliability and data revision
3
Eurostat Quality dimensions
The quality of the statistics “produced” by a process is evaluated in terms of: Relevance Accuracy & reliability Timeliness & Punctuality Accessibility & Clarity Comparability & Coherence 3
4
Types of statistical production processes
Traditional surveys (sample surveys or censuses) Processes involving the usage of one or more Administrative Registers Statistical compilations (e.g. those in National Accounts division) 4
5
Errors in the production processes
A statistical production process consists of a series of phases performed more or less in sequence. In each phase there are various potential sources of error that may affect the quality of the final “product” (point estimates, tables, ...). Usually the errors decrease the accuracy of the final statistics “Accuracy: the difference between the estimate and the true parameter value”. It reflects the “closeness” between the estimate and the true (unknown) value of a parameter of the population under study. Moreover the errors affect, directly or indirectly, others components of the quality (comparability, timeliness, …) 5
6
few errors in the data = high accuracy of the final estimates
Errors in statistical production processes (cont.ed) Users often believe that the quality of the final estimates directly depends on the amount of errors in the collected data: few errors in the data = high accuracy of the final estimates This thinking is partly true The quality of the estimates also depends on the data that were NOT collected, intentionally (because it was considered just sample) or unintentionally due to a series of errors (nonresponse, etc.) 6
7
Errors: sampling and nonsampling
the accuracy of the final estimates depends on two types of error: error due to sampling (sampling error) errors that do NOT depend on the sampling (nonsampling errors) The sampling error disappears in the census. 7
8
Sampling error It is an intentional error: it is decided to observe just a subset of the units belonging to the population under investigation The amount of sampling error is directly related to the sample size. By fixing in advance the maximum acceptable sampling error (CV when expressed in relative terms) it possible to determine the size of the sample to be drawn (or vice versa) Sampling error disappears if sample size = size of the population (eg. Census) In most of the cases the sampling error can be estimated by using the data collected on the sample
9
Nonsampling errors These are errors that can occur at any stage of the production process (data collection, processing, ...) Can NOT can be programmed in advance and is difficult to keep them under control. Their extent tends to increase with increasing number of units to be observed and number of phases in the process The estimate of their impact on the final estimates requires the application of ad hoc methods or additional surveys (e.g. control surveys) 9
10
Nonsampling errors Typically, non-sampling errors have a greater impact on the final estimates than sampling error: 10
11
Trade-off between sampling and nonsampling error
Small sample: high sampling error small nonsampling error Large sample: small sampling error high nonsampling error Optimal survey design: balance between sampling and nonsampling error so as to minimize the total survey error given the budget constraints and the available resources 11
12
Types of nonsampling errors* Errors of non-observation
Frame errors: errors in the list (frame) of units belonging to the target population of the survey Nonresponse errors: errors due to the non observation of information on units which should be surveyed Errors of observation Measurement errors: difference among the observed values and the corresponding true values in surveyed units Processing Errors: errors which are introduced in the collected data during one or more the data processing phases *Biemer and Lyberg (2003) consider the “specification error” too
13
Frame errors The frame should contain all and only the units of the target population (eligible units) and, for each unit the auxiliary information necessary to locate/contact it Frame errors arise from discrepancies between the target population and the population listed in the frame due to: omissions, erroneous inclusions, duplications, misclassifications of the units in the frame and to incorrect, incomplete, not up-to-date information on units’ characteristics 13
14
Two types of non response:
Nonresponse errors Non response error represents a failed attempt to obtain the desired information from one eligible unit (selected in the sample in the sample surveys) Two types of non response: unit nonresponse corresponds to a failed attempt to obtain any information from one sample/population unit Partial (or item) nonresponse occurs when a respondent provides some, but not all, the information required, or if the provided information cannot be considered valid 14
15
Measurement errors It is defined as the deviation between the value of a variable of a respondent observed in data collection phase and the true value of that variable on that unit (sometimes referred as response errors) Deviations may originate from the data collection mode, the respondent, the interviewer or the questionnaire. They include: errors resulting of confusion, ignorance, negligence or dishonesty of respondent; errors attributable to the interviewer, which may be due to poor or inadequate training, a priori expectations about the answers, or voluntary errors; errors attributable to the wording of the questions in the questionnaire, the order or the context in which the questions are presented, and the technique used to get the answers 15
16
Main sources of processing errors:
The processing errors (or measurement error in a broad sense) refer to errors that are introduced in the data once they have been collected, during the steps of coding, data entry, editing and imputation, etc... before the final estimates are produced Main sources of processing errors: Typing errors (data entry and data coding) Errors due to misinterpretation (data coding) Errors in the localisation (editing) and correction of errors Errors in applying a model to the data (processing) 16
17
Impact on accuracy of sampling and nonsampling errors
In order to understand how the errors affect the accuracy it is important to identify: The quantity to estimate The characteristic and the complexity of the production process The estimator applied (function of the available data used to derive the estimate): linear or nonlinear function The nature of the error: random or systematic 17
18
The nature of the errors
Example in the case of measurement error: systematic term of the error : in an hypothetical high number of repetitions of the measurement, the errors are always in the same direction: below or over the true value Example: intentional underreporting of income random term of the error: ‘fluctuations’ around the true measure due to randomness in an hypothetical high number of repetitions of the measurement; the errors sum up to zero 18
19
Example with random measurement errors
Let consider a high number T of repetitions, independent and under the same conditions, of the measurement on a given unit : true value : measurement obtained at the t-th occasion In the presence of just random errors it is expected , but if we sum out over the T measurements obtained it comes out: The random fluctuations of around are measured in terms of variance: 19
20
Example with systematic measurement errors
In the presence of just systematic errors it is expected or , and if we sum out over the T measurements obtained it comes out: The average distance between the observed values and the true value is said bias The evaluation of the bias requires the knowledge of the true value. 20
21
Summarizing random and systematic measurement errors
In the presence of both random and systematic errors, we have to deal with variance and bias The are usually summarized by the Mean Square Error (MSE): MSE = Variance + [bias]2 21
22
MSE decomposition in surveys
Let suppose that the sample survey objective is estimating the population parameter Assume the is the estimator chosen to obtain an estimate of with the data available in a given survey occasion. In the hypothetical case of independent repetitions of the survey process under the same conditions, it comes out that the MSE associated to Esurvey refers to the average over the hypothetical independent repetitions of the survey process under the same conditions. 22
23
Impact of errors on survey estimates
The impact of random errors (due to sampling or not) can be attenuated by increasing the sample size. On the other hand, increasing the number of units to be observed does not have any effect on systematic errors On the contrary, we expect that the amount of the systematic error will increase when the number of observed units increases. For this reason, when the objective is to estimate averages, percentages, etc.. more concern is on non-random component, which introduces bias in the estimates. 23
24
Trade-off variance-bias
Large variance and small bias Large bias and small variance 24
25
Impact of survey errors in terms of variance and bias
Potential impact of survey errors on the final survey estimates in terms of systematic and variable errors when the estimators are linear functions of observed values 25
26
Impact of errors on survey estimates with nonlinear estimators
For nonlinear estimators (e.g. correlation coefficient, regression estimates, …) it is not simple to identify the most damaging errors. For these types of estimates, both systematic and variable errors can lead to bias. It is known that in presence of variable errors the estimates of regression (and correlation) coefficients are attenuated, i.e. the estimated relationship among variables is lower than it really is. The effect of systematic errors on the estimation of regression coefficients can not be easily predicted. 26
27
Reliability When dealing with outputs from a complex statistical process involving multiple data sources (ARa and/or survey data) it can be difficult to assess the accuracy because of the many sampling and nonsampling errors involved in the different sources being considered When the different input data sources may be available or updated at different times, it is a common practice to provide preliminary estimates and then update them when new input data become available. In this context, one can look the closeness of the estimates initially released to the subsequent released estimates, since it is reasonable to assume that estimates converge towards the true value as they are based on more and more reliable sources. 27
28
Reliability: Data revision
The analysis of the degree of closeness of initial estimates to subsequent or final estimates give rises the analysis of revisions Assessing reliability is based on the analysis of revisions Revisions of estimates being considered are those explicitly foreseen by the revision policy. 28
29
Reliability: causes of revisions
Causes of ‘routine’ revisions are: Incorporation of more complete, updated and error-free data sources Revision due to time series adjustments (seasonal adjustment model) improvements in statistical methods Causes of exceptional revisions are: Change in concepts, classifications, definitions (e.g. rebasing) Methodological improvements (e.g. change in the estimation method; use of new data sources, etc.) 29
30
Reliability: Data revision indicators
In the analysis of revisions, in general the interest is in: direction of the revisions, usually measured in terms of average size of revisions: positive sign denotes tendency to underestimation negative sign indicates tendency to overestimation Stability of revisions, usually measured in terms of average size of revisions in absolute terms Looking at the direction of revisions is a sort of evaluation of bias. Stability is the equivalent of looking at the variance The OECD has a considerable experience on Revision Indicators ( 30
31
Selected references Biemer, P.P.; Lyberg L.E. (2003). Introduction to survey quality. Hoboken, New Jersey: John Wiley & Sons. Biemer, Groves, Lyberg, Mathiowetz, Sudman,(1991) Measurement errors in survey. John Wiley & Sons Eurostat (2003), “Definition of Quality in Statistics”. Eurostat Working Group on Assessment of Quality in Statistics, Luxembourg, October 2-3. Eurostat (2009), “ESS Handbook for Quality Reports”, Metholodogies and working papers, Luxembourg. FCSM (2001) “Measuring and Reporting Sources of Error in Surveys”. Federal Committee on Statistical Methodology, Statistical Policy Working Paper Lessler, J., and Kalsbeek, W. (1992) Nonsampling Errors in Surveys. Wiley, New York. Särndal C. E., Swensson, B., Wretman, L. (1992) Model Assisted Survey Sampling. Springer-Verlag, New York.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.