Presentation is loading. Please wait.

Presentation is loading. Please wait.

MEASUREMENT OF THE QUALITY OF STATISTICS

Similar presentations


Presentation on theme: "MEASUREMENT OF THE QUALITY OF STATISTICS"— Presentation transcript:

1 MEASUREMENT OF THE QUALITY OF STATISTICS
Quality of Administrative Data Orietta Luzi Istat – Department for National Accounts and Economic Statistics

2 Quality of Administrative Data (1)
Definitions An administrative record can be defined as a piece of information related to an individual person, business or other entity (e.g. a document stored in a file). An administrative register (AR) is a set of administrative records organised according to a given criterion, usually in a computer database. Data in ARs are collected from government agencies or other institutions for administrative purposes; most of the times to respond to requirements of registering particular events (births and deaths), or administering benefits (pensions) or duties (taxation)…

3 Quality of Administrative Data (2)
These data are collected with a specific aim (not statistical). The primary aim is that to identifying the unit corresponding to a record in the AR. In these last years there has been an increased interest of the statisticians in the administrative registers in order to use them, directly or indirectly, in the production of statistics. The ARs already exist; they do not require the costly data collection phase and the response burden of respondents. Moreover, the technological innovations permit to overcome the past limitations in the storage and in the processing of large databases.

4 Quality of Administrative Data (3)
Unfortunately, the ARs originally are not developed for statistical purposes, hence their usage to this aim requires a further processing phase that sometimes can not remedy some base defects (e.g. the absence of some variables). Elvers (2002) gives a summary list of the possible advantages and disadvantages in the use of ARs

5 Quality of Administrative Data (4)
Advantages + lower response burden and costs + lower production costs + no sampling error (if compared to surveys) + statistics for small areas and groups, more frequent in time + some variables observed more accurately (e.g. income) + broader subject coverage by linking data + use of data from previous years (if available and “linkable”) + possibilities for longitudinal studies

6 Quality of Administrative Data (5)
Disadvantages - possible problems with privacy - limited number of the available variables - different definitions of the variables - possible change in the definitions due to changes in legislation - dependence from other entities for the data collection and editing - delays in the reporting and before the registers become available

7 Quality of Administrative Data (6)
The actual definition of quality for statistics put emphasis on users’ needs. Speaking about quality when the administrative registers are used for statistical purposes requires an identification of the possible purposes. Petterson (1992) identifies five possible purposes: 1. Statistics based entirely on ARs. 2. Use of administrative records to supplement the sample survey data. 3. Use of administrative record to evaluate the sample survey data. 4. Use of AR as a sampling frame. 5. Linking of administrative to investigate some particular phenomena.

8 Quality of Administrative Data (7)
Point 1 The ARs are used to estimate directly the characteristics of studied population. This is the heaviest task. Point 2 The supplementation of the sample survey data with the administrative records can be carried out: to make available auxiliary variables to perform the post-stratification; or to use them in the survey estimation process; to impute the missing values; to collect information on the characteristics of unit nonresponse;

9 Quality of Administrative Data (8)
Zanutto and Zaslavsky (2002) investigated about using the administrative records to impute the missing values in survey data. They warn against direct substitution of the missing values with the administrative record. If the administrative records are affected by errors their use as substitutes of the missing values is likely to introduce an uncontrolled bias into the final estimates. Zanutto and Zaslavsky suggest using the administrative records as covariates in the models introduced to deal with the nonresponse. This usage of the administrative records requires their linkage with the administrative records.

10 Quality of Administrative Data (9)
Point 3. The evaluation of the sample survey data by means of the administrative records can be performed at two levels: aggregate level microdata level At aggregate level, estimates, averages etc. for subgroups of units are compared. At microdata level, two types of comparisons can be done: direct record checks reverse record checks

11 Quality of Administrative Data (10)
Direct record checks: the sample data are matched against the administrative records. This comparison aims at investigating the response and the processing errors (broad measurement errors) occurred in the survey. These checks require a linkage step. If the linkage is not error-free much care should be used in making comparisons. Neter et al. (1965) investigated the effect of matching errors in the estimation of response bias and response variance. Reverse record checks: a sample of administrative records is selected. These units are included as a part of the sample to be observed in a survey.

12 Quality of Administrative Data (11)
Point 4. The use of ARs as sampling frames is very common. This issue has been explored in the coverage errors section. Point 5. The linkage of the ARs with the survey data can be performed to study particular phenomena such as a given disease, or to investigate the characteristics of small subgroups of population units.

13 From Administrative Registers to Statistical Registers (1)
Rarely an administrative register can be used for statistical purposes as it is. Various processing steps should be carried out to use them for statistical purposes before listed. One or more Administrative Registers must be transformed in Statistical Register (SR) in order to be used for statistical purposes. A model has been proposed by Statistics Sweden (1999 and 2001). It refers to the usage of an SR to produce directly the statistics.

14 From Administrative Registers to Statistical Registers (2)

15 From Administrative Registers to Statistical Registers (3)
when dealing with the administrative data the evaluation of the product quality can be carried at different levels: I level) the AR released by an agency to external users; II level) the SR obtained by processing one or more ARs; III level) the statistics computed directly from a SR (when it is used for this purpose)

16 From Administrative Registers to Statistical Registers (4)
If the focus is on the SR, we can say that the more it satisfies the user’s needs (the statisticians) the higher will be its quality. Obviously, the quality of the final Statistical Register will depend mainly on: 1) the “quality” of the original AR (or ARs) 2) the “quality” of the processing steps.

17 “Quality” of the original AR (1)
RELEVANCE Is the capability of the AR register to cover all the user’s needs. “Does the register contains all the variables that are needed for using it as a sampling frame?” The processing step needed to transform one or more ARs in the SR can increase the relevance of the final result. The variables not observed in an AR can be extracted from another AR, referred to the same population, that contains them. The this aim a link among the ARs has to be created

18 “Quality” of the original AR (2)
TIMELINESS The time “lag” among the date to which records in the AR refer and the date the SR is available to be used for statistical purposes. Two components can be distinguished: 1) The interval between the data in the AR refers and the date the AR is made available from the Administration to the external users. 2) The time it takes to process one AR to derive the final SR The point (1) is strictly related to the time it takes before an event is registered and how often the data are updated. Moreover it can take a lot of time to obtain the various ARs from the different agencies that keeps them

19 “Quality” of the original AR (3)
ACCESSIBILITY It may result difficult to evaluate it for original AR. It should be evaluated how easy is for an external user of the AR to: know an AR is available for external use find it use it Accessibility of an AR depends mainly on: its cost the availability of auxiliary information to better understand its contents, the definitions of variables/objects, the processing steps carried out,…

20 “Quality” of the original AR (4)
COMPARABILITY How much data contained in an AR can be compared in the time and in space. It strictly depends on: administrative procedures; definitions. Comparability can be seriously affected if the definitions or the administrative procedures are changed (e.g. because of changes in the legislation). Co-operation with suppliers of AR can help in planning this changes in the administrative system

21 “Quality” of the original AR (5)
COHERENCE Difficult to define

22 “Quality” of the original AR (6)
ACCURACY - 1 Usually it refers to the closeness among the estimate and the true unknown parameter value. There is not a clear definition as far as accuracy of an AR is concerned. Better to analyse the factors that affect accuracy. Sampling error In the ARs registers sampling error does not exists

23 “Quality” of the original AR (7)
ACCURACY - 2 Nonsampling error Coverage If the AR are considered, two levels of evaluation exist: coverage with respect to the “administrative” population; coverage with respect to the “statistical” population. The coverage of the final SR should be evaluated only with respect to the second level.

24 “Quality” of the original AR (8)
ACCURACY - 3 Coverage “administrative” population: those units (individuals or businesses) for whom the administrative function is relevant; “Statistical” population: those units (individuals or businesses) that have to be investigated for the statistical purposes; There are less problems when the two population coincide. It is believed that an AR covers well the administrative population: if a unit (business or household) is not in the AR no actions (taxation, insurance,…) can be performed for it.

25 “Quality” of the original AR (9)
ACCURACY - 4 Coverage On the other hand, individuals may have convenience to avoid the inclusion into the register (evade taxation, illegal immigration, illegal buildings,…). ARs not updated may contain a certain amount of overcoverage: units that do not still belong to the administrative population. When the administrative population is a subset of the statistical population, linkage with other registers, when possible, may solve the undercoverage problem. The coverage of an AR can be evaluated only by means of comparisons with some other external sources (aggregate comparisons, matching case by case, …) or by checks on a sample basis.

26 “Quality” of the original AR (10)
ACCURACY - 5 Nonresponse Item nonresponse. Rarely an AR contains missing values for the most important variables from the administrative viewpoint. The fraction of missing values can be high for the variables that are considered not important for the administrative purposes. Unit nonresponse Not easy to define !!! Usually, it refers to those units with a high rate of missing values, or with missing values for most important variables. (units not included in an AR represent a coverage problem rather than a unit nonresponse problem).

27 “Quality” of the original AR (11)
ACCURACY - 6 Response Errors The errors in the observed values due to false declarations or to errors in the observation process. The ARs are supposed to show a high fraction of true responses for those variables that are more important from the administrative viewpoint. The respondents may give more accurate responses to administrative agencies. For the same reason there may be less missing values. Less accuracy can be placed in the variables not viewed as important. Editing techniques can help in identifying inconsistencies in data, and therefore those values that have a high probability of being wrong

28 “Quality” of the original AR (12)
ACCURACY - 7 Processing errors Two levels: 1. “administrative processing”: operations carried out at administrative level to produce the AR. 2. “statistical processing”: processing steps carried out to transform one or more ARs in a SR. It may be difficult to understand how administrative processing impacted over the data in the AR. Sometimes no documentation exists about the processing steps carried out on the AR, or it may be not available to external users.

29 Evaluating the Accuracy of AR (1)
In evaluating the accuracy of a SR it is better to focus on the “statistical” process that is carried out to transform an AR in a SR Several processing steps can be necessary to obtain a SR starting from one or more ARs - Variable harmonisation - Unit harmonisation - Check of basic data (editing) - Missing value handling - Linking, matching and joint processing - Processing of time references Creation of derived objects Creation of derived variables

30 Evaluating the Accuracy of AR (2)
Variables and units harmonisation. Carried out when the variables and/or the units in different registers are not defined using the same classification. Editing of basic data Necessary to identify the possible inconsistencies in the registers’ data. Micro-data editing is preferred to macro editing Handling of missing data Is analogous to what is done in the context of survey data. The same imputation methods can be used. Improvements in imputation may result when it is possible to link different registers: the same register but referred to different times or different registers with common variables.

31 Evaluating the Accuracy of AR (3)
Linking and matching – 1 Register limitations (limited coverage, limited number of variables) can be overcome by combining different registers, pertaining to the same target population, but maintained by different agencies. Combining registers is a relatively simple operation if units within them are exactly identified by a unique code, recorded without errors. In this case the ARs are combined by means of merging or exact matching procedures.

32 Evaluating the Accuracy of AR (4)
Linking and matching - 2 A probabilistic linkage should be performed when no unique identifiers are available or if this variable is affected by errors. Such a procedure (called record linkage) aims at identifying pairs of admin records that correspond to the same entities (individual or businesses) This procedure is based on comparisons done using the set of matching variables (name, surname, address,…) common to both the registers to be integrated. The record linkage can be used to identify duplicate units within a Register Linking two or more register can help in identifying possible measurement errors for some variables (incoherence among the same variables observed in two different AR)

33 Evaluating the Accuracy of AR (5)
Processing time references Means that information concerning time references for the different register events should be processed and the corresponding result should be included in the final SR. This represents a crucial information in using registers for statistical purposes.

34 Evaluating the Accuracy of AR (6)
Creating derived objects The elementary administrative records are processed to form derived objects for which statistics are requested. For instance, the administrative records referred to single individuals can be processed to derive households. For business, levels of objects can be: Juridical Unit, Business Unit, Operational unit, Local Unit. Creating derived variables The operation of building a variable statistically meaningful by processing a certain number of related administrative variables

35 A model of quality assurance for register-based statistics

36 A model of quality assurance for register-based statistics
This model introduced by Statistics Sweden describes the different steps to derive a statistical register. Each step must be identified and documented. The documentation and the metadata play an important role when, at the end, a statistical register is processed to derive statistics.

37 References Blom, E., and Carlsson, F. (1999) Registers in Official Statistics: A Swedish Perspective. Joint ECE/Eurostat Work Session On registers and Administrative Records for Social and Demographic Statisrics, Geneva 1-3 March Working Paper N Elvers, E. (2002) Comparison of survey and register statistics. International Conference on Improving Surveys, ICIS EUROSTAT (2003) Business Register Recommendations Manual – 2003 Edition. European Communities,Luxembourg. Lahiri, P., and Larsen, M. (2004) Regression analysis with linked data. Dept. of Statistics Preprint, 04- 9, Iowa State University. Neter, J., Maynes, E.S., and Ramanathan, R. (1965) The effect of mismatching on the measurement ofresponse errors. Journal of the American Statistical Association, 60, Petterson, H. (1992) Quality Control in Statistics from Administrative Registers and Records. EUSTAT, Cuaderno 26. Scheuren, F., and Winkler, W. E. (1993) Regression analysis of data files that are computer matched, Survey Methodology, 19, Scheuren, F., and Winkler, W. E. (1997), Regression analysis of data files that are computer matched – Part II, Survey Methodology, 23, Papers/scheuren_part2.pdf Statistics Canada (1998) Quality Guidelines, 3rd Edition. Ottawa. bnc.ca/100/201/301/statcan/stats_can_quality_guide/ Statistics Finland (2002) Quality Guidelines for Official Statistics. Helsinky.

38 Statistics Sweden (2001) “The future development of the Swedish register system”. R&D Reports.2001:1. Wallgren A., Wallgren B. (2007). Register-based Statistics: Administrative Data for Statistical Purposes. John Wiley & Sons. Winkler, W. E. (1995) Matching and Record Linkage. in B. G. Cox et al. (ed.) Business Survey Methods, Wiley, New York, Winkler, W. E. (1999a) The State of Record Linkage and Current Research Problems, Statistical Society of Canada, Proceedings of the Survey Methods Section, Winkler, W. E. (1999b) Issues with Linking Files and Performing Analyses on the Merged Files, Proceedings of the Sections on Government Statistics and Social Statistics, American Statistical Association, Zanutto, E., and Zaslavsky, A. (2002) Using administrative records to impute for nonresponse. Groves, R.M., Dillman, D.A., Eltinge, J.L, and Little, R.J.A. (eds.) Survey Nonresponse. Wiley, New York, pp stat.wharton.upenn.edu/~zanutto/bio/papers/zanutto.zaslavsky.adrec.public.pdf


Download ppt "MEASUREMENT OF THE QUALITY OF STATISTICS"

Similar presentations


Ads by Google