Download presentation
Presentation is loading. Please wait.
Published byFranklin Mosley Modified over 5 years ago
1
Preliminaries Training Course «Statistical Matching» Rome, 6-8 November 2013
Mauro Scanu Dept. Integration, Quality, Research and Production Networks Development, Istat scanu [at] istat.it
2
Notation and statistical matching phases
Outline Notation and statistical matching phases Notation Matching phases Harmonising the sources Metadata consistency Accuracy Statistical coherence
3
Notation target variables (Y,Z): variables that are not jointly observed but whose joint distribution is the target common variables: variables available in both the samples matching variables (X): subset of the common variables used for “building” joint information on the target variables 3
4
Matching phases 4
5
Harmonising the sources: 3 main topics
In order to harmonize the two data sources it is necessary: to check for metadata consistency; to check for overall accuracy; to check for statistical consistency. 5
6
1) Metadata consistency
There are some very fortunate cases. An example was the Dutch Household Survey on Living Condition (POLS): this consisted of a survey composed of different modules first module - questionnaire on demographic aspects (age, nationality, birth place,...) and socio-economical aspects (education, income,...) second module - screening questions on life conditions; third module - specific questionnaires on different life condition aspects (culture, work, health,...) the first two questionnaires were filled in by all the units in the sample. The sample is then split in distinct subsamples, each subsample should answer only a specific questionnaire on living conditions 6
7
1) Metadata consistency
If joint information on a pair of variables included in different subsamples is of interest, statistical matching can be used. Metadata is already perfectly harmonized. In general, in order to harmonize metadata, these steps should be considered [a] harmonization of unit definition [b] harmonization of reference periods [c] completeness of the reference populations [d] variable harmonization [e] classification harmonization 7
8
2) Overall accuracy The selection of the matching variables from the set of common variables is essentially a statistical phase, which will be illustrated afterwards. A ``common sense'' rule is to use only high accuracy variables. In this sense, it would be preferable to limit the selection of the matching variables among those which are completely observed. If this is not possible, it would be preferable that missing items are minimal. As far as missing data treatment is concerned, it would be preferable to avoid imputed data. If necessary: split the data set according to the missing data structure use different statistical matching strategies (for the selection of the matching variables) in the different subdata sets 8
9
3) Statistical coherence
A second ``common sense'' rule relates the statistical information contained in the matching variables. In order to have meaningful results, the statistical content of the matching variables should be the same, as if they refer to the same population. More precisely, it is important to check if the distribution of the common variables is the same. In order to do this statistical tests for ``homogeneity'' can be used (not appropriate for large data sets and care should be given to complex designs) ad hoc rules (for instance, cell relative frequencies should not differ more than 5\%) 9
10
Selected references Laan, P. van der, Integrating Administrative Registers and HouseholdSurveys. Netherlands Official Statistics, Vol. 15 (Summer 2000): SpecialIssue, Integrating Administrative Registers and Household Surveys, eds.P.G. Al and B.F.M. Bakker, pp
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.