Download presentation
Presentation is loading. Please wait.
Published byJustin Roberts Modified over 9 years ago
1
Eurostat Statistical matching when samples are drawn according to complex survey designs Training Course «Statistical Matching» Rome, 6-8 November 2013 Mauro Scanu Dept. Integration, Quality, Research and Production Networks Development, Istat scanu [at] istat.it
2
Eurostat Outline Renssen: calibration – What does CIA mean? – Estimates under the CIA Macro approach Micro approach – Auxiliary information: file C Incomplete two-way stratification Synthetic two-way stratification Rubin (1986): File concatenation Weight-split algorithm
3
Eurostat The problem The presence of survey weights is usually a problem in the statistical matchign context: should we use survey weights or not? The answer is: yes! Anyway, survey weights can be included in a statistical matching procedure in different ways. There are essentially two approaches Renssen (1999): survey matching is obtained by making the two samples homogeneous as much as possible in their statistical content. This approach is mainly based on calibration procedures Rubin (1996): this approach is more traditional, in the sense of reconstructing a unique sample A B with a unique system of survey weights. Let’s start from the Renssen’s approach, that is easily comparable with the techniques already shown for i.i.d samples in the last two days.
4
Eurostat Let A and B be two archives: on the same population consisting of N units Observing some common variables X and specific variable, Y in A and Z in B The records in A and B have not identifiers (PIN) and the common variables X cannot be considered as unit identifiers This is still a statistical matching problem (examples are in DeGroot et al (1971) The CIA in a finite population context: the case of two data archives
5
Eurostat Let s=1,…,N denote the units in the population. Assume that X, Y, and Z are categorical, with I, J, and K categories respectively. The variable categories assumed by each unit in the population are described by these vectors Notation
6
Eurostat Notation
7
Eurostat As in the i.i.d. context, statistical matching can have a micro or macro purpose MACRO APPROACH: The objective is the estimation of the contingency matrix The statistical matching problem
8
Eurostat The conditional independence assumption
9
Eurostat One property is that the marginal distributions are preserved From the normal equations Linear dependence: properties
10
Eurostat Linear dependence: properties
11
Eurostat The (Y, Z) contingency table under the conditional independence assumption (CIA) is The true, but unknown, contingency table would be The residual matrix is null when Y and X or Z and X are perfectly correlated Note that also preserves the observed marginal distributions The conditional independence assumption
12
Eurostat Let A and B be 2 samples drawn from the same finite population according to a complex survey design with the following first and second order inclusion probabilities Let X be defined by two different kinds of variables U: variables for which N U is known V: variables for which N V is unknown X corresponds to the categorical variable whose categories are defined by the Cartesian product of all the common variables From archives to samples
13
Eurostat Estimates under the CIA
14
Eurostat Estimates under the CIA
15
Eurostat 5. Estimate combining the estimates obtained from A and B with their final weights 6. The regression coefficients are estimated respectively from A and B: Estimates under the CIA: macro approach
16
Eurostat 7. The estimate of the contingency table under the CIA, i.e. is: Estimates under the CIA: macro approach
17
Eurostat 8. Assuming A as the recipient, a preliminary imputed value for the missing Z is obtained throught the estimated regression function 9. As we already know, the value is not a live value and can be unrealistic. In this case, a live value cna be obtained through the use of an additional hot deck procedure (hence, a mixed procedure is used). Note that, given that in step 8 we obtained a complete data set, we can use a distance hot deck procedure with a distance applied on (X, Y, Z) or (Y, Z) Estimates under the CIA: micro approach
18
Eurostat Auxiliary information: presence of an additional file C
19
Eurostat Incomplete two-way stratification
20
Eurostat Synthetic two-way stratification
21
Eurostat 7. The synthetic two-way estimate is This method uses C only in order to correct what estimated via A and B under the CIA Synthetic two-way stratification
22
Eurostat Rubin (1986): file concatenation
23
Eurostat The new weights become Rubin (1986): file concatenation This approach can be difficult to be applied, for different reasons
24
Eurostat File concatenation: comments
25
Eurostat Hot deck and complex survey designs
26
Eurostat For simplicity assume that Compute The method consists of these three steps The weight-split algorithm
27
Eurostat The weight-split algorithm
28
Eurostat The weight-split algorithm
29
Eurostat The marginal and joint distribution for (X, Y) are those observed in A The marginal distribution of Z is that observed in B The weight-split algorithm: properties
30
Eurostat Selected references Morris H. DeGroot, Paul I. Feder and Prem K. Goel (1971): “Matchmaking”, The Annals of Mathematical Statistics, 42, No. 2 (Apr., 1971), pp. 578-593. Renssen R H (1998) “Use of Statistical Matching Techniques in Calibration Estimation", Survey Methodology, 24, 171–183 Rubin D B (1986) “Statistical Matching Using File Concatenation with Adjusted Weights and Multiple Imputations", Journal of Business and Economic Statistics, 4, 87–94 Liu T P, Kovacevic M S (1994) “Statistical matching of survey datafiles: a simulation study" Proceedings of the Section on Survey Research Methods of the American Statistical Association, 479–484
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.