Eurostat Statistical matching when samples are drawn according to complex survey designs Training Course «Statistical Matching» Rome, 6-8 November 2013.

Eurostat Statistical matching when samples are drawn according to complex survey designs Training Course «Statistical Matching» Rome, 6-8 November 2013 Mauro Scanu Dept. Integration, Quality, Research and Production Networks Development, Istat scanu [at] istat.it

Eurostat Outline Renssen: calibration – What does CIA mean? – Estimates under the CIA Macro approach Micro approach – Auxiliary information: file C Incomplete two-way stratification Synthetic two-way stratification Rubin (1986): File concatenation Weight-split algorithm

Eurostat The problem The presence of survey weights is usually a problem in the statistical matchign context: should we use survey weights or not? The answer is: yes! Anyway, survey weights can be included in a statistical matching procedure in different ways. There are essentially two approaches Renssen (1999): survey matching is obtained by making the two samples homogeneous as much as possible in their statistical content. This approach is mainly based on calibration procedures Rubin (1996): this approach is more traditional, in the sense of reconstructing a unique sample A  B with a unique system of survey weights. Let’s start from the Renssen’s approach, that is easily comparable with the techniques already shown for i.i.d samples in the last two days.

Eurostat Let A and B be two archives:  on the same population consisting of N units  Observing some common variables X and specific variable, Y in A and Z in B  The records in A and B have not identifiers (PIN) and the common variables X cannot be considered as unit identifiers This is still a statistical matching problem (examples are in DeGroot et al (1971) The CIA in a finite population context: the case of two data archives

Eurostat Let s=1,…,N denote the units in the population. Assume that X, Y, and Z are categorical, with I, J, and K categories respectively. The variable categories assumed by each unit in the population are described by these vectors Notation

Eurostat Notation

Eurostat As in the i.i.d. context, statistical matching can have a micro or macro purpose MACRO APPROACH: The objective is the estimation of the contingency matrix The statistical matching problem

Eurostat The conditional independence assumption

Eurostat One property is that the marginal distributions are preserved From the normal equations Linear dependence: properties

Eurostat Linear dependence: properties

Eurostat The (Y, Z) contingency table under the conditional independence assumption (CIA) is The true, but unknown, contingency table would be The residual matrix is null when Y and X or Z and X are perfectly correlated Note that also preserves the observed marginal distributions The conditional independence assumption

Eurostat Let A and B be 2 samples drawn from the same finite population according to a complex survey design with the following first and second order inclusion probabilities Let X be defined by two different kinds of variables  U: variables for which N U is known  V: variables for which N V is unknown X corresponds to the categorical variable whose categories are defined by the Cartesian product of all the common variables From archives to samples

Eurostat Estimates under the CIA

Eurostat 5. Estimate combining the estimates obtained from A and B with their final weights 6. The regression coefficients are estimated respectively from A and B: Estimates under the CIA: macro approach

Eurostat 7. The estimate of the contingency table under the CIA, i.e. is: Estimates under the CIA: macro approach

Eurostat 8. Assuming A as the recipient, a preliminary imputed value for the missing Z is obtained throught the estimated regression function 9. As we already know, the value is not a live value and can be unrealistic. In this case, a live value cna be obtained through the use of an additional hot deck procedure (hence, a mixed procedure is used). Note that, given that in step 8 we obtained a complete data set, we can use a distance hot deck procedure with a distance applied on (X, Y, Z) or (Y, Z) Estimates under the CIA: micro approach

Eurostat Auxiliary information: presence of an additional file C

Eurostat Incomplete two-way stratification

Eurostat Synthetic two-way stratification

Eurostat 7. The synthetic two-way estimate is This method uses C only in order to correct what estimated via A and B under the CIA Synthetic two-way stratification

Eurostat Rubin (1986): file concatenation

Eurostat The new weights become Rubin (1986): file concatenation This approach can be difficult to be applied, for different reasons

Eurostat File concatenation: comments

Eurostat Hot deck and complex survey designs

Eurostat For simplicity assume that Compute The method consists of these three steps The weight-split algorithm

Eurostat The weight-split algorithm

Eurostat  The marginal and joint distribution for (X, Y) are those observed in A  The marginal distribution of Z is that observed in B The weight-split algorithm: properties

Eurostat Selected references Morris H. DeGroot, Paul I. Feder and Prem K. Goel (1971): “Matchmaking”, The Annals of Mathematical Statistics, 42, No. 2 (Apr., 1971), pp. 578-593. Renssen R H (1998) “Use of Statistical Matching Techniques in Calibration Estimation", Survey Methodology, 24, 171–183 Rubin D B (1986) “Statistical Matching Using File Concatenation with Adjusted Weights and Multiple Imputations", Journal of Business and Economic Statistics, 4, 87–94 Liu T P, Kovacevic M S (1994) “Statistical matching of survey datafiles: a simulation study" Proceedings of the Section on Survey Research Methods of the American Statistical Association, 479–484

Eurostat Statistical matching when samples are drawn according to complex survey designs Training Course «Statistical Matching» Rome, 6-8 November 2013.

Similar presentations

Presentation on theme: "Eurostat Statistical matching when samples are drawn according to complex survey designs Training Course «Statistical Matching» Rome, 6-8 November 2013."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Eurostat Statistical matching when samples are drawn according to complex survey designs Training Course «Statistical Matching» Rome, 6-8 November 2013.

Similar presentations

Presentation on theme: "Eurostat Statistical matching when samples are drawn according to complex survey designs Training Course «Statistical Matching» Rome, 6-8 November 2013."— Presentation transcript:

Similar presentations

About project

Feedback