Presentation is loading. Please wait.

Presentation is loading. Please wait.

Eurostat Statistical matching when samples are drawn according to complex survey designs Training Course «Statistical Matching» Rome, 6-8 November 2013.

Similar presentations


Presentation on theme: "Eurostat Statistical matching when samples are drawn according to complex survey designs Training Course «Statistical Matching» Rome, 6-8 November 2013."— Presentation transcript:

1 Eurostat Statistical matching when samples are drawn according to complex survey designs Training Course «Statistical Matching» Rome, 6-8 November 2013 Mauro Scanu Dept. Integration, Quality, Research and Production Networks Development, Istat scanu [at] istat.it

2 Eurostat Outline Renssen: calibration – What does CIA mean? – Estimates under the CIA Macro approach Micro approach – Auxiliary information: file C Incomplete two-way stratification Synthetic two-way stratification Rubin (1986): File concatenation Weight-split algorithm

3 Eurostat The problem The presence of survey weights is usually a problem in the statistical matchign context: should we use survey weights or not? The answer is: yes! Anyway, survey weights can be included in a statistical matching procedure in different ways. There are essentially two approaches Renssen (1999): survey matching is obtained by making the two samples homogeneous as much as possible in their statistical content. This approach is mainly based on calibration procedures Rubin (1996): this approach is more traditional, in the sense of reconstructing a unique sample A  B with a unique system of survey weights. Let’s start from the Renssen’s approach, that is easily comparable with the techniques already shown for i.i.d samples in the last two days.

4 Eurostat Let A and B be two archives:  on the same population consisting of N units  Observing some common variables X and specific variable, Y in A and Z in B  The records in A and B have not identifiers (PIN) and the common variables X cannot be considered as unit identifiers This is still a statistical matching problem (examples are in DeGroot et al (1971) The CIA in a finite population context: the case of two data archives

5 Eurostat Let s=1,…,N denote the units in the population. Assume that X, Y, and Z are categorical, with I, J, and K categories respectively. The variable categories assumed by each unit in the population are described by these vectors Notation

6 Eurostat Notation

7 Eurostat As in the i.i.d. context, statistical matching can have a micro or macro purpose MACRO APPROACH: The objective is the estimation of the contingency matrix The statistical matching problem

8 Eurostat The conditional independence assumption

9 Eurostat One property is that the marginal distributions are preserved From the normal equations Linear dependence: properties

10 Eurostat Linear dependence: properties

11 Eurostat The (Y, Z) contingency table under the conditional independence assumption (CIA) is The true, but unknown, contingency table would be The residual matrix is null when Y and X or Z and X are perfectly correlated Note that also preserves the observed marginal distributions The conditional independence assumption

12 Eurostat Let A and B be 2 samples drawn from the same finite population according to a complex survey design with the following first and second order inclusion probabilities Let X be defined by two different kinds of variables  U: variables for which N U is known  V: variables for which N V is unknown X corresponds to the categorical variable whose categories are defined by the Cartesian product of all the common variables From archives to samples

13 Eurostat Estimates under the CIA

14 Eurostat Estimates under the CIA

15 Eurostat 5. Estimate combining the estimates obtained from A and B with their final weights 6. The regression coefficients are estimated respectively from A and B: Estimates under the CIA: macro approach

16 Eurostat 7. The estimate of the contingency table under the CIA, i.e. is: Estimates under the CIA: macro approach

17 Eurostat 8. Assuming A as the recipient, a preliminary imputed value for the missing Z is obtained throught the estimated regression function 9. As we already know, the value is not a live value and can be unrealistic. In this case, a live value cna be obtained through the use of an additional hot deck procedure (hence, a mixed procedure is used). Note that, given that in step 8 we obtained a complete data set, we can use a distance hot deck procedure with a distance applied on (X, Y, Z) or (Y, Z) Estimates under the CIA: micro approach

18 Eurostat Auxiliary information: presence of an additional file C

19 Eurostat Incomplete two-way stratification

20 Eurostat Synthetic two-way stratification

21 Eurostat 7. The synthetic two-way estimate is This method uses C only in order to correct what estimated via A and B under the CIA Synthetic two-way stratification

22 Eurostat Rubin (1986): file concatenation

23 Eurostat The new weights become Rubin (1986): file concatenation This approach can be difficult to be applied, for different reasons

24 Eurostat File concatenation: comments

25 Eurostat Hot deck and complex survey designs

26 Eurostat For simplicity assume that Compute The method consists of these three steps The weight-split algorithm

27 Eurostat The weight-split algorithm

28 Eurostat The weight-split algorithm

29 Eurostat  The marginal and joint distribution for (X, Y) are those observed in A  The marginal distribution of Z is that observed in B The weight-split algorithm: properties

30 Eurostat Selected references Morris H. DeGroot, Paul I. Feder and Prem K. Goel (1971): “Matchmaking”, The Annals of Mathematical Statistics, 42, No. 2 (Apr., 1971), pp. 578-593. Renssen R H (1998) “Use of Statistical Matching Techniques in Calibration Estimation", Survey Methodology, 24, 171–183 Rubin D B (1986) “Statistical Matching Using File Concatenation with Adjusted Weights and Multiple Imputations", Journal of Business and Economic Statistics, 4, 87–94 Liu T P, Kovacevic M S (1994) “Statistical matching of survey datafiles: a simulation study" Proceedings of the Section on Survey Research Methods of the American Statistical Association, 479–484


Download ppt "Eurostat Statistical matching when samples are drawn according to complex survey designs Training Course «Statistical Matching» Rome, 6-8 November 2013."

Similar presentations


Ads by Google