Eurostat Statistical matching when samples are drawn according to complex survey designs Training Course «Statistical Matching» Rome, 6-8 November 2013.

Slides:



Advertisements
Similar presentations
Handling attrition and non- response in longitudinal data Harvey Goldstein University of Bristol.
Advertisements

Non response and missing data in longitudinal surveys.
General Linear Model With correlated error terms  =  2 V ≠  2 I.
Treatment of missing values
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
The General Linear Model. The Simple Linear Model Linear Regression.
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Statistical Methods Chichang Jou Tamkang University.
Chapter 11 Multiple Regression.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 14 Goodness-of-Fit Tests and Categorical Data Analysis.
1 A MONTE CARLO EXPERIMENT In the previous slideshow, we saw that the error term is responsible for the variations of b 2 around its fixed component 
Simple Linear Regression Analysis
Eurostat Statistical Data Editing and Imputation.
3/2003 Rev 1 I – slide 1 of 33 Session I Part I Review of Fundamentals Module 2Basic Physics and Mathematics Used in Radiation Protection.
Section 8.1 Estimating  When  is Known In this section, we develop techniques for estimating the population mean μ using sample data. We assume that.
Analyzing Data: Comparing Means Chapter 8. Are there differences? One of the fundament questions of survey research is if there is a difference among.
Using Resampling Techniques to Measure the Effectiveness of Providers in Workers’ Compensation Insurance David Speights Senior Research Statistician HNC.
Correlation & Regression
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Danila Filipponi Simonetta Cozzi ISTAT, Italy Outlier Identification Procedures for Contingency Tables in Longitudinal Data Roma,8-11 July 2008.
Statistical Matching in the framework of the modernization of social statistics Aura Leulescu & Emilio Di Meglio EUROSTAT Unit F3 - Living conditions and.
Handbook on Residential Property Price Indices Chapter 5: Methods Jan de Haan UNECE/ILO Meeting, May 2010.
Weighting and estimation methods: description in the Memobust handbook Loredana di Consiglio, Fabrizio Solari 2013 European Establishment Statistics Workshop.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Eurostat Statistical Matching using auxiliary information Training Course «Statistical Matching» Rome, 6-8 November 2013 Marco Di Zio Dept. Integration,
Marcello D’Orazio UNECE - Work Session on Statistical Data Editing Ljubljana, Slovenia, 9-11 May 2011 Statistical.
Eurostat Weighting and Estimation. Presented by Loredana Di Consiglio Istituto Nazionale di Statistica, ISTAT.
© John M. Abowd 2007, all rights reserved General Methods for Missing Data John M. Abowd March 2007.
Predictive Mean Matching using a Factor Model, Varriale - Guarnera – Nuremberg, 09/09/2013 Predictive Mean Matching using a Factor Model, an application.
1 We will now look at the properties of the OLS regression estimators with the assumptions of Model B. We will do this within the context of the simple.
Eurostat R and the package StatMatch Training Course «Statistical Matching» Rome, 6-8 November 2013 Marcello D’Orazio Dept. National Accounts and Economic.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Bootstrap Event Study Tests Peter Westfall ISQS Dept. Joint work with Scott Hein, Finance.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Eurostat Accuracy of Results of Statistical Matching Training Course «Statistical Matching» Rome, 6-8 November 2013 Marcello D’Orazio Dept. National Accounts.
Methods and software for editing and imputation: recent advancements at Istat M. Di Zio, U. Guarnera, O. Luzi, A. Manzari ISTAT – Italian Statistical Institute.
Spectrum Sensing In Cognitive Radio Networks
M.Sc. in Economics Econometrics Module I Topic 4: Maximum Likelihood Estimation Carol Newman.
Tutorial I: Missing Value Analysis
Week 21 Order Statistics The order statistics of a set of random variables X 1, X 2,…, X n are the same random variables arranged in increasing order.
Chapter 8 Estimation ©. Estimator and Estimate estimator estimate An estimator of a population parameter is a random variable that depends on the sample.
- 1 - Preliminaries Multivariate normal model (section 3.6, Gelman) –For a multi-parameter vector y, multivariate normal distribution is where  is covariance.
1 Ka-fu Wong University of Hong Kong A Brief Review of Probability, Statistics, and Regression for Forecasting.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
Small area estimation combining information from several sources Jae-Kwang Kim, Iowa State University Seo-Young Kim, Statistical Research Institute July.
Estimating standard error using bootstrap
The normal distribution
Inference about the slope parameter and correlation
Chapter 7. Classification and Prediction
Multiple Imputation using SOLAS for Missing Data Analysis
Calibrated estimators of the population covariance
CORRELATION ANALYSIS.
How to handle missing data values
Correlation and Regression
LESSON 24: INFERENCES USING REGRESSION
Statistical matching under the conditional independence assumption Training Course «Statistical Matching» Rome, 6-8 November 2013 Mauro Scanu Dept.
The normal distribution
Aapo Hyvärinen and Ella Bingham
The European Statistical Training Programme (ESTP)
Chapter 8: Weighting adjustment
Ch11 Curve Fitting II.
Marco Di Zio Dept. Integration, Quality, Research and Production
Preliminaries Training Course «Statistical Matching» Rome, 6-8 November 2013 Mauro Scanu Dept. Integration, Quality, Research and Production Networks.
Non response and missing data in longitudinal surveys
The European Statistical Training Programme (ESTP)
A bootstrap method for estimators based on combined administrative and survey data Sander Scholtus (Statistics Netherlands) NTTS Conference 13 March 2019.
Chapter 13: Item nonresponse
Presentation transcript:

Eurostat Statistical matching when samples are drawn according to complex survey designs Training Course «Statistical Matching» Rome, 6-8 November 2013 Mauro Scanu Dept. Integration, Quality, Research and Production Networks Development, Istat scanu [at] istat.it

Eurostat Outline Renssen: calibration – What does CIA mean? – Estimates under the CIA Macro approach Micro approach – Auxiliary information: file C Incomplete two-way stratification Synthetic two-way stratification Rubin (1986): File concatenation Weight-split algorithm

Eurostat The problem The presence of survey weights is usually a problem in the statistical matchign context: should we use survey weights or not? The answer is: yes! Anyway, survey weights can be included in a statistical matching procedure in different ways. There are essentially two approaches Renssen (1999): survey matching is obtained by making the two samples homogeneous as much as possible in their statistical content. This approach is mainly based on calibration procedures Rubin (1996): this approach is more traditional, in the sense of reconstructing a unique sample A  B with a unique system of survey weights. Let’s start from the Renssen’s approach, that is easily comparable with the techniques already shown for i.i.d samples in the last two days.

Eurostat Let A and B be two archives:  on the same population consisting of N units  Observing some common variables X and specific variable, Y in A and Z in B  The records in A and B have not identifiers (PIN) and the common variables X cannot be considered as unit identifiers This is still a statistical matching problem (examples are in DeGroot et al (1971) The CIA in a finite population context: the case of two data archives

Eurostat Let s=1,…,N denote the units in the population. Assume that X, Y, and Z are categorical, with I, J, and K categories respectively. The variable categories assumed by each unit in the population are described by these vectors Notation

Eurostat Notation

Eurostat As in the i.i.d. context, statistical matching can have a micro or macro purpose MACRO APPROACH: The objective is the estimation of the contingency matrix The statistical matching problem

Eurostat The conditional independence assumption

Eurostat One property is that the marginal distributions are preserved From the normal equations Linear dependence: properties

Eurostat Linear dependence: properties

Eurostat The (Y, Z) contingency table under the conditional independence assumption (CIA) is The true, but unknown, contingency table would be The residual matrix is null when Y and X or Z and X are perfectly correlated Note that also preserves the observed marginal distributions The conditional independence assumption

Eurostat Let A and B be 2 samples drawn from the same finite population according to a complex survey design with the following first and second order inclusion probabilities Let X be defined by two different kinds of variables  U: variables for which N U is known  V: variables for which N V is unknown X corresponds to the categorical variable whose categories are defined by the Cartesian product of all the common variables From archives to samples

Eurostat Estimates under the CIA

Eurostat Estimates under the CIA

Eurostat 5. Estimate combining the estimates obtained from A and B with their final weights 6. The regression coefficients are estimated respectively from A and B: Estimates under the CIA: macro approach

Eurostat 7. The estimate of the contingency table under the CIA, i.e. is: Estimates under the CIA: macro approach

Eurostat 8. Assuming A as the recipient, a preliminary imputed value for the missing Z is obtained throught the estimated regression function 9. As we already know, the value is not a live value and can be unrealistic. In this case, a live value cna be obtained through the use of an additional hot deck procedure (hence, a mixed procedure is used). Note that, given that in step 8 we obtained a complete data set, we can use a distance hot deck procedure with a distance applied on (X, Y, Z) or (Y, Z) Estimates under the CIA: micro approach

Eurostat Auxiliary information: presence of an additional file C

Eurostat Incomplete two-way stratification

Eurostat Synthetic two-way stratification

Eurostat 7. The synthetic two-way estimate is This method uses C only in order to correct what estimated via A and B under the CIA Synthetic two-way stratification

Eurostat Rubin (1986): file concatenation

Eurostat The new weights become Rubin (1986): file concatenation This approach can be difficult to be applied, for different reasons

Eurostat File concatenation: comments

Eurostat Hot deck and complex survey designs

Eurostat For simplicity assume that Compute The method consists of these three steps The weight-split algorithm

Eurostat The weight-split algorithm

Eurostat The weight-split algorithm

Eurostat  The marginal and joint distribution for (X, Y) are those observed in A  The marginal distribution of Z is that observed in B The weight-split algorithm: properties

Eurostat Selected references Morris H. DeGroot, Paul I. Feder and Prem K. Goel (1971): “Matchmaking”, The Annals of Mathematical Statistics, 42, No. 2 (Apr., 1971), pp Renssen R H (1998) “Use of Statistical Matching Techniques in Calibration Estimation", Survey Methodology, 24, 171–183 Rubin D B (1986) “Statistical Matching Using File Concatenation with Adjusted Weights and Multiple Imputations", Journal of Business and Economic Statistics, 4, 87–94 Liu T P, Kovacevic M S (1994) “Statistical matching of survey datafiles: a simulation study" Proceedings of the Section on Survey Research Methods of the American Statistical Association, 479–484