The European Statistical Training Programme (ESTP)

Slides:



Advertisements
Similar presentations
Econ 140 Lecture 151 Multiple Regression Applications Lecture 15.
Advertisements

Chapter 4 Multiple Regression.
Chapter 11 Multiple Regression.
STAT262: Lecture 5 (Ratio estimation)
Stratified Simple Random Sampling (Chapter 5, Textbook, Barnett, V
1 Ratio estimation under SRS Assume Absence of nonsampling error SRS of size n from a pop of size N Ratio estimation is alternative to under SRS, uses.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Sampling Design and Analysis MTH 494 LECTURE-12 Ossam Chohan Assistant Professor CIIT Abbottabad.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
통계적 추론 (Statistical Inference) 삼성생명과학연구소 통계지원팀 김선우 1.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 7-1 Chapter 7 Sampling Distributions Basic Business Statistics.
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid 1 Training Workshop on the ICCS 2009 database Weighting and Variance Estimation picture.
Eurostat Weighting and Estimation. Presented by Loredana Di Consiglio Istituto Nazionale di Statistica, ISTAT.
Topics Semester I Descriptive statistics Time series Semester II Sampling Statistical Inference: Estimation, Hypothesis testing Relationships, casual models.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Multiple Regression Chapter 14.
Guillaume Osier Institut National de la Statistique et des Etudes Economiques (STATEC) Social Statistics Division Construction.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
Statistics for Business and Economics 7 th Edition Chapter 7 Estimation: Single Population Copyright © 2010 Pearson Education, Inc. Publishing as Prentice.
32931 Technology Research Methods Autumn 2017 Quantitative Research Component Topic 4: Bivariate Analysis (Contingency Analysis and Regression Analysis)
Chapter 14 Introduction to Multiple Regression
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 7. Classification and Prediction
Regression Analysis AGEC 784.
Sampling Why use sampling? Terms and definitions
Introduction to Regression Analysis
Inference and Tests of Hypotheses
Sampling Population: The overall group to which the research findings are intended to apply Sampling frame: A list that contains every “element” or.
Introduction to estimation: 2 cases
Chapter 11: Simple Linear Regression
Correlation – Regression
Graduate School of Business Leadership
Chapter 5 STATISTICS (PART 4).
STRATIFIED SAMPLING.
Correlation and Regression
The European Statistical Training Programme (ESTP)
Chapter 7 Sampling Distributions
Prepared by Lee Revere and John Large
The European Statistical Training Programme (ESTP)
Random sampling Carlo Azzarri IFPRI Datathon APSU, Dhaka
The European Statistical Training Programme (ESTP)
The European Statistical Training Programme (ESTP)
Chapter 8: Weighting adjustment
Chapter 12: Other nonresponse correction techniques
The European Statistical Training Programme (ESTP)
Chapter 11: Adjustment for different types of nonresponse
Chapter 1 The Where, Why, and How of Data Collection
Chapter 1: Basic concepts of surveys
Chapter 10: Selection of auxiliary variables
Chapter 8: Estimating with Confidence
The European Statistical Training Programme (ESTP)
15.1 The Role of Statistics in the Research Process
Chapter 8: Estimating with Confidence
Chapter: 9: Propensity scores
Chapter 3: Response models
Chapter 8: Estimating with Confidence
New Techniques and Technologies for Statistics 2017  Estimation of Response Propensities and Indicators of Representative Response Using Population-Level.
Sampling and estimation
Chapter 8: Estimating with Confidence
STA 291 Spring 2008 Lecture 13 Dustin Lueker.
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Introduction to Regression
Chapter 6: Measures of representativity
The European Statistical Training Programme (ESTP)
Chapter 13: Item nonresponse
Chapter 5: The analysis of nonresponse
Use of Auxiliary Information (morning) October 2015 Jorge M
Presentation transcript:

The European Statistical Training Programme (ESTP)

Chapter 8: Weighting adjustment Handbook: chapter 8 What is weighting adjustment? Post-stratification Linear weighting Multiplicative weighting Calibration estimation Other weighting issues An example

Introduction What is weighting adjustment? Assignment of weights to observed (responding) persons. Use of weighted values to compute estimates. Why weighting? Reducing of the bias due to nonresponse. Increasing the precision of estimates (decreasing the variance). Required ingredients: auxiliary variables Usually categorical variables. Strongly correlated with target variables of the survey. Individual values are measured in the survey. Distribution in population (or full sample) must be available.

Introduction Principle Make response representative with respect to auxiliary variables. If auxiliary variables are correlated with target variable, then the sample will also be representative with respect to target variable. Use as much as possible auxiliary variables. Auxiliary variables Usually only limited number available (statistical institute). Examples: age, gender, marital status, region. They are not the most effective ones. Statistics Netherlands has many more auxiliary variables in the Social Statistical Database (SSD).

Introduction Adjustment weighting to correct for unit-nonresponse Use of auxiliary information A set of variables that have been measured in the survey and for which information on the population distribution is available Calculate adjustment weights Example: inclusion weight ci = 1 / πi This can also be written as Where wi is the inclusion weight ci times a correction weight di.

Post-stratification Suppose auxiliary variable X has L categories. It divides the population U into L strata U1, U2, …, UL. The number of elements in stratum Uh is denoted by Nh for h=1,2,...,L. So N = N1 + N2 + ... + NL The sample consists of n elements that can also be divided into the same strata, then n = n1 + n2 + ... + nL In case of simple random sampling without replacement With inclusion probabilities ci = n / N the post-stratification estimator becomes The estimator is equal to a weighted sum of sample stratum means

Post-stratification In case of nonresponse the post-stratification estimator becomes The bias of this estimator is equal to This can also be written as

Post-stratification A closer look at the bias: The bias is small if the biases within the strata are small This is the case when within strata there is little or no relationship between the target variable and the response behaviour. all response probabilities are more or less equal. all values of the target variable are more or less equal. Therefore, it is important to construct homogeneous strata.

Post-stratification Example: Sex  Age Weight of a young female: 0.209 / 0.150 = 1.393 Population Sample Male Female Total Young 226 209 435 23 15 38 Middle 152 144 296 16 17 33 Elderly 133 136 269 13 29 511 480 1000 52 48 100 Weights 0.983 1.393 0.950 0.847 1.023 0.850

Linear weighting Who not post-stratification? Many auxiliary variables: too few (or no) observations per strata. Lack of sufficient population information. Solution Linear or multiplicative weighting Population Sample Male Female Total Young ? 435 23 15 38 Middle 296 16 17 33 Elderly 269 13 29 511 480 1000 52 48 100 Weights

Linear weighting Generalised regression estimator – full response with vector of population means of auxiliary variables vector of sample means of auxiliary variables b vector of regression coefficients: The estimator is asymptotically design unbiased (ADU) Variance

Linear weighting Generalised regression estimator - nonresponse Bias: with The bias is small if Residuals are small, i.e. regression model fits well. There is little correlation between residuals and response behaviour (MAR).

Linear weighting Generalised regression estimator = Weighting Estimator can be re-written as: (under general conditions) Consequently with and v a vector of weight coefficients:

Linear weighting Post-stratification Replace each qualitative auxiliary variable by a set of dummy variables. Use these dummy variables in regression model Example: one qualitative variable with L categories (strata) Introduce L dummy variables X1, X2, …, XL. Xh = 1 for an observation in stratum h, and 0 otherwise Vector of population means Vector of weight coefficients

Linear weighting Post-stratification - example Two auxiliary variables: Sex  AgeClass Weight for young female = 1.393 Sex AgeClass X1 X2 X3 X4 X5 X6 Male Young 1 Middle Elderly Female Population means 0.226 0.152 0.133 0.209 0.144 0.136 Weight coefficients 0.983 0.950 1.023 1.393 0.847 0.850

Linear weighting Use only marginal distributions Two auxiliary variables: Sex + AgeClass Weight for young female = 0.991 + 0.033 + 0.161 = 1.185 Sex AgeClass X1 X2 X3 X4 X5 X6 Male Young 1 Middle Elderly Female Population means 1.000 0.511 0.489 0.435 0.296 0.269 Weight coefficients 0.991 -0.033 0.033 0.161 -0.095 -0.066

Linear weighting Many possible weighting models with more than two variables For example: three variables Sex, AgeClass, MarStat Models: Sex  AgeClass  MarStat (Sex  AgeClass) + (AgeClass  MarStat) + (Sex  MarStat) Sex + AgeClass + MarStat And many more … (Sex  MarStat) + AgeClass …

Linear weighting Qualitative and quantitative auxiliary variables Examples: Age, Age + Sex, Age  Sex Age Sex X1 X2 X3 X4 X5 X6 65 Male 1 36 73 Female 6 33 82 2 32 66 Population means 1.000 34.369 0.511 0.489 33.509 35.268 Weight coefficients 1 1.101 -0.003 Weight coefficients 2 -0.032 0.032 Weight coefficients 3 1.087 -0.001 -0.004

Multiplicative weighting Alternative for linear weighting. Qualitative auxiliary variables only. Difference: weight is not a linear combination of weight coefficients but a product of weight factors. Iterative process: Weight model: (A  B  C) + (D  E) + (F  G  H) Step 1: Introduce weight factors for each stratum in each cross-classification term. Set factors to 1. Step 2: Adjust factors for term 1, so that weighted sample distribution is equal to population distribution for variables involved. Step 3: Adjust factors for next term. This may disturb previous factors. Step 4: To this for all terms in the model. Step 5: Repeat steps 2-4 until factors do not change any more.

Multiplicative weighting Example: Sex + AgeClass Start situation Weight for young female = 1.000 1.000 = 1.000 Starting situation Male Female Weight factor Weighted sum Population distribution Young 0.230 0.150 1.000 0.380 0.435 Middle 0.160 0.170 0.330 0.296 Elderly 0.130 0.290 0.269 0.520 0.480 Popul. distr. 0.511 0.489

Multiplicative weighting Example: Sex + AgeClass Adjustment for Age Weight for young female = 1.000 1.145 = 1.145 Step 1 Male Female Weight factor Weighted sum Population distribution Young 0.230 0.150 1.145 0.435 Middle 0.160 0.170 0.897 0.296 Elderly 0.130 0.928 0.269 1.000 0.527 0.473 Popul. distr. 0.511 0.489

Multiplicative weighting Example: Sex + AgeClass Adjustment for Sex Weight for young male = 1.035 1.145 = 1.185 Step 2 Male Female Weight factor Weighted sum Population distribution Young 0.230 0.150 1.145 0.433 0.435 Middle 0.160 0.170 0.897 0.297 0.296 Elderly 0.130 0.928 0.270 0.269 0.969 1.035 0.511 0.489 1.000 Popul. distr.

Multiplicative weighting Example: Sex + AgeClass Final situation after convergence Weight for young female = 1.151 1.035 = 1.191 Step … Male Female Weight factor Weighted sum Population distribution Young 0.230 0.150 1.151 0.435 Middle 0.160 0.170 0.895 0.296 Elderly 0.130 0.923 0.269 0.968 1.035 0.511 0.489 1.000 Popul. distr.

Linear or multiplicative weighting? Advantages of linear weighting: Linear weighting based on regression model. Analytic formula for variances of estimators. Both qualitative and quantitative auxiliary variables. Disadvantage of linear weighting: Resulting weights may be negative. However: Both weighting methods produce estimates that are asymptotically equal.

Calibration estimation How to compare different weighting methods? Calibration estimation offers general framework for weighting methods General formulas for properties of methods like asymptotic distributions. Linear and multiplicative weighting are special cases Idea: Calibrate known auxiliary characteristics while affecting response as little as necessary Ingredients A distance measure D. Calibration of auxiliary variables X.

Calibration estimation Strategy Minimize under the constraint Examples Linear weighting: Multiplicative weighting: Without nonresponse both methods have the same asymptotic properties. With nonresponse the effectiveness depends on the validity of the underlying model .

Other weighting issues Consistency between person and household weights A survey may be used to make statistics about persons as well as about households. The person weights need not sum up to the household weights. Two sets of weights are impractical. Unsatisfactory solutions The household weight equals the weight of a reference person or randomly selected person in the household. Use the average person weight for the household. Generalised regression estimation

Other weighting issues Generalised regression estimation Auxiliary information at person level, X Household membership, H Auxiliary information at household level, Z = H’X Person 0-20 20-60 > 60 Male Female 1 2 3 4 Person H1 H2 H3 . . . 1 2 3 4 Household 0-20 20-60 > 60 Male Female 1 2

A practical example Selection of a weighting model Identification of auxiliary variables Collection of population totals Selection of weighting variables Weighting Example - Step 1 Available auxiliary variables: sex, age, marital status, province of residence and degree of urbanisation

A practical example Collection of population totals Ideal: complete crossing of all auxiliary variables. In practice totals of complete crossing often not available. Empty cells. Example - Step 2 Available: Age  Sex  Marital status, Age  Province and Age  Degree of urbanisation Age Male Female Unmarried Married Widowed Divorced 12-19 752.4 0.4 0.0 716.5 3.5 20-29 981.5 185.7 0.2 10.2 785.0 330.6 0.7 22.7 30-39 445.4 795.1 1.9 72.1 283.5 879.3 5.7 93.8 40-49 164.7 899.0 6.9 113.9 103.1 882.9 21.5 138.4 50-59 67.3 732.9 15.8 86.3 44.4 675.9 56.1 98.8 60-69 42.0 519.2 31.7 42.6 41.4 458.9 140.0 51.5 70-79 21.4 308.4 52.5 16.6 43.0 239.9 254.3 27.9 80+ 8.0 84.0 50.4 4.0 35.0 49.6 243.9 12.4

A practical example How to select weighting variables? Relation to nonresponse Relation to key survey topics Variables that are used as marginal variables in statistical tables Compute contingency tables (and possibly test for independence) Build model for nonresponse using auxiliary variables Select survey variables that cover range of topics in survey Compute contingency tables Build models for survey variables using auxiliary variables Take union of sets of auxiliary variables in models

A practical example Example - Step 3 Age Resp Pop Diff Province 12-19 12.8 11.1 1.7 Groningen 2.7 3.5 -0.8 20-29 15.9 17.5 -1.6 Friesland 4.3 3.9 0.4 39-39 20.5 19.4 1.1 Drenthe 2.3 3.0 -0.7 40-49 17.9 17.6 0.3 Overijssel 6.8 6.7 0.1 50-59 14.0 13.4 0.6 Flevoland 1.8 60-69 10.0 0.0 Gelderland 15.4 12.1 3.3 70-79 6.5 7.3 Utrecht 5.4 6.9 -1.5 80+ 2.5 3.7 -1.2 N-Holland 16.1 -2.1 Z-Holland 18.0 21.5 -3.5 Zeeland 2.4 N-Brabant 14.8 2.8 Limburg 9.1 7.4

A practical example Example - Step 3 Mar. status Resp Pop Diff Urbanisation Unmarried 32.7 34.2 -1.5 Very strong 11.8 18.0 -6.2 Married 57.2 53.2 4.0 Strong 24.0 23.8 0.2 Widowed 5.2 6.0 -0.8 Moderate 23.2 20.5 2.7 Divorced 4.9 6.7 -2.8 Little 23.3 21.1 2.2 Non 17.7 16.5 1.2 Sex Male 48.6 49.1 -0.5 Female 51.4 50.9 0.5

A practical example Computation of weights Construct weighting models with candidate auxiliary variables as ingredients for a number of key survey topics Control variance of estimates Example - Step 4 Weighting model Parameters Estimate Standard error 1 No weighting 43.4 1.2 2 Sex 3 Prov 12 43.3 4 MarStat 42.9 5 Urban 1.0 6 Age8 8 42.8 7 Age5  Province 60 (Sex  Age8) + (Sex  MarStat) 22 42.3 1.1 9 Age5  Urban 25 42.5 10 Sex + Age8 + MarStat + Urban + Province 23 42.1 11 (Sex  Age8) + (Sex  MarStat) + (Age5  Urban) + Province 53 42.0 0.9