Nonparametric estimation of non- response distribution in the Israeli Social Survey Yury Gubman Dmitri Romanov JSM 2009 Washington DC 4/8/2009.

Slides:



Advertisements
Similar presentations
Overview of Lecture Parametric vs Non-Parametric Statistical Tests.
Advertisements

C82MST Statistical Methods 2 - Lecture 2 1 Overview of Lecture Variability and Averages The Normal Distribution Comparing Population Variances Experimental.
Inference in the Simple Regression Model
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
STATISTICAL INFERENCE ABOUT MEANS AND PROPORTIONS WITH TWO POPULATIONS
Chapter 7 Sampling and Sampling Distributions
1 The Social Survey ICBS Nurit Dobrin December 2010.
Introduction Simple Random Sampling Stratified Random Sampling
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 22 Comparing Two Proportions.
Chapter 4 Inference About Process Quality
Comparing Two Population Parameters
Statistical Inferences Based on Two Samples
T-tests continued.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 9 Inferences Based on Two Samples.
Statistics Review – Part II Topics: – Hypothesis Testing – Paired Tests – Tests of variability 1.
Hypothesis Testing Steps in Hypothesis Testing:
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Outline input analysis input analyzer of ARENA parameter estimation
1 Chapter 9 Hypothesis Testing Developing Null and Alternative Hypotheses Type I and Type II Errors One-Tailed Tests About a Population Mean: Large-Sample.
PSY 307 – Statistics for the Behavioral Sciences
Elementary hypothesis testing
1/55 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 10 Hypothesis Testing.
Elementary hypothesis testing Purpose of hypothesis testing Type of hypotheses Type of errors Critical regions Significant levels Hypothesis vs intervals.
Topic 2: Statistical Concepts and Market Returns
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc.Chap 9-1 Statistics for Managers Using Microsoft® Excel 5th Edition.
Part III: Inference Topic 6 Sampling and Sampling Distributions
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 7 th Edition Chapter 9 Hypothesis Testing: Single.
7-2 Estimating a Population Proportion
Inferences About Process Quality
Chapter 9 Hypothesis Testing II. Chapter Outline  Introduction  Hypothesis Testing with Sample Means (Large Samples)  Hypothesis Testing with Sample.
Chapter 10, sections 1 and 4 Two-sample Hypothesis Testing Test hypotheses for the difference between two independent population means ( standard deviations.
Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and.
Statistics for Managers Using Microsoft® Excel 5th Edition
Chapter 11a: Comparisons Involving Proportions and a Test of Independence Inference about the Difference between the Proportions of Two Populations Hypothesis.
Hypothesis Testing and T-Tests. Hypothesis Tests Related to Differences Copyright © 2009 Pearson Education, Inc. Chapter Tests of Differences One.
Two Sample Tests Ho Ho Ha Ha TEST FOR EQUAL VARIANCES
Week 9 Chapter 9 - Hypothesis Testing II: The Two-Sample Case.
Hypothesis Testing II The Two-Sample Case.
Copyright © 2012 by Nelson Education Limited. Chapter 8 Hypothesis Testing II: The Two-Sample Case 8-1.
Fundamentals of Hypothesis Testing: One-Sample Tests
Chapter 9 Statistical Data Analysis
Chapter 9.3 (323) A Test of the Mean of a Normal Distribution: Population Variance Unknown Given a random sample of n observations from a normal population.
Education 793 Class Notes T-tests 29 October 2003.
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
Week 8 Fundamentals of Hypothesis Testing: One-Sample Tests
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Chapter 9 Hypothesis Testing and Estimation for Two Population Parameters.
QMS 6351 Statistics and Research Methods Regression Analysis: Testing for Significance Chapter 14 ( ) Chapter 15 (15.5) Prof. Vera Adamchik.
Copyright ©2011 Pearson Education 7-1 Chapter 7 Sampling and Sampling Distributions Statistics for Managers using Microsoft Excel 6 th Global Edition.
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.
Chapter 22: Comparing Two Proportions. Yet Another Standard Deviation (YASD) Standard deviation of the sampling distribution The variance of the sum or.
Slide Slide 1 Section 8-4 Testing a Claim About a Mean:  Known.
Chap 8-1 Fundamentals of Hypothesis Testing: One-Sample Tests.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 7-1 Chapter 7 Sampling and Sampling Distributions Basic Business Statistics 11 th Edition.
Basic Business Statistics
Hypothesis Testing. Suppose we believe the average systolic blood pressure of healthy adults is normally distributed with mean μ = 120 and variance σ.
Chapter 8 Estimation ©. Estimator and Estimate estimator estimate An estimator of a population parameter is a random variable that depends on the sample.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Inferences Concerning Means.
STA248 week 121 Bootstrap Test for Pairs of Means of a Non-Normal Population – small samples Suppose X 1, …, X n are iid from some distribution independent.
Introduction For inference on the difference between the means of two populations, we need samples from both populations. The basic assumptions.
Chapter 12 Chi-Square Tests and Nonparametric Tests
Chapter 4. Inference about Process Quality
Since everything is a reflection of our minds,
Chapter 9 Hypothesis Testing.
CONCEPTS OF ESTIMATION
9 Tests of Hypotheses for a Single Sample CHAPTER OUTLINE
Review of Chapter 10 Comparing Two Population Parameters
Presentation transcript:

Nonparametric estimation of non- response distribution in the Israeli Social Survey Yury Gubman Dmitri Romanov JSM 2009 Washington DC 4/8/2009

2 Outline 1.Missing data generating mechanisms 2.Sharp bounds for conditional mean 3.Tests for MCAR and MAR assumptions 4.Empirical results - Israeli Social Survey Conclusions

3 Missing data generating mechanisms  Missing Completely At Random (MCAR): Non-respondent's data is ignorable: (1)  Missing At Random (MAR): Conditionally on some set of covariates, the non-respondent's data is ignorable. 1.Non-respondent's data provides no additional information about conditional distribution of y: (2) 2.Given a set of survey design covariates X, the probability of y to be missing does not depend on y: (3)  MAR assumption can not be tested statistically using the survey data only, because the non-respondents data is not available.  We overcome this difficulty by conditioning on the full-known administrative covariates of census type.

4 Sharp bounds for conditional mean  Let z=1 for interview respondents, and 0 otherwise.  Let w=1 for item respondents, and 0 otherwise.  Let be survey strata, y - survey variable and x - covariate.  By the Law of Iterated Expectations: (4)  By Bayes theorem: (5)  Using covariates from the administrative sources of census type allows us to assume: 1.There is no item non-response on the covariates x. 2.P(x), the overall population distribution x, is known.

5 Sharp bounds for conditional mean  In addition: (6)  Combine (4), (5) and (6).  The data reveals nothing at all about and.  The lower and upper bounds are obtained by minimization and maximization, respectively, of the result expression with respect to all unobserved values.  Minimum and maximum exist because all survey variables are bounded.

6 Sharp bounds for conditional mean  For full survey data:, where:  The width of the interval between the bounds reflects both item and survey non-response.

7 Sharp bounds for conditional mean  For item non-response analysis, the respondent's data should be treated. In this case, formula for sharp bounds may be simplified:  The width of the interval between the bounds reflects item non- response only.  Nothing was assumed about the true missing data generating mechanism

8 Testing MCAR and MAR  Define:  The explicit expression for is given by:  and for by:  and are asymptotically normal, and do not depend on and on sample size.  Their standard deviations can be estimated using bootstrap.  T-test for equal means, for two population with unknown and different variances, is used for checking the null hypothesis in the following cases.

9 Testing MCAR  H 0 for testing overall non-response is given by: and for item non-response:  If the H 0 is rejected, for some i, j, we will conclude that the probability to be non-respondent depends on x. In particular, MCAR assumption is violated.

10 Testing MAR (1)  Let be a variable from the administrative database, which is strongly correlated with key survey variable y, and/or with survey topic. In such case, may be treated as a survey variable, and it is known for all sampled units.  Under MAR, the respondent's data is sufficient to estimate conditional population distributions for all survey variables, and in particular for.  The null hypothesis is given by:  If null hypothesis is rejected, survey non-response distribution depends on survey topic and/or on survey variable, and this contradicts MAR.

11 Testing MAR (2)  Use MAR definition:.  Let X be a set of survey design covariates, which "controls" a bias in survey variables (due to non-response).  Let be categorical full-observed and orthogonal to X covariate. Assuming MAR and conditional on X, the survey non-response rates should be independent of.  H 0 for testing MAR assumption is given by: for overall non-response, and for item non-response.  Rejection of H 0 means that, conditionally on the set of survey design covariates X, MAR assumption is violated. If is strongly correlated with some survey variable, rejection of H 0 means that the non-response depends on the survey variable or/and survey topic.

12 Israeli Social Survey 2006  The Israeli Social Survey (ISS) has been conducted annually since 2002 on a sample of persons aged 20 and older. The main purpose of the ISS is to provide up-to-date information on the welfare of Israelis and on their living conditions. The ISS is the first survey conducted by ICBS using the Population Register as a sampling frame.  The sample size in 2006 was 9,499 persons. 562 persons did not belong to the sample frame (deceased, were abroad for over a year), and the final sample included 8,937 persons. 1,648 did not respond the survey (18.4 percent of the final sample).

13 Israeli Social Survey 2006  Four key ISS variables were chosen: 1.Worked last week (4 categories) - no item non-response; 2.Optimism – general (3 categories) - item non-response rate is 11.0 percent; 3.Gross salary from all places of work (10 categories) - item non- response rate is 5.3 percent; 4.Degree of religiosity – Jews (5 categories) - item non-response rate is 0.5 percent.  We use three administrative covariates which are highly correlated with survey topic and some important survey variables: 1.Reported income from work (Tax Authority); 2.Work status (Tax Authority); 3.Degree of religiosity of Jews (derived from the educational databases).

14 Empirical results – testing covariate's conditional distributions Respondent ’ s and non-respondent ’ s distributions significantly differ

15 Empirical results: testing MAR for interview non-response H 0 is rejected, p-value<0.01 for three covariates

16 Empirical results testing MAR for item non-response H 0 is rejected, p-value<0.01 for three covariates

17 Conclusions  We propose nonparametric statistical tests for checking validity of MCAR and MAR assumptions, where the test statistics are based on the width of the interval between the estimated sharp bounds for conditional mean.  Significant departures from MAR assumption were found in the ISS 2006 data. Non-response propensity varies significantly between population groups assumed to be homogenous according to the survey design.  ISS survey design can be improved using available administrative covariates, such as income, labor market status, and degree of religiosity of Jews.

18

19 Yury Gubman Senior Coordinator Israeli Central Bureau of Statistics Tel. 972 (2) Fax 972 (2)