Evaluating Methods of Standard Error Estimation for Use with the Current Population Survey’s Public Use Data The Hawaii Coverage For All Technical Workshop.

Slides:



Advertisements
Similar presentations
Multiple Indicator Cluster Surveys Survey Design Workshop
Advertisements

Variance Estimation in EU-SILC Survey
Chapter 2 Samples and Populations
AADAPT Workshop South Asia Goa, December 17-21, 2009 Kristen Himelein 1.
Estimating the Size of the Uninsured and Other Vulnerable Populations in a Local Area Lynn A. Blewett, Ph.D. Timothy Beebe, Ph.D.
Estimates and sampling errors for Establishment Surveys International Workshop on Industrial Statistics Beijing, China, 8-10 July 2013.
1 Revisiting the SCHIP Funding Formula AcademyHealth National Health Policy Conference State Health Research and Policy Interest Group Meeting Washington.
11 ACS Public Use Microdata Samples of 2005 and 2006 – How to Use the Replicate Weights B. Dale Garrett and Michael Starsinic U.S. Census Bureau AAPOR.
QBM117 Business Statistics Statistical Inference Sampling 1.
Variance Estimation: Drawing Statistical Inferences from IPUMS-International Census Data Lara L. Cleveland IPUMS-International November 14, 2010 Havana,
© John M. Abowd 2005, all rights reserved Household Samples John M. Abowd March 2005.
Dr. Chris L. S. Coryn Spring 2012
Who and How And How to Mess It up
Sampling.
Sampling.
Bridging the Gaps: Dealing with Major Survey Changes in Data Set Harmonization Joint Statistical Meetings Minneapolis, MN August 9, 2005 Presented by:
Why sample? Diversity in populations Practicality and cost.
Chapter 11 Sampling Design. Chapter 11 Sampling Design.
ISSUES RELATED TO SAMPLING Why Sample? Probability vs. Non-Probability Samples Population of Interest Sampling Frame.
Formalizing the Concepts: Simple Random Sampling.
FINAL REPORT: OUTLINE & OVERVIEW OF SURVEY ERRORS
Sampling Design.
Key terms in Sampling Sample: A fraction or portion of the population of interest e.g. consumers, brands, companies, products, etc Population: All the.
Sample Design.
United Nations Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Amman, Jordan,
The new HBS Chisinau, 26 October Outline 1.How the HBS changed 2.Assessment of data quality 3.Data comparability 4.Conclusions.
Sampling : Error and bias. Sampling definitions  Sampling universe  Sampling frame  Sampling unit  Basic sampling unit or elementary unit  Sampling.
1 Lynn A. Blewett, Ph.D. Assistant Professor Tracy L. Johnson, Ph.D. President, Health Policy Solutions Adjunct Faculty, University of Colorado Timothy.
Sampling. Concerns 1)Representativeness of the Sample: Does the sample accurately portray the population from which it is drawn 2)Time and Change: Was.
Definitions Observation unit Target population Sample Sampled population Sampling unit Sampling frame.
Sampling: Theory and Methods
Chapter 7 Sampling and Sampling Distributions Sampling Distribution of Sampling Distribution of Introduction to Sampling Distributions Introduction to.
Multiple Indicator Cluster Surveys Survey Design Workshop Sampling: Overview MICS Survey Design Workshop.
Using the American Community Survey (ACS) Maryland Sate Data Center Affiliate Meeting April 4, 2007.
Design Effects: What are they and how do they affect your analysis? David R. Johnson Population Research Institute & Department of Sociology The Pennsylvania.
Secondary Data Analysis Linda K. Owens, PhD Assistant Director for Sampling and Analysis Survey Research Laboratory University of Illinois.
Metode Riset Akuntansi Measurement and Sampling. Measurement Measurement in research consists of assigning numbers to empirical events, objects, or properties,
1 Things That May Affect Estimates from the American Community Survey.
Scot Exec Course Nov/Dec 04 Survey design overview Gillian Raab Professor of Applied Statistics Napier University.
American Community Survey Maryland State Data Center Affiliate Meeting September 16, 2010.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Sampling Design and Analysis MTH 494 Lecture-30 Ossam Chohan Assistant Professor CIIT Abbottabad.
STANDARD ERROR Standard error is the standard deviation of the means of different samples of population. Standard error of the mean S.E. is a measure.
Growing Challenges to State Telephone Surveys of Health Insurance Coverage: Minnesota as a Case Study Supported by a grant from the Minnesota Department.
United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Bangkok,
Sampling Techniques 19 th and 20 th. Learning Outcomes Students should be able to design the source, the type and the technique of collecting data.
Biostatistics Unit 5 – Samples. Sampling distributions Sampling distributions are important in the understanding of statistical inference. Probability.
Things that May Affect the Estimates from the American Community Survey Updated February 2013.
BPS - 3rd Ed. Chapter 131 Confidence Intervals: The Basics.
AP STATISTICS LESSON AP STATISTICS LESSON DESIGNING DATA.
Introduction to Secondary Data Analysis Young Ik Cho, PhD Research Associate Professor Survey Research Laboratory University of Illinois at Chicago Fall,
Part III – Gathering Data
Chapter 6: 1 Sampling. Introduction Sampling - the process of selecting observations Often not possible to collect information from all persons or other.
Chapter 10 Sampling: Theories, Designs and Plans.
SAMPLE DESIGN: HOW MANY WILL BE IN THE SAMPLE—SAMPLE SIZE ADJUSTMENTS?
McGraw-Hill/IrwinCopyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved. SAMPLING Chapter 14.
 When every unit of the population is examined. This is known as Census method.  On the other hand when a small group selected as representatives of.
PEP-PMMA Training Session Sampling design Lima, Peru Abdelkrim Araar / Jean-Yves Duclos 9-10 June 2007.
Chapter 3 Surveys and Sampling © 2010 Pearson Education 1.
1 of 22 INTRODUCTION TO SURVEY SAMPLING October 6, 2010 Linda Owens Survey Research Laboratory University of Illinois at Chicago
Sampling technique  It is a procedure where we select a group of subjects (a sample) for study from a larger group (a population)
Basic Sampling Concepts Used in NAEP Andrew Kolstad National Center for Education Statistics May 20, 2006 Basic Sampling Concepts Used in NAEP Andrew Kolstad,
Sample Design of the National Health Interview Survey (NHIS) Linda Tompkins Data Users Conference July 12, 2006 Centers for Disease Control and Prevention.
United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Addis.
Sampling Design and Procedure
1 Safety Net Data Collection Strategies AHRQ User Liaison Program Washington, D.C. September 24, 2003 Supported by a grant from The Robert Wood Johnson.
Lecture 5.  It is done to ensure the questions asked would generate the data that would answer the research questions n research objectives  The respondents.
Sampling.
Graduate School of Business Leadership
Sampling.
Presentation transcript:

Evaluating Methods of Standard Error Estimation for Use with the Current Population Survey’s Public Use Data The Hawaii Coverage For All Technical Workshop Honolulu, Hawaii February 7, 2003 Presented by: Michael Davern, Ph.D. University of Minnesota Division of Health Services Research and Policy School of Public Health Supported by a grant from The Robert Wood Johnson Foundation

This paper is a Work in Progress Paper is co-authored with: James Lepkowski, University of Michigan Gestur Davidson, University of Minnesota/SHADAC Arthur Jones Jr., US Census Bureau Lynn A. Blewett, University of Minnesota/SHADAC Estimates have not cleared final Census Review –Estimates are therefore PRELIMINARY –We hope to present it at AAPOR in May of 2003

The Problem: CPS is a complex survey –Sample Design information is necessary to estimate appropriate standard errors –Important components of the sampling design are not released to the public Public use data are widely used by policy-makers and academics –Significance tests in research are likely biased due to standard error estimation –These significance tests provide important rules for “evidence” in the policy analysis and academic literature

The Result: Thus what constitutes “evidence” in policy analysis and academic journals—and the inferences drawn from that evidence--may not be valid In other words: What we know from research using Census Bureau public use data products may not be usefully accurate In a quick search we found over 50 journal articles in the top social science journals that used Census Bureau public Use data.

The Analysis: We identified four approaches to estimating the standard error on the public use data –The Simple Random Sample (SRS) approach – Generalized variance parameter (GVP) approach (Census Bureau’s Standard) –Robust variance estimation (aka sandwich estimator or Huber-White estimator) –Taylor Series with a stratum and cluster variable defined

The Data: The CPS uses a complex sampling design with the following features: –Country is divided into Primary Sampling Units A PSU is a county or group of contiguous counties “Self-representing” PSUs are Metro Areas that are selected with certainty Non-self-representing PSUs are sampled through a stratification process within each state –Within PSUs, a groups of housing units are identified and called Ultimate Sampling Units (USUs)

The Data: –On average 4 housing units are selected from a USU using a systematic sampling method –Information is collected on everyone within a selected household –Due to the rotation schedule, about 45 percent of the households that were interviewed in the monthly CPS were interviewed in the previous year during that month.

The Variables and Standard Error Estimation We run the state rates of health insurance coverage, and poverty. We also run the state average income We estimate the standard errors for these rates/averages in the following manner: –SRS uses normalized weights and conventional calculations to determine standard errors –GVP approach uses the parameters in the Source and Accuracy Statement from the Census Bureau to correct for the complex sampling design (this is the technique used by the Census).

Standard Error Estimation –Robust standard errors use the person weights to account for the degree of heterogeneity in the probability of selection –Taylor Series on the Public Use file uses the ‘Lowest’ level of identifiable geography as the stratum variable and household as the cluster variable Lowest level of identifiable geography is either: –(1) largest 250 MSAs, –(2) Other counties with over 100,000 in population, –(3) non-MSA and non-identified county within a state

The Standard Error “Standard” Ultimate Cluster Method is the current standard way to estimate standard errors for survey data –Taylor series combined with an identified ultimate cluster and stratum variable –The Ultimate cluster for the CPS is the PSU –We used the Census internal data that has the PSU identifiers In the Taylor Series the State is stratum and PSU is cluster (except DC)

These Results are Preliminary and Subject to Internal Census Bureau Review Please do not cite our work without permission

Findings Health Insurance Coverage on Average: –Robust is 8% larger than SRS –Taylor Series public use file is 54% larger than SRS –GVP is 17% smaller than SRS –Taylor Series on internal file is 138% larger than SRS

Findings Percent in Poverty on Average: –Robust is 7% larger than SRS –Taylor Series public use file is 77% larger than SRS –GVP is 81% larger than SRS –Taylor Series on internal file is 190% larger than SRS

Findings Individual (adult) Income on Average: –Robust is 6% larger than SRS –Taylor Series public use file is 7% larger than SRS –GVP is 154% percent smaller than SRS –Taylor Series on internal file is 123% larger than SRS

Discussion GVPs are all over the board compared to the Standard Error “Standard” –Std. Errors for Income are too high, for poverty too low and health insurance they are way too low Robust Std. Error estimates are consistently too small –The main cause of standard error inflation is not differential probability of selection but rather intra-cluster correlation To the extent households have a high intra-cluster correlation, then the Taylor Series is better than the 3 other public use file estimates –Poverty and health insurance have high intra-household correlations but not individual income

Discussion Larger states are likely to have increased numbers of PSUs in the Census Internal file than are recognized in the Public Use File (where we only see their aggregation) By their very construction, the increased number of PSUs result in more “within-PSU” homogeneity being recognized: –States with more PSU’s in the internal data have much higher Std. Errors (using the “Standard”) than currently being estimated –Greater homogeneity within PSUs or households reduces the “effective” sample size (there is less ‘independent’ information than the full sample size would suggest) Consequences of this especially with health insurance and poverty estimates, as expected.

Conclusion Census is not going to release PSU identifiers to public The data are widely used for important policy and academic research –The work done on public use file has biased standard errors and may not support inferences by meeting the statistical standard for evidence Therefore, I feel it is the responsibility of the Census Bureau to improve its GVPs or come up with a better substitute –What is currently offered is inadequate

SHADAC Contact Information University Avenue, Suite 345 Minneapolis Minnesota (612) Principal Investigator: Lynn Blewett, Ph.D. Co-Principal Investigator: Kathleen Call, Ph.D. Center Director: Kelli Johnson, M.B.A. Senior Research Associate: Timothy Beebe, Ph.D. Research Associate: Michael Davern, Ph.D.