1 Optimal Number of Replicates for Variance Estimation Mansour Fahimi, Darryl Creel, Peter Siegel, Matt Westlake, Ruby Johnson, and Jim Chromy Third International.

Slides:



Advertisements
Similar presentations
Variance Estimation in Complex Surveys Third International Conference on Establishment Surveys Montreal, Quebec June 18-21, 2007 Presented by: Kirk Wolter,
Advertisements

EVAULATION OF THE NSCRG SCHOOL SAMPLE Donsig Jang and Xiaojing Lin Third International Conference on Establishment Surveys Montreal, Canada, June 21, 2007.
Introduction Simple Random Sampling Stratified Random Sampling
9. Weighting and Weighted Standard Errors. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.
Sampling Strategy for Establishment Surveys International Workshop on Industrial Statistics Beijing, China, 8-10 July 2013.
NLSCY – Elements to take into account. Objectives of the Presentation zEmphasize the key elements to consider of when using NLSCY data.
Complex Surveys Sunday, April 16, 2017.
Mikaila Mariel Lemonik Arthur Data on Race and Education.
Dr. Chris L. S. Coryn Spring 2012
Bachelor’s Degrees Awarded by Race/Ethnicity * and Gender: *U.S. citizens and permanent residents. SOURCE: NSF/SRS, Science and Engineering Degrees,
Formalizing the Concepts: Simple Random Sampling.
Sampling ADV 3500 Fall 2007 Chunsik Lee. A sample is some part of a larger body specifically selected to represent the whole. Sampling is the process.
Understanding sample survey data
STRATIFIED SAMPLING DEFINITION Strata: groups of members that share common characteristics Stratified sampling: the population is divided into subpopulations.
Diversity Demographics United States and University of Washington Compiled by UW Department of Anthropology Diversity Committee.
Lecture 30 sampling and field work
STAT 572: Bootstrap Project Group Members: Cindy Bothwell Erik Barry Erhardt Nina Greenberg Casey Richardson Zachary Taylor.
NHANES Analytic Strategies Deanna Kruszon-Moran, MS Centers for Disease Control and Prevention National Center for Health Statistics.
Complexities of Complex Survey Design Analysis. Why worry about this? Many government studies use these designs – CDC National Health Interview Survey.
A Statistical Analysis of The University of Oregon’s Retention Rates for Minority Groups Zoe Grover & Joe Croson June, 2006 Economics 419.
Inference for regression - Simple linear regression
ICVS IN SLOVENIA Tatjana Škrbec. Content of presentation  Short history  Crime victim survey 2001 within SORS  Methodology and content of questionnaire.
1 For-Hire Survey Survey Design Recommendations Presented by Jim Chromy
Definitions Observation unit Target population Sample Sampled population Sampling unit Sampling frame.
Sampling Techniques LEARNING OBJECTIVES : After studying this module, participants will be able to : 1. Identify and define the population to be studied.
Trends in Higher Education Series Trends in College Pricing 2007.
Montecarlo Simulation LAB NOV ECON Montecarlo Simulations Monte Carlo simulation is a method of analysis based on artificially recreating.
1 Sampling Distributions Lecture 9. 2 Background  We want to learn about the feature of a population (parameter)  In many situations, it is impossible.
Optimal Allocation in the Multi-way Stratification Design for Business Surveys (*) Paolo Righi, Piero Demetrio Falorsi 
18b. PROC SURVEY Procedures in SAS ®. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.
Slide 1 Estimating Performance Below the National Level Applying Simulation Methods to TIMSS Fourth Annual IES Research Conference Dan Sherman, Ph.D. American.
Secondary Data Analysis Linda K. Owens, PhD Assistant Director for Sampling and Analysis Survey Research Laboratory University of Illinois.
Tobacco Use Supplement to the Current Population Survey User’s Workshop June 9, 2009 Tips and Tricks Analyzing TUS-CPS Data Lloyd Hicks Westat.
10/17/2015 State Board of Education 1 ANNUAL REPORT ON GIFTED AND TALENTED EDUCATION Academic Year
American Community Survey Maryland State Data Center Affiliate Meeting September 16, 2010.
LECTURE 3 SAMPLING THEORY EPSY 640 Texas A&M University.
Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Sources of Errors in Estimating Community Health Center Physicians Centers for Disease Control and Prevention National Center for Health Statistics Catharine.
A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Introduction to analysis of microarray data David Edwards.
Sampling Design and Analysis MTH 494 LECTURE-12 Ossam Chohan Assistant Professor CIIT Abbottabad.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Small-area estimation in Official Statistics: ICT survey in Enterprises of the Basque Country Jorge Aramendi, Jose Miguel Escalada, Elena Goni & Anjeles.
Student Debt Susan Choy MPR Associates Berkeley, California SHEEO Professional Development Conference Seattle, August 2005.
Assessment of Misclassification Error in Stratification Due to Incomplete Frame Information Donsig Jang, Xiaojing Lin, Amang Sukasih Mathematica Policy.
Stat 112: Notes 2 Today’s class: Section 3.3. –Full description of simple linear regression model. –Checking the assumptions of the simple linear regression.
Poverty Estimation in Small Areas Agne Bikauskaite European Conference on Quality in Official Statistics (Q2014) Vienna, 3-5 June 2014.
Trends in Higher Education Series 2006, October 24, The Price of College Sandy Baum Skidmore College and the College Board National.
A Comparison of Variance Estimates for Schools and Students Using Taylor Series and Replicate Weighting Ellen Scheib, Peter H. Siegel, and James R. Chromy.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 7 Sampling and Sampling Distributions.
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid 1 Training Workshop on the ICCS 2009 database Weighting and Variance Estimation picture.
Single-Factor Studies KNNL – Chapter 16. Single-Factor Models Independent Variable can be qualitative or quantitative If Quantitative, we typically assume.
1 Chapter 2: Sampling and Surveys. 2 Random Sampling Exercise Choose a sample of n=5 from our class, noting the proportion of females in your sample.
The U.S. Census Bureau’s Postcensal and Intercensal Population Estimates Alexa Jones-Puthoff Population Division National Conference on Health Statistics.
Analytical Example Using NHIS Data Files John R. Pleis.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
Bangor Transfer Abroad Programme Marketing Research SAMPLING (Zikmund, Chapter 12)
Statistics Canada Citizenship and Immigration Canada Methodological issues.
ICCS 2009 IDB Seminar – Nov 24-26, 2010 – IEA DPC, Hamburg, Germany Training Workshop on the ICCS 2009 database Weights and Variance Estimation picture.
CASE STUDY: NATIONAL SURVEY OF FAMILY GROWTH Karen E. Davis National Center for Health Statistics Coordinating Center for Health Information and Service.
Sampling technique  It is a procedure where we select a group of subjects (a sample) for study from a larger group (a population)
1 ALLOCATION. 2 DETERMINING SAMPLE SIZE Problem: Want to estimate. How choose n to obtain a margin of error not larger than e? Solution: Solve the inequality.
Statistical Weights and Methods for Analyzing HINTS Data HINTS Data Users Conference January 21, 2005 William W. Davis, Ph.D. Richard P. Moser, Ph.D. National.
Arun Srivastava. Variance Estimation in Complex Surveys Linearization (Taylor’s series) Random Group Methods Balanced Repeated Replication (BRR) Re-sampling.
NHANES Analytic Strategies Deanna Kruszon-Moran, MS Centers for Disease Control and Prevention National Center for Health Statistics.
Sample Design of the National Health Interview Survey (NHIS) Linda Tompkins Data Users Conference July 12, 2006 Centers for Disease Control and Prevention.
Small area estimation combining information from several sources Jae-Kwang Kim, Iowa State University Seo-Young Kim, Statistical Research Institute July.
Random sampling Carlo Azzarri IFPRI Datathon APSU, Dhaka
For a permutation test, we have H0: F1(x) = F2(x) vs
Presentation transcript:

1 Optimal Number of Replicates for Variance Estimation Mansour Fahimi, Darryl Creel, Peter Siegel, Matt Westlake, Ruby Johnson, and Jim Chromy Third International Conference on Establishment Surveys (ICES-III) June 21, 2007

2 Variance Estimation  Two general approaches for variance estimation With weighted data obtained under complex designs:  Linearization  Replication

3 Linearization  Approximate complex statistics in terms of L linear statistics  Estimate variance of from:

4 Replication  Partition the full sample into R subsamples (replicates)  Obtain separate estimates for  from each replicate:  Estimate variance of by:

5 How Many Replicates?  Recommendations regarding the optimal number of replicates for variance estimation are at variance:  Computational resources required can be intensive  For certain statistics a larger number of replicates might be needed to produce stable estimates of variance  What is the point of diminishing returns?

6 Research Methodology  Relying on two complex establishment surveys, this work presents an array of empirical results regarding the number of bootstrap replicates for variance estimation:  National Study of Postsecondary Faculty (NSOPF:04)  National Postsecondary Student Aid Study (NPSAS:04)

7 General Design Specifications National Study of Postsecondary Faculty (NSOPF:04)  Survey of about 35,000 faculty and instructional staff  Across a sample of 1,080 institutions  In the 50 States and the District of Columbia

8 Sampling Methodology  Institutions selected with probability proportional to a measure of size to over-represent:  Hispanic  Non-Hispanic Black  Asian and Pacific Islander  Full-time other female  Used RTI’s cost/variance optimization procedure for sample allocation

9 Institution Sampling Frame Degree GrantingCarnegie CodePublicPrivateTotal Doctor’s15, 16, Master’s21, Bachelor’s31, 32, Associate’s40, 601, ,180 Other/Unknown 51, 53 – 59, unclassified Total 1,7001,6803,380

10 Institution Sample Degree GrantingPublicPrivateTotal Doctor’s Master’s Bachelor’s Associate’s Other Total ,080

11 Expected Faculty Counts From Sampled Institutions by Strata NSOPF stratum BlackHispanicAsianOFTFOFTMOPTTotal Public, doctor’s 10,7208,66032,63058,870115,83051,110277,820 Public, master's 4,6703,1504,95014,12020,44022,13069,460 Public, bachelor’s ,4302,1103,8809,090 Public, associate’s 12,2509,2406,10021,10021,70082,570152,960 Public, other ,150 Private not-for-profit, doctor’s 6,0603,76013,11021,49047,37033,280125,080 Private not-for-profit, master's 1, ,0204,9307,02012,53027,550 Private not-for-profit, bachelor’s 1, ,9206,2705,44018,050 Private not-for-profit, Associate’s ,180 Private not-for-profit, other ,6802,7005,880 Total 37,48026,71059,460127,120223,500214,940689,210

12 Target Number of Respondents by Institution and Faculty Strata Institution stratumRespondentsFaculty stratumRespondents Public doctor’s6,200Non-Hispanic Black1,600 Public master’s2,700Hispanic1,300 Public bachelor’s600Asian900 Public associate’s7,500Other full-time female4,600 Public other500Other full-time male8,300 Private not-for-profit doctor’s2,600Other part-time7,800 Private not-for-profit master’s1,900 Private not-for-profit bachelor’s1,700 Private not-for-profit associate’s100 Private not-for-profit other700 Total24,500

13 Distribution of Respondents ( by institution and faculty strata) Institution stratumRespondentsFaculty stratumRespondents Public doctor’s7,460Non-Hispanic Black2,060 Public master’s2,680Hispanic1,700 Public bachelor’s450Asian1,610 Public associate’s6,410Other full-time female5,850 Public other110Other full-time male8,500 Private not-for-profit doctor’s3,160Other part-time6,380 Private not-for-profit master’s2,270 Private not-for-profit bachelor’s2,520 Private not-for-profit associate’s190 Private not-for-profit other850 Total26,110Total26,110

14 Variance Estimation Methodology (NSOPF:04)  Used methodology developed by Kaufman (2004) to create bootstrap replicate weights:  Reflected finite population correction adjustment for the first stage (institution) selection.  Second stage (faculty selection) finite population correction factors were close to one and not reflected.  Produced 65 bootstrap replicates to meet Data Analysis System (DAS) requirements of NCES.  Calculated standard error of several statistics using the above bootstrap replicates and Taylor linearization method in SUDAAN.

15 Comparisons of Variance Estimates SE of Percent Teaching as Principal Activity by Rank (Bootstrap vs. Linearization)

16 Comparisons of Variance Estimates SE of Percent Research as Principal Activity by Rank (Bootstrap vs. Linearization)

17 Comparisons of Variance Estimates SE of Percent Administration as Principal Activity by Rank (Bootstrap vs. Linearization)

18 Comparisons of Variance Estimates SE of Percent Full-time by Institution Type (Bootstrap vs. Linearization)

19 Revised Variance Estimation Methodology (NSOPF:04)  Used methodology developed by Kaufman (2004) to create 200 bootstrap replicate weights.  Used 10, 11, …., 200 replicates to estimate relative standard error (RSE) of different statistics.  Repeated the above using 9 random permutations of replicates to estimate RSE of the same statistics.  Used Taylor linearization to estimate relative standard error of estimates via SUDAAN.

20 RSE of Percent Asians by Number of Replicates

21 RSE of Percent Asians by Number of Replicates (Taylor Linearization and Permutations of Replicates)

22 RSE of Percent Age < 35 by Number of Replicates

23 RSE of Percent Age < 35 by Number of Replicates (Taylor Linearization and Permutations of Replicates)

24 RSE of Percent Citizen by Number of Replicates

25 RSE of Percent Citizen by Number of Replicates (Taylor Linearization and Permutations of Replicates)

26 RSE of Percent Full-time by Number of Replicates

27 RSE of Percent Full-time by Number of Replicates (Taylor Linearization and Permutations of Replicates)

28 RSE of Percent Master’s by Number of Replicates

29 RSE of Percent Master’s by Number of Replicates (Taylor Linearization and Permutations of Replicates)

30 RSE of Percent Teaching as Principal Activity by Number of Replicates

31 RSE of Percent Teaching as Principal Activity by Number of Replicates (Taylor Linearization and Permutations of Replicates)

32 RSE of Mean Income by Number of Replicates

33 RSE of Mean Income by Number of Replicates (Taylor Linearization and Permutations of Replicates)

34 RSE of Median Income by Number of Replicates

35 RSE of Median Income by Number of Replicates (Taylor Linearization and Permutations of Replicates)

36 RSE of Regression Intercept Income = Hours + Race + Hours  Race

37 RSE of Regression Intercept Income = Hours + Race + Hours  Race (Taylor Linearization and Permutations of Replicates)

38 RSE of Regression Slope (Hours) Income = Hours + Race + Hours  Race

39 RSE of Regression Slope (Hours) Income = Hours + Race + Hours  Race (Taylor Linearization and Permutations of Replicates)

40 RSE of Regression Slope (Race) Income = Hours + Race + Hours  Race

41 RSE of Regression Slope (Race) Income = Hours + Race + Hours  Race (Taylor Linearization and Permutations of Replicates)

42 RSE of Regression Slope (Hours  Race) Income = Hours + Race + Hours  Race

43 RSE of Regression Slope (Hours  Race) Income = Hours + Race + Hours  Race (Taylor Linearization and Permutations of Replicates)

44 Conclusions (Rough & Interim)  Complex statistics do require more replicates for stable variance estimation  It seems that:  64 replicates might be inadequate  200 replicates seem to be overkill  Somewhere between 100 to 200 replicates might be sufficient