Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Optimal Number of Replicates for Variance Estimation Mansour Fahimi, Darryl Creel, Peter Siegel, Matt Westlake, Ruby Johnson, and Jim Chromy Third International.

Similar presentations


Presentation on theme: "1 Optimal Number of Replicates for Variance Estimation Mansour Fahimi, Darryl Creel, Peter Siegel, Matt Westlake, Ruby Johnson, and Jim Chromy Third International."— Presentation transcript:

1 1 Optimal Number of Replicates for Variance Estimation Mansour Fahimi, Darryl Creel, Peter Siegel, Matt Westlake, Ruby Johnson, and Jim Chromy Third International Conference on Establishment Surveys (ICES-III) June 21, 2007

2 2 Variance Estimation  Two general approaches for variance estimation With weighted data obtained under complex designs:  Linearization  Replication

3 3 Linearization  Approximate complex statistics in terms of L linear statistics  Estimate variance of from:

4 4 Replication  Partition the full sample into R subsamples (replicates)  Obtain separate estimates for  from each replicate:  Estimate variance of by:

5 5 How Many Replicates?  Recommendations regarding the optimal number of replicates for variance estimation are at variance:  Computational resources required can be intensive  For certain statistics a larger number of replicates might be needed to produce stable estimates of variance  What is the point of diminishing returns?

6 6 Research Methodology  Relying on two complex establishment surveys, this work presents an array of empirical results regarding the number of bootstrap replicates for variance estimation:  National Study of Postsecondary Faculty (NSOPF:04)  National Postsecondary Student Aid Study (NPSAS:04)

7 7 General Design Specifications National Study of Postsecondary Faculty (NSOPF:04)  Survey of about 35,000 faculty and instructional staff  Across a sample of 1,080 institutions  In the 50 States and the District of Columbia

8 8 Sampling Methodology  Institutions selected with probability proportional to a measure of size to over-represent:  Hispanic  Non-Hispanic Black  Asian and Pacific Islander  Full-time other female  Used RTI’s cost/variance optimization procedure for sample allocation

9 9 Institution Sampling Frame Degree GrantingCarnegie CodePublicPrivateTotal Doctor’s15, 16, 52190110300 Master’s21, 22270320590 Bachelor’s31, 32, 3390480570 Associate’s40, 601,0301501,180 Other/Unknown 51, 53 – 59, unclassified 110620730 Total 1,7001,6803,380

10 10 Institution Sample Degree GrantingPublicPrivateTotal Doctor’s190110300 Master’s12080200 Bachelor’s30130160 Associate’s34010350 Other106070 Total6804001,080

11 11 Expected Faculty Counts From Sampled Institutions by Strata NSOPF stratum BlackHispanicAsianOFTFOFTMOPTTotal Public, doctor’s 10,7208,66032,63058,870115,83051,110277,820 Public, master's 4,6703,1504,95014,12020,44022,13069,460 Public, bachelor’s 8103405201,4302,1103,8809,090 Public, associate’s 12,2509,2406,10021,10021,70082,570152,960 Public, other 150801702906308302,150 Private not-for-profit, doctor’s 6,0603,76013,11021,49047,37033,280125,080 Private not-for-profit, master's 1,1109501,0204,9307,02012,53027,550 Private not-for-profit, bachelor’s 1,3603906703,9206,2705,44018,050 Private not-for-profit, Associate’s 20 401804504801,180 Private not-for-profit, other 3301202507901,6802,7005,880 Total 37,48026,71059,460127,120223,500214,940689,210

12 12 Target Number of Respondents by Institution and Faculty Strata Institution stratumRespondentsFaculty stratumRespondents Public doctor’s6,200Non-Hispanic Black1,600 Public master’s2,700Hispanic1,300 Public bachelor’s600Asian900 Public associate’s7,500Other full-time female4,600 Public other500Other full-time male8,300 Private not-for-profit doctor’s2,600Other part-time7,800 Private not-for-profit master’s1,900 Private not-for-profit bachelor’s1,700 Private not-for-profit associate’s100 Private not-for-profit other700 Total24,500

13 13 Distribution of Respondents ( by institution and faculty strata) Institution stratumRespondentsFaculty stratumRespondents Public doctor’s7,460Non-Hispanic Black2,060 Public master’s2,680Hispanic1,700 Public bachelor’s450Asian1,610 Public associate’s6,410Other full-time female5,850 Public other110Other full-time male8,500 Private not-for-profit doctor’s3,160Other part-time6,380 Private not-for-profit master’s2,270 Private not-for-profit bachelor’s2,520 Private not-for-profit associate’s190 Private not-for-profit other850 Total26,110Total26,110

14 14 Variance Estimation Methodology (NSOPF:04)  Used methodology developed by Kaufman (2004) to create bootstrap replicate weights:  Reflected finite population correction adjustment for the first stage (institution) selection.  Second stage (faculty selection) finite population correction factors were close to one and not reflected.  Produced 65 bootstrap replicates to meet Data Analysis System (DAS) requirements of NCES.  Calculated standard error of several statistics using the above bootstrap replicates and Taylor linearization method in SUDAAN.

15 15 Comparisons of Variance Estimates SE of Percent Teaching as Principal Activity by Rank (Bootstrap vs. Linearization)

16 16 Comparisons of Variance Estimates SE of Percent Research as Principal Activity by Rank (Bootstrap vs. Linearization)

17 17 Comparisons of Variance Estimates SE of Percent Administration as Principal Activity by Rank (Bootstrap vs. Linearization)

18 18 Comparisons of Variance Estimates SE of Percent Full-time by Institution Type (Bootstrap vs. Linearization)

19 19 Revised Variance Estimation Methodology (NSOPF:04)  Used methodology developed by Kaufman (2004) to create 200 bootstrap replicate weights.  Used 10, 11, …., 200 replicates to estimate relative standard error (RSE) of different statistics.  Repeated the above using 9 random permutations of replicates to estimate RSE of the same statistics.  Used Taylor linearization to estimate relative standard error of estimates via SUDAAN.

20 20 RSE of Percent Asians by Number of Replicates

21 21 RSE of Percent Asians by Number of Replicates (Taylor Linearization and Permutations of Replicates)

22 22 RSE of Percent Age < 35 by Number of Replicates

23 23 RSE of Percent Age < 35 by Number of Replicates (Taylor Linearization and Permutations of Replicates)

24 24 RSE of Percent Citizen by Number of Replicates

25 25 RSE of Percent Citizen by Number of Replicates (Taylor Linearization and Permutations of Replicates)

26 26 RSE of Percent Full-time by Number of Replicates

27 27 RSE of Percent Full-time by Number of Replicates (Taylor Linearization and Permutations of Replicates)

28 28 RSE of Percent Master’s by Number of Replicates

29 29 RSE of Percent Master’s by Number of Replicates (Taylor Linearization and Permutations of Replicates)

30 30 RSE of Percent Teaching as Principal Activity by Number of Replicates

31 31 RSE of Percent Teaching as Principal Activity by Number of Replicates (Taylor Linearization and Permutations of Replicates)

32 32 RSE of Mean Income by Number of Replicates

33 33 RSE of Mean Income by Number of Replicates (Taylor Linearization and Permutations of Replicates)

34 34 RSE of Median Income by Number of Replicates

35 35 RSE of Median Income by Number of Replicates (Taylor Linearization and Permutations of Replicates)

36 36 RSE of Regression Intercept Income = Hours + Race + Hours  Race

37 37 RSE of Regression Intercept Income = Hours + Race + Hours  Race (Taylor Linearization and Permutations of Replicates)

38 38 RSE of Regression Slope (Hours) Income = Hours + Race + Hours  Race

39 39 RSE of Regression Slope (Hours) Income = Hours + Race + Hours  Race (Taylor Linearization and Permutations of Replicates)

40 40 RSE of Regression Slope (Race) Income = Hours + Race + Hours  Race

41 41 RSE of Regression Slope (Race) Income = Hours + Race + Hours  Race (Taylor Linearization and Permutations of Replicates)

42 42 RSE of Regression Slope (Hours  Race) Income = Hours + Race + Hours  Race

43 43 RSE of Regression Slope (Hours  Race) Income = Hours + Race + Hours  Race (Taylor Linearization and Permutations of Replicates)

44 44 Conclusions (Rough & Interim)  Complex statistics do require more replicates for stable variance estimation  It seems that:  64 replicates might be inadequate  200 replicates seem to be overkill  Somewhere between 100 to 200 replicates might be sufficient


Download ppt "1 Optimal Number of Replicates for Variance Estimation Mansour Fahimi, Darryl Creel, Peter Siegel, Matt Westlake, Ruby Johnson, and Jim Chromy Third International."

Similar presentations


Ads by Google