Download presentation
Presentation is loading. Please wait.
Published byHorace Nichols Modified over 9 years ago
1
1 Optimal Number of Replicates for Variance Estimation Mansour Fahimi, Darryl Creel, Peter Siegel, Matt Westlake, Ruby Johnson, and Jim Chromy Third International Conference on Establishment Surveys (ICES-III) June 21, 2007
2
2 Variance Estimation Two general approaches for variance estimation With weighted data obtained under complex designs: Linearization Replication
3
3 Linearization Approximate complex statistics in terms of L linear statistics Estimate variance of from:
4
4 Replication Partition the full sample into R subsamples (replicates) Obtain separate estimates for from each replicate: Estimate variance of by:
5
5 How Many Replicates? Recommendations regarding the optimal number of replicates for variance estimation are at variance: Computational resources required can be intensive For certain statistics a larger number of replicates might be needed to produce stable estimates of variance What is the point of diminishing returns?
6
6 Research Methodology Relying on two complex establishment surveys, this work presents an array of empirical results regarding the number of bootstrap replicates for variance estimation: National Study of Postsecondary Faculty (NSOPF:04) National Postsecondary Student Aid Study (NPSAS:04)
7
7 General Design Specifications National Study of Postsecondary Faculty (NSOPF:04) Survey of about 35,000 faculty and instructional staff Across a sample of 1,080 institutions In the 50 States and the District of Columbia
8
8 Sampling Methodology Institutions selected with probability proportional to a measure of size to over-represent: Hispanic Non-Hispanic Black Asian and Pacific Islander Full-time other female Used RTI’s cost/variance optimization procedure for sample allocation
9
9 Institution Sampling Frame Degree GrantingCarnegie CodePublicPrivateTotal Doctor’s15, 16, 52190110300 Master’s21, 22270320590 Bachelor’s31, 32, 3390480570 Associate’s40, 601,0301501,180 Other/Unknown 51, 53 – 59, unclassified 110620730 Total 1,7001,6803,380
10
10 Institution Sample Degree GrantingPublicPrivateTotal Doctor’s190110300 Master’s12080200 Bachelor’s30130160 Associate’s34010350 Other106070 Total6804001,080
11
11 Expected Faculty Counts From Sampled Institutions by Strata NSOPF stratum BlackHispanicAsianOFTFOFTMOPTTotal Public, doctor’s 10,7208,66032,63058,870115,83051,110277,820 Public, master's 4,6703,1504,95014,12020,44022,13069,460 Public, bachelor’s 8103405201,4302,1103,8809,090 Public, associate’s 12,2509,2406,10021,10021,70082,570152,960 Public, other 150801702906308302,150 Private not-for-profit, doctor’s 6,0603,76013,11021,49047,37033,280125,080 Private not-for-profit, master's 1,1109501,0204,9307,02012,53027,550 Private not-for-profit, bachelor’s 1,3603906703,9206,2705,44018,050 Private not-for-profit, Associate’s 20 401804504801,180 Private not-for-profit, other 3301202507901,6802,7005,880 Total 37,48026,71059,460127,120223,500214,940689,210
12
12 Target Number of Respondents by Institution and Faculty Strata Institution stratumRespondentsFaculty stratumRespondents Public doctor’s6,200Non-Hispanic Black1,600 Public master’s2,700Hispanic1,300 Public bachelor’s600Asian900 Public associate’s7,500Other full-time female4,600 Public other500Other full-time male8,300 Private not-for-profit doctor’s2,600Other part-time7,800 Private not-for-profit master’s1,900 Private not-for-profit bachelor’s1,700 Private not-for-profit associate’s100 Private not-for-profit other700 Total24,500
13
13 Distribution of Respondents ( by institution and faculty strata) Institution stratumRespondentsFaculty stratumRespondents Public doctor’s7,460Non-Hispanic Black2,060 Public master’s2,680Hispanic1,700 Public bachelor’s450Asian1,610 Public associate’s6,410Other full-time female5,850 Public other110Other full-time male8,500 Private not-for-profit doctor’s3,160Other part-time6,380 Private not-for-profit master’s2,270 Private not-for-profit bachelor’s2,520 Private not-for-profit associate’s190 Private not-for-profit other850 Total26,110Total26,110
14
14 Variance Estimation Methodology (NSOPF:04) Used methodology developed by Kaufman (2004) to create bootstrap replicate weights: Reflected finite population correction adjustment for the first stage (institution) selection. Second stage (faculty selection) finite population correction factors were close to one and not reflected. Produced 65 bootstrap replicates to meet Data Analysis System (DAS) requirements of NCES. Calculated standard error of several statistics using the above bootstrap replicates and Taylor linearization method in SUDAAN.
15
15 Comparisons of Variance Estimates SE of Percent Teaching as Principal Activity by Rank (Bootstrap vs. Linearization)
16
16 Comparisons of Variance Estimates SE of Percent Research as Principal Activity by Rank (Bootstrap vs. Linearization)
17
17 Comparisons of Variance Estimates SE of Percent Administration as Principal Activity by Rank (Bootstrap vs. Linearization)
18
18 Comparisons of Variance Estimates SE of Percent Full-time by Institution Type (Bootstrap vs. Linearization)
19
19 Revised Variance Estimation Methodology (NSOPF:04) Used methodology developed by Kaufman (2004) to create 200 bootstrap replicate weights. Used 10, 11, …., 200 replicates to estimate relative standard error (RSE) of different statistics. Repeated the above using 9 random permutations of replicates to estimate RSE of the same statistics. Used Taylor linearization to estimate relative standard error of estimates via SUDAAN.
20
20 RSE of Percent Asians by Number of Replicates
21
21 RSE of Percent Asians by Number of Replicates (Taylor Linearization and Permutations of Replicates)
22
22 RSE of Percent Age < 35 by Number of Replicates
23
23 RSE of Percent Age < 35 by Number of Replicates (Taylor Linearization and Permutations of Replicates)
24
24 RSE of Percent Citizen by Number of Replicates
25
25 RSE of Percent Citizen by Number of Replicates (Taylor Linearization and Permutations of Replicates)
26
26 RSE of Percent Full-time by Number of Replicates
27
27 RSE of Percent Full-time by Number of Replicates (Taylor Linearization and Permutations of Replicates)
28
28 RSE of Percent Master’s by Number of Replicates
29
29 RSE of Percent Master’s by Number of Replicates (Taylor Linearization and Permutations of Replicates)
30
30 RSE of Percent Teaching as Principal Activity by Number of Replicates
31
31 RSE of Percent Teaching as Principal Activity by Number of Replicates (Taylor Linearization and Permutations of Replicates)
32
32 RSE of Mean Income by Number of Replicates
33
33 RSE of Mean Income by Number of Replicates (Taylor Linearization and Permutations of Replicates)
34
34 RSE of Median Income by Number of Replicates
35
35 RSE of Median Income by Number of Replicates (Taylor Linearization and Permutations of Replicates)
36
36 RSE of Regression Intercept Income = Hours + Race + Hours Race
37
37 RSE of Regression Intercept Income = Hours + Race + Hours Race (Taylor Linearization and Permutations of Replicates)
38
38 RSE of Regression Slope (Hours) Income = Hours + Race + Hours Race
39
39 RSE of Regression Slope (Hours) Income = Hours + Race + Hours Race (Taylor Linearization and Permutations of Replicates)
40
40 RSE of Regression Slope (Race) Income = Hours + Race + Hours Race
41
41 RSE of Regression Slope (Race) Income = Hours + Race + Hours Race (Taylor Linearization and Permutations of Replicates)
42
42 RSE of Regression Slope (Hours Race) Income = Hours + Race + Hours Race
43
43 RSE of Regression Slope (Hours Race) Income = Hours + Race + Hours Race (Taylor Linearization and Permutations of Replicates)
44
44 Conclusions (Rough & Interim) Complex statistics do require more replicates for stable variance estimation It seems that: 64 replicates might be inadequate 200 replicates seem to be overkill Somewhere between 100 to 200 replicates might be sufficient
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.