Download presentation
Presentation is loading. Please wait.
Published byFrancis Chapman Modified over 9 years ago
1
1 Ratio estimation under SRS Assume Absence of nonsampling error SRS of size n from a pop of size N Ratio estimation is alternative to under SRS, uses “auxiliary” information (X ) Sample data: observe y i and x i Population information Have y i and x i on all individual units, or Have summary statistics from the population distribution of X, such as population mean, total of X Ratio estimation is also used to estimate population parameter called a ratio (B )
2
2 Uses Estimate a ratio Tree volume or bushels per acre Per capita income Liability to asset ratio More precise estimator of population parameters If X and Y are correlated, can improve upon Estimating totals when pop size N is unknown Avoids need to know N in formula for Domain estimation Obtaining estimates of subsamples Incorporate known information into estimates Postratification Adjust for nonresponse
3
3 Estimating a ratio, B Population parameter for the ratio: B Examples Number of bushels harvested (y) per acre (x) Number of children (y) per single-parent household (x) Total usable weight (y) relative to total shipment weight (x) for chickens
4
4 Estimating a ratio SRS of n observation units Collect data on y and x for each OU Natural estimator for B ?
5
5 Estimating a ratio -2 Estimator for B is a biased estimator for B is a ratio of random variables
6
6 Bias of
7
7 Bias is small if Sample size n is large Sample fraction n/N is large is large is small (pop std deviation for x) High positive correlation between X and Y (see Lohr p. 67) Bias of – 2
8
8 Estimated variance of estimator for B Estimator for If is unknown?
9
9 Variance of Variance is small if sample size n is large sample fraction n/N is large deviations about line e = y Bx are small correlation between X and Y close to 1 is large
10
10 Ag example – 1 Frame: 1987 Agricultural Census Take SRS of 300 counties from 3078 counties to estimate conditions in 1992 Collect data on y, have data on x for sample Existing knowledge about the population
11
11 Ag example – 2 Estimate 0.9866 farm acres in 1992 relative to 1987 farm acres
12
12 Ag example – 3 Need to calculate variance of e i ’s
13
13 Ag example – 4 For each county i, calculate Coffee Co, AL example Sum of squares for e i
14
14 Ag example – 5
15
15 Estimating proportions If denominator variable is random, use ratio estimator to estimate the proportion p Example (p. 72) 10 plots under protected oak trees used to assess effect of feral pigs on native vegetation on Santa Cruz Island, CA Count live seedlings y and total number of seedlings x per plot Y and X correlated due to common environmental factors Estimate proportion of live seedlings to total number of seedlings
16
16 Estimating population mean Estimator for “Adjustment factor” for sample mean A measure of discrepancy between sample and population information, and Improves precision if X and Y are + correlated
17
17 Underlying model with B > 0 B is a slope B > 0 indicates X and Y are positively correlated Absence of intercept implies line must go through origin (0, 0 ) y x 0 0
18
18 Using population mean of X to adjust sample mean Discrepancy between sample & pop info for X is viewed as evidence that same relative discrepancy exists between
19
19 Bias of Ratio estimator for the population mean is biased Rules of thumb for bias of apply
20
20 Estimator for variance of
21
21 Ag example – 6
22
22 Ag example - 8
23
23 Ag example – 9 Expect a linear relationship between X and Y (Figure 3.1) Note that sample mean is not equal to population mean for X
24
24 MSE under ratio estimation Recall … MSE = Variance + Bias 2 SRS estimators are unbiased so MSE = Variance Ratio estimators are biased so MSE > Variance Use MSE to compare design/estimation strategies EX: compare sample mean under SRS with ratio estimator for pop mean under SRS
25
25 Sample mean vs. ratio estimator of mean is smaller than if and only if For example, if and ratio estimation will be better than SRS
26
26 Estimating the MSE Estimate MSE with sample estimates of bias and variance of estimator This tends to underestimate MSE and are approximations Estimated MSE is less biased if is small (see earlier slide) Large sample size or sampling fraction High + correlation for X and Y is a precise estimate (small CV for ) We have a reasonably large sample size (n > 30)
27
27 Ag example – 10
28
28 Estimating population total t Estimator for t Is biased? Estimator for
29
29 Ag example – 11
30
30 Summary of ratio estimation
31
31 Summary of ratio estn – 2
32
32 Regression estimation What if relationship between y and x is linear, but does NOT pass through the origin Better model in this case is y x B0B0 B 1 slope
33
33 Regression estimation – 2 New estimator is a regression estimator To estimate, is predicted value from regression of y on x at Adjustment factor for sample mean is linear, rather than multiplicative
34
34 Estimating population mean Regression estimator Estimating regression parameters
35
35 Estimating pop mean – 2 Sample variances, correlation, covariance
36
36 Bias in regression estimator
37
37 Estimating variance Note: This is a different residual than ratio estimation (predicted values differ)
38
38 Estimating the MSE Plugging sample estimates into Lohr, equation 3.13:
39
39 Estimating population total t Is regression estimator for t unbiased?
40
40 Tree example Goal: obtain a precise estimate of number of dead trees in an area Sample Select n = 25 out of N = 100 plots Make field determination of number of dead trees per plot, y i Population For all N = 100 plots, have photo determination on number of dead trees per plot, x i Calculate = 11.3 dead trees per plot
41
41 Tree example – 2 Lohr, p. 77-78 Data Plot of y vs. x Output from PROC REG Components for calculating estimators and estimating the variance of the estimators We will use PROC SURVEYREG, which will give you the correct output for regression estimators
42
42 Tree example – 3 Estimated mean number of dead trees/plot Estimated total number of dead trees
43
43 Tree example – 4 Due to small sample size, Lohr uses t - distribution w/ n 2 degrees of freedom Half-width for 95% CI Approx 95% CI for t y is (1115, 1283) dead trees
44
44 Related estimators Ratio estimator B 0 = 0 ratio model Ratio estimator regression estimator with no intercept Difference estimation B 1 = 1 slope is assumed to be 1 y x B0B0 B 1 slope
45
45 Domain estimation under SRS Usually interested in estimates and inferences for subpopulations, called domains If we have not used stratification to set the sample size for each domain, then we should use domain estimation We will assume SRS for this discussion If we use stratified sampling with strata = domains, then use stratum estimators (Ch 4) To use stratification, need to know domain assignment for each unit in the sampling frame prior to sampling
46
46 Stratification vs. domain estimation In stratified random sampling Define sample size in each stratum before collecting data Sample size in stratum h is fixed, or known In other words, the sample size n h is the same for each sample selected under the specified design In domain estimation n d = sample size in domain d is random Don’t know n d until after the data have been collected The value of n d changes from sample to sample
47
47 Population partitioned into domains Recall U = index set for population = {1, 2, …, N } Domain index set for domain d = 1, 2, …, D U d = {1, 2, …, N d } where N d = number of OUs in domain d in the population In sample of size n n d = number of sample units from domain d are in the sample S d = index set for sample belonging to domain d Domain D d=1d=2... d=Dd=D Domain #1
48
48 Boat owner example Population N = 400,000 boat owners (currently licensed) Sample n = 1,500 owners selected using SRS Divide universe (population) into 2 domains d = 1own open motor boat > 16 ft. (large boat) d = 2do not own this type of boat Of the n = 1500 sample owners: n 1 = 472 owners of open motor boat > 16 ft. n 2 = 1028 owners do not own this kind of boat
49
49 New population parameters Domain mean Domain total
50
50 Boat owner example - 2 Estimate population domain mean Estimate the average number of children for boat owners from domain 1 Estimate proportion of boat owners from domain 1 who have children Estimate population domain total Estimate the total number of children for large boat owners (domain 1)
51
51 New population parameter – 2 Ratio form of population mean Numerator variable Denominator variable
52
52 Boat owner example - 3 Estimate mean number of children for owners from domain 1 Zero values for OUs that are not in domain 1 Applies to whole pop
53
53 Boat example – 4
54
54 Estimator for population domain mean
55
55 Boat example – 5 Domain 1 data
56
56 Boat example – 6 Domain 1 and domain 2 data combined 1104 zeros = 76 zeros from domain 1 + 1028 zeros from domain 2
57
57 Two ways of estimating mean Boat example – 7 Whole data set Domain 1 data only
58
58 Estimator for variance of
59
59 Boat example – 8
60
60 Boat example – 9
61
61 Approximation for estimator of variance of Domain 1 data only
62
62 Estimated variance of Estimator for Domain variance estimator is directly related
63
63 Relationship to estimating a ratiowith Population mean of X Residual
64
64 Relationship to estimating a ratiowith - 2 Residual variance
65
65 Estimator for variance of
66
66 Estimating a population domain total If we know the domain sizes, N d
67
67 Estimating a population domain total- 2 If we do NOT know the domain sizes Standard SRS estimator using u as the variable
68
68 Boat example – 10 Do not know the domain size, N 1
69
69 Comparing 2 domain means Suppose we want to test the hypothesis that two domain means are equal Construct a z-test with Type 1 error rate (for falsely rejecting null hypothesis) Test statistic: Critical value: z /2 Reject H 0 if |z| > z /2
70
70 Boat example - 10 Large boat owners (d = 1) Other boat owners (d = 2)
71
71 Boat example - 11 Test whether domain means are equal at = 0.05 Calculate z-statistic Critical value z /2 = z 0.25 = 1.96 Apply rejection rule |z| = |-1.04|=1.04 < 1.96 = z 0.25 Fail to reject H 0
72
72 Overview Population parameters Mean Total Proportion (w/ fixed denom) Ratio Includes proportion w/ random denominator Domain mean Domain total
73
73 Overview – 2 Estimation strategies No auxiliary information Auxiliary information X, no intercept Y and X positively correlated Linear relationship passes through origin Auxiliary information X, intercept Y and X positively correlated Linear relationship does not pass through origin
74
74 Overview – 3 Make a table of population parameters (rows) by estimation strategy (columns) In each cell, write down Estimator for population parameter Estimator for variance of estimated parameter Residual e i Notes Some cells will be blank Look for relationship between mean and total, and mean and proportion Look at how the variance formulas for many of the estimators are essentially the same form
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.