Download presentation
Presentation is loading. Please wait.
Published byFanny Agusalim Modified over 6 years ago
1
Ratio and regression estimation STAT262, Fall 2017
2
STAT262: Ratio estimation
3
Motivating Example: California Schools
api99 and api100
4
Motivating Example: California Schools
Suppose that we know api99 for the whole population api00 for a simple random sample We would like to estimate the population mean of api00 Use the SRS only Use api99 as an auxiliary variable
5
Ratio Estimator The same sampling method: Sample Random Sampling (SRS)
Sometimes we are interested in the ratio of two population characteristics. E.g., average yield of corn per acre. Sometimes, Y: a characteristic we are interested in X: a characteristic that is related with Y Use ratio estimators to increase the precision of estimated means or totals
6
api99 vs api00
7
api99 vs api00 In the SRS In the population
mean of api99 = mean of api00 = The sample mean is an unbiased estimator for the population mean The ratio = In the population Mean of api99 = What is a good guess of api00 in the population
8
api99 vs api00 Ratio estimation ?=664.1821
The true mean of api00 in the population: The SRS estimate: . = = ? . population SRS
9
api99 vs api00 Ratio estimator vs the unbiased estimator?
Simulation: for every of the 1000 simulations Generate an SRS Two methods to estimate the population mean of api00: Use the sample mean of api00 Use the ratio estimator
10
api99 vs api00
11
api99 vs api00: bias vs variance
12
More examples E.g.1: The average yield of corns per acre
E.g.2: The number of hummingbirds in a national forest Sample a few regions, record the number (yi) and area (xi) for each region. Calculate sample ratio Total area of the national forest is tx An estimate of ty is
13
Examples E.g.3: Laplace wanted to know the number of persons living in France in There was no census in that year Two candidate estimators Which was Laplace’s choice? # persons # registered births Sample: 30 counties 2,037,615 71,866 France: N (known) ty (???) tx (known)
14
Examples Laplace reasoned that using ratio estimator is more accurate.
Large counties have more registered births Number of registered births and number of persons are positively correlated Thus, using information in x is likely to improve our estimate of y
15
Examples E.g. 4. McDonal Corp. The average of annual sale of this year
One can use information from last year. Details will be discussed later
16
Ratio estimators in SRS
Sampling method: SRS Two quantities (xi, yi) are measured in each sampled unit, where xi is an auxiliary variable
17
Population quantities
Size: N Totals: Means: Ratio: Variances and covariance: Correlation coefficient:
18
Example of population quantities
19
Bias Ratio estimators are usually biased
20
Bias – the exact expression
21
Bias – the exact expression
The expression is exact. But it involves quantities we don’t know
22
Bias – an approximation
We want to get rid of random items in the denominators
23
Bias – an approximation
24
The bias is usually small if
25
Variance and Mean Squared Error
The bias is usually small, thus can be ignored and MSE≈Var
26
Estimate the variance Alternative expression of the variance
They are not the same
27
Estimate the variance We can implement the formula in the previous slide using “residuals: It is not difficult to show that
28
Efficiency of ratio estimation
Consider the ratio and the unbiased estimators. Which one has a smaller variance?
29
A hypothetical example
Population. N=8
30
A hypothetical example
31
A hypothetical example
Mean estimate = Bias = Bias approx: Mean estimate = 40
32
api99 vs api00 Consider the built in SRS: apisrs
33
Another example
34
Another example
35
Another example
36
Assignment 3: Problem 1 We have used the California schools example to illustrate different sampling and estimation strategies. What if we combine stratified sampling and ratio estimation to estimate the population mean of api00? Will this combination better than using only one strategy? Use simulations to answer this question. Please choose reasonable sample size(s) describe your conclusion clearly support your conclusion using tables and or figures
37
Assignment 3: Problem 2
38
STAT262: Regression estimation
39
Regression estimation
Ratio estimation works well if the data are well fit by a straight line through the origin Often, data are scattered around a straight line that does not go through the origin
40
Regression estimation
The regression estimator of the population mean is
41
Bias For large SRS, the bias is usually small
42
Variance and MSE Bias is small
43
Variance
44
Standard error
45
California Schools: api99 vs api00
46
California Schools: api99 vs api00
Step 1: fit a regression model: api00~api99 Residuals: , i=1 ,…, n Step 2: calculate the regression estimation Step 3: calculate standard error
47
California Schools: api99 vs api00
48
California Schools: api99 vs api00
SRS: /- 1.96*9.2 Ratio estimator: /- 1.96*2.25 Regression estimator: /- 1.96*2.04 The true population mean: 664.7
49
The McDonald Example
50
The McDonald Example
51
Relative Efficiencies
52
Relative Efficiencies
53
Relative Efficiencies
54
Relative Efficiencies
55
Relative efficiency: California Schools
56
Summary We introduced two new estimators:
Ratio estimator: Regression estimator: Both exploit the association between x and y The regression estimator is the most efficient (asymptotically) The ratio estimator is more efficient than the SRS estimator if R is large
57
Estimation in Domains: A motivating example
We are often interested in separate estimates for subpopulations (also called domains) E.g. after taking an SRS of 1000 persons, we want to estimate the average salary for men and the average salary for women
58
Estimation in Domains: A motivating example
59
Estimation in Domains: A motivating example
The calculation in the previous slide treats as a constant. But it is not. We should take the randomness into consideration The formulas we derived for ratio estimators can be used
60
Estimations in Domains
61
Estimations in Domains
62
Estimations in Domains
If the sample is large
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.