Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ratio and regression estimation STAT262, Fall 2017

Similar presentations


Presentation on theme: "Ratio and regression estimation STAT262, Fall 2017"— Presentation transcript:

1 Ratio and regression estimation STAT262, Fall 2017

2 STAT262: Ratio estimation

3 Motivating Example: California Schools
api99 and api100

4 Motivating Example: California Schools
Suppose that we know api99 for the whole population api00 for a simple random sample We would like to estimate the population mean of api00 Use the SRS only Use api99 as an auxiliary variable

5 Ratio Estimator The same sampling method: Sample Random Sampling (SRS)
Sometimes we are interested in the ratio of two population characteristics. E.g., average yield of corn per acre. Sometimes, Y: a characteristic we are interested in X: a characteristic that is related with Y Use ratio estimators to increase the precision of estimated means or totals

6 api99 vs api00

7 api99 vs api00 In the SRS In the population
mean of api99 = mean of api00 = The sample mean is an unbiased estimator for the population mean The ratio = In the population Mean of api99 = What is a good guess of api00 in the population

8 api99 vs api00 Ratio estimation ?=664.1821
The true mean of api00 in the population: The SRS estimate: . = = ? . population SRS

9 api99 vs api00 Ratio estimator vs the unbiased estimator?
Simulation: for every of the 1000 simulations Generate an SRS Two methods to estimate the population mean of api00: Use the sample mean of api00 Use the ratio estimator

10 api99 vs api00

11 api99 vs api00: bias vs variance

12 More examples E.g.1: The average yield of corns per acre
E.g.2: The number of hummingbirds in a national forest Sample a few regions, record the number (yi) and area (xi) for each region. Calculate sample ratio Total area of the national forest is tx An estimate of ty is

13 Examples E.g.3: Laplace wanted to know the number of persons living in France in There was no census in that year Two candidate estimators Which was Laplace’s choice? # persons # registered births Sample: 30 counties 2,037,615 71,866 France: N (known) ty (???) tx (known)

14 Examples Laplace reasoned that using ratio estimator is more accurate.
Large counties have more registered births Number of registered births and number of persons are positively correlated Thus, using information in x is likely to improve our estimate of y

15 Examples E.g. 4. McDonal Corp. The average of annual sale of this year
One can use information from last year. Details will be discussed later

16 Ratio estimators in SRS
Sampling method: SRS Two quantities (xi, yi) are measured in each sampled unit, where xi is an auxiliary variable

17 Population quantities
Size: N Totals: Means: Ratio: Variances and covariance: Correlation coefficient:

18 Example of population quantities

19 Bias Ratio estimators are usually biased

20 Bias – the exact expression

21 Bias – the exact expression
The expression is exact. But it involves quantities we don’t know

22 Bias – an approximation
We want to get rid of random items in the denominators

23 Bias – an approximation

24 The bias is usually small if

25 Variance and Mean Squared Error
The bias is usually small, thus can be ignored and MSE≈Var

26 Estimate the variance Alternative expression of the variance
They are not the same

27 Estimate the variance We can implement the formula in the previous slide using “residuals: It is not difficult to show that

28 Efficiency of ratio estimation
Consider the ratio and the unbiased estimators. Which one has a smaller variance?

29 A hypothetical example
Population. N=8

30 A hypothetical example

31 A hypothetical example
Mean estimate = Bias = Bias approx: Mean estimate = 40

32 api99 vs api00 Consider the built in SRS: apisrs

33 Another example

34 Another example

35 Another example

36 Assignment 3: Problem 1 We have used the California schools example to illustrate different sampling and estimation strategies. What if we combine stratified sampling and ratio estimation to estimate the population mean of api00? Will this combination better than using only one strategy? Use simulations to answer this question. Please choose reasonable sample size(s) describe your conclusion clearly support your conclusion using tables and or figures

37 Assignment 3: Problem 2

38 STAT262: Regression estimation

39 Regression estimation
Ratio estimation works well if the data are well fit by a straight line through the origin Often, data are scattered around a straight line that does not go through the origin

40 Regression estimation
The regression estimator of the population mean is

41 Bias For large SRS, the bias is usually small

42 Variance and MSE Bias is small

43 Variance

44 Standard error

45 California Schools: api99 vs api00

46 California Schools: api99 vs api00
Step 1: fit a regression model: api00~api99 Residuals: , i=1 ,…, n Step 2: calculate the regression estimation Step 3: calculate standard error

47 California Schools: api99 vs api00

48 California Schools: api99 vs api00
SRS: /- 1.96*9.2 Ratio estimator: /- 1.96*2.25 Regression estimator: /- 1.96*2.04 The true population mean: 664.7

49 The McDonald Example

50 The McDonald Example

51 Relative Efficiencies

52 Relative Efficiencies

53 Relative Efficiencies

54 Relative Efficiencies

55 Relative efficiency: California Schools

56 Summary We introduced two new estimators:
Ratio estimator: Regression estimator: Both exploit the association between x and y The regression estimator is the most efficient (asymptotically) The ratio estimator is more efficient than the SRS estimator if R is large

57 Estimation in Domains: A motivating example
We are often interested in separate estimates for subpopulations (also called domains) E.g. after taking an SRS of 1000 persons, we want to estimate the average salary for men and the average salary for women

58 Estimation in Domains: A motivating example

59 Estimation in Domains: A motivating example
The calculation in the previous slide treats as a constant. But it is not. We should take the randomness into consideration The formulas we derived for ratio estimators can be used

60 Estimations in Domains

61 Estimations in Domains

62 Estimations in Domains
If the sample is large


Download ppt "Ratio and regression estimation STAT262, Fall 2017"

Similar presentations


Ads by Google