STAT262: Lecture 5 (Ratio estimation)
Introduction Today we introduce new estimation methods Ratio estimators Ratio estimators involve two characteristics: Y: a characteristic we are interested X: a characteristic that is related with Y The same sampling method: Sample Random Sampling (SRS)
Why consider ratio estimation? In many situations we want to estimate the ratio of two population characteristics. E.g., average yield of corn per acre. Use ratio to assist the estimation of a correlated quantity N is unknown. We can estimate it by Use ratio estimators to increase the precision of estimated means or totals
Examples E.g.1: The average yield of corns per acre E.g.2: The number of hummingbirds in a national forest Sample a few regions, record the number (yi) and area (xi) for each region. Calculate sample ratio Total area of the national forest is tx Estimate of ty is
Examples E.g.3: Laplace wanted to know the number of persons living in France in 1802. There was no census on that year Two candidate estimators Which was Laplace’s choice? # persons # registered births Sample: 30 counties 2,037,615 71,866 France: N (known) ty (???) tx (known)
Examples Laplace reasoned that using ratio estimator is more accurate. Large counties have more registered births. Number of registered births and number of persons are positively correlated. Thus, using information in x is likely to improve our estimate of y.
Examples E.g. 4. McDonal Corp. The average of annual sale of this year One can use information from last year. Details will be discussed later
Ratio estimators in SRS Sampling method: SRS Two quantities (xi, yi) are measured in each sampled unit, where xi is an auxiliary variable
Population quantities Size: N Totals: Means: Ratio: Variances and covariance: Correlation coefficient:
Example of population quantities
Contents Ratio estimator in SRS Bias – the exact expression Bias – an approximated formula MSE and variance Examples Efficiency
Bias – the exact expression Ratio estimators are usually biased
Bias – the exact expression
Bias – the exact expression Exact, but not easy to use with data.
Bias – an approximation
Bias – an approximation
The bias is usually small if
Variance and Mean Squared Error The bias is usually small, thus can be ignored
A hypothetical example Population. N=8
A hypothetical example
A hypothetical example Mean estimate = 39.85036 Bias = -0.003178 Bias approx: Mean estimate = 40
Estimate variance
Estimate variance
Example 1
Example 1
Example 2
Example 2
Example 2
Efficiency of ratio estimation
STAT262: Lecture 6 (Regression estimation)
Regression estimation Ratio estimation works well if the data are well fit by a straight line through the origin Often, data are scattered around a straight line that does not go through the origin
Regression estimation The regression estimator of the population mean is
Bias For large SRS, the bias is usually small
Variance and MSE Bias is small
Variance
Standard error
The McDonald Example
The McDonald Example
Relative Efficiencies
Relative Efficiencies
Relative Efficiencies
Relative Efficiencies
Summary We introduced two new estimators: Ratio estimator: Regression estimator: Both exploit the association between x and y The regression estimator is the most efficient (asymptotically) The ratio estimator is more efficient than the SRS estimator if R is large
Estimation in Domains: A motivating example We are often interested in separate estimatef for subpopulations (also called domains) E.g. after taking an SRS of 1000 persons, we want to estimate the average salary for men and the average salary for women
Estimation in Domains: A motivating example
Estimation in Domains: A motivating example The calculation in the previous slide treats as a constant. But it is not. We should take the randomness into consideration The formulas we derived for ratio estimators can be used
Estimations in Domains
Estimations in Domains
Estimations in Domains If the sample is large
A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline Definition and motivation Statistical inference (theory of stratified sampling) Advantages of stratified sampling Sample size calculation
Stratified sampling: definition and motivation A motivating example: average number of words in save messages of people in this room What is stratified sampling? Stratify: make layers Strata: subpopulations Strata do not overlap Each sampling unit belongs to exactly one stratum Strata constitute the whole population
Why do we use stratified sampling? Be protected from obtaining a really bad sample. Example Population size is N=500 (250 women and 250 men) SRS of size n=50 It is possible to obtain a sample with no or a few males Pr(less than or equal to 15 men in an SRS)=0.003 Pr(less than or equal to 20 men in an SRS)=0.10 In stratified sampling, we can sample 25 men and 25 women
Why do we use stratified sampling? Stratified sampling allows us to compare subgroups Convenient, reduce cost, easy to sample More precise. See the following example
Total number of farm acres (3078 counties) SRS of 300 counties from the Census of Agriculture Estimate: , standard error: Stratified sampling: about 10% stratum (region)
Total number of farm acres (3078 counties) Estimate: Standard error:
Theory of stratified sampling
Notation for Stratification: Population
Notation for Stratification: Sample
Stratified sampling: estimation
Statistical Properties: Bias and Variance
Variance Estimates for stratified samples
Confidence intervals for stratified samples Some books use t distribution with n-H degrees of freedom
Sampling probabilities and weights In a population with 1600 men and 400 women and the stratified sample design specifies sampling 200 men and 200 women, Each man in the sample has weight 8 and woman has weight 2 Each woman in the sample represents herself and 1 other woman not selected Each man represents himself and 7 other men not in the sample
Sampling probabilities and weights The sampling probability for the jth unit in the hth stratum is Sampling weight: The sum of sampling weight is N
Sampling probabilities and weights
Sampling probabilities and weights example
Sampling probabilities and weights in proportional allocation In proportional allocation, the number of sampled units in each stratum is proportional to the size of the stratum, i.e., Every unit in the sample has the same weight and represents the same number of units in the population. The sample is called self-weighting
Sampling probabilities and weights in proportional allocation Sampling probability for all units is about 10% All the weights are the same: 10