Ratio and regression estimation STAT262, Fall 2017

Slides:



Advertisements
Similar presentations
Lecture 3 Today: Statistical Review cont’d:
Advertisements

Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
Chapter 4 Multiple Regression.
Accuracy of Prediction How accurate are predictions based on a correlation?
STAT262: Lecture 5 (Ratio estimation)
Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of Suppose.
Stratified Simple Random Sampling (Chapter 5, Textbook, Barnett, V
Ratio Estimation and Regression Estimation (Chapter 4, Textbook, Barnett, V., 1991) 2.1 Estimation of a population ratio:
Copyright © 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Statistical Inference: Estimation and Hypothesis Testing chapter.
F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.
STA Lecture 161 STA 291 Lecture 16 Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately)
1 Ratio estimation under SRS Assume Absence of nonsampling error SRS of size n from a pop of size N Ratio estimation is alternative to under SRS, uses.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
Managerial Economics Demand Estimation & Forecasting.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
1 Lecture 16: Point Estimation Concepts and Methods Devore, Ch
Stat 112: Notes 2 Today’s class: Section 3.3. –Full description of simple linear regression model. –Checking the assumptions of the simple linear regression.
Sampling Design and Analysis MTH 494 Lecture-22 Ossam Chohan Assistant Professor CIIT Abbottabad.
Chapter 5 Sampling Distributions. The Concept of Sampling Distributions Parameter – numerical descriptive measure of a population. It is usually unknown.
1 Simple Linear Regression and Correlation Least Squares Method The Model Estimating the Coefficients EXAMPLE 1: USED CAR SALES.
Sampling Design and Analysis MTH 494 Lecture-21 Ossam Chohan Assistant Professor CIIT Abbottabad.
Statistics for Business and Economics 7 th Edition Chapter 7 Estimation: Single Population Copyright © 2010 Pearson Education, Inc. Publishing as Prentice.
Chapter 4: Basic Estimation Techniques
Physics 114: Lecture 13 Probability Tests & Linear Fitting
Chapter 4 Basic Estimation Techniques
Sampling Why use sampling? Terms and definitions
Part 5 - Chapter 17.
Let’s Get It Straight! Re-expressing Data Curvilinear Regression
Correlation and Simple Linear Regression
Basic Estimation Techniques
Chapter 7: Sampling Distributions
Chapter 3: TWO-VARIABLE REGRESSION MODEL: The problem of Estimation
Multiple Regression Analysis
Elementary Statistics
Two-Phase Sampling (Double Sampling)
Correlation and Simple Linear Regression
Basic Estimation Techniques
Stratified Sampling STAT262.
Sampling Design.
Introduction to Instrumentation Engineering
Part 5 - Chapter 17.
Daniela Stan Raicu School of CTI, DePaul University
Correlation and Simple Linear Regression
Undergraduated Econometrics
Some issues in multivariate regression
One-Way Analysis of Variance
Introduction to Estimation
Simple Linear Regression and Correlation
Chapter 7: Sampling Distributions
Chapter 7: Sampling Distributions
Chapter 7: Sampling Distributions
Chapter 9: Sampling Distributions
Chapter 7: Sampling Distributions
Chapter 7: Sampling Distributions
Chapter 7: Sampling Distributions
Chapter 7: Sampling Distributions
Chapter 7: Sampling Distributions
The Practice of Statistics – For AP* STARNES, YATES, MOORE
Chapter 7: Sampling Distributions
Chapter 7: Sampling Distributions
Chapter 7: Sampling Distributions
Chapter 8 Estimation.
STA 291 Summer 2008 Lecture 12 Dustin Lueker.
STA 291 Spring 2008 Lecture 12 Dustin Lueker.
Chapter 7: Sampling Distributions
Applied Statistics and Probability for Engineers
MGS 3100 Business Analysis Regression Feb 18, 2016
Introduction to Econometrics, 5th edition
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Ratio and regression estimation STAT262, Fall 2017

STAT262: Ratio estimation

Motivating Example: California Schools api99 and api100

Motivating Example: California Schools Suppose that we know api99 for the whole population api00 for a simple random sample We would like to estimate the population mean of api00 Use the SRS only Use api99 as an auxiliary variable

Ratio Estimator The same sampling method: Sample Random Sampling (SRS) Sometimes we are interested in the ratio of two population characteristics. E.g., average yield of corn per acre. Sometimes, Y: a characteristic we are interested in X: a characteristic that is related with Y Use ratio estimators to increase the precision of estimated means or totals

api99 vs api00

api99 vs api00 In the SRS In the population mean of api99 = 624.685 mean of api00 = 656.585. The sample mean is an unbiased estimator for the population mean The ratio = 1.051066 In the population Mean of api99 = 631.913 What is a good guess of api00 in the population

api99 vs api00 Ratio estimation ?=664.1821 The true mean of api00 in the population: 664.7126 The SRS estimate: 656.585 . 656.585 624.685 =1.051066= ? 631.913 . population SRS

api99 vs api00 Ratio estimator vs the unbiased estimator? Simulation: for every of the 1000 simulations Generate an SRS Two methods to estimate the population mean of api00: Use the sample mean of api00 Use the ratio estimator

api99 vs api00

api99 vs api00: bias vs variance

More examples E.g.1: The average yield of corns per acre E.g.2: The number of hummingbirds in a national forest Sample a few regions, record the number (yi) and area (xi) for each region. Calculate sample ratio Total area of the national forest is tx An estimate of ty is

Examples E.g.3: Laplace wanted to know the number of persons living in France in 1802. There was no census in that year Two candidate estimators Which was Laplace’s choice? # persons # registered births Sample: 30 counties 2,037,615 71,866 France: N (known) ty (???) tx (known)

Examples Laplace reasoned that using ratio estimator is more accurate. Large counties have more registered births Number of registered births and number of persons are positively correlated Thus, using information in x is likely to improve our estimate of y

Examples E.g. 4. McDonal Corp. The average of annual sale of this year One can use information from last year. Details will be discussed later

Ratio estimators in SRS Sampling method: SRS Two quantities (xi, yi) are measured in each sampled unit, where xi is an auxiliary variable

Population quantities Size: N Totals: Means: Ratio: Variances and covariance: Correlation coefficient:

Example of population quantities

Bias Ratio estimators are usually biased

Bias – the exact expression

Bias – the exact expression The expression is exact. But it involves quantities we don’t know

Bias – an approximation We want to get rid of random items in the denominators

Bias – an approximation

The bias is usually small if

Variance and Mean Squared Error The bias is usually small, thus can be ignored and MSE≈Var

Estimate the variance Alternative expression of the variance They are not the same

Estimate the variance We can implement the formula in the previous slide using “residuals: It is not difficult to show that

Efficiency of ratio estimation Consider the ratio and the unbiased estimators. Which one has a smaller variance?

A hypothetical example Population. N=8

A hypothetical example

A hypothetical example Mean estimate = 39.85036 Bias = -0.003178 Bias approx: Mean estimate = 40

api99 vs api00 Consider the built in SRS: apisrs

Another example

Another example

Another example

Assignment 3: Problem 1 We have used the California schools example to illustrate different sampling and estimation strategies. What if we combine stratified sampling and ratio estimation to estimate the population mean of api00? Will this combination better than using only one strategy? Use simulations to answer this question. Please choose reasonable sample size(s) describe your conclusion clearly support your conclusion using tables and or figures

Assignment 3: Problem 2

STAT262: Regression estimation

Regression estimation Ratio estimation works well if the data are well fit by a straight line through the origin Often, data are scattered around a straight line that does not go through the origin

Regression estimation The regression estimator of the population mean is

Bias For large SRS, the bias is usually small

Variance and MSE Bias is small

Variance

Standard error

California Schools: api99 vs api00

California Schools: api99 vs api00 Step 1: fit a regression model: api00~api99 Residuals: , i=1 ,…, n Step 2: calculate the regression estimation Step 3: calculate standard error

California Schools: api99 vs api00

California Schools: api99 vs api00 SRS: 656.6 +/- 1.96*9.2 Ratio estimator: 664.2 +/- 1.96*2.25 Regression estimator: 663.4 +/- 1.96*2.04 The true population mean: 664.7

The McDonald Example

The McDonald Example

Relative Efficiencies

Relative Efficiencies

Relative Efficiencies

Relative Efficiencies

Relative efficiency: California Schools

Summary We introduced two new estimators: Ratio estimator: Regression estimator: Both exploit the association between x and y The regression estimator is the most efficient (asymptotically) The ratio estimator is more efficient than the SRS estimator if R is large

Estimation in Domains: A motivating example We are often interested in separate estimates for subpopulations (also called domains) E.g. after taking an SRS of 1000 persons, we want to estimate the average salary for men and the average salary for women

Estimation in Domains: A motivating example

Estimation in Domains: A motivating example The calculation in the previous slide treats as a constant. But it is not. We should take the randomness into consideration The formulas we derived for ratio estimators can be used

Estimations in Domains

Estimations in Domains

Estimations in Domains If the sample is large