12.1 4.5.2018.

Inference for Linear Regression
Today we will apply inference procedures to linear regression We have been using inference procedures for the past several chapters Confidence intervals Hypothesis tests Linear regression we covered in chapter 3 It has been awhile

Linear Regression Refresher
The idea behind linear regression is to estimate a line of best fit between two variables Independent variable and dependent variable How many units the dependent variable changes when the independent variable changes by one unit

Linear Regression Refresher
Old faithful Duration of an eruption vs the time before the next eruption Slope is 10.36, and y-intercept is 33.97 𝑦 = 𝑥 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 = (𝑑𝑢𝑟𝑎𝑡𝑖𝑜𝑛)

Inference So when we do a linear regression using a sample of data, we are really ESTIMATING the true population values (slope and y-intercept) We don’t know the population values But the estimates from the regression are unbiased estimators for the true population value Does not mean that they are exactly correct So when we do a regression on sample data, we get a regression line 𝑦 =𝑎+𝑏𝑥 a is an unbiased estimator of the true population y-intercept (sometimes called α) b is an unbiased estimator of the true population slope (sometimes called β)

Sample vs Population Sample regression equation: 𝑦 =𝑎+𝑏𝑥
Population regression equation: y=α+β𝑥 a estimates α b estimates β

Sampling Distribution
So if we want a sampling distribution for the slope, we already have our unbiased estimate i.e. the mean of the sampling distribution Whatever our estimated slope is But we also need the standard deviation of the sampling distribution Because we don’t know it, we estimate it Called a standard error

Standard Error of the Slope
The good news is that we rarely need to use this When we perform a regression, it is included in the computer output But not when you do it on your calculator

Standard Error of the Slope
𝑆𝐸 𝑏 =

Confidence Interval for the Slope

Example

Example 90% confidence interval:
± (1.761)( ) ( , ) Interpretation: We are 90% confident that one additional calorie of non-exercise activity corresponds to a decrease in fat gain of between kg and kg.

You try The following regression uses the number of Peruvian anchovies caught (millions of metric tons) per year with the price (US $) of fish meal in that year. Calculate a 95% confidence interval for the true slope of the regression line Predictor Coef SE Coef T P Constant Catch S= R-Sq= 73.5% n=14

Answers ± (2.179)(5.091) ± ( , )

Example Hypotheses: 𝐻 0 : β=0 𝐻 𝑎 : β>0 Check conditions—for now let’s assume that they are met (we’ll talk about this in a minute) Test statistic: − = 3.07 P-value: tcdf(3.07, BIG, 36)= 0.002

Look back at the regression output.
P-value=0.002 Look back at the regression output. It gives us the test statistic, and it gives us a (wrong) p-value Why is the p-value wrong?

Back to the Anchovies Does the number of fish caught affect the price?
What is the test statistic? What is the p-value? What is our interpretation?

Back to the Anchovies What is the test statistic? t=-5.78
What is the p-value? Very small What is our interpretation? We reject the null hypothesis that the true slope is zero, we conclude instead that the true slope (affect) is different from zero

Anchovies Again (last example)
Now let’s test a hypothesis different from zero Let’s test whether the slope is below -20 𝐻 0 : β=−20 𝐻 𝑎 : β<−20 Test statistic: − =−1.847 P-value: .0448

12.1 4.5.2018.

Similar presentations

Presentation on theme: "12.1 4.5.2018."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

12.1 4.5.2018.

Similar presentations

Presentation on theme: "12.1 4.5.2018."— Presentation transcript:

Similar presentations

About project

Feedback