Lecture 10 Chapter 23. Inference for regression. Objectives (PSLS Chapter 23) Inference for regression (NHST Regression Inference Award)[B level award]

Slides:



Advertisements
Similar presentations
Chapter 12 Inference for Linear Regression
Advertisements

Inference for Regression
Regression Inferential Methods
CHAPTER 24: Inference for Regression
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 14: More About Regression Section 14.1 Inference for Linear Regression.
Objectives (BPS chapter 24)
Chapter 10 Simple Regression.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Chapter 12 Section 1 Inference for Linear Regression.
Simple Linear Regression Analysis
Inference for regression - Simple linear regression
Simple linear regression Linear regression with one predictor variable.
Confidence Intervals for the Regression Slope 12.1b Target Goal: I can perform a significance test about the slope β of a population (true) regression.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Chapter 15 Inference for Regression
Inference for Linear Regression Conditions for Regression Inference: Suppose we have n observations on an explanatory variable x and a response variable.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Regression. Height Weight How much would an adult female weigh if she were 5 feet tall? She could weigh varying amounts – in other words, there is a distribution.
Regression. Height Weight Suppose you took many samples of the same size from this population & calculated the LSRL for each. Using the slope from each.
Introduction to Linear Regression
Multiple regression - Inference for multiple regression - A case study IPS chapters 11.1 and 11.2 © 2006 W.H. Freeman and Company.
Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation.
Regression with Inference Notes: Page 231. Height Weight Suppose you took many samples of the same size from this population & calculated the LSRL for.
Chapter 14 Inference for Regression AP Statistics 14.1 – Inference about the Model 14.2 – Predictions and Conditions.
Lesson Inference for Regression. Knowledge Objectives Identify the conditions necessary to do inference for regression. Explain what is meant by.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
1 Lecture 4 Main Tasks Today 1. Review of Lecture 3 2. Accuracy of the LS estimators 3. Significance Tests of the Parameters 4. Confidence Interval 5.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Inference for Regression Chapter 14. Linear Regression We can use least squares regression to estimate the linear relationship between two quantitative.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
Simple Linear Regression ANOVA for regression (10.2)
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
Chapter 12 Inference for Linear Regression. Reminder of Linear Regression First thing you should do is examine your data… First thing you should do is.
Chapter 10 Inference for Regression
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
Chapter 26: Inference for Slope. Height Weight How much would an adult female weigh if she were 5 feet tall? She could weigh varying amounts – in other.
The Practice of Statistics Third Edition Chapter 15: Inference for Regression Copyright © 2008 by W. H. Freeman & Company.
Regression Analysis Presentation 13. Regression In Chapter 15, we looked at associations between two categorical variables. We will now focus on relationships.
BPS - 5th Ed. Chapter 231 Inference for Regression.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Simple linear regression. What is simple linear regression? A way of evaluating the relationship between two continuous variables. One variable is regarded.
Simple linear regression. What is simple linear regression? A way of evaluating the relationship between two continuous variables. One variable is regarded.
Statistics for Business and Economics Module 2: Regression and time series analysis Spring 2010 Lecture 4: Inference about regression Priyantha Wijayatunga,
Regression Inference. Height Weight How much would an adult male weigh if he were 5 feet tall? He could weigh varying amounts (in other words, there is.
CHAPTER 12 More About Regression
23. Inference for regression
Chapter 4: Basic Estimation Techniques
CHAPTER 12 More About Regression
Regression.
Inference for Regression
CHAPTER 12 More About Regression
The Practice of Statistics in the Life Sciences Fourth Edition
Basic Estimation Techniques
Inference for Regression Lines
CHAPTER 29: Multiple Regression*
CHAPTER 26: Inference for Regression
Chapter 12 Regression.
Regression.
Regression.
Regression.
Regression Chapter 8.
Regression.
Basic Practice of Statistics - 3rd Edition Inference for Regression
CHAPTER 12 More About Regression
Chapter 14 Inference for Regression
CHAPTER 12 More About Regression
Presentation transcript:

Lecture 10 Chapter 23. Inference for regression

Objectives (PSLS Chapter 23) Inference for regression (NHST Regression Inference Award)[B level award] Note Level  The regression model  Confidence interval for the regression slope β  Testing the hypothesis of no linear relationship  Inference for prediction  Conditions for inference

Most scatterplots are created from sample data.  Is the observed relationship statistically significant (not explained by chance events due to random sampling alone)?  What is the population mean response  y as a function of the explanatory variable x?  y =  +  x Common Questions Asked of Bivariate Relationships

The regression model The least-squares regression line ŷ = a + bx is a mathematical model of the relationship between 2 quantitative variables: “observed data = fit + residual” The regression line is the fit. For each data point in the sample, the residual is the difference (y - ŷ).

At the population level, the model becomes y i = (  +  x i ) + (  i ) with residuals  i independent and Normally distributed N(0,  ). The population mean response  y is  y =  +  x ŷ unbiased estimate for mean response  y a unbiased estimate for intercept  b unbiased estimate for slope 

The regression standard error, s, for n sample data points is computed from the residuals (y i – ŷ i ): s is an unbiased estimate of the regression standard deviation   The model has constant variance of Y (  is the same for all values of x). For any fixed x, the responses y follow a Normal distribution with standard deviation .

Regression Analysis: The regression equation is Frequency = Temperature Predictor Coef SE Coef T P Constant Temperature S = R-Sq = 71.5% R-Sq(adj) = 69.9% Frog mating call Scientists recorded the call frequency of 20 gray tree frogs, Hyla chrysoscelis, as a function of field air temperature. MINITAB s: aka “MSE”, regression standard error, unbiased estimate of 

Confidence interval for the slope β Estimating the regression parameter  for the slope is a case of one- sample inference with σ unknown. Hence we rely on t distributions. The standard error of the slope b is: (s is the regression standard error) Thus, a level C confidence interval for the slope  is: estimate ± t*SE estimate b ± t* SE b t* is t critical for t(df = n – 2) density curve with C% between –t* and +t*

Regression Analysis: The regression equation is Frequency = Temperature Predictor Coef SE Coef T P Constant Temperature S = R-Sq = 71.5% R-Sq(adj) = 69.9% Frog mating call standard error of the slope SE b We are 95% confident that a 1 degree Celsius increase in temperature results in an increase in mating call frequency between 1.60 and 3.06 notes/seconds. 95% CI for the slope  is: b ± t*SE b = ± (2.101)(0.3468) = ± 07286, or to n = 20, df = 18 t* = 2.101

Testing the hypothesis of no relationship To test for a significant relationship, use a sampling distribution with the parameter for the slope  is equal to zero, using a one-sample t test. The standard error of the slope b is: We test the statistical hypotheses H 0 :  = 0 versus a one-sided or two-sided H a. We compute t = b / SE b which has the t (n – 2) distribution to find the P-value of the test.

Regression Analysis: The regression equation is Frequency = Temperature Predictor Coef SE Coef T P Constant Temperature S = R-Sq = 71.5% R-Sq(adj) = 69.9% Frog mating call Two-sided P-value for H 0 : β = 0 t = b / SE b = / = 6.721, with df = n – 2 = 18, Table C: t >  P < (two-sided test), highly significant.  There is a significant relationship between temperature and call frequency. We test H 0 : β = 0 versus H a : β ≠ 0

Testing for lack of correlation The regression slope b and the correlation coefficient r are related and b = 0  r = 0. Conversely, the population parameter for the slope β is related to the population correlation coefficient ρ, and when β  = 0  ρ = 0. Thus, testing the hypothesis H 0 : β = 0 is the same as testing the hypothesis of no correlation between x and y in the population from which our data were drawn.

Hand length and body height for a random sample of 21 adult men Regression Analysis: Hand length (mm) vs Height (m) The regression equation is Hand length (mm) = Height (m) Predictor Coef SE Coef T P Constant Height (m) S = R-Sq = 12.7% R-Sq(adj) = 8.1%  No significant relationship

Objectives (PSLS Chapter 23) Inference for regression (NHST Regression Inference Award)[B level award] Note Change  The regression model  Confidence interval for the regression slope β  Testing the hypothesis of no linear relationship  Inference for prediction  Conditions for inference

Two Types of Inference for prediction 1. One use of regression is for prediction within range: ŷ = a + bx. But this prediction depends on the particular sample drawn. We need statistical inference to generalize our conclusions. 2. To estimate an individual response y for a given value of x, we use a prediction interval. If we randomly sampled many times, there would be many different values of y obtained for a particular x following N(0,σ) around the mean response µ y.

Confidence interval for µ y Predicting the population mean value of y, µ y, for any value of x within the range (domain) of data tested. Using inference, we calculate a level C confidence interval for the population mean μ y of all responses y when x takes the value x*: This interval is centered on ŷ, the unbiased estimate of μ y. The true value of the population mean μ y at a given value of x, will indeed be within our confidence interval in C% of all intervals computed from many different random samples.

A level C prediction interval for a single observation on y when x takes the value x* is: ŷ ± t* SE ŷ A level C confidence interval for the mean response μ y at a given value x* of x is: ŷ ± t* SE  ^ Use t* for a t distribution with df = n – 2

Confidence interval for the mean call frequency at 22 degrees C: µ y within 43.3 and 46.9 notes/sec Prediction interval for individual call frequencies at 22 degrees C: ŷ within 38.9 and 51.3 notes/sec Minitab Predicted Values for New Observations: Temperature=22 NewObs Fit SE Fit 95% CI 95% PI (43.273, ) (38.888, )

Blood alcohol content (bac) as a function of alcohol consumed (number of beers) for a random sample of 16 adults. What do you predict for 5 beers consumed? What are typical values of BAC when 5 beers are consumed? Give a range.

Conditions for inference  The observations are independent  The relationship is indeed linear (in the parameters)  The standard deviation of y, σ, is the same for all values of x  The response y varies Normally around its mean at a given x

Using residual plots to check for regression validity The residuals (y – ŷ) give useful information about the contribution of individual data points to the overall pattern of scatter. We view the residuals in a residual plot: If residuals are scattered randomly around 0 with uniform variation, it indicates that the data fit a linear model, have Normally distributed residuals for each value of x, and constant standard deviation σ.

Residuals are randomly scattered  good! Curved pattern  the shape of the relationship is not modeled correctly. Change in variability across plot  σ not equal for all values of x.

Frog mating call The data are a random sample of frogs. The relationship is clearly linear. The residuals are roughly Normally distributed. The spread of the residuals around 0 is fairly homogenous along all values of x.

Objectives (PSLS Chapter 23) Inference for regression (NHST Regression Inference Award)[B level award] Note Change  The regression model  Confidence interval for the regression slope β  Testing the hypothesis of no linear relationship  Inference for prediction  Conditions for inference