23. Inference for regression

Slides:

Advertisements

Similar presentations

Copyright © 2010 Pearson Education, Inc. Slide

Advertisements

Inference for Regression

CHAPTER 24: Inference for Regression

Objectives (BPS chapter 24)

SIMPLE LINEAR REGRESSION

Chapter 12 Section 1 Inference for Linear Regression.

Inference for regression - Simple linear regression

Confidence Intervals for the Regression Slope 12.1b Target Goal: I can perform a significance test about the slope β of a population (true) regression.

BPS - 3rd Ed. Chapter 211 Inference for Regression.

+ Chapter 12: Inference for Regression Inference for Linear Regression.

Introduction to Linear Regression

Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.

Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.

+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.

Lecture 10 Chapter 23. Inference for regression. Objectives (PSLS Chapter 23) Inference for regression (NHST Regression Inference Award)[B level award]

28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.

Chapter 10 Inference for Regression

The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.

The Practice of Statistics Third Edition Chapter 15: Inference for Regression Copyright © 2008 by W. H. Freeman & Company.

BPS - 5th Ed. Chapter 231 Inference for Regression.

The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.

Stats Methods at IC Lecture 3: Regression.

Inference for Linear Regression

Chapter 14: More About Regression

CHAPTER 12 More About Regression

Lecture #25 Tuesday, November 15, 2016 Textbook: 14.1 and 14.3

Chapter 4: Basic Estimation Techniques

Chapter 20 Linear and Multiple Regression

Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression.

CHAPTER 12 More About Regression

Inference for Regression (Chapter 14) A.P. Stats Review Topic #3

AP Statistics Chapter 14 Section 1.

Regression Inferential Methods

Inferences for Regression

Inference for Regression

Chapter 12: More About Regression

Chapter 12: More About Regression

CHAPTER 12 More About Regression

Simple Linear Regression

Chapter 13 Simple Linear Regression

Lecture Slides Elementary Statistics Thirteenth Edition

The Practice of Statistics in the Life Sciences Fourth Edition

Basic Estimation Techniques

Inference for Regression Lines

CHAPTER 29: Multiple Regression*

CHAPTER 26: Inference for Regression

Simple Linear Regression

Chapter 14 Inference for Regression

Regression Chapter 8.

Correlation and Regression

Chapter 12: More About Regression

Basic Practice of Statistics - 3rd Edition Inference for Regression

CHAPTER 12 More About Regression

SIMPLE LINEAR REGRESSION

Simple Linear Regression

CHAPTER 12 More About Regression

Chapter 12: More About Regression

Making Inferences about Slopes

Inferences for Regression

Chapter 12: More About Regression

Inference for Regression

Chapter Thirteen McGraw-Hill/Irwin

Chapter 12: More About Regression

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression

Presentation transcript:

23. Inference for regression The Practice of Statistics in the Life Sciences Third Edition © 2014 W. H. Freeman and Company

Objectives (PSLS Chapter 23) Inference for regression The regression model Confidence interval for the regression slope β Testing the hypothesis of no linear relationship Inference for prediction Conditions for inference

Most scatterplots are created from sample data.  Is the observed relationship statistically significant (not entirely explained by chance events due to random sampling)?  What is the population mean response my as a function of the explanatory variable x? my = a + bx

The regression model The least-squares regression line ŷ = a + bx is a mathematical model of the relationship between two quantitative variables: “sample data = fit + residual” The regression line is the fit. For each data point in the sample, the residual is the difference (y - ŷ).

The population mean response my is my = a + bx At the population level, the model becomes yi = (a + bxi) + (ei) with residuals ei independent and Normally distributed N(0,s). The population mean response my is my = a + bx ŷ unbiased estimate for mean response my a unbiased estimate for intercept a b unbiased estimate for slope b

s is an unbiased estimate of the regression standard deviation s. For any fixed x, the responses y follow a Normal distribution with standard deviation s. Regression assumes equal variance of Y (s is the same for all values of x). The regression standard error, s, for n sample data points is computed from the residuals (yi – ŷi): s is an unbiased estimate of the regression standard deviation s.

s: regression standard error, unbiased estimate of s Frog mating call Scientists recorded the call frequency of 20 gray tree frogs, Hyla chrysoscelis, as a function of field air temperature. MINITAB Regression Analysis: The regression equation is Frequency = - 6.19 + 2.33 Temperature Predictor Coef SE Coef T P Constant -6.190 8.243 -0.75 0.462 Temperature 2.3308 0.3468 6.72 0.000 S = 2.82159 R-Sq = 71.5% R-Sq(adj) = 69.9% s: regression standard error, unbiased estimate of s

Confidence interval for the slope β Estimating the regression parameter b for the slope is a case of one-sample inference with σ unknown. Hence we rely on t distributions. The standard error of the slope b is: (s is the regression standard error) Thus, a level C confidence interval for the slope b is: estimate ± t*SEestimate b ± t* SEb t* is t critical for t(df = n – 2) density curve with C% between –t* and +t*.

standard error of the slope SEb Frog mating call Regression Analysis: The regression equation is Frequency = - 6.19 + 2.33 Temperature Predictor Coef SE Coef T P Constant -6.190 8.243 -0.75 0.462 Temperature 2.3308 0.3468 6.72 0.000 S = 2.82159 R-Sq = 71.5% R-Sq(adj) = 69.9% n = 20, df = 18 t* = 2.101 standard error of the slope SEb 95% CI for the slope b is: b ± t*SEb = 2.3308 ± (2.101)(0.3468) = 2.3308 ± 07286, or 1.6022 to 3.0594 We are 95% confident that a 1 degree Celsius increase in temperature results in an increase in mating call frequency between 1.60 and 3.06 notes/seconds.

Testing the hypothesis of no relationship To test for a significant relationship, we ask if the parameter for the slope b is equal to zero, using a one-sample t test. The standard error of the slope b is: We test the hypotheses H0: b = 0 versus a one-sided or two-sided Ha. We compute t = b / SEb which has the t (n – 2) distribution to find the P-value of the test.

Two-sided P-value for H0: β = 0 One-sided P-value for H0: β = 0 Frog mating call Regression Analysis: The regression equation is Frequency = - 6.19 + 2.33 Temperature Predictor Coef SE Coef T P Constant -6.190 8.243 -0.75 0.462 Temperature 2.3308 0.3468 6.72 0.000 S = 2.82159 R-Sq = 71.5% R-Sq(adj) = 69.9% Two-sided P-value for H0: β = 0 We test H0: β = 0 versus Ha: β ≠ 0 t = b / SEb = 2.3308 / 0.3468 = 6.721, with df = n – 2 = 18 Table C: t > 3.922  P < 0.001 (two-sided test), highly significant.  There is a significant relationship between temperature and call frequency. One-sided P-value for H0: β = 0

Testing for lack of correlation The regression slope b and the correlation coefficient r are related and b = 0  r = 0. Similarly, the population parameter for the slope β is related to the population correlation coefficient ρ, and when β = 0  ρ = 0. Thus, testing the hypothesis H0: β = 0 is the same as testing the hypothesis of no correlation between x and y in the population from which our data were drawn.

Researchers examined the relationship between straightforwardness (using a personality inventory test score) and brain levels of serotonin transporters (in the dorsal raphe nucleus) in 20 adult subjects. With no obvious explanatory variable, we choose a test for the population correlation . We test: H0:  = 0 versus Ha:  ≠ 0 With a two-sided test, we get 0.02 < P < 0.01 from Table E (P = 0.014 from software), significant. Correlations: serotonin, straightf Pearson correlation of serotonin and straightf = -0.538 P-Value = 0.014 The P-value is small enough that we reject the null hypothesis: there is a significant relationship between serotonin transporter levels and straightforwardness. However, the relationship is only moderate to weak (r = –0.538). Notice that the t test for the slope (bottom, grey background) gives exactly the same P-value (0.014). | r | = 0.538 Regression Analysis: serotonin versus straightf Predictor Coef SE Coef T P Constant 4.8544 0.6814 7.12 0.000 straightf -0.04114 0.01519 -2.71 0.014

Inference for prediction One use of regression is for prediction within range: ŷ = a + bx. But this prediction depends on the particular sample drawn. We need statistical inference to generalize our conclusions. To estimate an individual response y for a given value of x, we use a prediction interval. If we randomly sampled many times, there would be many different values of y obtained for a particular x following N(0,σ) around the mean response µy. The prediction interval contains C% of all the individual values taken by y at a particular value of x.

Confidence interval for µy We may also want to predict the population mean value of y, µy, for any value of x within the range of data tested. Using inference, we calculate a level C confidence interval for the population mean μy of all responses y when x takes the value x*: This interval is centered on ŷ, the unbiased estimate of μy. The true value of the population mean μy at a given value of x will indeed be within our confidence interval in C% of all intervals computed from many different random samples. The confidence interval for μy contains with C% confidence the population mean μy of all responses at a particular value of x.

A level C prediction interval for a single observation on y when x takes the value x* is: ŷ ± t* SEŷ A level C confidence interval for the mean response μy at a given value x* of x is: ŷ ± t* SEm^ (In practice, use technology) The prediction interval for a single observation will always be larger than the confidence interval for the mean response. Use t* for a t distribution with df = n – 2.

Prediction interval for individual call frequencies at 22 degrees C: Confidence interval for the mean call frequency at 22 degrees C: µy within 43.3 and 46.9 notes/sec Prediction interval for individual call frequencies at 22 degrees C: ŷ within 38.9 and 51.3 notes/sec Graphically, a series of confidence intervals is shown for the whole range of x values. Minitab Predicted Values for New Observations: Temperature=22 NewObs Fit SE Fit 95% CI 95% PI 1 45.088 0.863 (43.273, 46.902) (38.888, 51.287)

Is the relationship linear? Blood alcohol content (bac) as a function of alcohol consumed (number of beers) for a random sample of 16 adults. Is the relationship linear? What do you predict for five beers consumed? The scatterplot is clearly linear, and the t test for the slope gives t = 7.47, P < 0.0001, highly significant. The confidence interval for the mean bac for five beers consumed is 0.066 to 0.088. The prediction interval for individual bac values for five beers consumed is 0.032 to 0.122.

Conditions for inference The observations are independent The relationship is indeed linear The standard deviation of y, σ, is the same for all values of x The response y varies Normally around its mean

Using residual plots to check for regression validity The residuals (y – ŷ) give useful information about the contribution of individual data points to the overall pattern of scatter. We view the residuals in a residual plot: If residuals are scattered randomly around 0 with uniform variation, it indicates that the data fit a linear model, have Normally distributed residuals for each value of x, and constant standard deviation σ.

Residuals are randomly scattered  good! Curved pattern  the relationship is not linear. Change in variability across plot  σ not equal for all values of x.

The data are a random sample of frogs. Frog mating call The data are a random sample of frogs. The relationship is clearly linear. The residuals are roughly Normally distributed. The spread of the residuals around 0 is fairly homogenous along all values of x. Minitab here displays the residuals in four different graphs: Normal quantile plot and histogram (top and bottom left) to assess Normality; a residual plot against the fitted values (top right) to assess the spread of the residuals; and a residual plot against the order of the observations (bottom right) to determine whether a pattern might have arisen during data collection.