Simple linear regression Linear regression with one predictor variable.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Objectives 10.1 Simple linear regression
Inference for Regression
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Regression Inferential Methods
Simple Linear Regression. G. Baker, Department of Statistics University of South Carolina; Slide 2 Relationship Between Two Quantitative Variables If.
Linear regression models
Objectives (BPS chapter 24)
Simple Linear Regression
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Chapter 10 Simple Regression.
REGRESSION AND CORRELATION
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Chapter 12 Section 1 Inference for Linear Regression.
Simple Linear Regression Analysis
Descriptive measures of the strength of a linear association r-squared and the (Pearson) correlation coefficient r.
Correlation & Regression
Inference for regression - Simple linear regression
Linear Lack of Fit (LOF) Test An F test for checking whether a linear regression function is inadequate in describing the trend in the data.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
M23- Residuals & Minitab 1  Department of ISM, University of Alabama, ResidualsResiduals A continuation of regression analysis.
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Introduction to Linear Regression
Prediction concerning the response Y. Where does this topic fit in? Model formulation Model estimation Model evaluation Model use.
Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation.
An alternative approach to testing for a linear association The Analysis of Variance (ANOVA) Table.
Chapter 14 Inference for Regression AP Statistics 14.1 – Inference about the Model 14.2 – Predictions and Conditions.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Part 2: Model and Inference 2-1/49 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
Lack of Fit (LOF) Test A formal F test for checking whether a specific type of regression function adequately fits the data.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Multiple regression. Example: Brain and body size predictive of intelligence? Sample of n = 38 college students Response (Y): intelligence based on the.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Lecture 10 Chapter 23. Inference for regression. Objectives (PSLS Chapter 23) Inference for regression (NHST Regression Inference Award)[B level award]
[1] Simple Linear Regression. The general equation of a line is Y = c + mX or Y =  +  X.  > 0  > 0  > 0  = 0  = 0  < 0  > 0  < 0.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
June 30, 2008Stat Lecture 16 - Regression1 Inference for relationships between variables Statistics Lecture 16.
Regression. Height Weight How much would an adult female weigh if she were 5 feet tall? She could weigh varying amounts – in other words, there is a distribution.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
Regression through the origin
Inference for  0 and 1 Confidence intervals and hypothesis tests.
Interaction regression models. What is an additive model? A regression model with p-1 predictor variables contains additive effects if the response function.
Correlation and Regression Elementary Statistics Larson Farber Chapter 9 Hours of Training Accidents.
Chapter 26: Inference for Slope. Height Weight How much would an adult female weigh if she were 5 feet tall? She could weigh varying amounts – in other.
Regression Analysis Presentation 13. Regression In Chapter 15, we looked at associations between two categorical variables. We will now focus on relationships.
Simple linear regression. What is simple linear regression? A way of evaluating the relationship between two continuous variables. One variable is regarded.
The “Big Picture” (from Heath 1995). Simple Linear Regression.
Simple linear regression. What is simple linear regression? A way of evaluating the relationship between two continuous variables. One variable is regarded.
Analysis of variance approach to regression analysis … an (alternative) approach to testing for a linear association.
Regression Inference. Height Weight How much would an adult male weigh if he were 5 feet tall? He could weigh varying amounts (in other words, there is.
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
Chapter 15 Inference for Regression. How is this similar to what we have done in the past few chapters?  We have been using statistics to estimate parameters.
Lecture #25 Tuesday, November 15, 2016 Textbook: 14.1 and 14.3
The simple linear regression model and parameter estimation
Chapter 20 Linear and Multiple Regression
Inference for Regression Lines
CHAPTER 29: Multiple Regression*
Chapter 12 Inference on the Least-squares Regression Line; ANOVA
Inference for Regression
Regression Chapter 8.
Regression.
Simple Linear Regression
Chapter 14 Inference for Regression
Introduction to Regression
Presentation transcript:

Simple linear regression Linear regression with one predictor variable

What is simple linear regression? A way of evaluating the relationship between two continuous variables. One variable is regarded as the predictor, explanatory, or independent variable (x). Other variable is regarded as the response, outcome, or dependent variable (y).

A deterministic (or functional) relationship

Other deterministic relationships Circumference = π×diameter Hooke’s Law: Y = α + βX, where Y = amount of stretch in spring, and X = applied weight. Ohm’s Law: I = V/r, where V = voltage applied, r = resistance, and I = current. Boyle’s Law: For a constant temperature, P = α/V, where P = pressure, α = constant for each gas, and V = volume of gas.

A statistical relationship A relationship with some “trend”, but also with some “scatter.”

Other statistical relationships Height and weight Alcohol consumed and blood alcohol content Vital lung capacity and pack-years of smoking Driving speed and gas mileage

Which is the “best fitting line”?

Notation is the observed response for the i th experimental unit. is the predictor value for the i th experimental unit. is the predicted response (or fitted value) for the i th experimental unit. Equation of best fitting line:

Prediction error (or residual error) In using to predict the actual response we make a prediction error (or a residual error) of size A line that fits the data well will be one for which the n prediction errors are as small as possible in some overall sense.

The “least squares criterion” Choose the values b 0 and b 1 that minimize the sum of the squared prediction errors. Equation of best fitting line: That is, find b 0 and b 1 that minimize:

Which is the “best fitting line”?

w = h

w = h

The least squares regression line Using calculus, minimize (take derivative with respect to b 0 and b 1, set to 0, and solve for b 0 and b 1 ): and get the least squares estimates b 0 and b 1 :

Fitted line plot in Minitab

Regression analysis in Minitab The regression equation is weight = height Predictor Coef SE Coef T P Constant height S = R-Sq = 89.7% R-Sq(adj) = 88.4% Analysis of Variance Source DF SS MS F P Regression Residual Error Total

Prediction of future responses A common use of the estimated regression line. Predict mean weight of 66"-inch tall people. Predict mean weight of 67"-inch tall people.

What do the “estimated regression coefficients” b 0 and b 1 tell us? We can expect the mean response to increase or decrease by b 1 units for every unit increase in x. If the “scope of the model” includes x = 0, then b 0 is the predicted mean response when x = 0. Otherwise, b 0 is not meaningful.

So, the estimated regression coefficients b 0 and b 1 tell us… We predict the mean weight to increase by 6.14 pounds for every additional one-inch increase in height. It is not meaningful to have a height of 0 inches. That is, the scope of the model does not include x = 0. So, here the intercept b 0 is not meaningful.

What do b 0 and b 1 estimate?

The simple linear regression model

The mean of the responses, E(Y i ), is a linear function of the x i. The errors, ε i, and hence the responses Y i, are independent. The errors, ε i, and hence the responses Y i, are normally distributed. The errors, ε i, and hence the responses Y i, have equal variances (σ 2 ) for all x values.

What about (unknown) σ 2 ? It quantifies how much the responses (y) vary around the (unknown) mean regression line E(Y) = β 0 + β 1 x.

Will this thermometer yield more precise future predictions …?

… or this one?

Recall the “sample variance” The sample variance estimates σ 2, the variance of the one population.

Estimating σ 2 in regression setting The mean square error estimates σ 2, the common variance of the many populations.

Estimating σ 2 from Minitab’s fitted line plot

Estimating σ 2 from Minitab’s regression analysis The regression equation is weight = height Predictor Coef SE Coef T P Constant height S = R-Sq = 89.7% R-Sq(adj) = 88.4% Analysis of Variance Source DF SS MS F P Regression Residual Error Total

Inference for (or drawing conclusions about) β 0 and β 1 Confidence intervals and hypothesis tests

Relationship between state latitude and skin cancer mortality? Mortality rate of white males due to malignant skin melanoma from LAT = degrees (north) latitude of center of state MORT = mortality rate due to malignant skin melanoma per 10 million people # State LAT MORT 1 Alabama Arizona Arkansas California Colorado !    49 Wyoming

(1-α)100% t-interval for slope parameter β 1 Formula in words: Sample estimate ± (t-multiplier × standard error) Formula in notation:

Hypothesis test for slope parameter 1 Null hypothesis H 0 : 1 = some number β Alternative hypothesis H A : 1 ≠ some number Test statistic P-value = How likely is it that we’d get a test statistic t* as extreme as we did if the null hypothesis is true? The P-value is determined by referring to a t-distribution with n-2 degrees of freedom.

Inference for slope parameter β 1 in Minitab The regression equation is Mort = Lat Predictor Coef SE Coef T P Constant Lat S = R-Sq = 68.0% R-Sq(adj) = 67.3% Analysis of Variance Source DF SS MS F P Regression Residual Error Total

(1-α)100% t-interval for intercept parameter β 0 Formula in notation: Formula in words: Sample estimate ± (t-multiplier × standard error)

Hypothesis test for intercept parameter 0 Null hypothesis H 0 : 0 = some number Alternative hypothesis H A : 0 ≠ some number Test statistic P-value = How likely is it that we’d get a test statistic t* as extreme as we did if the null hypothesis is true? The P-value is determined by referring to a t-distribution with n-2 degrees of freedom.

Inference for intercept parameter β 0 in Minitab The regression equation is Mort = Lat Predictor Coef SE Coef T P Constant Lat S = R-Sq = 68.0% R-Sq(adj) = 67.3% Analysis of Variance Source DF SS MS F P Regression Residual Error Total

What assumptions? The intervals and tests depend on the assumption that the error terms (and thus responses) follow a normal distribution. Not a big deal if the error terms (and thus responses) are only approximately normal. If have a large sample, then the error terms can even deviate far from normality.

Basic regression analysis output in Minitab Select Stat. Select Regression. Select Regression … Specify Response (y) and Predictor (x). Click OK.