Chapter 11 Linear Regression Straight Lines, Least-Squares and More Chapter 11A Can you pick out the straight lines and find the least-square?

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Linear regression models
Objectives (BPS chapter 24)
Chapter 12 Simple Linear Regression
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Classical Regression III
Chapter 13 Multiple Regression
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Chapter 10 Simple Regression.
Chapter 12 Simple Regression
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 11 th Edition.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
The Simple Regression Model
SIMPLE LINEAR REGRESSION
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 11 th Edition.
Chapter Topics Types of Regression Models
Chapter 11 Multiple Regression.
REGRESSION AND CORRELATION
SIMPLE LINEAR REGRESSION
Ch. 14: The Multiple Regression Model building
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 10 th Edition.
Simple Linear Regression and Correlation
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Lecture 5 Correlation and Regression
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation.
An alternative approach to testing for a linear association The Analysis of Variance (ANOVA) Table.
Chap 14-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics.
Chap 13-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 12.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
Statistics for Business and Economics 8 th Edition Chapter 11 Simple Regression Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Lecture 10: Correlation and Regression Model.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
Chapter 12 Simple Linear Regression.
Chapter 12 Simple Linear Regression n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n Testing.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
The “Big Picture” (from Heath 1995). Simple Linear Regression.
Analysis of variance approach to regression analysis … an (alternative) approach to testing for a linear association.
Chapter 20 Linear and Multiple Regression
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Relationship with one independent variable
Chapter 13 Simple Linear Regression
Relationship with one independent variable
Simple Linear Regression
SIMPLE LINEAR REGRESSION
SIMPLE LINEAR REGRESSION
St. Edward’s University
Presentation transcript:

Chapter 11 Linear Regression Straight Lines, Least-Squares and More Chapter 11A Can you pick out the straight lines and find the least-square?

The Chapter in Outline It’s all about explaining variability

11-1 Empirical Models Empirical – derived from observation rather than theory Regression Analysis - A mathematical optimization technique which, when given a series of observed data, attempts to find a function which closely approximates the data a "best fit" I let the data do the talking.

The General Idea - Example

A Straight Line The linear equation: dependent variable constant (y-intercept) slope independent variable random error

The Statistical Model This model is linear in   and    not necessarily in the predictor variable x. Assume  is a random variable with E[  ] = 0 and V[  ] =  2

The Problem Given n (paired) data points: (x 1,y 1 ), (x 2,y 2 ), … (x n,y n ) Fit a straight line to the data. That is find values of  0 and  1 such that y i (actual) and (predicted) are somehow “close.”

The Error Terms

The Method of Least Squares As in Max Likelihood methods, we treat the x’s as constants and the parameters as the variables.

Let’s do some more math… normal equations

and even more math …

More Method of Least Squares A useful way to think of the solution for  1.

Some Notation

The Least-Squares Estimates

Estimating  2 which means that an unbiased estimator of the variance is The text says that it would be tedious to compute SS E as in the equation at the top. So we are offered a computational form. My comment – the computational form is conceptually important as well. Mean Square Error (MSE)

Partitioning and Explaining Variability The objective in predictive modeling is to explain the variability in the data. The computational form for SS E is our first example of this. SS T is the total variability of the data about the mean. It gets broken into two parts – variability explained by the regression and that left unexplained as error. A good predictor explains a lot of variability.

Partitioning Variability This identity comes up in many contexts. SST – total variability about the mean in the data SSR – variability explained by the model SSE – variability left unexplained – attributed to error Note that under our distributional assumptions SSR and SSE are sums of squares of normal variates.

Bonus Slide For the discriminating and overachieving student: Derive the above by first starting with the identity: Then rearrange terms, square both sides, and simplify. xixi yiyi

Problem 11-5 S XY = – 558 * / 12 = S XX = – 558*558/12 = 3309 SS T = – * / 12 = SS E = – 9.21 * = => MSE = 37.75/10 ~ 3.8 b 1 = ( /3309) = 9.21 b 0 = ( /12) – 9.21 * (558 / 12) = Pounds (in 1,000) of steam used per month

Problem 11-5, Minitab Usage = Temp Predictor Coef SE Coef T P Constant Temp S = R-Sq = 100.0% R-Sq(adj) = 100.0% Analysis of Variance Source DF SS MS F P Regression Residual Error Total   SSE SSR SST Note: (272.64) 2 =

Bias and Variance of the Estimators Since the betas are functions of the observations they are random variables. Properties of the regression parameters are implied by the properties of the observations and the algebra of expected values and variances. Our assumption is that the error term has a mean of zero and a variance of  2. Remember we treat the x values as fixed – constants.

Expected Values The k i terms come in handy now.

Variance Terms – beta 1 The k i terms still come in handy.

Variance Terms – beta 0 But the Y i are all independent so that,

Variance Terms Usage = Temp Predictor Coef SE Coef T P Constant Temp S = R-Sq = 100.0% R-Sq(adj) = 100.0% Analysis of Variance Source DF SS MS F P Regression Residual Error Total  2 estimate = 3.8 S xx from Excel version = 3309 Xbar =558/12=46.5 =sqrt(3.8/3309)= =sqrt(3.8*[1/ ^2/3309 ]) = 1.67

Introducing Distributional Assumptions NID(0,  2 ) is the basic assumption This implies that the error terms in our equation are independent, normally distributed with mean 0, and constant variance  2. Our work at the end of this chapter will focus on how we can verify or test some of the assumptions by examining the estimates of the error. When these assumptions are valid, we can build confidence and prediction intervals on the regression line and new predictions. We can also create hypothesis tests on the coefficients. - We know the estimates are unbiased. - We know their standard error and can estimate it.

Hypothesis Tests on the Coefficients Two-sided test Recall the definition of a t variate as a standard normal divided by the square root of a chi-square divided by its degrees of freedom.

Hypothesis Tests on the Coefficients Similarly, we can write out a test statistic for  0. Does the regression have any predictive value? Test H 0 :  1 = 0 H 1 :  1 <> 0 Failing to reject can mean x is not a good predictor, or the relation is not linear.

Use of t-Tests An important special case of the hypotheses of Equation is These hypotheses relate to the significance of regression. Failure to reject H 0 is equivalent to concluding that there is no linear relationship between x and Y.

Figure 11-5 The hypothesis H 0 :  1 = 0 is not rejected.

Figure 11-6 The hypothesis H 0 :  1 = 0 is rejected.

Analysis of Variance Approach to Test Significance of Regression If the null hypothesis, H 0 :  1 = 0 is true, the statistic follows the F 1,n-2 distribution and we would reject if f 0 > f ,1,n-2.

F-Test of Regression Significance Note that the numerator and denominator are divided by d.f. As pointed out in the text, the F-test on the regression and the T-test on the b 1 coefficient are completely equivalent – not just close or likely to produce the same answer.

Analysis of Variance Approach to Test Significance of Regression The quantities, MS R and MS E are called mean squares. Analysis of variance table:

Problem 11-24/27 The regression equation is y = x Predictor Coef SE Coef T P Constant x S = R-Sq = 80.0% R-Sq(adj) = 78.2% Analysis of Variance Source DF SS MS F P Regression Residual Error Total Unusual Observations Obs x y Fit SE Fit Residual St Resid R R denotes an observation with a large standardized residual

A Complete Example – prbl 11-3 The following are NFL quarterback ratings for the 2004 season. It is suspected that the rating (y) is related to the average number of yards gained per pass attempt (x). A prob-stat student generating sample data.

Problem 11-3 – some calculations

More of a Complete Example

Problem 11-3 – a graph

Problem 11-3 – some questions (b) Find an estimate for the mean rating if a quarterback averages 7.5 yards per attempt. (c) What change in the mean rating is associated with a decrease of one yard per attempt? (d) To increase the mean rating by 10 points, how much increase in the average yards per attempt must be generated? (e) Given that x = 7.21 yards (M. Vick), find the fitted value of y and the corresponding residual.

Excel Regression Output SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations30 ANOVA dfSSMSFSignificance F Regression E-11 Residual Total Coefficients Standard Errort StatP-value Intercept Yds per Att E-11

Problem (a) Test for significance of the regression at 1% level. From prob-calculator: F.01,1,28 =

Problem (b) Estimate the standard error of the slope and intercept

Problem (c) Test H 0 :  1 = 10 (two-tailed)

Next Time Restoring our Confidence (intervals) Join us next time when we return to an old favorite – the confidence interval.