Simple Linear Regression

Slides:



Advertisements
Similar presentations
Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Advertisements

Forecasting Using the Simple Linear Regression Model and Correlation
Regresi Linear Sederhana Pertemuan 01 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Chapter 10 Simple Regression.
Chapter 12 Simple Regression
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 11 th Edition.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Chapter 12a Simple Linear Regression
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 13-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Chapter Topics Types of Regression Models
Linear Regression and Correlation Analysis
Chapter 13 Introduction to Linear Regression and Correlation Analysis
RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.
Korelasi dalam Regresi Linear Sederhana Pertemuan 03 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Pertemua 19 Regresi Linier
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 10 th Edition.
Correlation and Regression Analysis
Introduction to Regression Analysis, Chapter 13,
Chapter 13 Simple Linear Regression
Multiple Linear Regression Analysis
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Regression and Correlation Methods Judy Zhong Ph.D.
Introduction to Linear Regression and Correlation Analysis
1 1 Slide © 2009 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Chapter 11 Simple Regression
MAT 254 – Probability and Statistics Sections 1,2 & Spring.
Statistics for Business and Economics 8 th Edition Chapter 11 Simple Regression Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch.
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Chapter 14 Simple Regression
1 1 Slide Simple Linear Regression Part A n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n.
1 1 Slide © 2004 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 12-1 Correlation and Regression.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Applied Quantitative Analysis and Practices LECTURE#22 By Dr. Osman Sadiq Paracha.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
BIOL 582 Lecture Set 11 Bivariate Data Correlation Regression.
Chapter 5: Regression Analysis Part 1: Simple Linear Regression.
Trend Projection Model b0b0 b1b1 YiYi
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
Statistics for Business and Economics 8 th Edition Chapter 11 Simple Regression Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Lecture 10: Correlation and Regression Model.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
© 2001 Prentice-Hall, Inc.Chap 13-1 BA 201 Lecture 18 Introduction to Simple Linear Regression (Data)Data.
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Statistics for Managers Using Microsoft® Excel 5th Edition
Chapter 12 Simple Linear Regression.
Lecture 10 Introduction to Linear Regression and Correlation Analysis.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Chapter 14 Introduction to Regression Analysis. Objectives Regression Analysis Uses of Regression Analysis Method of Least Squares Difference between.
BUSINESS MATHEMATICS & STATISTICS. Module 6 Correlation ( Lecture 28-29) Line Fitting ( Lectures 30-31) Time Series and Exponential Smoothing ( Lectures.
Chapter 12 Simple Regression Statistika.  Analisis regresi adalah analisis hubungan linear antar 2 variabel random yang mempunyai hub linear,  Variabel.
Introduction Many problems in Engineering, Management, Health Sciences and other Sciences involve exploring the relationships between two or more variables.
Chapter 13 Simple Linear Regression
The simple linear regression model and parameter estimation
Regression Analysis AGEC 784.
Linear Regression and Correlation Analysis
Chapter 11 Simple Regression
Simple Linear Regression
Correlation and Regression
Presentation transcript:

Simple Linear Regression Chapter 4 Simple Linear Regression

Learning Objectives Understand the goals of simple linear regression analysis Consider what the error term contains Define the population regression model and the sample regression function Estimate the sample regression function Interpret the estimated sample regression function Predict outcomes based on our estimated sample regression function Assess the goodness-of-fit of the estimated sample regression function Understand how to read regression output in Excel Understand the difference between correlation and causation

Understand the Goals of Simple Linear Regression Analysis Regression analysis is used to: Obtain the marginal effect that a one-unit change in the independent variable has on the dependent variable Predict the value of a dependent variable based on the value of the independent variable Dependent or explanatory variable: the variable we wish to explain Independent variable: the variable used to explain the dependent variable

Simple Linear Regression Model The term simple refers to that there is only one independent variable, x, Relationship between x and y is described by a linear function Regression refers to the manner the relationship is estimated Changes in y are assumed to be caused by changes in x (although this is not typically the case)

Types of Regression Models Positive Linear Relationship Relationship NOT Linear Negative Linear Relationship No Relationship

Population Linear Regression Model The population regression model: Random Error term, or residual Population Slope Coefficient Population y intercept Independent Variable Dependent Variable Linear component Random Error component

Population Linear Regression y Observed Value of y for xi εi Slope = β1 Predicted Value of y for xi Random Error for this x value Intercept = β0 xi x

Consider What the Random Error Component, ε, Contains Omitted Variables – independent variables that are related to the dependent variable, y, but are not in the regression model (i.e. they are omitted). Measurement Error – the difference between the measured value of the observation and the true value. This can occur if there is a data entry error or if a person, firm, etc. does not know the true value and instead reports an incorrect value.

Consider What the Random Error Component, ε, Contains Incorrect Functional Form – the wrong model is fit to the data. For example, a linear function is fit between y and x but the true relationship is quadratic. Random Component – the variable being studies is inherently random. Even if two people have the same number of years of education, they may earn different salaries due to random factors aside from the omitted factors listed above.

Estimated Regression Function The sample regression line provides an estimate of the population regression line Estimated (or predicted) y value Estimate of the regression intercept Estimate of the regression slope Independent variable

What is a Residual? A residual is the difference between the observed value of y and the predicted value of y. It is an estimate of the error term, ε, that resides in the population while the residual is from the sample.

Graph of the Sample Regression Function

Graph of Predictions and Residuals for Multiple Observations

Estimate the Sample Regression Function and are obtained by minimizing the sum of the squared residuals with respect to and

The Least Squares Equation The formulas for and are: and

Interpretation of the Slope and the Intercept is, on average, the estimated value of y when x is equal to zero is, on average, the estimated change in the value of y as a result of a one-unit change in x

Salary (y) vs. Education (x) Example in salary.xls

Example continued

A Graphical Representation of the Estimated Regression Line

Using Excel to Compute the Estimated Regression Equation in a Scatter Plot Create a scatter diagram in Excel Position the mouse over any data point and right click Select Add Trendline option When the Add Trendline dialog box appears: On the Type tab select Linear (it is the default) On the Options tab select the Display equation on chart box (note the equation is displayed with the slope first and the intercept second) Click OK

Interpret the Estimated Sample Regression Function : On average, if education goes up by one year then salary will go up by $11,257.58. : On average, if an individual has 0 years of education then their estimated salary is $-121,321.58 (this estimate is obviously ridiculous)

Predict Outcomes Based on our Estimated Sample Regression Function Say we want to predict salary for a person with 12 years of education. We would put this value of x into the sample regression function as We predict a salary of $13,769.75 for a person with 12 years of education.

Assess the Goodness-of-Fit of the Estimated Regression Function Goodness-of-Fit is how well the regression model describes the observed data. Two measures of goodness-of-fit R-squared Standard error of the regression

Comparing the Goodness-of-Fit of Two Hypothetical Data Sets

A Venn Diagram Demonstrating Joint Variation between y and x

The Sample Regression Function Explains None of the Variation in y

The Sample Regression Function Explains All of the Variation in y

The Sample Regression Function Explains All of the Variation in y

Explained and Unexplained Variation Total variation in the dependent variable is made up of two parts: Total sum of Squares Explained Sum of Squares Unexplained Sum of Squares where: = Average value of the dependent variable y = Observed values of the dependent variable = Estimated value of y for the given x value

Explained and Unexplained Variation SST = total sum of squares Measures the total variation of the yi values around the mean of y. This is the numerator of the variance of y ESS = explained sum of squares Variation in y attributable to the portion of the dependent variable y that is explained by the independent variable x USS = unexplained sum of squares Variation in y attributable to factors other than the relationship between x and y

Explained and Unexplained Variation y yi   y USS = (yi - yi )2 _ TSS = (yi - y)2  _ y  ESS = (yi - y)2 _ _ y y x Xi

Coefficient of Determination, R2 The coefficient of determination is the portion of the total variation in the dependent variable that is explained by variation in the independent variable The coefficient of determination is also called R-squared and is denoted as R2 where

Coefficient of Determination, R2 Note: In simple linear regression, R2 is equal to the correlation coefficient squared where: R2 = Coefficient of determination = correlation coefficient between x and y

How are the Correlation Coefficient and the Coefficient of Determination Related? R2 = rxy2 Note that this relationship only occurs with simple linear regression.

What is the Intuition Behind This Relationship? In the case of linear relationship between two variables, both the coefficient of determination and the sample correlation coefficient provide measures of the strength of relationship. The coefficient of determination provides a measure between 0 and 1 whereas the correlation coefficient provides a measure between -1 and 1. The coefficient of determination can be used for nonlinear relationships and for relationships that have two or more independent variables. Why might the correlation coefficient be preferred to the coefficient of determination?

Examples of Approximate R2 Values y R2 = 1 Perfect linear relationship between x and y: 100% of the variation in y is explained by variation in x Note that the R2 is positive even if the line has a negative slope x R2 = 1 y x R2 = +1

Examples of Approximate R2 Values y 0 < R2 < 1 Weaker linear relationship between x and y: Some but not all of the variation in y is explained by variation in x x y x

Examples of Approximate R2 Values y No linear relationship between x and y: The value of Y does not depend on x. (None of the variation in y is explained by variation in x) x R2 = 0

What does R2 mean? R2 means that R2*100% of the variation in y is explained by x. For example if R2=.85 we would say that 85% of the variation in y is explained by x.

Calculating R2 for the salary.xls example This says 63.73% of the variation in salary is explained by education

Using Excel to Compute the Coefficient of Determination Position the mouse pointer over any data point in the scatter diagram and right click to display the chart menu. Select Add Trendline option When the Add Trendline dialog box occurs: On the Options tab display the R-squared value on the chart box and click OK.

The Standard Error of the Estimated Sample Regression Function The standard error of the regression function measures, on average, how far the points fall away from the regression line. where k = the number of explanatory variables. In simple linear regression k = 1.

Calculation of the Standard Error for the salary.xls Example

Reading Regression Output in Excel: Intercept and Slope

Reading Regression Output in Excel: R2 63.73% of the variation in salary is explained by the variation in education Explained Unexplained Total

Reading Regression Output in Excel: Standard Error Explained Unexplained Total

Excel’s Regression Tool Select the Tools menu Choose the Data Analysis option Choose Regression from the list of Analysis Tools Input y into the Input Y Range Input x into the Input X Range Select Labels Select Output Range in the sheet Click OK

Understand the Difference between Correlation and Causation Correlation is when there is a linear relationship between two random variables. Causation occurs between two random variables when changes in one variable (say x) causes changes in another variable (say y) Spurious correlation occurs when there is correlation between two random variables that results from a relationship from a third random variable

Understand the Difference between Correlation and Causation Just because there is correlation between two random variables it does not mean causation. Examples: The more firemen at a fire is linked to increased monetary damages from the fire. The number of shark attacks and ice cream sales are positively related. Students who are tutored tend to get worse grades than children that are tutored. See Google correlate for more real world examples of this phenomenon.