Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 17.1 Chapter 17 Simple Linear Regression and Correlation.

Slides:



Advertisements
Similar presentations
Simple Linear Regression 1. review of least squares procedure 2
Advertisements

Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Forecasting Using the Simple Linear Regression Model and Correlation
Inference for Regression
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
1 Simple Linear Regression and Correlation The Model Estimating the Coefficients EXAMPLE 1: USED CAR SALES Assessing the model –T-tests –R-square.
Chapter 12 Simple Linear Regression
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Chapter 12 Simple Regression
BA 555 Practical Business Analysis
Simple Linear Regression
Chapter 13 Introduction to Linear Regression and Correlation Analysis
1 Pertemuan 13 Uji Koefisien Korelasi dan Regresi Matakuliah: A0392 – Statistik Ekonomi Tahun: 2006.
SIMPLE LINEAR REGRESSION
Pengujian Parameter Koefisien Korelasi Pertemuan 04 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter Topics Types of Regression Models
Regression Diagnostics - I
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
1 Simple Linear Regression and Correlation Chapter 17.
Introduction to Probability and Statistics Linear Regression and Correlation.
SIMPLE LINEAR REGRESSION
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
1 Simple Linear Regression Chapter Introduction In Chapters 17 to 19 we examine the relationship between interval variables via a mathematical.
Business Statistics - QBM117 Statistical inference for regression.
Lecture 19 Simple linear regression (Review, 18.5, 18.8)
Simple Linear Regression and Correlation
Chapter 7 Forecasting with Simple Regression
Introduction to Regression Analysis, Chapter 13,
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Simple Linear Regression. Introduction In Chapters 17 to 19, we examine the relationship between interval variables via a mathematical equation. The motivation.
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
Keller: Stats for Mgmt & Econ, 7th Ed
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Correlation and Linear Regression
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Chapter 13: Inference in Regression
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Keller: Stats for Mgmt & Econ, 7th Ed
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Economics 173 Business Statistics Lectures Summer, 2001 Professor J. Petry.
Lecture 10: Correlation and Regression Model.
Chapter 8: Simple Linear Regression Yang Zhenlin.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Statistics for Managers Using Microsoft® Excel 5th Edition
Chapter 12 Simple Linear Regression n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n Testing.
Linear Regression Linear Regression. Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Purpose Understand Linear Regression. Use R functions.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 17 Simple Linear Regression and Correlation.
1 Simple Linear Regression Review 1. review of scatterplots and correlation 2. review of least squares procedure 3. inference for least squares lines.
Simple Linear Regression and Correlation (Continue..,) Reference: Chapter 17 of Statistics for Management and Economics, 7 th Edition, Gerald Keller. 1.
1 Simple Linear Regression Chapter Introduction In Chapters 17 to 19 we examine the relationship between interval variables via a mathematical.
Warm-Up The least squares slope b1 is an estimate of the true slope of the line that relates global average temperature to CO2. Since b1 = is very.
Inference for Least Squares Lines
Linear Regression.
Statistics for Managers using Microsoft Excel 3rd Edition
Simple Linear Regression Review 1
Keller: Stats for Mgmt & Econ, 7th Ed
Keller: Stats for Mgmt & Econ, 7th Ed Linear Regression Analysis
SIMPLE LINEAR REGRESSION
Presentation transcript:

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 17 Simple Linear Regression and Correlation

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Linear Regression Analysis… Regression analysis is used to predict the value of one variable (the dependent variable) on the basis of other variables (the independent variables). Dependent variable: denoted Y Independent variables: denoted X 1, X 2, …, X k If we only have ONE independent variable, the model is which is referred to as simple linear regression. We would be interested in estimating β 0 and β 1 from the data we collect.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Linear Regression Analysis Variables: X = Independent Variable (we provide this) Y = Dependent Variable (we observe this) Parameters: β 0 = Y-Intercept β 1 = Slope ε ~ Normal Random Variable ( μ ε = 0, σ ε = ???) [Noise]

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Effect of Larger Values of σ ε House size House Price 25K$ Same square footage, but different price points (e.g. décor options, cabinet upgrades, lot location…) Lower vs. Higher Variability House Price = 25, (Size) +

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Theoretical Linear Model

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Building the Model – Collect Data Test 2 Grade = β 0 +β1*(Test 1 Grade) From Data: Estimate β 0 Estimate β 1 Estimate σ ε

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Linear Regression Analysis…

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Correlation Analysis… “-1 <  < 1” If we are interested only in determining whether a relationship exists, we employ correlation analysis. Example: Student’s height and weight.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Correlation Analysis… “-1 <  < 1” If the correlation coefficient is close to +1 that means you have a strong positive relationship. If the correlation coefficient is close to -1 that means you have a strong negative relationship. If the correlation coefficient is close to 0 that means you have no correlation. WE HAVE THE ABILITY TO TEST THE HYPOTHESIS H 0 :  = 0

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Regression: Model Types… X=size of house, Y=cost of house Deterministic Model: an equation or set of equations that allow us to fully determine the value of the dependent variable from the values of the independent variables. y = $25,000 + (75$/ft 2 )(x) Area of a circle: A =  *r 2 Probabilistic Model: a method used to capture the randomness that is part of a real-life process. y = 25, x + ε E.g. do all houses of the same size (measured in square feet) sell for exactly the same price?

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Simple Linear Regression Model… Meaning of and > 0 [positive slope] < 0 [negative slope] y x run rise =slope (=rise/run) =y-intercept

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Which line has the best “fit” to the data? ? ? ?

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Estimating the Coefficients… In much the same way we base estimates of on, we estimate with b 0 and with b 1, the y-intercept and slope (respectively) of the least squares or regression line given by: (This is an application of the least squares method and it produces a straight line that minimizes the sum of the squared differences between the points and the line)

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Least Squares Line… This line minimizes the sum of the squared differences between the points and the line… …but where did the line equation come from? How did we get.934 for a y-intercept and for slope?? these differences are called residuals or errors

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Least Squares Line…[sure glad we have computers now!] The coefficients b 1 and b 0 for the least squares line… …are calculated as:

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Data Statistics Information Data Points: xy y = x Least Squares Line… See if you can estimate Y-intercept and slope from this data Recall…

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Least Squares Line… See if you can estimate Y-intercept and slope from this data

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Excel: Data Analysis - Regression

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Excel: Plotted Regression Model – You will need to play around with this to get the plot to look “Good”

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Required Conditions… For these regression methods to be valid the following four conditions for the error variable ( ) must be met: The probability distribution of is normal. The mean of the distribution is 0; that is, E( ) = 0. The standard deviation of is, which is a constant regardless of the value of x. The value of associated with any particular value of y is independent of associated with any other value of y.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Assessing the Model… The least squares method will always produce a straight line, even if there is no relationship between the variables, or if the relationship is something other than linear. Hence, in addition to determining the coefficients of the least squares line, we need to assess it to see how well it “fits” the data. We’ll see these evaluation methods now. They’re based on the what is called sum of squares for errors (SSE).

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Sum of Squares for Error (SSE – another thing to calculate)… The sum of squares for error is calculated as: and is used in the calculation of the standard error of estimate: If is zero, all the points fall on the regression line.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Standard Error… If is small, the fit is excellent and the linear model should be used for forecasting. If is large, the model is poor… But what is small and what is large?

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Standard Error… Judge the value of by comparing it to the sample mean of the dependent variable ( ). In this example, =.3265 and = so (relatively speaking) it appears to be “small”, hence our linear regression model of car price as a function of odometer reading is “good”.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Testing the Slope…Excel output does this for you. If no linear relationship exists between the two variables, we would expect the regression line to be horizontal, that is, to have a slope of zero. We want to see if there is a linear relationship, i.e. we want to see if the slope ( ) is something other than zero. Our research hypothesis becomes: H 1 : ≠ 0 Thus the null hypothesis becomes: H 0 : = 0 Already discussed!

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Testing the Slope… We can implement this test statistic to try our hypotheses: H0: β 1 = 0 where is the standard deviation of b 1, defined as: If the error variable ( ) is normally distributed, the test statistic has a Student t-distribution with n–2 degrees of freedom. The rejection region depends on whether or not we’re doing a one- or two- tail test (two-tail test is most typical).

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Example 17.4… Test to determine if the slope is significantly different from “0” (at 5% significance level) We want to test: H 1 : ≠ 0 H 0 : = 0 (if the null hypothesis is true, no linear relationship exists) The rejection region is: OR check the p-value.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Example 17.4… We can compute t manually or refer to our Excel output… We see that the t statistic for “odometer” (i.e. the slope, b 1 ) is –13.49 which is greater than t Critical = – We also note that the p-value is There is overwhelming evidence to infer that a linear relationship between odometer reading and price exists. COMPUTE Compare p-value

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Testing the Slope… We can also estimate (to some level of confidence) and interval for the slope parameter,. Recall that your estimate for is b 1. The confidence interval estimator is given as: Hence: That is, we estimate that the slope coefficient lies between –.0768 and –.0570

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Coefficient of Determination… Tests thus far have shown if a linear relationship exists; it is also useful to measure the strength of the relationship. This is done by calculating the coefficient of determination – R 2. The coefficient of determination is the square of the coefficient of correlation (r), hence R 2 = (r) 2 r will be computed shortly and this is true for models with only 1 indepenent variable

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Coefficient of Determination R 2 has a value of This means 64.83% of the variation in the auction selling prices (y) is explained by your regression model. The remaining 35.17% is unexplained, i.e. due to error. Unlike the value of a test statistic, the coefficient of determination does not have a critical value that enables us to draw conclusions. In general the higher the value of R 2, the better the model fits the data. R 2 = 1: Perfect match between the line and the data points. R 2 = 0: There are no linear relationship between x and y.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Remember Excel’s Output… An analysis of variance (ANOVA) table for the simple linear regression model can be give by: Source degrees of freedom Sums of Squares Mean Squares F-Statistic Regression1SSR MSR = SSR/1 F=MSR/MSE Errorn–2SSE MSE = SSE/(n–2) Totaln–1 Variation in y (SST)

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Using the Regression Equation… We could use our regression equation: y = –.0669x to predict the selling price of a car with 40 (40,000) miles on it: y = –.0669x = –.0669(40) = 14, 574 We call this value ($14,574) a point prediction (estimate). Chances are though the actual selling price will be different, hence we can estimate the selling price in terms of a confidence interval.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Prediction Interval The prediction interval is used when we want to predict one particular value of the dependent variable, given a specific value of the independent variable: (x g is the given value of x we’re interested in)

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Confidence Interval Estimator for Mean of Y… The confidence interval estimate for the expected value of y (Mean of Y) is used when we want to predict an interval we are pretty sure contains the true “regression line”. In this case, we are estimating the mean of y given a value of x: (Technically this formula is used for infinitely large populations. However, we can interpret our problem as attempting to determine the average selling price of all Ford Tauruses, all with 40,000 miles on the odometer)

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc What’s the Difference? Prediction Interval Confidence Interval 1no 1 Used to estimate the value of one value of y (at given x) Used to estimate the mean value of y (at given x) The confidence interval estimate of the expected value of y will be narrower than the prediction interval for the same given value of x and confidence level. This is because there is less error in estimating a mean value as opposed to predicting an individual value.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Regression Diagnostics… There are three conditions that are required in order to perform a regression analysis. These are: The error variable must be normally distributed, The error variable must have a constant variance, & The errors must be independent of each other. How can we diagnose violations of these conditions?  Residual Analysis, that is, examine the differences between the actual data points and those predicted by the linear equation…

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Nonnormality… We can take the residuals and put them into a histogram to visually check for normality… …we’re looking for a bell shaped histogram with the mean close to zero [our old “test for normality].

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Heteroscedasticity… When the requirement of a constant variance is violated, we have a condition of heteroscedasticity. We can diagnose heteroscedasticity by plotting the residual against the predicted y.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Heteroscedasticity… If the variance of the error variable ( ) is not constant, then we have “heteroscedasticity”. Here’s the plot of the residual against the predicted value of y: there doesn’t appear to be a change in the spread of the plotted points, therefore no heteroscedasticity

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Nonindependence of the Error Variable If we were to observe the auction price of cars every week for, say, a year, that would constitute a time series. When the data are time series, the errors often are correlated. Error terms that are correlated over time are said to be autocorrelated or serially correlated. We can often detect autocorrelation by graphing the residuals against the time periods. If a pattern emerges, it is likely that the independence requirement is violated.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Nonindependence of the Error Variable Patterns in the appearance of the residuals over time indicates that autocorrelation exists: Note the runs of positive residuals, replaced by runs of negative residuals Note the oscillating behavior of the residuals around zero.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Outliers… Problem worked earlier An outlier is an observation that is unusually small or unusually large. E.g. our used car example had odometer readings from 19.1 to 49.2 thousand miles. Suppose we have a value of only 5,000 miles (i.e. a car driven by an old person only on Sundays ) — this point is an outlier.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Outliers… Possible reasons for the existence of outliers include: There was an error in recording the value The point should not have been included in the sample * Perhaps the observation is indeed valid. Outliers can be easily identified from a scatter plot. If the absolute value of the standard residual is > 2, we suspect the point may be an outlier and investigate further. They need to be dealt with since they can easily influence the least squares line…

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Procedure for Regression Diagnostics… 1.Develop a model that has a theoretical basis. 2.Gather data for the two variables in the model. 3.Draw the scatter diagram to determine whether a linear model appears to be appropriate. Identify possible outliers. 4.Determine the regression equation. 5.Calculate the residuals and check the required conditions 6.Assess the model’s fit. 7.If the model fits the data, use the regression equation to predict a particular value of the dependent variable and/or estimate its mean.