Chapter 4 Part I - Introduction to Simple Linear Regression Applied Management Science for Decision Making, 2e © 2014 Pearson Learning Solutions Philip A. Vaccaro, PhD MGMT E-5070
Linear Regression Viewing the relationships that exist between two or more numerical variables Level of Education and Income Square Footage and House Price Advertising Expenditures and Sales Volume Outside Temperature and Heating Bills
Introduction Regression analysis is used primarily for the purposes of prediction. The goal is the development of a statistical model that can be used to predict the values of a dependent or response variable, based on the values of at least one explanatory or independent variable. 1 st ‘X’ Variable 2 nd ‘X’ Variable ‘Y’ Variable Here, education and income determine level of prestige
Introduction Simple Linear Regression A regression model that uses a single numerical independent variable ‘ X ‘ to predict the numerical dependent variable ‘ Y ‘ Multiple Regression A regression model that uses several independent variables, X 1, X 2, X 3 ……X p, to predict a numerical dependent variable ‘ Y ‘
The Linear Relationship Y X βoβo β1β1 0 ΔX = change in X ΔY = change in Y In this model the slope of line β 1 represents the expected change in Y, per unit change in X. It represents the average amount that Y changes ( positively or negatively ) for a particular unit change in X. The Y intercept, βo, represents the average value of Y when X equals zero. INDEPENDENT VARIABLE DEPENDENT VARIABLE The Slope of the Regression Line Applied Management Science for Decision Making, 2e © 2014 Pearson Learning Solutions
Simple Linear Regression Model Y i = β o + β 1 X i + ε i where β o = Y intercept for the population β 1 = slope for the population e i = random error in Y for observation ‘ i ‘ Y i = the predicted value of Y ( ‘Y- hat’ ) ^ Error is defined as ( actual value ) – ( predicted value ), that is, ε = Y - Y ^ ^ This is the population based simple regression model
Scatter diagrams are used to plot the relationship between an ‘ X ‘ variable on the horizontal axis and a ‘ Y ‘ variable on the vertical axis. The nature of the relationship can take on many forms, ranging from simple ones to extremely complicated mathematical functions. ScatterDiagrams ‘X’ variable ‘Y’ variable As the poverty index decreases, the mathematics passing rate increases ! NYS Math Regents Pass Rate and School District Poverty, 2014
Simple Regression Analysis The marketing department of a retail chain wishes to determine if there is a relationship between sales volume for a product and the number of radio ads broadcast in a sample of ten large cities. The data are given as follows: Problem Statement
Regression Using Excel
Microsoft EXCEL 2010 Worksheet Layout Applied Management Science for Decision Making, 2e © 2014 Pearson Learning Solutions
Regression Calculations CityXi ( Ads )Yi ( Sales )XiXi YiYi XiYiXiYi A118, ,000,00088,000 B75, ,000,00035,000 C129, ,000,000108,000 D84, ,000,00032,000 E108, ,000,00080,000 F1310, ,000,000130,000 G85, ,000,00040,000 H107, ,000,00070,000 I1410, ,000,000140,000 J74, ,000,00028,000 Σ = 100Σ = 70,000Σ = 1,056 Σ = 540,000,000 Σ = 751, RED THE CALCULATIONS IN RED ARE USED IN THE FORMULAS _ _ X = 10Y = 7,000 n = 10
The Basic Calculations n = 10 X = 10 Y = 7,000 __ Σ Xi = 100 Σ Xi = 1,056 Σ Yi = 70,000 Σ Yi = 540,000,000 Σ XiYi = 751, n n n n n i = 1
Requirements 1.Set up the scatter diagram. 2.Assuming a linear relationship, use the least squares method to compute the regression coefficients b o and b 1 3.Interpret the meaning of the intercept b o and the slope b 1 4.Predict the annual sales volume in a city in which eight ( 8 ) ads are broadcast daily. 5.Compute the standard error of the estimate. 6.Compute the coefficient of determination and interpret.
Requirements 7. Compute the coefficient of correlation ( r ). 8. Set up a 95% and 99% confidence interval estimate of the average annual sales volume in a city in which eight ( 8 ) ads are broadcast daily. 9. At the a =.01 and.05 level of significance, is there a relationship between sales volume and the number of radio ads broadcast? 10. Set up the 99% confidence interval estimate of the true slope. 11. Discuss why you should not predict annual sales volume in a city which has fewer than 7 broadcasts daily or more than 14 daily.
Requirements 1.Set up the scatter diagram. 2.Assuming a linear relationship, use the least squares method to compute the regression coefficients b o and b 1 3.Interpret the meaning of the intercept bo and the slope b 1 4.Predict the annual sales volume in a city in which eight ( 8 ) ads are broadcast daily. 5.Compute the standard error of the estimate. 6.Compute the coefficient of determination and interpret.
The Scatter Diagram ,000 8,000 6,000 4,000 2,000 0 Y X radio ads versus sales volume ( in units ) More radio ads seem to result in more sales. This positive linear relationship is not perfect, but there is a relationship.
The Linear Relationship ,000 8,000 6,000 4,000 2,000 0 Y X radio ads versus sales volume ( in units ) The points do not all lie on the regression line, so there would be some error involved if we tried to predict sales based on radio ads broadcast, using this, or any other line.
The Linear Relationship ,000 8,000 6,000 4,000 2,000 0 Y X radio ads versus sales volume ( in units ) The best regression line will be the one that minimizes the sum of the squared * errors * * Error = the difference between each actual and predicted value of Y ? RADIO ADS BROADCAST DAILY UNITS SOLD
Requirements 1.Set up the scatter diagram. 2.Assuming a linear relationship, use the least squares method to compute the regression coefficients b o and b 1 3.Interpret the meaning of the intercept bo and the slope b 1 4.Predict the annual sales volume in a city in which eight ( 8 ) ads are broadcast daily. 5.Compute the standard error of the estimate. 6.Compute the coefficient of determination and interpret.
Regression Calculations CityXi ( Ads )Yi ( Sales )XiXi YiYi XiYiXiYi A118, ,000,00088,000 B75, ,000,00035,000 C129, ,000,000108,000 D84, ,000,00032,000 E108, ,000,00080,000 F1310, ,000,000130,000 G85, ,000,00040,000 H107, ,000,00070,000 I1410, ,000,000140,000 J74, ,000,00028,000 Σ = 100Σ = 70,000Σ = 1,056 Σ = 540,000,000 Σ = 751, RED THE CALCULATIONS IN RED ARE USED IN THE FORMULAS __ X = 10Y = 7,000 n = 10
The Basic Calculations n = 10 X = 10 Y = 7,000 __ Σ Xi = 100 Σ Xi = 1,056 Σ Yi = 70,000 Σ Yi = 540,000,000 Σ XiYi = 751, n n n n n i = 1 NEEDED
The Slope of the Regression Line Σ XiYi - Σ Xi Σ Yi Σ Xi - Σ Xi b = (10)(751,000) – (100)(70,000) (10)(1,056) – (100) 7,510,000 – 7,000,000 10,560 – 10, , === n n nn n nn1 i = 1 =
Slope of the Regression Line b 1 = units
Regression Calculations CityXi ( Ads )Yi ( Sales )XiXi YiYi XiYiXiYi A118, ,000,00088,000 B75, ,000,00035,000 C129, ,000,000108,000 D84, ,000,00032,000 E108, ,000,00080,000 F1310, ,000,000130,000 G85, ,000,00040,000 H107, ,000,00070,000 I1410, ,000,000140,000 J74, ,000,00028,000 Σ = 100Σ = 70,000Σ = 1,056 Σ = 540,000,000 Σ = 751, RED THE CALCULATIONS IN RED ARE USED IN THE FORMULAS __ X = 10 Y = 7,000 n = 10
The Basic Calculations n = 10 X = 10 Y = 7,000 __ Σ Xi = 100 Σ Xi = 1,056 Σ Yi = 70,000 Σ Yi = 540,000,000 Σ XiYi = 751, n n n n n i = 1 NEEDED
b = Y – b X = 7,000 – ( ) ( 10 ) = 7,000 – 9, = - 2, __ Y X b o o 1 The ‘ Y ‘ Intercept The average value of ‘ Y ’ when ‘ X ‘ equals zero
The Y Intercept b o = - 2,107.14
Requirements 1.Set up the scatter diagram. 2.Assuming a linear relationship, use the least squares method to compute the regression coefficients b o and b 1 3.Interpret the meaning of the intercept bo and the slope b 1 4.Predict the annual sales volume in a city in which eight ( 8 ) ads are broadcast daily. 5.Compute the standard error of the estimate. 6.Compute the coefficient of determination and interpret.
Regression Coefficient Interpretation – 2, The Y intercept is “ – 2, “. It represents the value of ‘Y’ when ‘X’ equals zero. We can interpret it as the portion of annual sales that varies with factors other than radio ads broadcast The slope is “ “. This means that for each unit increase in radio ads broad- cast, the number of annual sales will rise by units.
Requirements 1.Set up the scatter diagram. 2.Assuming a linear relationship, use the least squares method to compute the regression coefficients b o and b 1 3.Interpret the meaning of the intercept bo and the slope b 1 4.Predict the annual sales volume in a city in which eight ( 8 ) ads are broadcast daily. 5.Compute the standard error of the estimate. 6.Compute the coefficient of determination and interpret.
Prediction Yi = b + b Xi o1 ^ = - 2, ( ) ( 8 ) = - 2, , = 5, The predicted average number of annual sales is 5, when 8 radio ads are broadcast daily. SAMPLE-BASED SIMPLE LINEAR REGRESSION MODEL
Chapter 4 Part I - Introduction to Simple Linear Regression Applied Management Science for Decision Making, 2e © 2014 Pearson Learning Solutions Philip A. Vaccaro, PhD