Simple Linear Regression. Chapter Topics Types of Regression Models Determining the Simple Linear Regression Equation Measures of Variation Assumptions.

Slides:



Advertisements
Similar presentations
Korelasi Diri (Auto Correlation) Pertemuan 15 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Advertisements

Forecasting Using the Simple Linear Regression Model and Correlation
© 2001 Prentice-Hall, Inc.Chap 13-1 BA 201 Lecture 21 Autocorrelation and Inferences about the Slope.
Regresi dan Analisis Varians Pertemuan 21 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter 14 Introduction to Multiple Regression
Korelasi Ganda Dan Penambahan Peubah Pertemuan 13 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter 12 Simple Regression
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 11 th Edition.
Interaksi Dalam Regresi (Lanjutan) Pertemuan 25 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Regresi dan Rancangan Faktorial Pertemuan 23 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
© 2000 Prentice-Hall, Inc. Chap Multiple Regression Models.
Multiple Regression Models. The Multiple Regression Model The relationship between one dependent & two or more independent variables is a linear function.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
© 2003 Prentice-Hall, Inc.Chap 14-1 Basic Business Statistics (9 th Edition) Chapter 14 Introduction to Multiple Regression.
1 Pertemuan 13 Uji Koefisien Korelasi dan Regresi Matakuliah: A0392 – Statistik Ekonomi Tahun: 2006.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 13-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 11 th Edition.
Pengujian Parameter Koefisien Korelasi Pertemuan 04 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter 12 Simple Linear Regression
Chapter Topics Types of Regression Models
Linear Regression and Correlation Analysis
© 2004 Prentice-Hall, Inc.Chap 14-1 Basic Business Statistics (9 th Edition) Chapter 14 Introduction to Multiple Regression.
Linear Regression Example Data
Ch. 14: The Multiple Regression Model building
Korelasi dalam Regresi Linear Sederhana Pertemuan 03 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Pertemua 19 Regresi Linier
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 10 th Edition.
Chapter 7 Forecasting with Simple Regression
Introduction to Regression Analysis, Chapter 13,
Chapter 13 Simple Linear Regression
Statistics for Managers Using Microsoft Excel 3rd Edition
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
© 2001 Prentice-Hall, Inc.Chap 14-1 BA 201 Lecture 23 Correlation Analysis And Introduction to Multiple Regression (Data)Data.
Chapter 8 Forecasting with Multiple Regression
Chapter 13 Simple Linear Regression
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Purpose of Regression Analysis Regression analysis is used primarily to model causality and provide prediction –Predicts the value of a dependent (response)
© 2003 Prentice-Hall, Inc.Chap 11-1 Business Statistics: A First Course (3 rd Edition) Chapter 11 Multiple Regression.
Lecture 14 Multiple Regression Model
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Chapter 14 Introduction to Multiple Regression
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
© 2001 Prentice-Hall, Inc.Chap 9-1 BA 201 Lecture 22 Estimation of Predicted Values.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Chap 14-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics.
Lecture 10: Correlation and Regression Model.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 14-1 Chapter 14 Introduction to Multiple Regression Statistics for Managers using Microsoft.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice- Hall, Inc. Chap 14-1 Business Statistics: A Decision-Making Approach 6 th Edition.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Statistics for Managers Using Microsoft® Excel 5th Edition
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Chapter 12 Simple Linear Regression.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
© 2000 Prentice-Hall, Inc. Chap Chapter 10 Multiple Regression Models Business Statistics A First Course (2nd Edition)
Conceptual Foundations © 2008 Pearson Education Australia Lecture slides for this course are based on teaching materials provided/referred by: (1) Statistics.
Chapter 13 Simple Linear Regression
Chapter 14 Introduction to Multiple Regression
Lecture 24 Multiple Regression Model And Residual Analysis
Statistics for Managers using Microsoft Excel 3rd Edition
Multiple Regression Analysis and Model Building
Chapter 13 Simple Linear Regression
PENGOLAHAN DAN PENYAJIAN
Pemeriksaan Sisa dan Data Berpengaruh Pertemuan 17
Korelasi Parsial dan Pengontrolan Parsial Pertemuan 14
Presentation transcript:

Simple Linear Regression

Chapter Topics Types of Regression Models Determining the Simple Linear Regression Equation Measures of Variation Assumptions of Regression and Correlation Residual Analysis Measuring Autocorrelation Inferences about the Slope

Chapter Topics Correlation - Measuring the Strength of the Association Estimation of Mean Values and Prediction of Individual Values Pitfalls in Regression and Ethical Issues (continued)

Purpose of Regression Analysis Regression Analysis is Used Primarily to Model Causality and Provide Prediction Predict the values of a dependent (response) variable based on values of at least one independent (explanatory) variable Explain the effect of the independent variables on the dependent variable

Types of Regression Models Positive Linear Relationship Negative Linear Relationship Relationship NOT Linear No Relationship

Simple Linear Regression Model Relationship between Variables is Described by a Linear Function The Change of One Variable Causes the Other Variable to Change A Dependency of One Variable on the Other

Population Regression Line (Conditional Mean) Simple Linear Regression Model average value (conditional mean) Population regression line is a straight line that describes the dependence of the average value (conditional mean) of one variable on the other Population Y Intercept Population Slope Coefficient Random Error Dependent (Response) Variable Independent (Explanatory) Variable (continued)

Simple Linear Regression Model (continued) = Random Error Y X (Observed Value of Y) = Observed Value of Y (Conditional Mean)

estimate Sample regression line provides an estimate of the population regression line as well as a predicted value of Y Linear Regression Equation Sample Y Intercept Sample Slope Coefficient Residual Simple Regression Equation (Fitted Regression Line, Predicted Value)

Linear Regression Equation and are obtained by finding the values of and that minimize the sum of the squared residuals estimate provides an estimate of (continued)

Linear Regression Equation (continued) Y X Observed Value

Interpretation of the Slope and Intercept is the average value of Y when the value of X is zero measures the change in the average value of Y as a result of a one-unit change in X

Interpretation of the Slope and Intercept estimated is the estimated average value of Y when the value of X is zero estimated is the estimated change in the average value of Y as a result of a one-unit change in X (continued)

Simple Linear Regression: Example You wish to examine the linear dependency of the annual sales of produce stores on their sizes in square footage. Sample data for 7 stores were obtained. Find the equation of the straight line that fits the data best. Annual Store Square Sales Feet($1000) 1 1,726 3, ,542 3, ,816 6, ,555 9, ,292 3, ,208 5, ,313 3,760

Scatter Diagram: Example Excel Output

Simple Linear Regression Equation: Example From Excel Printout:

Graph of the Simple Linear Regression Equation: Example Y i = X i 

Interpretation of Results: Example The slope of means that for each increase of one unit in X, we predict the average of Y to increase by an estimated units. The equation estimates that for each increase of 1 square foot in the size of the store, the expected annual sales are predicted to increase by $1487.

Simple Linear Regression in PHStat In Excel, use PHStat | Regression | Simple Linear Regression … Excel Spreadsheet of Regression Sales on Footage

Measures of Variation: The Sum of Squares SST = SSR + SSE Total Sample Variability = Explained Variability + Unexplained Variability

Measures of Variation: The Sum of Squares SST = Total Sum of Squares Measures the variation of the Y i values around their mean, SSR = Regression Sum of Squares Explained variation attributable to the relationship between X and Y SSE = Error Sum of Squares Variation attributable to factors other than the relationship between X and Y (continued)

Measures of Variation: The Sum of Squares (continued) XiXi Y X Y SST =  (Y i - Y) 2 SSE =  (Y i - Y i ) 2  SSR =  (Y i - Y) 2   _ _ _

Venn Diagrams and Explanatory Power of Regression Sales Sizes Variations in Sales explained by Sizes or variations in Sizes used in explaining variation in Sales Variations in Sales explained by the error term or unexplained by Sizes Variations in store Sizes not used in explaining variation in Sales

The ANOVA Table in Excel ANOVA dfSSMSF Significance F RegressionkSSR MSR =SSR/k MSR/MSE P-value of the F Test Residualsn-k-1SSE MSE =SSE/(n-k-1) Totaln-1SST

Measures of Variation The Sum of Squares: Example Excel Output for Produce Stores SSR SSE Regression (explained) df Degrees of freedom Error (residual) df Total df SST

The Coefficient of Determination Measures the proportion of variation in Y that is explained by the independent variable X in the regression model

Venn Diagrams and Explanatory Power of Regression Sales Sizes

Coefficients of Determination (r 2 ) and Correlation (r) r 2 = 1, r 2 =.81, r 2 = 0, Y Y i =b 0 +b 1 X i X ^ Y Y i =b 0 +b 1 X i X ^ Y Y i =b 0 +b 1 X i X ^ Y Y i =b 0 +b 1 X i X ^ r = +1 r = -1 r = +0.9 r = 0

Standard Error of Estimate Measures the standard deviation (variation) of the Y values around the regression equation

Measures of Variation: Produce Store Example Excel Output for Produce Stores r 2 =.94 94% of the variation in annual sales can be explained by the variability in the size of the store as measured by square footage. S yx n

Linear Regression Assumptions Normality Y values are normally distributed for each X Probability distribution of error is normal Homoscedasticity (Constant Variance) Independence of Errors

Consequences of Violation of the Assumptions Violation of the Assumptions Non-normality (error not normally distributed) Heteroscedasticity (variance not constant) Usually happens in cross-sectional data Autocorrelation (errors are not independent) Usually happens in time-series data Consequences of Any Violation of the Assumptions Predictions and estimations obtained from the sample regression line will not be accurate Hypothesis testing results will not be reliable It is Important to Verify the Assumptions

Y values are normally distributed around the regression line. For each X value, the “spread” or variance around the regression line is the same. Variation of Errors Around the Regression Line X1X1 X2X2 X Y f(e) Sample Regression Line

Residual Analysis Purposes Examine linearity Evaluate violations of assumptions Graphical Analysis of Residuals Plot residuals vs. X and time

Residual Analysis for Linearity Not Linear Linear X e e X Y X Y X

Residual Analysis for Homoscedasticity Heteroscedasticity Homoscedasticity SR X X Y X X Y

Residual Analysis: Excel Output for Produce Stores Example Excel Output

Residual Analysis for Independence The Durbin-Watson Statistic Used when data is collected over time to detect autocorrelation (residuals in one time period are related to residuals in another period) Measures violation of independence assumption Should be close to 2. If not, examine the model for autocorrelation.

Durbin-Watson Statistic in PHStat PHStat | Regression | Simple Linear Regression … Check the box for Durbin-Watson Statistic

Obtaining the Critical Values of Durbin-Watson Statistic Table 13.4 Finding Critical Values of Durbin-Watson Statistic

Accept H 0 (no autocorrelation) Using the Durbin-Watson Statistic : No autocorrelation (error terms are independent) : There is autocorrelation (error terms are not independent) 042 dLdL 4-d L dUdU 4-d U Reject H 0 (positive autocorrelation) Inconclusive Reject H 0 (negative autocorrelation)

Residual Analysis for Independence Not Independent Independent e e Time Residual is Plotted Against Time to Detect Any Autocorrelation No Particular PatternCyclical Pattern Graphical Approach

Inference about the Slope: t Test t Test for a Population Slope Is there a linear dependency of Y on X ? Null and Alternative Hypotheses H 0 :  1 = 0(no linear dependency) H 1 :  1  0(linear dependency) Test Statistic

Example: Produce Store Data for 7 Stores: Estimated Regression Equation: Annual Store Square Sales Feet($000) 1 1,726 3, ,542 3, ,816 6, ,555 9, ,292 3, ,208 5, ,313 3,760 The slope of this model is Does square footage affect annual sales?

Inferences about the Slope: t Test Example H 0 :  1 = 0 H 1 :  1  0  .05 df  = 5 Critical Value(s): Test Statistic: Decision: Conclusion: There is evidence that square footage affects annual sales. t Reject.025 From Excel Printout Reject H 0. p-value

Inferences about the Slope: Confidence Interval Example Confidence Interval Estimate of the Slope: Excel Printout for Produce Stores At 95% level of confidence, the confidence interval for the slope is (1.062, 1.911). Does not include 0. Conclusion: There is a significant linear dependency of annual sales on the size of the store.

Inferences about the Slope: F Test F Test for a Population Slope Is there a linear dependency of Y on X ? Null and Alternative Hypotheses H 0 :  1 = 0(no linear dependency) H 1 :  1  0(linear dependency) Test Statistic Numerator d.f.=1, denominator d.f.=n-2

Relationship between a t Test and an F Test Null and Alternative Hypotheses H 0 :  1 = 0(no linear dependency) H 1 :  1  0(linear dependency) The p –value of a t Test and the p –value of an F Test are Exactly the Same The Rejection Region of an F Test is Always in the Upper Tail

Inferences about the Slope: F Test Example Test Statistic: Decision: Conclusion: H 0 :  1 = 0 H 1 :  1  0  .05 numerator df = 1 denominator df  = 5 There is evidence that square footage affects annual sales. From Excel Printout Reject H Reject  =.05 p-value

Purpose of Correlation Analysis Correlation Analysis is Used to Measure Strength of Association (Linear Relationship) Between 2 Numerical Variables Only strength of the relationship is concerned No causal effect is implied

Purpose of Correlation Analysis Population Correlation Coefficient  (Rho) is Used to Measure the Strength between the Variables (continued)

Sample Correlation Coefficient r is an Estimate of  and is Used to Measure the Strength of the Linear Relationship in the Sample Observations Purpose of Correlation Analysis (continued)

r =.6r = 1 Sample Observations from Various r Values Y X Y X Y X Y X Y X r = -1 r = -.6r = 0

Features of  and r Unit Free Range between -1 and 1 The Closer to -1, the Stronger the Negative Linear Relationship The Closer to 1, the Stronger the Positive Linear Relationship The Closer to 0, the Weaker the Linear Relationship

Hypotheses H 0 :  = 0 (no correlation) H 1 :  0 (correlation) Test Statistic t Test for Correlation

Example: Produce Stores From Excel Printout r Is there any evidence of linear relationship between annual sales of a store and its square footage at.05 level of significance? H 0 :  = 0 (no association) H 1 :   0 (association)  .05 df  = 5

Example: Produce Stores Solution Reject.025 Critical Value(s): Conclusion: There is evidence of a linear relationship at 5% level of significance. Decision: Reject H 0. The value of the t statistic is exactly the same as the t statistic value for test on the slope coefficient.

Estimation of Mean Values Confidence Interval Estimate for : The Mean of Y Given a Particular X i t value from table with df=n-2 Standard error of the estimate Size of interval varies according to distance away from mean,

Prediction of Individual Values Prediction Interval for Individual Response Y i at a Particular X i Addition of 1 increases width of interval from that for the mean of Y

Interval Estimates for Different Values of X Y X Prediction Interval for a Individual Y i a given X Confidence Interval for the Mean of Y Y i = b 0 + b 1 X i 

Example: Produce Stores Y i = X i Data for 7 Stores: Regression Model Obtained:  Annual Store Square Sales Feet($000) 1 1,726 3, ,542 3, ,816 6, ,555 9, ,292 3, ,208 5, ,313 3,760 Consider a store with 2000 square feet.

Estimation of Mean Values: Example Find the 95% confidence interval for the average annual sales for stores of 2,000 square feet. Predicted Sales Y i = X i = ($000)  X = S YX = t n-2 = t 5 = Confidence Interval Estimate for

Prediction Interval for Y : Example Find the 95% prediction interval for annual sales of one particular store of 2,000 square feet. Predicted Sales Y i = X i = ($000)  X = S YX = t n-2 = t 5 = Prediction Interval for Individual

Estimation of Mean Values and Prediction of Individual Values in PHStat In Excel, use PHStat | Regression | Simple Linear Regression … Check the “Confidence and Prediction Interval for X=” box Excel Spreadsheet of Regression Sales on Footage

Pitfalls of Regression Analysis Lacking an Awareness of the Assumptions Underlining Least-Squares Regression Not Knowing How to Evaluate the Assumptions Not Knowing What the Alternatives to Least- Squares Regression are if a Particular Assumption is Violated Using a Regression Model Without Knowledge of the Subject Matter

Strategy for Avoiding the Pitfalls of Regression Start with a scatter plot of X on Y to observe possible relationship Perform residual analysis to check the assumptions Use a histogram, stem-and-leaf display, box- and-whisker plot, or normal probability plot of the residuals to uncover possible non- normality

Strategy for Avoiding the Pitfalls of Regression If there is violation of any assumption, use alternative methods (e.g., least absolute deviation regression or least median of squares regression) to least-squares regression or alternative least-squares models (e.g., curvilinear or multiple regression) If there is no evidence of assumption violation, then test for the significance of the regression coefficients and construct confidence intervals and prediction intervals (continued)

Chapter Summary Introduced Types of Regression Models Discussed Determining the Simple Linear Regression Equation Described Measures of Variation Addressed Assumptions of Regression and Correlation Discussed Residual Analysis Addressed Measuring Autocorrelation

Chapter Summary Described Inference about the Slope Discussed Correlation - Measuring the Strength of the Association Addressed Estimation of Mean Values and Prediction of Individual Values Discussed Pitfalls in Regression and Ethical Issues (continued)

Introduction to Multiple Regression

Chapter Topics The Multiple Regression Model Residual Analysis Testing for the Significance of the Regression Model Inferences on the Population Regression Coefficients Testing Portions of the Multiple Regression Model Dummy-Variables and Interaction Terms

Population Y-intercept Population slopesRandom error The Multiple Regression Model Relationship between 1 dependent & 2 or more independent variables is a linear function Dependent (Response) variable Independent (Explanatory) variables

Multiple Regression Model Bivariate model

Multiple Regression Equation Bivariate model Multiple Regression Equation

Too complicated by hand! Ouch!

Interpretation of Estimated Coefficients Slope ( b j ) Estimated that the average value of Y changes by b j for each 1 unit increase in X j, holding all other variables constant (ceterus paribus) Example: If b 1 = -2, then fuel oil usage ( Y ) is expected to decrease by an estimated 2 gallons for each 1 degree increase in temperature ( X 1 ), given the inches of insulation ( X 2 ) Y-Intercept ( b 0 ) The estimated average value of Y when all X j = 0

Multiple Regression Model: Example ( 0 F) Develop a model for estimating heating oil used for a single family home in the month of January, based on average temperature and amount of insulation in inches.

Multiple Regression Equation: Example Excel Output For each degree increase in temperature, the estimated average amount of heating oil used is decreased by gallons, holding insulation constant. For each increase in one inch of insulation, the estimated average use of heating oil is decreased by gallons, holding temperature constant.

Multiple Regression in PHStat PHStat | Regression | Multiple Regression … Excel spreadsheet for the heating oil example

Venn Diagrams and Explanatory Power of Regression Oil Temp Variations in Oil explained by Temp or variations in Temp used in explaining variation in Oil Variations in Oil explained by the error term Variations in Temp not used in explaining variation in Oil

Venn Diagrams and Explanatory Power of Regression Oil Temp (continued)

Venn Diagrams and Explanatory Power of Regression Oil Temp Insulation Overlapping variation NOT estimation Overlapping variation in both Temp and Insulation are used in explaining the variation in Oil but NOT in the estimation of nor NOT Variation NOT explained by Temp nor Insulation

Coefficient of Multiple Determination Proportion of Total Variation in Y Explained by All X Variables Taken Together Never Decreases When a New X Variable is Added to Model Disadvantage when comparing among models

Venn Diagrams and Explanatory Power of Regression Oil Temp Insulation

Adjusted Coefficient of Multiple Determination Proportion of Variation in Y Explained by All the X Variables Adjusted for the Sample Size and the Number of X Variables Used Penalizes excessive use of independent variables Smaller than Useful in comparing among models Can decrease if an insignificant new X variable is added to the model

Coefficient of Multiple Determination Excel Output Adjusted r 2  reflects the number of explanatory variables and sample size  is smaller than r 2

Interpretation of Coefficient of Multiple Determination 96.56% of the total variation in heating oil can be explained by temperature and amount of insulation 95.99% of the total fluctuation in heating oil can be explained by temperature and amount of insulation after adjusting for the number of explanatory variables and sample size

Simple and Multiple Regression Compared simple The slope coefficient in a simple regression picks up the impact of the independent variable plus the impacts of other variables that are excluded from the model, but are correlated with the included independent variable and the dependent variable multiple Coefficients in a multiple regression net out the impacts of other variables in the equation Hence, they are called the net regression coefficients They still pick up the effects of other variables that are excluded from the model, but are correlated with the included independent variables and the dependent variable

Simple and Multiple Regression Compared: Example Two Simple Regressions: Multiple Regression:

Simple and Multiple Regression Compared: Slope Coefficients

Simple and Multiple Regression Compared: r 2 

Example: Adjusted r 2 Can Decrease Adjusted r 2 decreases when k increases from 2 to 3 Color is not useful in explaining the variation in oil consumption.

Using the Regression Equation to Make Predictions Predict the amount of heating oil used for a home if the average temperature is 30 0 and the insulation is 6 inches. The predicted heating oil used is gallons.

Predictions in PHStat PHStat | Regression | Multiple Regression … Check the “Confidence and Prediction Interval Estimate” box Excel spreadsheet for the heating oil example

Residual Plots Residuals Vs May need to transform Y variable Residuals Vs May need to transform variable Residuals Vs May need to transform variable Residuals Vs Time May have autocorrelation

Residual Plots: Example No Discernable Pattern Maybe some non- linear relationship

Testing for Overall Significance Shows if Y Depends Linearly on All of the X Variables Together as a Group Use F Test Statistic Hypotheses: H 0 :      …  k = 0 (No linear relationship) H 1 : At least one  i  ( At least one independent variable affects Y ) The Null Hypothesis is a Very Strong Statement The Null Hypothesis is Almost Always Rejected

Testing for Overall Significance Test Statistic: Where F has k numerator and ( n-k-1 ) denominator degrees of freedom (continued)

Test for Overall Significance Excel Output: Example k = 2, the number of explanatory variables n - 1 p -value

Test for Overall Significance: Example Solution F H 0 :  1 =  2 = … =  k = 0 H 1 : At least one  j  0  =.05 df = 2 and 12 Critical Value : Test Statistic: Decision: Conclusion: Reject at  = There is evidence that at least one independent variable affects Y.  = 0.05 F  (Excel Output)

Test for Significance: Individual Variables Show If Y Depends Linearly on a Single X j Individually While Holding the Effects of Other X’ s Fixed Use t Test Statistic Hypotheses: H 0 :  j  0 (No linear relationship) H 1 :  j  0 (Linear relationship between X j and Y )

t Test Statistic Excel Output: Example t Test Statistic for X 1 (Temperature) t Test Statistic for X 2 (Insulation)

t Test : Example Solution H 0 :  1 = 0 H 1 :  1  0 df = 12 Critical Values: Test Statistic: Decision: Conclusion: Reject H 0 at  = There is evidence of a significant effect of temperature on oil consumption holding constant the effect of insulation. t Reject H Does temperature have a significant effect on monthly consumption of heating oil? Test at  = t Test Statistic =

Venn Diagrams and Estimation of Regression Model Oil Temp Insulation Only this information is used in the estimation of This information is NOT used in the estimation of nor

Confidence Interval Estimate for the Slope Provide the 95% confidence interval for the population slope  1 (the effect of temperature on oil consumption)   1  We are 95% confident that the estimated average consumption of oil is reduced by between 4.7 gallons to 6.17 gallons per each increase of 1 0 F holding insulation constant. We can also perform the test for the significance of individual variables, H 0 :  1 = 0 vs. H 1 :  1  0, using this confidence interval.

Contribution of a Single Independent Variable Let X j Be the Independent Variable of Interest Measures the additional contribution of X j in explaining the total variation in Y with the inclusion of all the remaining independent variables

Contribution of a Single Independent Variable Measures the additional contribution of X 1 in explaining Y with the inclusion of X 2 and X 3. From ANOVA section of regression for

Coefficient of Partial Determination of Measures the proportion of variation in the dependent variable that is explained by X j while controlling for (holding constant) the other independent variables

Coefficient of Partial Determination for (continued) Example: Model with two independent variables

Venn Diagrams and Coefficient of Partial Determination for Oil Temp Insulation =

Coefficient of Partial Determination in PHStat PHStat | Regression | Multiple Regression … Check the “Coefficient of Partial Determination” box Excel spreadsheet for the heating oil example

Contribution of a Subset of Independent Variables Let X s Be the Subset of Independent Variables of Interest Measures the contribution of the subset X s in explaining SST with the inclusion of the remaining independent variables

Contribution of a Subset of Independent Variables: Example Let X s be X 1 and X 3 From ANOVA section of regression for

Testing Portions of Model Examines the Contribution of a Subset X s of Explanatory Variables to the Relationship with Y Null Hypothesis: Variables in the subset do not improve the model significantly when all other variables are included Alternative Hypothesis: At least one variable in the subset is significant when all other variables are included

Testing Portions of Model One-Tailed Rejection Region Requires Comparison of Two Regressions One regression includes everything Another regression includes everything except the portion to be tested (continued)

Partial F Test for the Contribution of a Subset of X Variables Hypotheses: H 0 : Variables X s do not significantly improve the model given all other variables included H 1 : Variables X s significantly improve the model given all others included Test Statistic: with df = m and ( n-k-1 ) m = # of variables in the subset X s

Partial F Test for the Contribution of a Single Hypotheses: H 0 : Variable X j does not significantly improve the model given all others included H 1 : Variable X j significantly improves the model given all others included Test Statistic: with df = 1 and ( n-k-1 ) m = 1 here

Testing Portions of Model: Example Test at the  =.05 level to determine if the variable of average temperature significantly improves the model, given that insulation is included.

Testing Portions of Model: Example H 0 : X 1 (temperature) does not improve model with X 2 (insulation) included H 1 : X 1 does improve model  =.05, df = 1 and 12 Critical Value = 4.75 (For X 1 and X 2 )(For X 2 ) Conclusion: Reject H 0 ; X 1 does improve model.

Testing Portions of Model in PHStat PHStat | Regression | Multiple Regression … Check the “Coefficient of Partial Determination” box Excel spreadsheet for the heating oil example

Do We Need to Do This for One Variable? The F Test for the Contribution of a Single Variable After All Other Variables are Included in the Model is IDENTICAL to the t Test of the Slope for that Variable The Only Reason to Perform an F Test is to Test Several Variables Together

Dummy-Variable Models Categorical Explanatory Variable with 2 or More Levels Yes or No, On or Off, Male or Female, Use Dummy-Variables (Coded as 0 or 1) Only Intercepts are Different Assumes Equal Slopes Across Categories The Number of Dummy-Variables Needed is (# of Levels - 1) Regression Model Has Same Form:

Dummy-Variable Models (with 2 Levels) Given: Y = Assessed Value of House X 1 = Square Footage of House X 2 = Desirability of Neighborhood = Desirable ( X 2 = 1) Undesirable ( X 2 = 0) 0 if undesirable 1 if desirable Same slopes

Undesirable Desirable Location Dummy-Variable Models (with 2 Levels) (continued) X 1 (Square footage) Y (Assessed Value) b 0 + b 2 b0b0 Same slopes Intercepts different

Interpretation of the Dummy- Variable Coefficient (with 2 Levels) Example: : GPA 0 non-business degree 1 business degree : Annual salary of college graduate in thousand $ With the same GPA, college graduates with a business degree are making an estimated 6 thousand dollars more than graduates with a non-business degree, on average. :

Dummy-Variable Models (with 3 Levels)

Interpretation of the Dummy- Variable Coefficients (with 3 Levels) With the same footage, a Split- level will have an estimated average assessed value of thousand dollars more than a Condo. With the same footage, a Ranch will have an estimated average assessed value of thousand dollars more than a Condo.

Regression Model Containing an Interaction Term Hypothesizes Interaction between a Pair of X Variables Response to one X variable varies at different levels of another X variable Contains a Cross-Product Term Can Be Combined with Other Models E.g., Dummy-Variable Model

Effect of Interaction Given: Without Interaction Term, Effect of X 1 on Y is Measured by  1 With Interaction Term, Effect of X 1 on Y is Measured by  1 +  3 X 2 Effect Changes as X 2 Changes

Y = 1 + 2X 1 + 3(1) + 4X 1 (1) = 4 + 6X 1 Y = 1 + 2X 1 + 3(0) + 4X 1 (0) = 1 + 2X 1 Interaction Example Effect (slope) of X 1 on Y depends on X 2 value X1X Y Y = 1 + 2X 1 + 3X 2 + 4X 1 X 2

Interaction Regression Model Worksheet Multiply X 1 by X 2 to get X 1 X 2 Run regression with Y, X 1, X 2, X 1 X 2 Case, iYiYi X 1i X 2i X 1i X 2i :::::

Interpretation When There Are 3+ Levels MALE = 0 if female and 1 if male MARRIED = 1 if married; 0 if not DIVORCED = 1 if divorced; 0 if not MALEMARRIED = 1 if male married; 0 otherwise = (MALE times MARRIED) MALEDIVORCED = 1 if male divorced; 0 otherwise = (MALE times DIVORCED)

Interpretation When There Are 3+ Levels (continued)

Interpreting Results FEMALE Single: Married: Divorced: MALE Single: Married: Divorced: Main Effects : MALE, MARRIED and DIVORCED Interaction Effects : MALEMARRIED and MALEDIVORCED Difference

Suppose X 1 and X 2 are Numerical Variables and X 3 is a Dummy-Variable To Test if the Slope of Y with X 1 and/or X 2 are the Same for the Two Levels of X 3 Model: Hypotheses: H 0 :   =   = 0 (No Interaction between X 1 and X 3 or X 2 and X 3 ) H 1 :  4 and/or  5  0 ( X 1 and/or X 2 Interacts with X 3 ) Perform a Partial F Test Evaluating the Presence of Interaction with Dummy-Variable

Evaluating the Presence of Interaction with Numerical Variables Suppose X 1, X 2 and X 3 are Numerical Variables To Test If the Independent Variables Interact with Each Other Model: Hypotheses: H 0 :   =   =   = 0 (no interaction among X 1, X 2 and X 3 ) H 1 : at least one of  4,  5,  6  0 (at least one pair of X 1, X 2, X 3 interact with each other) Perform a Partial F Test

Chapter Summary Developed the Multiple Regression Model Discussed Residual Plots Addressed Testing the Significance of the Multiple Regression Model Discussed Inferences on Population Regression Coefficients Addressed Testing Portions of the Multiple Regression Model Discussed Dummy-Variables and Interaction Terms