Regression Analysis Part A Basic Linear Regression Analysis and Estimation of Parameters Read Chapters 3, 4 and 5 of Forecasting and Time Series, An Applied.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
Correlation and regression
Objectives (BPS chapter 24)
Chapter 12 Simple Linear Regression
Chapter 13 Multiple Regression
© 2008 Prentice-Hall, Inc. Chapter 4 To accompany Quantitative Analysis for Management, Tenth Edition, by Render, Stair, and Hanna Power Point slides created.
Chapter 10 Simple Regression.
Statistics for Managers Using Microsoft® Excel 5th Edition
Chapter 12 Simple Regression
Statistics for Managers Using Microsoft® Excel 5th Edition
Chapter 12 Multiple Regression
SIMPLE LINEAR REGRESSION
Chapter 11 Multiple Regression.
Topic 3: Regression.
REGRESSION AND CORRELATION
SIMPLE LINEAR REGRESSION
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Chapter 15: Model Building
Correlation and Regression Analysis
Simple Linear Regression Analysis
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Correlation & Regression
Objectives of Multiple Regression
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
Chapter 13: Inference in Regression
Correlation and Linear Regression
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft.
CPE 619 Simple Linear Regression Models Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved OPIM 303-Lecture #9 Jose M. Cruz Assistant Professor.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Regression Analysis Part B Calculation Procedures Read Chapters 3, 4 and 5 of Forecasting and Time Series, An Applied Approach.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Regression Analysis Part C Confidence Intervals and Hypothesis Testing
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Model Building and Model Diagnostics Chapter 15.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Regression Analysis: Part 2 Inference Dummies / Interactions Multicollinearity / Heteroscedasticity Residual Analysis / Outliers.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Chapter 12 Simple Linear Regression n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n Testing.
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Regression Analysis Part D Model Building
Multiple Regression and Model Building
Lecture Slides Elementary Statistics Thirteenth Edition
Correlation and Regression
Simple Linear Regression
Product moment correlation
SIMPLE LINEAR REGRESSION
Created by Erin Hodgess, Houston, Texas
Presentation transcript:

Regression Analysis Part A Basic Linear Regression Analysis and Estimation of Parameters Read Chapters 3, 4 and 5 of Forecasting and Time Series, An Applied Approach.

L01A MGS Regression - Basic 2 Part A – Basic Model & Parameter Estimation Part B – Calculation Procedures Part C – Inference: Confidence Intervals & Hypothesis Testing Part D – Goodness of Fit Part E – Model Building Part F – Transformed Variables Part G – Standardized Variables Part H – Dummy Variables Part I – Eliminating Intercept Part J - Outliers Part K – Regression Example #1 Part L – Regression Example #2 Part N – Non-linear Regression Part P – Non-linear Example Regression Analysis Modules 

L01A MGS Regression - Basic 3 -the derivation of an algebraic equation to describe the relationship of one or more variables (X 1, X 2,... X P ) with respect to one other variable (Y). -The X’s can be quantitative or qualitative. -The Y must be quantitative. Definition of Regression Analysis

L01A MGS Regression - Basic 4 -Linear Regression – the algebraic equation is linear in the parameters. Estimates of the parameters are very easy to derive. -Non-Linear Regression – the algebraic equation is not linear in the parameters and estimates of the parameters are more difficult to derive. Types of Regression Analysis A linear in the parameters equations can provide a non-linear curve.

L01A MGS Regression - Basic 5 -Simple Regression versus Multiple Regression – depends on whether the regression equation has one independent variable (X 1 ) or more than one independent variables (X 1, X 2, … X p ). The theory remains the same in both situations, but -1) the numerical computations are slightly more complex in the multiple regression case (need to use a matrix formulation of the problem). -2) some sets of independent variables are computationally invalid (a condition called multicollinearity). Other Regression Categorizations

L01A MGS Regression - Basic 6 -Cross-Sectional Regression versus Time Series Regression – depends on whether the dependent and independent variables are jointly “ordered” in time. The Y j and the X j s are recorded at a specific point in time and Y j+k and the X j+k s are recorded k time periods later. The theory remains the same in both situations. However, for Time Series data, prior values of Y and X may be used in the regression equation for future values of Y. Other Regression Categorizations

L01A MGS Regression - Basic 7 -The one Y variable is called the dependent variable or the response variable. -The set of X variables are called the independent variables or explanatory variables. -If there are p independent variables and a data sample of size N, the variables can be denoted as denoted as Notation for Regression Data

L01A MGS Regression - Basic 8 -The value of Y i for a given set of equal X values equals the mean value for Y,  i, at that set of X values plus some random error,  i. The ‘ i ’ in this case refers to the specific set of X values. -Regression analysis determines an algebraic equation that can be used to predict  i. Notation for Regression Equation

L01A MGS Regression - Basic 9 -If the regression equation is denoted as -The estimates of the regression parameters (a, b, c, … h) and the estimated regression equation are then denoted as -An alternative notation that is frequently used is which is estimated as Notation for Regression Equation (continued)

L01A MGS Regression - Basic 10 Sometimes a distinction is made between Y and X 1, X 2, …X p the random variables and Y i and X i,1, X i,2, … X i,p the observed values in a sample or y i and x i,1, x i,2, … x i,p the observed values in a sample. If this distinction is desired, the observed sample values are frequently denoted by lower case y i and x i,1, x i,2, … x i,p. Notation for Regression Equation (continued) pis the number of variables nis the number of observations

L01A MGS Regression - Basic 11 Example of Regression Data single independent variable.

L01A MGS Regression - Basic 12 Plotted Regression Equations

L01A MGS Regression - Basic 13 Uses of Regression Analysis 1) Summarizes the data - seeing the data plotted on an X-Y chart is informative. Seeing the equation plotted on the graph may be even more valuable. The plotted curve summarizes the scatter of the points. All observers are focused on the same interpretation of the data. 2) Allows predictions - inserting a set of X i in the right half of the equation allows the predicted Y i to be calculated. Predictions made from the equation are better than reviewing historical data to find a value that was close to the desired X i value and seeing what the corresponding Y i values was. The selected Y i may by chance have a large residual and not be representative the the majority of the data. 3) Interpret the Coefficients - in many cases the regression parameters have a useful physical interpretation. The coefficient for X i indicates how much change occurs in Y for each unit change in X i.

L01A MGS Regression - Basic 14 Regression Line - Fitting Procedure Least Squares Criteria Teaching Point

L01A MGS Regression - Basic 15 Derivation of Regression Parameters single independent variable.

L01A MGS Regression - Basic 16 Derivation of Regression Parameters single independent variable.(continued)

L01A MGS Regression - Basic 17 Relationship between Regression Coefficients and Correlation Coefficient Teaching Point

L01A MGS Regression - Basic 18 Derivation of Regression Parameters multiple independent variables. Not shown is the fact that this algebraic manipulation is also the Least Squares solution. Teaching Point

L01A MGS Regression - Basic 19 Characteristic of Parameter Estimates, b=(X’X) -1 (X’Y) Least Squares Estimates - matrix differentiation is outside the scope of the course, but it can be shown that b=(X’X) -1 (X’Y) minimizes SSE =  ’  = (Y-X  )(Y-X  ). Unbiased, Minimum Variance Estimates Maximum Likelihood Estimates – if the errors are independent and normally distributed, that is  i ~ N(0,  2 ).

L01A MGS Regression - Basic 20 Limitations on Regression Analysis Restriction – the basic model must be linear in the parameters. Restriction – the sample must be representative of the population for the inference prediction. Assertion - the equation being fit to the data is a correct (valid) representation of the underlying process. Consequence - if there are a large number of Y i values for the same sets of X i values, the regression equation will predict the mean Y i values, that is the regression equation will pass through the points,  i.

L01A MGS Regression - Basic 21 Assumption underlying Regression The required assumptions are: 1.The dependent variable is subject to error. This error is assumed to be a random variable, with a mean of zero, E(  )=0. 2.The independent variable is error-free. 3.The predictors must be linearly independent, i.e. it must not be possible to express any predictor as a linear combination of the others. See Multicollinearity. 4.The errors are uncorrelated, that is, the variance-covariance matrix of the errors is diagonal and each non-zero element is the variance of the error. 5.The variance of the errors is constant (homoscedasticity). 6.The errors follow a normal distribution. Teaching Points will discuss later

L01A MGS Regression - Basic 22 Multicollinearity Condition Y predicted by X 1, X 2, X 3, X p moderate correlation is good large correlation is great Y predicted by X 1, X 2, X 3, X p moderate correlation is possible problem large correlation is disaster

L01A MGS Regression - Basic 23 Multicollinearity Condition A Multicollinearity condition exists if the independent variables are linearly related. Example – using stature (standing height) and sitting height as independent variables when estimating weight. Transformed data example – Definite multicollinearity condition X 1 and X 2 =2X 1 X 1, X 2 and X 3 =X 1 -X 2 X 1, X 2 and any X 3 =aX 1 +bX 2 Potential (possible) multicollinearity X 1 and X 2 =X 1 2 X 1, and X 2 =X 1 1/2 Particularly if 1)N is small compared to p. 2)the original Xs have moderate correlation. 3)Many transformations of one original variable are being used.

L01A MGS Regression - Basic 24 Basic Test for Multicollinearity Calculate a correlation matrix for all of the independent variable (original variables and transformed variables). Any correlation greater than.9 represent a possible multicollinearity condition. A p i,j >.97 is a highly likely multicollinearity condition.

L01A MGS Regression - Basic 25 Sophisticated Test for Multicollinearity Calculate the Coefficient of Determination p times, each time using the jth variable as the dependent variable and the p-1 remaining variables as the independent variables (ignore the original dependent variable). Denote these partial R 2 as R j 2 for j=1 to p. Calculate the Variance Inflation Factor and the Mean Variance Inflation Factor as Potential multicollinearity condition: –If largest R j 2 >.9 –If largest VIF j > 10 –If Mean VIF >>> 1 (a very loose rule of thumb, can’t be less than 1)

L01A MGS Regression - Basic 26 Differences in Multicollinearity Tests Correlation coefficients measure the multicollinearity between each pair of independent variables. VIF measures the multicollinearity between one independent variable and the remaining independent variables taken as a group. A multicollinearity condition does not necessarily mean that a variable should be removed from the data base. It does indicate that caution should be exercised when fitting equations with these variables.

L01A MGS Regression - Basic 27 Multi- collinearity Example Implies Size could be removed from database: Corr(Size,Size 2 )=.997, Corr(Size,1/Size)=-.985 and VIF(Size)=47,669. The effect of Size is being accounted for by Size 2, 1/Size and in general all of the remaining variables. Similarly, Age.9 could be removed from the database because it is being accounted for by Age and the cross product (Size)(Age).

L01A MGS Regression - Basic 28 Consequences of Extreme Multicollinearity the computer program may crash. The matrix being inverted will be singular and will cause a “divide by zero” error. the derived regression coefficients will be very sensitive to small changes in the data. The coefficients are unstable. the derived regression coefficients may have unexpected physical interpretations. For example, increasing ‘Advertising’ could incorrectly imply a reduction in ‘Sales’. Confusion can occur when deriving the “best” regression model. Two variables may seem of marginal value when they are both in the regression equation. Yet, if one of these variables is removed from the regression equation, the remaining variables becomes highly significant (desirable). The t-tests of the significance of the regression coefficients may be incorrect and variables that are not needed may be retained in the regression equation. Teaching Point

L01A MGS Regression - Basic 29 Skip’s Approach to a Multicollinearity Condition I do not eliminate potential multicollinearity variables before doing the regression analysis. If the computer crashes, then I eliminate the multicollinearity variables one at a time and by trial-&-error. If the computer does not crash, then I continue my regression analysis. Usually, typical regression analysis procedures will by themselves eliminate multicollinearity variables. When I have a final model I do a correlation and VIF analysis to make sure that a multicollinearity condition has not survived the analysis. Based on the sets of multicollinearity variables, I try different subsets of variables as the starting point in my regression analysis.