Purpose of Regression Analysis Regression analysis is used primarily to model causality and provide prediction –Predicts the value of a dependent (response)

Slides:



Advertisements
Similar presentations
Korelasi Diri (Auto Correlation) Pertemuan 15 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Advertisements

Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Forecasting Using the Simple Linear Regression Model and Correlation
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 14-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Qualitative Variables and
Multiple Regression [ Cross-Sectional Data ]
Chapter 13 Multiple Regression
Regresi dan Analisis Varians Pertemuan 21 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter 14 Introduction to Multiple Regression
Korelasi Ganda Dan Penambahan Peubah Pertemuan 13 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter 12 Simple Regression
Interaksi Dalam Regresi (Lanjutan) Pertemuan 25 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Regresi dan Rancangan Faktorial Pertemuan 23 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter 12 Multiple Regression
© 2000 Prentice-Hall, Inc. Chap Multiple Regression Models.
Multiple Regression Models. The Multiple Regression Model The relationship between one dependent & two or more independent variables is a linear function.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
© 2003 Prentice-Hall, Inc.Chap 14-1 Basic Business Statistics (9 th Edition) Chapter 14 Introduction to Multiple Regression.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 11 th Edition.
Pengujian Parameter Koefisien Korelasi Pertemuan 04 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter Topics Types of Regression Models
Linear Regression and Correlation Analysis
Statistics for Business and Economics Chapter 11 Multiple Regression and Model Building.
© 2004 Prentice-Hall, Inc.Chap 14-1 Basic Business Statistics (9 th Edition) Chapter 14 Introduction to Multiple Regression.
Linear Regression Example Data
Ch. 14: The Multiple Regression Model building
Korelasi dalam Regresi Linear Sederhana Pertemuan 03 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Pertemua 19 Regresi Linier
Simple Linear Regression. Chapter Topics Types of Regression Models Determining the Simple Linear Regression Equation Measures of Variation Assumptions.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Chapter 7 Forecasting with Simple Regression
Statistics for Managers Using Microsoft Excel 3rd Edition
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
© 2001 Prentice-Hall, Inc.Chap 14-1 BA 201 Lecture 23 Correlation Analysis And Introduction to Multiple Regression (Data)Data.
Chapter 8 Forecasting with Multiple Regression
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
© 2003 Prentice-Hall, Inc.Chap 11-1 Business Statistics: A First Course (3 rd Edition) Chapter 11 Multiple Regression.
© 2004 Prentice-Hall, Inc.Chap 15-1 Basic Business Statistics (9 th Edition) Chapter 15 Multiple Regression Model Building.
Lecture 14 Multiple Regression Model
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
Chapter 14 Simple Regression
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Chapter 14 Introduction to Multiple Regression
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Chap 14-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics.
Lecture 10: Correlation and Regression Model.
Lecture 3 Introduction to Multiple Regression Business and Economic Forecasting.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 14-1 Chapter 14 Introduction to Multiple Regression Statistics for Managers using Microsoft.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice- Hall, Inc. Chap 14-1 Business Statistics: A Decision-Making Approach 6 th Edition.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Chap 13-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 13 Multiple Regression and.
Statistics for Managers Using Microsoft® Excel 5th Edition
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Multiple Regression Learning Objectives n Explain the Linear Multiple Regression Model n Interpret Linear Multiple Regression Computer Output n Test.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
© 2000 Prentice-Hall, Inc. Chap Chapter 10 Multiple Regression Models Business Statistics A First Course (2nd Edition)
Chapter 13 Simple Linear Regression
Chapter 14 Introduction to Multiple Regression
Chapter 15 Multiple Regression and Model Building
Statistics for Managers using Microsoft Excel 3rd Edition
Multiple Regression Analysis and Model Building
PENGOLAHAN DAN PENYAJIAN
Pemeriksaan Sisa dan Data Berpengaruh Pertemuan 17
Korelasi Parsial dan Pengontrolan Parsial Pertemuan 14
Presentation transcript:

Purpose of Regression Analysis Regression analysis is used primarily to model causality and provide prediction –Predicts the value of a dependent (response) variable based on the value of at least one independent (explanatory) variable –Explains the effect of the independent variables on the dependent variable

Types of Regression Models Positive Linear Relationship Negative Linear Relationship Relationship NOT Linear No Relationship

Simple Linear Regression Model Relationship between variables is described by a linear function The change of one variable causes the change in the other variable A dependency of one variable on the other

Population Regression Line (conditional mean) Population Linear Regression average value (conditional mean) Population regression line is a straight line that describes the dependence of the average value (conditional mean) of one variable on the other Population Y intercept Population Slope Coefficient Random Error Dependent (Response) Variable Independent (Explanatory) Variable

Population Linear Regression (continued) = Random Error Y X (Observed Value of Y) = Observed Value of Y (Conditional Mean)

estimate Sample regression line provides an estimate of the population regression line as well as a predicted value of Y Sample Linear Regression Sample Y Intercept Sample Slope Coefficient Residual Sample Regression Line (Fitted Regression Line, Predicted Value)

Sample Linear Regression and are obtained by finding the values of and that minimizes the sum of the squared residuals estimate provides an estimate of estimate provides and estimate of (continued)

Sample Linear Regression (continued) Y X Observed Value

Interpretation of the Slope and the Intercept is the average value of Y when the value of X is zero. measures the change in the average value of Y as a result of a one-unit change in X.

estimated is the estimated average value of Y when the value of X is zero. estimated is the estimated change in the average value of Y as a result of a one-unit change in X. (continued) Interpretation of the Slope and the Intercept

Simple Linear Regression: Example You want to examine the linear dependency of the annual sales of produce stores on their size in square footage. Sample data for seven stores were obtained. Find the equation of the straight line that fits the data best. Annual Store Square Sales Feet($1000) 1 1,726 3, ,542 3, ,816 6, ,555 9, ,292 3, ,208 5, ,313 3,760

Scatter Diagram: Example Excel Output

Equation for the Sample Regression Line: Example From Excel Printout:

Excel Output Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations7 ANOVA dfSSMSF Significance F Regression Residual Total Coefficient s Standard Errort StatP-valueLower 95%Upper 95% Intercept X Variable

Graph of the Sample Regression Line: Example Y i = X i 

Interpretation of Results: Example The slope of means that for each increase of one unit in X, we predict the average of Y to increase by an estimated units. The model estimates that for each increase of one square foot in the size of the store, the expected annual sales are predicted to increase by $1487.

How Good is the regression? R 2 Confidence Intervals Residual Plots Analysis of Variance Hypothesis (t) tests

Measure of Variation: The Sum of Squares SST = SSR + SSE Total Sample Variability = Explained Variability + Unexplained Variability

Measure of Variation: The Sum of Squares SST = total sum of squares –Measures the variation of the Y i values around their mean Y SSR = regression sum of squares –Explained variation attributable to the relationship between X and Y SSE = error sum of squares –Variation attributable to factors other than the relationship between X and Y (continued)

Measure of Variation: The Sum of Squares (continued) XiXi Y X Y SST =  (Y i - Y) 2 SSE =  (Y i - Y i ) 2  SSR =  (Y i - Y) 2   _ _ _

The Coefficient of Determination Measures the proportion of variation in Y that is explained by the independent variable X in the regression model

Coefficients of Determination (r 2 ) and Correlation (r) r 2 = 1, r 2 =.8,r 2 = 0, Y Y i =b 0 +b 1 X i X ^ Y Y i =b 0 +b 1 X i X ^ Y Y i =b 0 +b 1 X i X ^ Y Y i =b 0 +b 1 X i X ^ r = +1 r = -1 r = +0.9r = 0

Linear Regression Assumptions 1.Linearity 2.Normality –Y values are normally distributed for each X –Probability distribution of error is normal 2.Homoscedasticity (Constant Variance) 3.Independence of Errors

Residual Analysis Purposes –Examine linearity –Evaluate violations of assumptions Graphical Analysis of Residuals –Plot residuals vs. X i, Y i and time

Residual Analysis for Linearity Not Linear Linear X e e X Y X Y X

Residual Analysis for Homoscedasticity Heteroscedasticity Homoscedasticity SR X X Y X X Y

Y values are normally distributed around the regression line. For each X value, the “spread” or variance around the regression line is the same. Variation of Errors around the Regression Line X1X1 X2X2 X Y f(e) Sample Regression Line

Residual Analysis:Excel Output for Produce Stores Example Excel Output

Residual Analysis for Independence Not Independent Independent e e Time Residual is plotted against time to detect any autocorrelation No Particular PatternCyclical Pattern Graphical Approach

Inference about the Slope: t Test t test for a population slope –Is there a linear dependency of Y on X ? Null and alternative hypotheses –H 0 :  1 = 0(no linear dependency) –H 1 :  1  0(linear dependency) Test statistic –

Example: Produce Store Data for Seven Stores: Estimated Regression Equation: The slope of this model is Is square footage of the store affecting its annual sales?  Annual Store Square Sales Feet($000) 1 1,726 3, ,542 3, ,816 6, ,555 9, ,292 3, ,208 5, ,313 3,760 Y i = X i

Inferences about the Slope: t Test Example H 0 :  1 = 0 H 1 :  1  0  .05 df  = 5 Critical Value(s): Test Statistic: Decision: Conclusion: There is evidence that square footage affects annual sales. t Reject.025 From Excel Printout Reject H 0

Population Y-intercept Population slopesRandom Error The Multiple Regression Model Relationship between 1 dependent & 2 or more independent variables is a linear function Dependent (Response) variable for sample Independent (Explanatory) variables for sample model Residual

Population Multiple Regression Model Bivariate model

Sample Multiple Regression Model Bivariate model Sample Regression Plane

Simple and Multiple Regression Compared Coefficients in a simple regression pick up the impact of that variable plus the impacts of other variables that are correlated with it and the dependent variable. Coefficients in a multiple regression net out the impacts of other variables in the equation.

Simple and Multiple Regression Compared:Example Two simple regressions: – Multiple regression: –

Multiple Linear Regression Equation Too complicated by hand! Ouch!

Interpretation of Estimated Coefficients Slope (b i ) –Estimated that the average value of Y changes by b i for each 1 unit increase in X i holding all other variables constant (ceteris paribus) –Example: if b 1 = -2, then fuel oil usage (Y) is expected to decrease by an estimated 2 gallons for each 1 degree increase in temperature (X 1 ) given the inches of insulation (X 2 ) Y-intercept (b 0 ) –The estimated average value of Y when all X i = 0

Multiple Regression Model: Example ( 0 F) Develop a model for estimating heating oil used for a single family home in the month of January based on average temperature and amount of insulation in inches.

Sample Multiple Regression Equation: Example Excel Output For each degree increase in temperature, the estimated average amount of heating oil used is decreased by gallons, holding insulation constant. For each increase in one inch of insulation, the estimated average use of heating oil is decreased by gallons, holding temperature constant.

Confidence Interval Estimate for the Slope Provide the 95% confidence interval for the population slope  1 (the effect of temperature on oil consumption)   1  The estimated average consumption of oil is reduced by between 4.7 gallons to 6.17 gallons per each increase of 1 0 F.

Coefficient of Multiple Determination Proportion of total variation in Y explained by all X variables taken together – Never decreases when a new X variable is added to model –Disadvantage when comparing models

Adjusted Coefficient of Multiple Determination Proportion of variation in Y explained by all X variables adjusted for the number of X variables used – –Penalize excessive use of independent variables –Smaller than –Useful in comparing among models

Coefficient of Multiple Determination Excel Output Adjusted r 2  reflects the number of explanatory variables and sample size  is smaller than r 2

Interpretation of Coefficient of Multiple Determination –96.56% of the total variation in heating oil can be explained by different temperature and amount of insulation – 95.99% of the total fluctuation in heating oil can be explained by different temperature and amount of insulation after adjusting for the number of explanatory variables and sample size

Using The Model to Make Predictions Predict the amount of heating oil used for a home if the average temperature is 30 0 and the insulation is six inches. The predicted heating oil used is gallons

Testing for Overall Significance Shows if there is a linear relationship between all of the X variables together and Y Use F test statistic Hypotheses: –H 0 :      …  k = 0 (no linear relationship) –H 1 : at least one  i  ( at least one independent variable affects Y ) The null hypothesis is a very strong statement Almost always reject the null hypothesis

Test for Significance: Individual Variables Shows if there is a linear relationship between the variable X i and Y Use t test statistic Hypotheses: –H 0 :  i  0 (no linear relationship) –H 1 :  i  0 (linear relationship between X i and Y)

Residual Plots Residuals vs. –May need to transform variable Residuals vs. –May need to transform variable Residuals vs. time –May have autocorrelation

Residual Plots: Example No Discernable Pattern Maybe some non- linear relationship

The Quadratic Regression Model Relationship between one response variable and two or more explanatory variables is a quadratic polynomial function Useful when scatter diagram indicates non- linear relationship Quadratic model : – The second explanatory variable is the square of the first variable

Quadratic Regression Model (continued) Quadratic models may be considered when scatter diagram takes on the following shapes: X1X1 Y X1X1 X1X1 YYY  2 > 0  2 < 0  2 = the coefficient of the quadratic term X1X1

Testing for Significance: Quadratic Model Testing for Overall Relationship –Similar to test for linear model –F test statistic = Testing the Quadratic Effect –Compare quadratic model with the linear model –Hypotheses (No 2 nd order polynomial term) (2 nd order polynomial term is needed)

Dummy Variable Models Categorical explanatory variable (dummy variable) with two or more levels: Yes or no, on or off, male or female, Coded as 0 or 1 Only intercepts are different Assumes equal slopes across categories The number of dummy variables needed is (number of levels - 1) Regression model has same form:

Dummy-Variable Models (with 2 Levels) Given: Y = Assessed Value of House X 1 = Square footage of House X 2 = Desirability of Neighborhood = Desirable (X 2 = 1) Undesirable (X 2 = 0) 0 if undesirable 1 if desirable Same slopes

Undesirable Desirable Location Dummy-Variable Models (with 2 Levels) (continued) X 1 (Square footage) Y (Assessed Value) b 0 + b 2 b0b0 Same slopes Intercepts different

Interpretation of the Dummy Variable Coefficient (with 2 Levels) Example: : GPA 0 Female 1 Male : Annual salary of college graduate in thousand $ On average, male college graduates are making an estimated six thousand dollars more than female college graduates with the same GPA. :

Dummy-Variable Models (with 3 Levels)

Interpretation of the Dummy Variable Coefficients (with 3 Levels) With the same footage, a Split- level will have an estimated average assessed value of thousand dollars more than a Condo. With the same footage, a Ranch will have an estimated average assessed value of thousand dollars more than a Condo.

Dummy Variables Predict Weekly Sales in a Grocery Store Possible independent variables: –Price –Grocery Chain Data Set: –Grocery.xlsGrocery.xls Interaction Effect?

Interaction Regression Model Hypothesizes interaction between pairs of X variables –Response to one X variable varies at different levels of another X variable Contains two-way cross product terms – Can be combined with other models –E.G., Dummy variable model

Effect of Interaction Given: – Without interaction term, effect of X 1 on Y is measured by  1 With interaction term, effect of X 1 on Y is measured by  1 +  3 X 2 Effect changes as X 2 increases

Y = 1 + 2X 1 + 3(1) + 4X 1 (1) = 4 + 6X 1 Y = 1 + 2X 1 + 3(0) + 4X 1 (0) = 1 + 2X 1 Interaction Example Effect (slope) of X 1 on Y does depend on X 2 value X1X Y Y = 1 + 2X 1 + 3X 2 + 4X 1 X 2

Interaction Regression Model Worksheet Multiply X 1 by X 2 to get X 1 X 2. Run regression with Y, X 1, X 2, X 1 X 2

Hypothesize interaction between pairs of independent variables Contains 2-way product terms Evaluating Presence of Interaction

Using Transformations Requires data transformation Either or both independent and dependent variables may be transformed Can be based on theory, logic or scatter diagrams

Inherently Linear Models Non-linear models that can be expressed in linear form –Can be estimated by least squares in linear form Require data transformation

Transformed Multiplicative Model (Log- Log) Similarly for X 2

Square Root Transformation  1 > 0  1 < 0 Similarly for X 2 Transforms one of above model to one that appears linear. Often used to overcome heteroscedasticity.

Linear-Logarithmic Transformation  1 > 0  1 < 0 Similarly for X 2 Transformed from an original multiplicative model

Exponential Transformation (Log-Linear) Original Model  1 > 0  1 < 0 Transformed Into:

Model Building / Model Selection Find “the best” set of explanatory variables among all the ones given. “Best subset” regression (only linear models) –Requires a lot of computation (2 N regressions) “Stepwise regression” “Common Sense” methodology –Run regression with all variables –Throw out variables not statistically significant –“Adjust” model by including some excluded variables, one at a time Tradeoff: Parsimony vs. Fit

Association ≠ Causation !

Regression Limitations R 2 measures the association between independent and dependent variables Association ≠ Causation ! Be careful about doing predictions that involve extrapolation Inclusion / Exclusion of independent variables is subject to a type I / type II error

Multi-collinearity What? –When one independent variable is highly correlated (“collinear”) with one or more other independent variables –Examples: square feet and square meters as independent variables to predict house price (1 sq ft is roughly 0.09 sq meters) “total rooms” and bedrooms plus bathrooms for a house How to detect? –Run a regression with the “not-so-independent” independent variable (in the examples above: square feet and total rooms) as a function of all other remaining independent variables, e.g.: X 1 = β 0 + β 2 X 2 + …+ β k X k –If R 2 of the above regression is > 0.8, then one suspects multi- collinearity to be present

Multi-collinearity What effect? –Coefficient estimates are unreliable –Can still be used for predicting values for Y –If possible, delete the “not-so-independent” independent variable When to check? –When one suspects that two variables measure the same thing, or when the two variables are highly correlated –When one suspects that one independent variable is a (linear) function of the other independent variables (continued)