Download presentation
Presentation is loading. Please wait.
Published byBridget French Modified over 8 years ago
1
Chapter 5: Regression Analysis Part 1: Simple Linear Regression
2
Regression Analysis Building models that characterize the relationships between a dependent variable and one (single) or more (multiple) independent variables, all of which are numerical, for example: Sales = a + b*Price + c*Coupons + d*Advertising + e*Price*Advertising Cross-sectional data Time series data (forecasting)
3
Simple Linear Regression Single independent variable Linear relationship
4
SLR Model Y = X + Interceptslopeerror E(Y X) f(Y X)
5
Error Terms (Residuals) i = Y i X i
6
Estimation of the Regression Line True regression line (unknown): Y = X + Estimated regression line: Y = b 0 + b 1 X Observable errors: i Y i - b 0 - b 1 X i
7
Least Squares Regression b 0 = Y - b 1 X minimize a b c
8
Excel Trendlines Construct a scatter diagram Method 1: Select Chart/Add Trendline Method 2: Select data series; right click = 0.0854(6000) – 108.59 = 403.81
9
Without Regression Y Best estimate for Y is the mean; independent of the value of X A measure of total variation is SST = (Y i - Y) 2 Unexplained variation XiXi YiYi
10
With Regression Observed values Y i Fitted values Y i XiXi YiYi Variation unexplained after regression, Y - Y Fitted line Y = b 0 + b 1 X Y Variation explained by regression, Y - Y
11
Sums of Squares SST = (Y i - Y) 2 = ( – Y) 2 + (Y - ) 2 = SSR + SSE Explained variationUnexplained variation
12
Coefficient of Determination R 2 = SSR/SST = (SST – SSE)/SST = 1 – SSE/SST = coefficient of determination: the proportion of variation explained by the independent variable (regression model) 0 R 2 1 Adjusted R 2 incorporates sample size and number of explanatory variables (in multiple regression models). Useful for comparing against models with different numbers of explanatory variables.
13
Correlation Coefficient Sample correlation coefficient R = R 2 Properties -1 R 1 R = 1 => perfect positive correlation R = -1 => perfect negative correlation R = 0 => no correlation
14
Standard Error of the Estimate MSE = SSE/(n-2) = an unbiased estimate of the variance of the errors about the regression line Standard error of the estimate is Measures the spread of data about the line S YX =
15
Confidence Bands Analogous to confidence intervals, but depend on the specific values of the independent variable t /2, n-2 S YX h i
16
Regression as ANOVA Testing for significance of regression SST = SSR + SSE The null hypothesis implies that SST = SSE, or SSR = 0 MSR = SSR/1 = variance explained by regression F = MSR/MSE If F > critical value, it is likely that 1 0, or that the regression line is significant H 0 : 1 = 0 H 1 : 1 0
17
t-test for Significance of Regression with n-2 degrees of freedom This allows you to test H 0 : slope = 1 H 1 : slope 1
18
Excel Regression Tool Excel menu > Tools > Data Analysis > Regression Input variable ranges Check appropriate boxes Select output options
19
Regression Output Correlation coefficient S YX b 0 b 1 p-value for significance of regression t- test for slope Confidence interval for slope
20
Residuals Standard residuals are residuals divided by their standard error, expressed in units independent of the units of the data.
21
Assumptions Underlying Regression Linearity Check with scatter diagram of the data or the residual plot Normally distributed errors for each X with mean 0 and constant variance Examine histogram of standardized residuals or use goodness-of-fit tests Homoscedasticity – constant variance about the regression line for all values of the independent variable Examine by plotting residuals and looking for differences in variances at different values of X No autocorrelation. Residuals should be independent for each value of the independent variable. Especially important if the independent variable is time (forecasting models).
22
Residual Plot
23
Histogram of Residuals
24
Evaluating Homoscedasticity OK Heteroscadastic
25
Autocorrelation Durbin-Watson statistic D < 1 suggest autocorrelation D > 1.5 suggest no autocorrelation D > 2.5 suggest negative autocorrelation PHStat tool calculates this statistic
26
Regression and Investment Risk Systematic risk – variation in stock price explained by the market Measured by beta Beta = 1: perfect match to market movements Beta < 1: stock is less volatile than market Beta > 1: stock is more volatile than market
27
Systematic Risk (Beta) Beta is the slope of the regression line
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.