Download presentation
Presentation is loading. Please wait.
Published byDelilah Joseph Modified over 8 years ago
1
Simple Linear Regression and Correlation (Continue..,) Reference: Chapter 17 of Statistics for Management and Economics, 7 th Edition, Gerald Keller. 1
2
Using the regression equation (17.6) 2
3
Why regression? 1.Analyze specific relations between Y and X. How is Y related to X? 2.Forecast / Predict the variable Y with the help of X. In this case, linear relationship 3
4
Two kinds of prediction Point predictions Do not provide any information about how closely the value will match the true value Interval predictions –Prediction intervals – for predicting y for a given value of x –Confidence intervals – for the average of y for a given x. 4
5
Example; fast food company Make a prediction for one restaurant’s selling if it has an advertisment budget of $750 000. 5
6
6
7
The prediction interval 7
8
Example; fast food company Make a prediction interval with 95% confidence for one restaurant’s selling if it has an advertisment budget of $750 000. 8
9
Confidence interval for the expected value 9
10
Example; fast food company Make a interval with 95% conficence for the mean selling for restaurants having an advertisment budget of $750 000. 10
11
11
12
17.7 Regression Diagnostics - I The three conditions required for the validity of the regression analysis are: –the error variable is normally distributed. –the error variance is constant for all values of x. –The errors are independent of each other. How can we diagnose violations of these conditions? 12
13
13 Residual Analysis Most of the departure from the required conditions can be diagnosed by the residual analysis For our case Food company... –a 1 st data: when ADVER= 276, SALES= 115.0 predicted SALES= 118.0087 residual= 115.0 –118.0087= –3.008726
14
14 Non–normality Non–normality of the residuals can be checked by making a histogram on residuals
15
15 Heteroscedasticity Variance of the errors is not constant (Violation of the requirement) Homoscedasticity Variance of the errors is constant (No violation of the requirement) Check: plot the residuals against predicted values of Y by the model Non–independnece of error variable
16
Outliers An outlier is an observation that is unusually small or large. Several possibilities need to be investigated when an outlier is observed: –There was an error in measuring or recording the value. –The point does not belong in the sample. –The observation is valid. 16
17
Identify outliers from the scatter diagram. It is customary to suspect an observation is an outlier if its |standard residual| > 2 17
18
Influential observations 18
19
Testing the coefficient of correlation The coefficient of correlation is used to measure the strength of association between two variables. The coefficient values range between -1 and 1. If r = -1 (negative association) or r = +1 (positive association) every point falls on the regression line. If r = 0 there is no linear pattern. The coefficient can be used to test for linear relationship between two variables. 19
20
To test the coefficient of correlation for linear relationship between X and Y –X and Y must be observational –X and Y are bivariate normally distributed X Y 20
21
When no linear relationship exist between the two variables, = 0. The hypotheses are: H 0 : 0 H 1 : 0 The test statistic is: 21
22
The statistic is Student t distributed with d.f. = n - 2, provided the variables are bivariate normally distributed. 22
23
23 Food Company Example... Sample correlation co–efficient between ADVER and SALES Test statistic
24
Testing the Coefficient of correlation Foreign Index Funds (Index)Index –A certain investor prefers the investment in an index mutual funds constructed by buying a wide assortment of stocks. –The investor decides to avoid the investment in a Japanese index fund if it is strongly correlated with an American index fund that he owns. –From the data shown in Index should he avoid the investment in the Japanese index fund? 24
25
25
26
Solution –Problem objective: Analyze relationship between two interval variables. –The two variables are observational (the return for each fund was not controlled). –We are interested in whether there is a linear relationship between the two variables, thus, we need to test the coefficient of correlation Testing the Coefficient of Correlation 26
27
The sample coefficient of correlation: r = cov(x,y)/s x s y =.491 (Cov(x,y) =.001279; s x =.0509; s y = 0512) The value of the t statistic is The rejection region: |t| > t /2,n-2 = t.025,59-2 2.000. 27
28
Conclusion: There is sufficient evidence at a = 5% to infer that there are linear relationship between the two variables. 28
29
29
30
Procedure for Regression Diagnostics Develop a model that has a theoretical basis. Gather data for the two variables in the model. Draw the scatter diagram to determine whether a linear model appears to be appropriate. Determine the regression equation. Check the required conditions for the errors. Check the existence of outliers and influential observations Assess the model fit. If the model fits the data, use the regression equation.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.