Download presentation
Presentation is loading. Please wait.
1
BCOR 1020 Business Statistics Lecture 24 – April 17, 2008
2
Overview Chapter 12 – Linear Regression –Visual Displays and Correlation Analysis –Bivariate Regression –Regression Terminology
3
Chapter 12 –Visual Displays Begin the analysis of bivariate data (i.e., two variables) with a scatter plot. A scatter plot - displays each observed data pair (x i, y i ) as a dot on an X/Y grid. - indicates visually the strength of the relationship between the two variables. Visual Displays:
4
Chapter 12 –Visual Displays Visual Displays: The price of Regular Unleaded appears to have a positively sloped linear relationship with the price of Diesel. These variables appear to be correlated.
5
Chapter 12 –Correlation Analysis The sample correlation coefficient (r) measures the degree of linearity in the relationship between X and Y. -1 < r < +1 r = 0 indicates no linear relationship. Correlation functions are available in Excel, MegaStat and on your calculators. Correlation Analysis: Strong negative relationship Strong positive relationship
6
Chapter 12 –Correlation Analysis Correlation Analysis (Computing r): This value can be calculated on your calculator or using a software package like Excel or MegaStat.
7
Chapter 12 –Correlation Analysis Example: Data Set for problem 12.3 (“CallWait”)… Y = Hold time (minutes) for concert tickets X = number of operators Operators (X)Wait Time (Y) 4385 5335 6383 7344 8288 There appears to be “some” negative correlation between the variables. Does this make sense? We can calculate the sample correlation coefficient… r = -0.733 (overhead)
8
Excel/MegaStat Demo…
9
Chapter 12 –Correlation Analysis Strong Positive Correlation Weak Positive Correlation Weak Negative Correlation Strong Negative Correlation
10
Chapter 12 –Correlation Analysis No Correlation Nonlinear Relation
11
Chapter 12 –Correlation Analysis r is an estimate of the population correlation coefficient (rho). To test the hypothesis H 0 : = 0, the test statistic is: The critical value t is obtained from Appendix D using = n – 2 degrees of freedom for any . We can bound the p-value for this test using the t table or we can find it exactly using Excel or MegaStat. Tests for Significance:
12
Chapter 12 –Correlation Analysis Equivalently, you can calculate the critical value for the correlation coefficient using This method gives a benchmark for the correlation coefficient. However, there is no p-value and is inflexible if you change your mind about . Tests for Significance:
13
Chapter 12 –Correlation Analysis Step 1: State the Hypotheses Determine whether you are using a one or two-tailed test and the level of significance (a). H 0 : = 0 H 1 : ≠ 0 Step 2: Calculate the Critical Value For degrees of freedom = n -2, look up the critical value t in Appendix D, then calculate Steps in Testing if = 0: Step 3: Make the Decision If the sample correlation coefficient r exceeds the critical value r , then reject H 0. If using the t statistic method, reject H 0 if t > t or if the p-value < .
14
Chapter 12 –Correlation Analysis Example: In our earlier example on the data set “CallWait”, we calculated the sample correlation, r = -0.733, based on n = 5 data points. Calculate the Critical Value, r , to test the hypothesis H 0 : = 0 vs. H 1 : ≠ 0 at the 10% level of significance. Since | r | is not greater than r , we cannot reject H 0. There is not a significant correlation between these variables at the 10 % level of significance.
15
Clickers For our example on the data set “CallWait”, we calculated the sample correlation, r = -0.733, based on n = 5 data points. Instead of calculating the Critical Value, r , to test the hypothesis H 0 : = 0 vs. H 1 : ≠ 0, we could have calculated the test statistic What are the bounds for the p-value on this test statistic? (A) 0.10 < p-value < 0.20 (B) 0.025 < p-value < 0.05 (C) 0.05 < p-value < 0.10 t distribution with = n-2 d.f. under H 0.
16
Chapter 12 –Correlation Analysis As sample size increases, the critical value of r becomes smaller. This makes it easier for smaller values of the sample correlation coefficient to be considered significant. A larger sample does not mean that the correlation is stronger nor does its significance imply importance. Role of Sample Size:
17
Chapter 12 –Bivariate Regression Bivariate Regression analyzes the relationship between two variables. It specifies one dependent (response) variable and one independent (predictor) variable. This hypothesized relationship may be linear, quadratic, or whatever. What is Bivariate Regression?
18
Chapter 12 –Bivariate Regression Some Model Forms:
19
Chapter 12 –Bivariate Regression The intercept and slope of a fitted regression can provide useful information. For example, consider the fitted regression model… Sales(Y) = 268 + 7.37Ads(X) –Each extra $1 million of advertising will generate $7.37 million of sales on average. –The firm would average $268 million of sales with zero advertising. –However, the intercept may not be meaningful because Ads = 0 may be outside the range of the observed data. Prediction Using Regression:
20
Chapter 12 –Bivariate Regression One of the main uses of regression is to make predictions. Once you have a fitted regression equation that shows the estimated relationship between X and Y, we can plug in any value of X to make a prediction for Y. Consider our example… Sales(Y) = 268 + 7.37Ads(X) If the firm spends $10 million on advertising, its expected sales would be… –Sales(Y) = 268 + 7.37(10) = $341.7 million. Prediction Using Regression:
21
Chapter 12 –Regression Terminology Unknown parameters that we will estimate are 0 =Intercept 1 =Slope The assumed model for a linear relationship is y i = 0 + 1 x i + i for all observations (i = 1, 2, …, n) The error term is not observable, but is assumed normally distributed with mean of 0 and standard deviation . Models and Parameters:
22
Chapter 12 –Regression Terminology The fitted model used to predict the expected value of Y for a given value of X is y i = b 0 + b 1 x i Models and Parameters: The fitted coefficients are b 0 the estimated intercept b 1 the estimated slope Residual is e i = y i - y i. Residuals may be used to estimate , the standard deviation of the errors. We will discuss how b 0 and b 1 are found next lecture. ^ ^
23
Chapter 12 –Regression Terminology Step 1: - Highlight the data columns. - Click on the Chart Wizard and choose Scatter Plot - In the completed graph, click once on the points in the scatter plot to select the data - Right-click and choose Add Trendline - Choose Options and check Display Equation Fitting a Regression on a Scatter Plot in Excel:
24
Chapter 12 –Regression Terminology Example: Data Set for problem 12.3 (“CallWait”)… Y = Hold time (minutes) for concert tickets X = number of operators Operators (X)Wait Time (Y) 4385 5335 6383 7344 8288 From this output, we have the linear model: y = 458 – 18.5x b 0 = 458 b 1 = -18.5 Discussion…
25
Clickers For our example on the data set “CallWait”, we have now calculated the regression model: Wait time (Y) = 458 – 18.5 Operators (X) If the there are 7 operators, what is the expected wait time? (A) 458 (B) 129.5 (C) 328.5 (D) 587.5
26
Chapter 12 –Regression Terminology Regression Caveats: The “fit” of the regression does not depend on the sign of its slope. The sign of the fitted slope merely tells whether X has a positive or negative association with Y. View the intercept with skepticism unless X = 0 is logically possible and was actually observed in the data set. Be wary of extrapolating the model beyond the observed range in the data. Regression does not demonstrate cause-and-effect between X and Y. A good fit shows that X and Y vary together. Both could be affected by another variable or by the way the data are defined.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.