1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y = 0 + 1 x + n Simple Linear Regression Equation E( y ) = 0 + 1 x n Estimated Simple Linear Regression Equation y = b 0 + b 1 x ^
2 2 Slide 最小平方直線(最佳預測直線) n 通過平面分佈圖資料點的直線中,使預測誤差平方和 爲最小者即稱爲最小平方直線,而此方法即稱爲最小 平方法( Least Square Method ) n 何謂誤差平方和? 設 爲 n 個資料點,若以 做 爲以 X 預測 Y 的直線,則當 X = x1 ,預測值 與實際觀 察的 y1 之差異 即稱爲預測誤差,誤差平方和即定義爲 求 使函數 f 爲最小時,由微積分解 “ 極大或極小 ” 方法。
3 3 Slide 最小平方直線 解此聯立方程組 : 可得 可得 故最小平方直線為
4 4 Slide Example: Reed Auto Sales n Simple Linear Regression Reed Auto periodically has a special week-long sale. As part of the advertising campaign Reed runs one or more television commercials during the weekend preceding the sale. Data from a sample of 6 previous sales are shown below. Number of TV Ads Number of Cars Sold Number of TV Ads Number of Cars Sold
5 5 Slide n Slope for the Estimated Regression Equation b 1 = (12)(122)/5 = 5 b 1 = (12)(122)/5 = (12) 2 / (12) 2 /5 n y -Intercept for the Estimated Regression Equation b 0 = (2) = b 0 = (2) = n Estimated Regression Equation y = x ^ Example: Reed Auto Sales
6 6 Slide Example: Reed Auto Sales n Scatter Diagram
7 7 Slide The Coefficient of Determination n Relationship Among SST, SSR, SSE SST = SSR + SSE n Coefficient of Determination r 2 = SSR/SST where: SST = total sum of squares SST = total sum of squares SSR = sum of squares due to regression SSR = sum of squares due to regression SSE = sum of squares due to error SSE = sum of squares due to error ^^
8 8 Slide 判定係數 n 定義: r 2 = SSR/SST n 用以表示 Y 的變異數中已被 X 解釋的部分(比率) 當 r 2 愈大時,表示最小平方直線愈精確 當 r 2 愈大時,表示最小平方直線愈精確 1 - r 2 為總變異數 (SST) 中無法由 X 解釋的餘量(剩餘的比率) 1 - r 2 為總變異數 (SST) 中無法由 X 解釋的餘量(剩餘的比率) n 表示汽車銷售量的差異與變化有 85.2% 可由 “ 廣告次數 ” 這個因 素來解釋(而有 14.8% 無法由 “ 廣告次數 ” 所解釋) 表示汽車銷售量的差異與變化有 85.2% 可由 “ 廣告次數 ” 這個因 素來解釋(而有 14.8% 無法由 “ 廣告次數 ” 所解釋) Example: Reed Auto Sales r 2 = SSR/SST = 100/ =
9 9 Slide The Correlation Coefficient n Sample Correlation Coefficient where: b 1 = the slope of the estimated regression b 1 = the slope of the estimated regressionequation
10 Slide Example: Reed Auto Sales n Sample Correlation Coefficient The sign of b 1 in the equation is “+”. r xy = r xy =
11 Slide Model Assumptions Assumptions About the Error Term Assumptions About the Error Term The error is a random variable with mean of zero. The error is a random variable with mean of zero. The variance of , denoted by 2, is the same for all values of the independent variable. The variance of , denoted by 2, is the same for all values of the independent variable. The values of are independent. The values of are independent. The error is a normally distributed random variable. The error is a normally distributed random variable.
12 Slide Testing for Significance To test for a significant regression relationship, we must conduct a hypothesis test to determine whether the value of 1 is zero. To test for a significant regression relationship, we must conduct a hypothesis test to determine whether the value of 1 is zero. n Two tests are commonly used t Test t Test F Test F Test Both tests require an estimate of 2, the variance of in the regression model. Both tests require an estimate of 2, the variance of in the regression model.
13 Slide Testing for Significance An Estimate of 2 An Estimate of 2 The mean square error (MSE) provides the estimate of 2, and the notation s 2 is also used. s 2 = MSE = SSE/(n-2) s 2 = MSE = SSE/(n-2)where:
14 Slide Testing for Significance An Estimate of An Estimate of To estimate we take the square root of 2. To estimate we take the square root of 2. The resulting s is called the standard error of the estimate. The resulting s is called the standard error of the estimate.
15 Slide Testing for Significance: t Test n Hypotheses H 0 : 1 = 0 H 0 : 1 = 0 H a : 1 = 0 H a : 1 = 0 n Test Statistic n Rejection Rule Reject H 0 if t t where t is based on a t distribution with where t is based on a t distribution with n - 2 degrees of freedom. n - 2 degrees of freedom.
16 Slide n t Test Hypotheses H 0 : 1 = 0 Hypotheses H 0 : 1 = 0 H a : 1 = 0 H a : 1 = 0 Rejection Rule Rejection Rule For =.05 and d.f. = 4, t.025 = For =.05 and d.f. = 4, t.025 = Reject H 0 if t > Reject H 0 if t > Test Statistics Test Statistics t = 5/ = Conclusions Conclusions Reject H 0 Reject H 0 P-value 2P{T>4.804}= }= <0.05 Reject H 0 Reject H 0 Example: Reed Auto Sales
17 Slide Confidence Interval for 1 We can use a 95% confidence interval for 1 to test the hypotheses just used in the t test. We can use a 95% confidence interval for 1 to test the hypotheses just used in the t test. H 0 is rejected if the hypothesized value of 1 is not included in the confidence interval for 1. H 0 is rejected if the hypothesized value of 1 is not included in the confidence interval for 1.
18 Slide Confidence Interval for 1 The form of a confidence interval for 1 is: The form of a confidence interval for 1 is: where b 1 is the point estimate is the margin of error is the t value providing an area of /2 in the upper tail of a t distribution with n - 2 degrees t distribution with n - 2 degrees of freedom
19 Slide Example: Reed Auto Sales n Rejection Rule Reject H 0 if 0 is not included in the confidence interval for 1. 95% Confidence Interval for 1 95% Confidence Interval for 1 = (1.0408) = = (1.0408) = or 2.11 to 7.89 n Conclusion Reject H 0
20 Slide Testing for Significance: F Test n Hypotheses H 0 : 1 = 0 H 0 : 1 = 0 H a : 1 = 0 H a : 1 = 0 n Test Statistic F = MSR/MSE n Rejection Rule Reject H 0 if F > F where F is based on an F distribution with 1 d.f. in the numerator and n - 2 d.f. in the denominator.
21 Slide n F Test Hypotheses H 0 : 1 = 0 Hypotheses H 0 : 1 = 0 H a : 1 = 0 H a : 1 = 0 Rejection Rule Rejection Rule For =.05 and d.f. = 1, 4: F.05 = For =.05 and d.f. = 1, 4: F.05 = Reject H 0 if F > Reject H 0 if F > Test Statistic Test Statistic F = MSR/MSE = 100/4.333 = Conclusion Conclusion We can reject H 0. Example: Reed Auto Sales
22 Slide Some Cautions about the Interpretation of Significance Tests Rejecting H 0 : 1 = 0 and concluding that the relationship between x and y is significant does not enable us to conclude that a cause-and-effect relationship is present between x and y. Rejecting H 0 : 1 = 0 and concluding that the relationship between x and y is significant does not enable us to conclude that a cause-and-effect relationship is present between x and y. Just because we are able to reject H 0 : 1 = 0 and demonstrate statistical significance does not enable us to conclude that there is a linear relationship between x and y. Just because we are able to reject H 0 : 1 = 0 and demonstrate statistical significance does not enable us to conclude that there is a linear relationship between x and y.
23 Slide n Confidence Interval Estimate of E ( y p ) n Prediction Interval Estimate of y p y p + t /2 s ind y p + t /2 s ind where the confidence coefficient is 1 - and t /2 is based on a t distribution with n - 2 d.f. is the standard error of the estimate of E ( y p ) is the standard error of the estimate of E ( y p ) s ind is the standard error of individual estimate of estimate of Using the Estimated Regression Equation for Estimation and Prediction
24 Slide Standard Errors of Estimate of E ( y p ) and y p
25 Slide E ( y p ) 與 y p 估計式的變異數 n 的變異數: 的變異數: 的變異數: n 估計式的變異數:
26 Slide n Point Estimation If 3 TV ads are run prior to a sale, we expect the mean number of cars sold to be: y = (3) = cars n Confidence Interval for E ( y p ) 95% confidence interval estimate of the mean number of cars sold when 3 TV ads are run is: = to cars = to cars n Prediction Interval for y p 95% prediction interval estimate of the number of cars sold in one particular week when 3 TV ads are run is: = to cars ^ Example: Reed Auto Sales