1-1 Regression Models Population Deterministic Regression Model Y i = 0 + 1 X i u Y i only depends on the value of X i and no other factor can affect Y i. Population Probabilistic Regression Model Y i = 0 + 1 X i + i i n. E(Y |X i )= 0 + 1 X i, That is, Y ij = E(Y |X i ) + ij 0 + 1 X ij + ij i n; j = 1, 2,..., N. 0 and 1 are population parameters 0 and 1 are estimated by sample statistics b 0 and b 1 u Sample Model:
1-2 Assumptions Underlying Linear Regression– for Y For each value of X, there is a group of Y values, and these Y values are normally distributed. The means of these normal distributions of Y values all lie on the straight line of regression. The error variances of these normal distributions are equal (Homoscedasticity). If the error variances are not constant ( called heteroscedasticity). The Y values are statistically independent. This means that in the selection of a sample, the Y values chosen for a particular X value do not depend on the Y values for any other X values.
1-3 Equation of the Simple Regression Line
1-4 Ordinary Least Squares (OLS) Analysis
1-5
1-6 Least Squares Analysis
1-7 Standard Error of the Estimate Sum of Squares Error Standard Error of the Estimate
1-8 Proof: Standard Error of the Estimate Sum of Squares Error Standard Error of the Estimate
Coefficient of Determination The Coefficient of Determination, r 2 - the proportion of the total variation in the dependent variable Y that is explained or accounted for by the variation in the independent variable X. –The coefficient of determination is the square of the coefficient of correlation, and ranges from 0 to
1-10 Analysis of Variance (ANOVA)
1-11 Figure: Measures of variation in regression
1-12 Expectation of b 1
1-13 Variance of b 1
1-14 Expectation of b 0
1-15 Variance of b 0
1-16 Covariance of b 0 and b 1
1-17
Confidence Interval—predict The confidence interval for the mean value of Y for a given value of X is given by: p.483
1-19 Prediction of Y 0
Prediction Interval of an individual value of Y 0 The prediction interval for an individual value of Y for a given value of X is given by: p.484
1-21 Figure: Confidence Intervals for Estimation Y X=6.5 Confidence Intervals for Y X Confidence Intervals for E(Y X )
1-22 The Coefficient of Correlation, r The Coefficient of Correlation (r) is a measure of the strength of the relationship between two variables. –It requires interval or ratio-scaled data (variables). –It can range from to –Values of or 1.00 indicate perfect and strong correlation. –Values close to 0.0 indicate weak correlation. –Negative values indicate an inverse relationship and positive values indicate a direct relationship.
1-23 (Pearson Product-Moment ) Correlation Coefficient For sampleFor population p.489
1-24 Covariance p. 493
1-25 Coefficient of regression and correlation
1-26 F and t statistics
1-27 The Simple Regression Model-Matrix Denote
1-28
1-29
1-30
1-31
1-32
1-33