Download presentation
Presentation is loading. Please wait.
1
CORRELATION AND SIMPLE LINEAR REGRESSION - Revisited Ref: Cohen, Cohen, West, & Aiken (2003), ch. 2
2
Pearson Correlation n (x i – m x )(y i – m y )/(n-1) r xy = I=1_____________________________ = s xy /s x s y s x s y = z x i z y i /(n-1) / = 1 – ( (z x i -z y i ) 2 /2(n-1) = 1 – ( (d z i ) 2 /2(n-1) = COVARIANCE / SDxSDy
3
Variance of X=1 Variance of Y=1 r 2 = percent overlap in the two squares Fig. 3.6: Geometric representation of r 2 as the overlap of two squares a. Nonzero correlation Variance of X=1 Variance of Y=1 B. Zero correlation
4
SSy SSx S xy Sums of Squares and Cross Product (Covariance)
5
SAT Math Calc Grade.00364 (.40) ) error.932(.955) Figure 3.4: Path model representation of correlation between SAT Math scores and Calculus Grades R 2 =.4 2 =.16
6
Path Models path coefficient -standardized coefficient next to arrow, covariance in parentheses error coefficient- the correlation between the errors, or discrepancies between observed and predicted Calc Grade scores, and the observed Calc Grade scores. Predicted(Calc Grade) =.00364 SAT-Math + 2.5 errors are sometimes called disturbances
7
X Y a XY b XY c Figure 3.2: Path model representations of correlation
8
SUPPRESSED SCATTERPLOT NO APPARENT RELATIONSHIP X Y Prediction lines MALES FEMALES
9
IDEALIZED SCATTERPLOT POSITIVE CURVILINEAR RELATIONSHIP X Y Linear prediction line Quadratic prediction line
10
LINEAR REGRESSION- REVISITED
11
Single predictor linear regression. Regression equations: y = x b 1 x+ x b 0 x = y b 1 y + y b 0 Regression coefficients: x b 1 = r xy s y / s x y b 1 = r xy s x / s y
12
Two variable linear regression Path model representation:unstandardized x y e b1b1
13
Linear regression y = b 1 x + b 0 If the correlation coefficient is calculated, then b 1 can be calculated from the equation above: b 1 = r xy s y / s x The intercept, b 0, follows by placing the means for x and y into the equation above and solving: _ _ b 0 = y. – [ r xy s y /s x ] x.
14
Linear regression Path model representation:standardized zxzx zyzy e r xy
15
Least squares estimation The best estimate will be one in which the sum of squared differences between each score and the estimate will be the smallest among all possible linear unbiased estimates (BLUES, or best linear unbiased estimate).
16
Least squares estimation errors or disturbances. They represent in this case the part of the y score not predictable from x: e i = yi – b 1 x i. The sum of squares for errors follows: n SS e = e 2 i. i-1
17
e y x e e e e e e e SS e = e 2 i
18
Matrix representation of least squares estimation. We can represent the regression model in matrix form: y = X + e
19
Matrix representation of least squares estimation y = X + e y 1 1 x 1 e 1 0 y 2 1 x 2 1 e 2 y 3 1 x 3 e 3 y 4 = 1 x 4 +e 4.1..
20
Matrix representation of least squares estimation y = Xb + e The least squares criterion is satisfied by the following matrix equation: b = (X’X) -1 X’y. The term X’ is called the transform of the X matrix. It is the matrix turned on its side. When X’X is multiplied together, the result is a 2 x 2 matrix n x i x i x 2 i
21
SUMS OF SQUARES SS e = (n – 2 )s 2 e SS reg = ( b 1 xi – y. ) 2 SS y = SS reg + SS e
22
SUMS OF SQUARES-Venn Diagram ssreg SSy SSe Fig. 8.3: Venn diagram for linear regression with one predictor and one outcome measure SS x
23
STANDARD ERROR OF ESTIMATE s 2 y = s 2 y hat + s 2 e s 2 z y = 1 = r 2 y.x +s 2 e z s e z = s y ( 1 - r 2 y.x ) = SSe / (n-2) Review slide 17: this is the standard deviation of the errors shown there
24
SUMS OF SQUARES- ANOVA Table SOURCEdfSum of Mean F SquaresSquare x1SS reg SS reg / 1SS reg / 1 SS e /(n-2) e n-2SS e SS e / (n-2) Totaln-1SS y SS y / (n-1) Table 8.1: Regression table for Sums of Squares
25
Confidence Intervals Around b and Beta weights s b = (s y / s x ) (1 - r 2 y.x )/ (n-2) Standard deviation of sampling error of estimate of regression weight b s β = ( 1 - r 2 y.x )/ (n-2) Note: this is formally correct only for a regression equation, not for the Pearson correlation
26
Distribution around parameter estimates: b-weight b estimate sbsb ± t s b
27
Hypothesis testing for the regression weight Null hypothesis: b population = 0 Alternative hypothesis: b population ≠ 0 Test statistic: t = b sample / s e b Student’s t-distribution with degrees of freedom = n-2
28
Test of b=0 rejected at.05 level SPSS Regression Analysis option predicting Social Stress from Locus of Control in a sample of 16 year olds
29
Locus of Control Social Stress.190 (.539) ) error 3.12(.842) Figure 3.4: Path model representation of prediction of Social Stress from Locus of Control R 2 =.291 √1- R 2 =.842 b β sese
30
Difference between Independent b- weights Compare two groups’ regression weights to see if they differ (eg. boys vs. girls) Null hypothesis: b boys = b girls Test statistic: t = (b boys - b girls ) / (s b boys – b girls ) (s b boys – b girls ) = √ s 2 b boys + s 2 b girls Student’s t distribution with n 1 + n 2 - 4
31
boys n=22 girls n=12 t = (.281 -.106) / √ (.081 2 +.058 2 ) = 1.76
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.