Download presentation
Presentation is loading. Please wait.
1
CORRELATION LECTURE 1 EPSY 640 Texas A&M University
2
ALTITUDE HEIGHT OF COLUMN 32 30 28 26 24 22 20 30 32 34 36 38 40 42 44 46 48 Figure 3.1: Graph of Torricelli and Viviani 1643/44 data on Altitude and Height of a column of mercury
3
TABULAR DATA HEIGHTALT CHANGEMN CHNG HT ALTHT ALT 28.043000 26.653900 -1.39+900 24.714800 -1.96+900 -1.67 +900 Predicted:
4
SYMBOLIC REPRESENTATION mathematical representation: height 1/altitude where means “proportional to.” orH = b 1 A + b 0 H =height of the column of mercury, b 1 is a multiplier or coefficient, b 0 is a constant value that makes the data points line up correctly, also the value H takes when A is zero.
5
MATH REPRESENTATION For the data above the following numbers are produced from the best fit: H = -.00185 A + 33.682 Thus, for any altitude in feet, we multiply it by -.00185 and add 33.682 Our approximation was H = -1.67 = A(+900) =change 1.665 = 900 x (-.00185) close enough
6
MATH REPRESENTATION Error - the difference between prediction and observation. Note: error in our estimate for going from 3000 to 3900 feet should have dropped the mercury from 28.04 to 26.37, but it only dropped to 26.65, error = +.28 inches Prediction -the outcome of computing an equation such as that for H above.
7
Karl Pearson (1857-1936. (exerpted from E S Pearson, Karl Pearson: An Appreciation of some aspects of his life and works, Cambridge University Press, 1938).
8
Pearson Correlation standard deviation (SD)- measure of spread of scores SD of the three data points s A = 900 coefficient -.00185, the amount of change in height per foot of altitude. s H = 1.673. m A = 26.467, m A = 3900 re-represent the data in standard score units, or z-scores as z H = -.995 z A.
9
Pearson Correlation z H = -.995 z A Thus, a 1 standard deviation change in altitude produces a -.995 standard deviation change in height Thus, -.995 SD A = = -.995 x 1.673 = 1.664635 inches per 900 feet of altitude
10
Pearson Correlation n (x i – x x )(y i – y y )/(n-1) r xy = i=1_____________________________ = s xy /s x s y s x s y = z x i z y i /(n-1) / s x s y = COVARIANCE / SD(x)SD(y)
11
COVARIANCE DEFINED AS CO-VARIATION “UNSTANDARDIZED CORRELATION”
12
Squared correlation “r-squared” Most squared things are: –area measures –variance-related –Often have a chi-square distribution (looks somewhat like a Poisson)
13
Variance of X=1 Variance of Y=1 r 2 = percent overlap in the two squares Fig. 3.6: Geometric representation of r 2 as the overlap of two squares a. Nonzero correlation Variance of X=1 Variance of Y=1 B. Zero correlation
14
SSy SSx S xy Sums of Squares and Cross Product (Covariance) Circles are easier to show than rectangles, still area concept:
15
StudentX (SAT Math) X=X-Mean Y (Calc grade) Y=Y-Mean X Y Contributor Discrepant 1 450-100 D = 1.0 -1.5 +150* 2 450 -100 C = 2.0 -.5 +50* 3 500 -50 B = 3.0 +.5 -25* 4 550 0 A = 4.0 +1.5 0 5 650+100 C = 2.0 -.5 -50* 6 700+150 B = 3.0 +.5 + 75* Sum 3300 0 15.0 0 +200 Mean 550 0 2.5 0 +40 (n-1 divisor) SD 104.88 1.05 110.02 Correlation = 40/110.02 =.364 b 1 =.00364 b 0 = 2.5-.00364*550 y =.00364SAT +.5 means:2.5 = 2.0 +.5 Note: prediction always includes the means Pred(Ymean)= b1Xmean + b0 Table 3.1: Calculation of Pearson correlation coefficient for hypothetical data on SAT Math and Calculus Grades
16
Plot of data of Calc grade by SAT Math
17
SAT Math Calc Grade.364 (40) error. 932(.955) Figure 3.4: Path model representation of correlation between SAT Math scores and Calculus Grades 1 – r 2 s e = standard deviation of errors correlation covariance
18
Path Models path coefficient -standardized coefficient next to arrow, covariance in parentheses error coefficient- the correlation between the errors, or discrepancies between observed and predicted Calc Grade scores, and the observed Calc Grade scores. Predicted(Calc Grade) =.00364 SAT-Math +.5 errors are sometimes called disturbances
19
X Y a XY b X Y e c Figure 3.2: Path model representations of correlation
20
BIVARIATE DATA 2 VARIABLES QUESTION: DO THEY COVARY? IF SO, HOW DO WE INTERPRET? IF NOT, IS THERE A THIRD INTERVENING (MEDIATING) VARIABLE OR EXOGENOUS VARIABLE THAT SUPPRESSES THE RELATIONSHIP? OR MODERATES THE RELATIONSHIP
21
IDEALIZED SCATTERPLOT POSITIVE RELATIONSHIP X Y Prediction line
22
IDEALIZED SCATTERPLOT NEGATIVE RELATIONSHIP X Y Prediction line 95% confidence interval around prediction X. Y.
23
IDEALIZED SCATTERPLOT NO RELATIONSHIP X Y Prediction line
24
SUPPRESSED SCATTERPLOT NO APPARENT RELATIONSHIP X Y Prediction lines MALES FEMALES
25
MODEERATION AND SUPPRESSION IN A SCATTERPLOT NO APPARENT RELATIONSHIP X Y Prediction lines MALES FEMALES
26
IDEALIZED SCATTERPLOT POSITIVE CURVILINEAR RELATIONSHIP X Y Linear prediction line Quadratic prediction line
27
INFLUENCE OF POINTS SOME POINTS CHANGE RELATIONSHIP (outliers, influence points), OTHERS DO LITTLE ACTIVITY: http://istics.net/stat/PutPoints/ –1. CONSTRUCT 10 POINT SCATTERPLOT, TRY TO APPROXIMATE.6 CORRELATION –DETERMINE LOCATIONS FOR POINTS THAT CHANGE THE CORRELATION TO.4 OR LESS
28
Computing Correlation with SPSS SPSS data files are organized by ROWS: people or unitsCOLUMNS: variables Select “Analyze/Correlate/Bivariate” Highlight a variable, move it to the text box, repeat for all variables to be correlated Select “Pearson” or “Spearman (ordinal only) Select “One” or “Two” tailed for significance testing: do you have theory that says a correlation should be positive (or negative)? Test one-tailed, which tests if the correlation is zero or not
29
Computing Correlation with SPSS continued Select “Options”, check “Means and Standard Deviations” if you want summary statistics correlation signficance Sample size
30
5%
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.