CORRELATION LECTURE 1 EPSY 640 Texas A&M University
ALTITUDE HEIGHT OF COLUMN Figure 3.1: Graph of Torricelli and Viviani 1643/44 data on Altitude and Height of a column of mercury
TABULAR DATA HEIGHTALT CHANGEMN CHNG HT ALTHT ALT Predicted:
SYMBOLIC REPRESENTATION mathematical representation: height 1/altitude where means “proportional to.” orH = b 1 A + b 0 H =height of the column of mercury, b 1 is a multiplier or coefficient, b 0 is a constant value that makes the data points line up correctly, also the value H takes when A is zero.
MATH REPRESENTATION For the data above the following numbers are produced from the best fit: H = A Thus, for any altitude in feet, we multiply it by and add Our approximation was H = = A(+900) =change = 900 x ( ) close enough
MATH REPRESENTATION Error - the difference between prediction and observation. Note: error in our estimate for going from 3000 to 3900 feet should have dropped the mercury from to 26.37, but it only dropped to 26.65, error = +.28 inches Prediction -the outcome of computing an equation such as that for H above.
Karl Pearson ( (exerpted from E S Pearson, Karl Pearson: An Appreciation of some aspects of his life and works, Cambridge University Press, 1938).
Pearson Correlation standard deviation (SD)- measure of spread of scores SD of the three data points s A = 900 coefficient , the amount of change in height per foot of altitude. s H = m A = , m A = 3900 re-represent the data in standard score units, or z-scores as z H = z A.
Pearson Correlation z H = z A Thus, a 1 standard deviation change in altitude produces a standard deviation change in height Thus, SD A = = x = inches per 900 feet of altitude
Pearson Correlation n (x i – x x )(y i – y y )/(n-1) r xy = i=1_____________________________ = s xy /s x s y s x s y = z x i z y i /(n-1) / s x s y = COVARIANCE / SD(x)SD(y)
COVARIANCE DEFINED AS CO-VARIATION “UNSTANDARDIZED CORRELATION”
Squared correlation “r-squared” Most squared things are: –area measures –variance-related –Often have a chi-square distribution (looks somewhat like a Poisson)
Variance of X=1 Variance of Y=1 r 2 = percent overlap in the two squares Fig. 3.6: Geometric representation of r 2 as the overlap of two squares a. Nonzero correlation Variance of X=1 Variance of Y=1 B. Zero correlation
SSy SSx S xy Sums of Squares and Cross Product (Covariance) Circles are easier to show than rectangles, still area concept:
StudentX (SAT Math) X=X-Mean Y (Calc grade) Y=Y-Mean X Y Contributor Discrepant D = * C = * B = * A = C = * B = * Sum Mean (n-1 divisor) SD Correlation = 40/ =.364 b 1 = b 0 = *550 y =.00364SAT +.5 means:2.5 = Note: prediction always includes the means Pred(Ymean)= b1Xmean + b0 Table 3.1: Calculation of Pearson correlation coefficient for hypothetical data on SAT Math and Calculus Grades
Plot of data of Calc grade by SAT Math
SAT Math Calc Grade.364 (40) error. 932(.955) Figure 3.4: Path model representation of correlation between SAT Math scores and Calculus Grades 1 – r 2 s e = standard deviation of errors correlation covariance
Path Models path coefficient -standardized coefficient next to arrow, covariance in parentheses error coefficient- the correlation between the errors, or discrepancies between observed and predicted Calc Grade scores, and the observed Calc Grade scores. Predicted(Calc Grade) = SAT-Math +.5 errors are sometimes called disturbances
X Y a XY b X Y e c Figure 3.2: Path model representations of correlation
BIVARIATE DATA 2 VARIABLES QUESTION: DO THEY COVARY? IF SO, HOW DO WE INTERPRET? IF NOT, IS THERE A THIRD INTERVENING (MEDIATING) VARIABLE OR EXOGENOUS VARIABLE THAT SUPPRESSES THE RELATIONSHIP? OR MODERATES THE RELATIONSHIP
IDEALIZED SCATTERPLOT POSITIVE RELATIONSHIP X Y Prediction line
IDEALIZED SCATTERPLOT NEGATIVE RELATIONSHIP X Y Prediction line 95% confidence interval around prediction X. Y.
IDEALIZED SCATTERPLOT NO RELATIONSHIP X Y Prediction line
SUPPRESSED SCATTERPLOT NO APPARENT RELATIONSHIP X Y Prediction lines MALES FEMALES
MODEERATION AND SUPPRESSION IN A SCATTERPLOT NO APPARENT RELATIONSHIP X Y Prediction lines MALES FEMALES
IDEALIZED SCATTERPLOT POSITIVE CURVILINEAR RELATIONSHIP X Y Linear prediction line Quadratic prediction line
INFLUENCE OF POINTS SOME POINTS CHANGE RELATIONSHIP (outliers, influence points), OTHERS DO LITTLE ACTIVITY: –1. CONSTRUCT 10 POINT SCATTERPLOT, TRY TO APPROXIMATE.6 CORRELATION –DETERMINE LOCATIONS FOR POINTS THAT CHANGE THE CORRELATION TO.4 OR LESS
Computing Correlation with SPSS SPSS data files are organized by ROWS: people or unitsCOLUMNS: variables Select “Analyze/Correlate/Bivariate” Highlight a variable, move it to the text box, repeat for all variables to be correlated Select “Pearson” or “Spearman (ordinal only) Select “One” or “Two” tailed for significance testing: do you have theory that says a correlation should be positive (or negative)? Test one-tailed, which tests if the correlation is zero or not
Computing Correlation with SPSS continued Select “Options”, check “Means and Standard Deviations” if you want summary statistics correlation signficance Sample size
5%