Department of Cognitive Science Michael J. Kalsher Adv. Experimental Methods & Statistics PSYC 4310 / COGS 6310 Bivariate Relationships 1 PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2012, Michael Kalsher
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher Overview Back to variance... Cross-product deviations and Covariance Characteristics of the correlation coefficient Types of correlation –Bivariate vs. partial correlation –Parametric vs. non-parametric –Partial vs. Semi-partial (or part) Reporting correlation coefficients 2
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher Variance (s 2 or 2 ) The average squared difference from the mean. It tells us--on average--how much a given data point differs from the mean of all data points. s 2 = SS = ( x i – x ) 2 N-1 N-1 Where: x i = a single data point x = the mean of the sample N = number of observations SS = Sum of Squares or more precisely, sum of squared deviations from the mean.
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher Linear Bivariate Relationships
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher –Measures extent to which corresponding elements from two sets of ordered data move in the same direction (see example next slide). –Not based on standard scores (more on this later). –Its value is influenced by: the strength of the linear relationship between X and Y the size of the standard deviations of X and Y (i.e., s x and s y ) Covariance: Assessing association between two variables
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher Example: Imagine we expose five people to a specific number of advertisements promoting a particular type of candy and then measure how many packages of the candy each person purchases the following week Deviation between each data point and the mean Mean = 11.0 Mean = 5.4 = = Packs of candy purchased = Ads watched
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher What’s going on here? 1.The pattern of deviations is similar for both variables. 2.But how do we quantify the level of similarity? -- For a single variable, recall that we calculate the average squared deviation from the mean to determine the level of dispersion in the data (i.e,, the variance). -- For two variables, we multiply the individual deviations for one variable by the corresponding deviations for the second variable to obtain the cross-product deviations, then divide by N-1 to compute the covariance.
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher Var Y Var X Cov XY Covariance: Conceptual Form
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher Covariance: Equation Form cov(x,y) = (x i - x)(y i - y) N - 1 = (-0.4)(-3) + (-1.4)(-2) + (-1.4)(-1) + (-0.6)(2) + (2.6)(4) 4 = (1.2) + (2.8) + (1.4) + (1.2) + (10.4) 4 Positive covariance = as one variable deviates from the mean, the other variable deviates in the same direction. Negative covariance = as one variable deviates from the mean, the other variable deviates from the mean in the opposite direction. = 17 =
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher Standardized Covariance: The Correlation Coefficient Covariance useful, but dependent on the scales of measurement used. A more practical approach is to use a unit of measurement into which any scale of measurement can be converted--standard deviation units. The standardized covariance is known as the correlation coefficient: r = cov xy = (x i - x)(y i - y) (N - 1) s x s y sxsysxsy
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher Standardized (z) scores z x = (x i – x) / s x
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher Pearson Correlation Coefficient: Some Characteristics –The sample correlation coefficient (r XY ) provides the best sample estimate of the population correlation coefficient, XY. –Values vary between +1.0 and –r XY is a standardized measure of an observed effect. –The square of the correlation coefficient, R 2 XY, is termed “r squared” or the index of association and is defined as the proportion of the variance of one variable that is shared by another variable.
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher A researcher wonders whether self-esteem is associated with academic performance and collects the following data. Calculating Correlation: A Simple Example
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher Calculating Correlations: Using z-scores r XY = ( ) + ( ) =.2281 Where:
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher This formula is useful because it does not involve computing means, standard deviations, and z scores. Calculating Correlations: Using the Computational Formula
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher Calculating Simple Correlation: Using SPSS
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 17 Step 1 Step 2 Step 3
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 18 The results showed that student’s self-esteem was not significantly related to their academic performance, r =.23, p >.05.
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher Types of Correlation Bivariate correlation: Used to assess the relationship between two variables. Partial correlation: When we do a partial correlation between two variables, we control for the effects of a third variable. Specifically, the effect that the third variable has on both variables in the correlation is controlled. (Useful for isolating the unique relationship between two variables when other variables are ruled out). Semi-partial (or part) correlation : When we do a semi-partial correlation, we control for the effect that the third variable has on only one of the variables in the correlation. (Useful when trying to explain the variance in one particular variable from a set of predictor variables). ABAB CABCAB CABCAB
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 20 Bivariate Correlation : Exam Anxiety.sav Revise = Revision time Exam = Exam performance Anxiety = Exam anxiety A psychologist is interested in the effects of exam stress and revision on exam performance. Exam anxiety is assessed with a standardized measure (EAQ). Revision is defined as the number of hours students spend studying for the exam.
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 21 Pearson’s r: Interesting Facts Data must be score-level Significance testing requires that data must be normally distributed. Step 1 Step 2 Step 3
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 22 Bivariate Correlation: SPSS Output
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 23 Exam Performance and Exam Anxiety share 19.4% of their variation. Exam Performance and Revision Time share 15.7% of their variation. Partial Correlation: Examining the relationship between two variables when the effects of a third variable are held constant r =.441; R 2 =.194 = 19.4% shared variance r =.397; R 2 =.157 = 15.7% shared variance r =.709; R 2 =. 502 = 50.2% shared variance Exam Anxiety and Revision Time share 50% of their variation Two “chunks” of variance in Exam Performance share unique variance with Exam anxiety and Revision time.
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 24 Calculating Partial-Correlation: Using SPSS – Exam Anxiety.sav Let’s next try assessing the partial correlation between exam anxiety and exam performance while controlling for the effect of revision time.
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 25 Note: The partial correlation is still statistically significant, but is much lower when the effects of Time Spend Revising is held constant. R 2 = 19.4%R 2 = 6% Zero-order Correlations Partial Correlation
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 26 Correlation: Non-Parametric Alternatives (See TheBiggestLiar dataset; choose Creativity and Position as variables) Spearman’s Correlation Coefficient (r s ): Used when the data violate parametric assumptions (e.g., normally distributed data; ordinal data). Works by first ranking the data and then applying Pearson’s equation to these ranks. Kendall’s tau ( ): Used instead of Spearman’s correlation coefficient when data set is small and has a large number of tied ranks.
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 27 Non-Parametric Correlation: TheBiggestLiar.sav A researcher gathers 68 past participants of The World’s Biggest Liar Competition. Each person indicates his/her placement (1 st, 2 nd, etc.) and completes a creativity questionnaire (max. score = 60). Is level of creativity related to a person’s ability to tell tall tales?
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 28 Non-Parametric Correlation: Output
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 29 Correlation: Applied to Dichotomous Variables (see pbcorr.sav dataset) Point-Biserial Correlation (r pb ): Used when one variable is a discrete dichotomy (e.g., pregnancy). Biserial Correlation (r b ): Used when one variable is a continuous dichotomy (e.g., passing or failing an exam). SPSS calculates r pb ; must the following equation to calculate r b Where: p = proportion of cases in largest category q = proportion of cases in smallest category y = value obtained from Appendix A.1: Table of the standard normal distribution r b = r pb pq y
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 30 (See Appendix A.1: Table of the standard normal distribution. Note: to obtain values for “p” and “q”, go to Analyze, Descriptive Statistics, and then Frequencies. Select the dichotomous variables of interest, and the run the analysis.
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 31 Suppose we are interested in the relationship between the gender of a cat and how much time it spends away from home? How will you test that relationship?
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 32 r b = r pb pq y Converting r pb to r b (see pbcorr.sav dataset) =0.475
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher Problem #1: Correlation 33 Using the ChickFlick.sav data, is there a relationship between gender and arousal? Using the same data, is there a relationship between the film watched and arousal?
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 34 Problem #2: Correlation 34 Using the data collected in class, is there a relationship between your age, gender, political perspective, and your current level of satisfaction with the President Obama’s performance. Does there appear to be any shared variance among any/all of the variables? Codes: Gender: 1 = Female; 2 = Male Political Leaning: 1 = Liberal; 2 = Moderate; 3 = Conservative President’s Performance Rating: | | | | | | | Not at All Somewhat Moderately Very Satisfied Satisfied Satisfied Satisfied