Psych 230 Psychological Measurement and Statistics Pedro Wolf September 23, 2009
Correlation
Sometimes our research questions are concerned with finding the relationship between two variables Usually, these questions seek to observe these variables as they exist naturally in the world – the researcher is not trying to manipulate, but is observing what occurs Often this type of research does not allow easy definition of ‘levels’ of the independent variable
Correlation Is coffee drinking related to nervousness? Is sugar consumption related to hyperactivity in children? Are beer and coffee sales related to temperature? These type of questions are suited to a statistical technique known as correlation analysis
Statistical Testing 1.Decide which test to use 2.State the hypotheses (H 0 and H 1 ) 3.Calculate the obtained value 4.Calculate the critical value (size of ) 5.Make our conclusion
Statistical Testing 1.Decide which test to use 2.State the hypotheses (H 0 and H 1 ) 3.Calculate the obtained value - calculate r 4.Calculate the critical value (size of ) 5.Make our conclusion
Characteristics of Correlation Analyses - 1 With correlational data, we don’t calculate a mean score for each condition – we don’t figure out mean beer sales in January, February, March and so on Instead, the correlation coefficient [r] summarizes the entire relationship
Characteristics of Correlation Analyses - 2 We always examine the relationship between pairs of scores – sugar consumption and hyperactivity – age and income – beer sales and temperature So, N is the number of pairs of scores in the data
Characteristics of Correlation Analyses - 3 Neither variable is called the independent or dependent – sugar consumption and hyperactivity – age and income – beer sales and temperature
Characteristics of Correlation Analyses - 4 We graph the scores differently in correlational research – we use a scatterplot to visualize our data A scatterplot is a graph that shows the location of each data point formed by a pair of X-Y scores When a relationship exists, a particular value of Y tends to be paired with one value of X and another value of Y tends to be paired with a different value of X
Characteristics of Correlation Analyses - 5 Correlation is not causation Just because we observe a relationship between two variables, does not mean that changes in one of the variables causes changes in the other – Television watching and aggression
Scatterplot Coffee Nervousness
Scatterplot Coffee Nervousness
Relationships Two aspects of relationships: Type of relationship – shape – direction Strength of relationship – correlation coefficient – test of significance
Types of Relationship The type of relationship in a dataset can be thought of as the overall direction in which the scores on Y change as the X scores change – does knowing about variable 1 help you know something about variable 2? There are two main types of relationship – Linear – Nonlinear
Linear Relationships A linear relationship forms a pattern on a scatterplot that fits a straight line In a positive linear relationship, as the scores on the X variable increase, the scores on the Y variable also tend to increase In a negative linear relationship, as the scores on the X variable increase, the scores on the Y variable tend to decrease
Linear Relationship
Linear Relationships Positive relationship: more X leads to more Y Negative relationship: more X leads to less Y What is the relationship between study time and test scores? What is the relationship between hours of tv watched and hours slept?
Positive Linear Relationship
Negative Linear Relationship
Nonlinear Relationships A nonlinear relationship does not fit a straight line What is the relationship between stress and exam performance? – Low stress levels: suboptimal – High stress levels: suboptimal – Moderate stress levels: optimal performance Common shapes of nonlinear relationships are U- shaped and inverted U-shaped
Nonlinear Relationship
Examples 1. X Y X Y X Y X Y X Y
1: Relationship?
Negative Linear Relationship Mother’s height (inches) Father’s height (inches)
2: Relationship?
Positive Linear Relationship Excited about Course (0-10) Willing to ask question (0- 7)
3: Relationship?
No Relationship Last Haircut ($) GPA
4: Relationship?
Positive Linear Relationship Height (inches) Shoe size
5: Relationship
Nonlinear Relationship X Y
Strength of the Relationship The strength of a linear relationship is the degree to which one value of Y is consistently paired with one and only one value of X r can vary between -1 and +1 We measure the strength of the relationship with the correlation coefficient: r The larger the absolute value of the correlation coefficient, the stronger the relationship The sign of the correlation coefficient indicates the direction of a linear relationship – negative: negative relationship – positive: positive relationship
Strength of the Relationship The strength of a linear relationship is the degree to which one value of Y is consistently paired with one and only one value of X
Strength of the Relationship Describe the relationships between the variables which have the following correlations: A and B: R = 0.05 C and D: R = E and F: R = 0.96 G and H: R = 0.39 I and J: R = -0.16
Strength of the Relationship Describe the relationships between the variables which have the following correlations (in terms of strong vs. weak, positive versus negative): A and B: R = 0.05none C and D: R = -0.73strong negative E and F: R = 0.96 strong positive G and H: R = 0.39moderate positive I and J: R = weak negative
Strength of the Relationship Estimate the correlation of the following relationships:
Strength of the Relationship Estimate the correlation of the following relationships: r approx +0.90r approx 0.00
What is r? The pearson product moment correlation coefficient: r = (ΣZxZy) / N Z-scores tell us about distance from the mean The sum of squared Z-scores for a variable is equal N x=1,5,6,7,8,9 z x = z x 2 = Σ z x 2 = 5= N Therefore the closer Zx is to Zy the closer to one the correlation will be. If one of them is negative and the other is positive you get a negative correlation If both are negative or positive you get a positive correlation
Calculating R To measure the strength of a linear relationship, we will use the Pearson correlation coefficient [r] – this will be the obtained value for the statistical test The computational formula for the correlation coefficient is:
Calculating R Calculate the correlation coefficient for the following dataset: X Y
Calculating R Calculate the correlation coefficient for the following dataset: XX2X2 YY2Y2 XY X = 21 X 2 = 91 Y = 29 Y 2 = 171 XY = 81
Calculating R
Your Turn Calculate R for the following dataset X Y
Your Turn XX2X2 YY2Y2 XY X = 44 X 2 = 224 Y = 22 Y 2 = 58 XY = 110
Your Turn X = 44; X 2 = 224; Y = 22; Y 2 = 58; XY = 110 r = 10(110) - (44)(22) / {[10(224) - (44) 2 ][10(58) - (22) 2 ]} r = / {[ ][ ]} r = 132 / {[304][96]} r = 132 / r = 132 / = r = 0.773
Positive Linear Relationship Excited about Course (0-10) Willing to ask question (0- 7) r=+0.773
Statistically testing correlations The correlation coefficient [r] tells us something about the strength and direction of the linear relationship But, we often want to know whether this relationship could have happened by chance or whether it is a real, significant, relationship – we have a correlation coefficient of for the relationship between excitement about the class and willingness to ask questions – does this indicate a real relationship? What are the chances that this could have happened by fluke?
Statistical Testing 1.Decide which test to use 2.State the hypotheses (H 0 and H 1 ) 3.Calculate the obtained value 4.Calculate the critical value (size of ) 5.Make our conclusion
Statistical Testing 1.Decide which test to use 2.State the hypotheses (H 0 and H 1 ) 3.Calculate the obtained value - calculate r 4.Calculate the critical value (size of ) 5.Make our conclusion
1. Decide which test to use Are we looking for the relationship between variables? – Yes: Use the Correlation test
2. State the Hypotheses Though we are testing samples, again, we are really interested in the total population The population correlation is described by (rho) The null hypothesis (H 0 ) always states that there is no relationship between the variables H 0 : = 0 excitement about course is not related to willingness to ask questions H 1 : 0 excitement about course is related to willingness to ask questions
Plotting the correlation aaa a Values of correlation coefficient
r crit and r obt a aa a r crit =-0.67 r obt =-0.78 Values of correlation coefficient r crit =+0.67
r crit and r obt aaa a r crit =-0.67 r obt =+0.33 Values of correlation coefficient r crit =+0.67 Values of correlation coefficient
3. Calculate r obt We calculate r obt using the formula: r obt =
4. Calculate the critical value Assume =0.05 We are looking for any relationship (positive or negative), therefore it will be a two-tailed test df = N - 2 (where N is the number of pairs in the data) df = (9 - 2) = 7 Look up Table 3 – critical values of the Pearson Correlation Coefficient: the r-tables Two-tailed Test df =.05 = r crit = 0.666
r crit and r obt aaa a r crit = r obt = Values of correlation coefficient r crit =+0.666
5. Make our Conclusion r crit = 0.67 r obt = As r obt is inside the rejection region, we reject H 0 and accept H 1 We conclude that there is a significant positive relationship between excitement about a course and a willingness to ask questions in it (p < 0.05)
Significance and Importance We conclude that there is a significant positive relationship between excitement about a course and a willingness to ask questions in it (p < 0.05) How important is this finding? What proportion of the variability in people’s willingness to ask questions is related to excitement about the course (or vice versa)? We can answer this with the Effect size: r 2 r = r 2 = – around 60%
Your Turn A researcher asks if there is a relationship between the number of errors on a statistics exam and the person’s level of satisfaction with the course. Is there a significant relationship between these variables? Is it important? Errors Satisfaction
1. Decide which test to use Are we looking for the relationship between variables? – Yes: Use the Correlation test
2. State the Hypotheses H 0 : = 0 there is no relationship between errors made on the exam and satisfaction with the course H 1 : 0 there is a relationship between errors made on the exam and satisfaction with the course
3. Calculate r obt
XX2X2 YY2Y2 XY X = 49 X 2 = 371 Y = 31 Y 2 = 171 XY = 188
Your Turn X = 49; X 2 = 371; ( X) 2 = 2401; Y = 31; Y 2 = 171;( Y) 2 = 961; XY = 188N = 7 r = 7(110) - (49)(31) / {[7(371) ][7(171) - 961]} r = / {[ ][ ]} r = -203 / {[196][236]} r = / r = / = r = -0.94
4. Calculate the critical value Assume =0.05 We are looking for any relationship (positive or negative), therefore it will be a two-tailed test df = N - 2 (where N is the number of pairs in the data) df = (7 - 2) = 5 Look up Table 3 – critical values of the Pearson Correlation Coefficient: the r-tables Two-tailed Test df =.05 = r crit = 0.754
r crit and r obt aaa a r crit = r obt =-0.94 Values of correlation coefficient r crit =+0.754
5. Make our Conclusion r crit = r obt = As r obt is inside the rejection region, we reject H 0 and accept H 1 We conclude that there is a significant negative relationship between errors made on a test and satisfaction with the course (p < 0.05) – more errors made, less satisfaction
Significance and Importance We conclude that there is a significant negative relationship between errors made on a test and satisfaction with the course (p < 0.05) Importance – Effect size: r 2 r = r 2 = 0.88 – around 88% of the differences in satisfaction scores are related to the errors made on the exam
Homework Chapter 8: 2, 6, 8