Statistics for the Social Sciences Psychology 340 Fall 2013 Thursday, November 7, 2013 Correlation and Regression
Homework #11 due Thursday 11/14 Ch 15 #1, 2, 5, 6, 8, 9, 14
Exam III Results Will be available on Tuesday, 11/12
Last Few Weeks of the Semester Review of Correlation (CH 15) Review of Simple Linear Regression (CH 16) Multiple Regression (CH 16) Hypothesis testing with Correlation and Regression (CH 15, 16) Chi-Squared Test (CH 17)
Statistical analysis follows design We have finished the top part of the chart! Focus on this section for rest of semester
Correlation Write down what (you think) a correlation is. Write down an example of a correlation Association between scores on two variables age and coordination skills in children, as kids get older their motor coordination tends to improve price and quality, generally the more expensive something is the higher in quality it is
Correlation and Causality Correlational research design Correlation as a statistical procedure Correlation as a kind of research design (observational designs)
Another thing to consider about correlation A correlation describes a relationship between two variables, but DOES NOT explain why the variables are related Suppose that Dr. Steward finds that rates of spilled coffee and severity of plane turbulence are strongly positively correlated. One might argue that turbulence cause coffee spills One might argue that spilling coffee causes turbulence
Another thing to consider about correlation A correlation describes a relationship between two variables, but DOES NOT explain why the variables are related Suppose that Dr. Cranium finds a positive correlation between head size and digit span (roughly the number of digits you can remember). One might argue that bigger your head, the larger your digit span 1 21 24 15 37 One might argue that head size and digit span both increase with age (but head size and digit span aren’t directly related) AGE
Relationships between variables Properties of a correlation Form (linear or non-linear) Direction (positive or negative) Strength (none, weak, strong, perfect) To examine this relationship you should: Make a scatterplot - a picture of the relationship Compute the Correlation Coefficient - a numerical description of the relationship
Graphing Correlations Steps for making a scatterplot (scatter diagram) Draw axes and assign variables to them Determine range of values for each variable and mark on axes Mark a dot for each person’s pair of scores
Scatterplot X Y Plots one variable against the other Each point corresponds to a different individual Y X 1 2 3 4 5 6 X Y A 6 6
Scatterplot X Y Plots one variable against the other Each point corresponds to a different individual Y X Y 6 A 6 6 5 B 1 2 4 3 2 1 X 1 2 3 4 5 6
Scatterplot X Y Plots one variable against the other Each point corresponds to a different individual Y X Y 6 A 6 6 5 B 1 2 4 C 5 6 3 2 1 X 1 2 3 4 5 6
Scatterplot X Y Plots one variable against the other Each point corresponds to a different individual Y X Y 6 A 6 6 5 B 1 2 4 C 5 6 3 D 3 4 2 1 X 1 2 3 4 5 6
Scatterplot X Y Plots one variable against the other Each point corresponds to a different individual Y X Y 6 A 6 6 5 B 1 2 4 C 5 6 3 D 3 4 2 E 3 2 1 X 1 2 3 4 5 6
Scatterplot X Y Plots one variable against the other Each point corresponds to a different individual Y X Y 6 Imagine a line through the data points A 6 6 5 B 1 2 4 C 5 6 3 Useful for “seeing” the relationship Form, Direction, and Strength D 3 4 2 E 3 2 1 X 1 2 3 4 5 6
Form Linear Non-linear
Direction Positive Negative Y Y X X X & Y vary in the same direction As X goes up, Y goes up positive Pearson’s r X & Y vary in opposite directions As X goes up, Y goes down negative Pearson’s r
Strength The strength of the relationship Spread around the line (note the axis scales) Correlation coefficient will range from -1 to +1 Zero means “no relationship”. The farther the r is from zero, the stronger the relationship
Strength r = -1.0 “perfect negative corr.” r2 = 100% r = 0.0 “no relationship” r2 = 0.0 r = 1.0 “perfect positive corr.” r2 = 100% -1.0 0.0 +1.0 The farther from zero, the stronger the relationship
The Correlation Coefficient Formulas for the correlation coefficient: You may have used this one in PSY138 Common alternative
The Correlation Coefficient Formulas for the correlation coefficient: You may have used this one in PSY138 Common alternative
Computing Pearson’s r (using SP) Step 1: SP (Sum of the Products) 6 6 1 2 5 6 3 4 3 2 X Y mean 3.6 4.0
Computing Pearson’s r (using SP) Step 1: SP (Sum of the Products) X Y 6 6 2.4 = 6 - 3.6 1 2 -2.6 = 1 - 3.6 5 6 1.4 = 5 - 3.6 3 4 -0.6 = 3 - 3.6 3 2 -0.6 = 3 - 3.6 mean 3.6 4.0 0.0 Quick check
Computing Pearson’s r (using SP) Step 1: SP (Sum of the Products) X Y 6 6 2.4 2.0 = 6 - 4.0 1 2 -2.6 -2.0 = 2 - 4.0 5 6 1.4 2.0 = 6 - 4.0 3 4 -0.6 0.0 = 4 - 4.0 3 2 -0.6 -2.0 = 2 - 4.0 mean 3.6 4.0 0.0 0.0 Quick check
Computing Pearson’s r (using SP) Step 1: SP (Sum of the Products) X Y 6 6 2.4 4.8 * = 2.0 1 2 -2.6 5.2 * = -2.0 5 6 1.4 2.8 * = 2.0 3 4 -0.6 0.0 * = 0.0 3 2 -0.6 1.2 * = -2.0 mean 3.6 4.0 0.0 0.0 14.0 SP
Computing Pearson’s r (using SP) Step 2: SSX & SSY
Computing Pearson’s r (using SP) Step 2: SSX & SSY X Y 2 = 6 6 2.4 5.76 2.0 4.8 6.76 2 = 1 2 -2.6 -2.0 5.2 5 6 1.4 1.96 2 = 2.0 2.8 3 4 -0.6 0.36 2 = 0.0 0.0 3 2 -0.6 0.36 2 = -2.0 1.2 mean 3.6 4.0 0.0 15.20 0.0 14.0 SSX
Computing Pearson’s r (using SP) Step 2: SSX & SSY X Y 6 6 2.4 5.76 2.0 2 = 4.0 4.8 1 2 -2.6 6.76 -2.0 2 = 4.0 5.2 5 6 1.4 1.96 2.0 2 = 4.0 2.8 3 4 -0.6 0.36 0.0 2 = 0.0 0.0 2 = 4.0 3 2 -0.6 0.36 -2.0 1.2 mean 3.6 4.0 0.0 15.20 0.0 16.0 14.0 SSY
Computing Pearson’s r (using SP) Step 3: compute r
Computing Pearson’s r (using SP) Step 3: compute r X Y 6 6 2.4 5.76 2.0 4.0 4.8 1 2 -2.6 6.76 -2.0 4.0 5.2 5 6 1.4 1.96 2.0 4.0 2.8 3 4 -0.6 0.36 0.0 0.0 0.0 3 2 -0.6 0.36 -2.0 4.0 1.2 mean 3.6 4.0 0.0 15.20 0.0 16.0 14.0 SP SSX SSY
Computing Pearson’s r Step 3: compute r 15.20 16.0 14.0 SP SSX SSY
Computing Pearson’s r Step 3: compute r 15.20 16.0 SSX SSY
Computing Pearson’s r Step 3: compute r 15.20 SSX
Computing Pearson’s r Step 3: compute r
Computing Pearson’s r Step 3: compute r Y Appears linear X 1 2 3 4 5 6 Appears linear Positive relationship Fairly strong relationship .89 is far from 0, near +1
The Correlation Coefficient Formulas for the correlation coefficient: You may have used this one in PSY138 Common alternative
Computing Pearson’s r (using z-scores) Step 1: compute standard deviation for X and Y (note: keep track of sample or population) For this example we will assume the data is from a population 6 6 1 2 5 6 3 4 3 2 X Y
Computing Pearson’s r (using z-scores) Step 1: compute standard deviation for X and Y (note: keep track of sample or population) For this example we will assume the data is from a population X Y 6 6 2.4 -2.6 1.4 -0.6 0.0 5.76 1 2 6.76 1.96 0.36 5 6 3 4 3 2 Mean 3.6 15.20 SSX 1.74 Std dev
Computing Pearson’s r (using z-scores) Step 1: compute standard deviation for X and Y (note: keep track of sample or population) For this example we will assume the data is from a population X Y 6 6 2.4 5.76 2.0 -2.0 0.0 4.0 1 2 -2.6 6.76 4.0 0.0 5 6 1.4 1.96 3 4 -0.6 0.36 3 2 -0.6 0.36 Mean 3.6 4.0 15.20 16.0 SSY 1.74 1.79 Std dev
Computing Pearson’s r (using z-scores) Step 2: compute z-scores X Y 6 6 2.4 5.76 2.0 4.0 1.38 1 2 -2.6 6.76 -2.0 4.0 5 6 1.4 1.96 2.0 4.0 3 4 -0.6 0.36 0.0 0.0 3 2 -0.6 0.36 -2.0 4.0 Mean 3.6 4.0 15.20 16.0 1.74 1.79 Std dev
Computing Pearson’s r (using z-scores) Step 2: compute z-scores X Y 6 6 2.4 5.76 2.0 4.0 1.38 1 2 -2.6 6.76 -2.0 4.0 -1.49 5 6 1.4 1.96 2.0 4.0 0.8 3 4 -0.6 0.36 0.0 0.0 - 0.34 3 2 -0.6 0.36 -2.0 4.0 - 0.34 Mean 3.6 4.0 15.20 16.0 0.0 Quick check 1.74 1.79 Std dev
Computing Pearson’s r (using z-scores) Step 2: compute z-scores X Y 6 6 2.4 5.76 2.0 4.0 1.38 1.1 1 2 -2.6 6.76 -2.0 4.0 -1.49 5 6 1.4 1.96 2.0 4.0 0.8 3 4 -0.6 0.36 0.0 0.0 - 0.34 3 2 -0.6 0.36 -2.0 4.0 - 0.34 Mean 3.6 4.0 15.20 16.0 1.74 1.79 Std dev
Computing Pearson’s r (using z-scores) Step 2: compute z-scores X Y 6 6 2.4 5.76 2.0 4.0 1.38 1.1 1 2 -2.6 6.76 -2.0 4.0 -1.49 -1.1 5 6 1.4 1.96 2.0 4.0 0.8 1.1 3 4 -0.6 0.36 0.0 0.0 - 0.34 0.0 3 2 -0.6 0.36 -2.0 4.0 - 0.34 -1.1 Mean 3.6 4.0 15.20 16.0 0.0 1.74 1.79 Quick check Std dev
Computing Pearson’s r (using z-scores) Step 3: compute r X Y 6 6 2.4 5.76 2.0 4.0 1.38 * = 1.1 1.52 1 2 -2.6 6.76 -2.0 4.0 -1.49 -1.1 5 6 1.4 1.96 2.0 4.0 0.8 1.1 3 4 -0.6 0.36 0.0 0.0 - 0.34 0.0 3 2 -0.6 0.36 -2.0 4.0 - 0.34 -1.1 Mean 3.6 4.0 0.0 15.20 0.0 16.0 0.0 0.0 1.74 1.79 Std dev
Computing Pearson’s r (using z-scores) Step 3: compute r X Y 6 6 2.4 5.76 2.0 4.0 1.38 1.1 1.52 1 2 -2.6 6.76 -2.0 4.0 -1.49 -1.1 1.64 5 6 1.4 1.96 2.0 4.0 0.8 1.1 0.88 3 4 -0.6 0.36 0.0 0.0 - 0.34 0.0 0.0 3 2 -0.6 0.36 -2.0 4.0 - 0.34 -1.1 0.37 Mean 3.6 4.0 0.0 15.20 0.0 16.0 0.0 0.0 4.41 1.74 1.79 Std dev
Computing Pearson’s r (using z-scores) Step 3: compute r Y X 1 2 3 4 5 6 Appears linear Positive relationship Fairly strong relationship .88 is far from 0, near +1
A few more things to consider about correlation Correlations are greatly affected by the range of scores in the data Consider height and age relationship Extreme scores can have dramatic effects on correlations A single extreme score can radically change r When considering "how good" a relationship is, we really should consider r2 (coefficient of determination), not just r.
Correlation in Research Articles Correlation matrix A display of the correlations between more than two variables Why have a “-”? Acculturation Why only half the table filled with numbers?
Hypothesis testing with correlation H0: No population Correlation HA: There is a population correlation (Can also have directional hypothesis) Use table B6 in appendix B (gives critical values for r, at different sample sizes, based on t/F statistic that can be calculated)
Correlation in SPSS Enter the data in two columns (one for X and one for Y), with scores for each individual in the same row. Analyze -->Correlate-->Bivariate Move labels from the two variables into the Variables box Check Pearson box Click OK Output includes correlation (r) and p-value (significance level for the correlation)
Next time Predicting a variable based on other variables Read Chapter 16