Unit 5 Correlation
Relationship Just like it takes two persons to have a romantic relationship, you need at least two variables to form a statistical relationship. A relationship can be shown by a plot or by a number.
Scatterplot/Scattergram The plot for displaying a bivariate relationship is called “scatterplot” or “scattergram” (Cartesian coordinate system). The left-right (horizontal) direction is commonly called the X axis. The up-down (vertical) direction is commonly called the Y axis..
Scatterplot When y = 6, the vertical line goes up by six units. When x = 4, the horizontal line goes across 4 units. The data point is put into the intersection of two lines.
Scatter: Spread out Variations in the data Scatter everywhere
Different types of correlation coefficient The number to show a correlation is called correlation coefficient. Pearson’s r is applicable to measure the association between two continuous-scaled variables (Our focus). Spearman is suitable to rank-order/ordinal data We will go through the calculation later.
Pearson’s correlation coefficient Range: From -1 to +1. Pearson’s r = 1 Perfect relationship (It may be too good to be true!) Pearson’s r = 0 No relationship Positive but less than 1: Positive relationship Negative but less than -1: Negative relationship Common mistake: When the coefficient is high, it must be a strong relationship!
Pearson’s r = .83
Pearson’s r =.83
Pearson’s r =.83
Pearson’s r =.83
Computation in SPSS Use visulization_data.sav in Unit 4 Analyze Correlate Bivariate
Computation in SPSS There is no distinction between DV and IV. Put both college test scores and GPA into variables. The order doesn’t matter. Press OK
Computation in SPSS The correlation of test scores and GPA = the correlation of GPA and test scores Don’t worry about what “significance” (sig) means.
Computation in JMP In Unit 4 folder: visualization_data.jmp I want to know the relationship between overall GPA and a particular college test score. Analyze Multivariate Methods Multivariate (It takes multiple variables [at least two])
In correlation there is no distinction between DV (Y) and IV (X), and thus both variables are put into Y, columns. What we care is whether X and Y are related; we don’t care about cause and effect.
Because there is no distinction between DV and IV, the correlation between GPA and test scores (0.5273) is the same as the correlation between the test scores and GPA (0.5273).
Pearson’s r = .5273. It is fair but it could be better! The eclipse (red line) covers the majority of the data. Any points outside the red line are considered outliers (extreme cases).
I can select these outliers (extreme cases) by holding down the shift key and pointing to them one by one.
When I go back to the table, I can see that the extreme cases are highlighted in blue. Mouse over to one of them, right-click and select hide and exclude. These cases will be hidden and excluded.
Back to the graph From the first red triangle, choose Redo Redo Analysis
Without those outliers, the correlation is stronger. Pearson’s r = .6129
I want to infer from the sample to the population Red Triangle Pairwise Correlations
The confidence interval shows that in the population the correlation may be as low as .4618 or as high as .7293.
Confidence interval Why do we need a bracket (Confidence interval) to estimate the population correlation coefficient? Because we cannot access to the entire population, we are not sure. If you ask me to guess how old you are and I am not sure, I would also put my estimates within a bracket (interval). Because you are in college, I guess your age is between 18 and 22.
Semi-hand calculation To make it easier and faster, use r_hand_calculation.xlsx
Semi-hand calculation The Greek symbol S = Sum Plug the numbers into the equation: 6(20,485) – (247 × 486) / [√[[6(11,409) – (2472)] × [6(40,022) – 4862]]] = 0.5298
Assignment 5.1 Mary’s highs school GPA is 3.5 and her college GPA is 3.7. Tom’s HS GPA is 3.8 and his college GPA is 4.0. If these data are used for computing correlation, what result would you expect? Write down your expectation before making any computation. Next, use SPSS or JMP to compute the correlation coefficient. Does the result match your expectation? Is it a valid result? Why or why not?
Assignment 5.2 (Canvas) Use visualization_data (Unit 4). Compute the Pearson’s r of GPA and SAT. There is no distinction between DV and IV. You can put them in any order. You can use SPSS, JMP, Excel (semi-hand calculation) or hand-calculation.