Download presentation
Presentation is loading. Please wait.
1
Covariance and Correlation
Whole seminar takes about 1.5 hrs. Seminar 4
2
You must have heard… Correlation ≠ Causation
This is not completely true. Correlation ≠ Causation
3
Today’s questions What does it mean to say that two variables are associated with one another? How can we quantify the concept of association?
4
So far… We focused on summary (descriptive) statistics: their shape, central tendency, and dispersion. Often, in psychology, we ask “How do two variables relate to one another?” Coffee consumption & happiness? Cigarettes & lung cancer? IQ & nutrition?
5
The concept of bivariate association
What does it mean to say: X is correlated to Y X is related (has a relationship) to Y X is associated to Y X predicts Y Identical statements
6
The concept of bivariate association
It is about quantifying the association between two variables. Suppose you collect English (x) and Math (y) scores from 6 individuals x y [A] [B] [C] [D] [E] [F]
7
x y [A] [B] [C] [D] [E] [F]
8
People with high scores on x seem to have high scores on y
x y [A] [B] [C] [D] [E] [F] Can we define “high scores” more precisely?
9
Yes, we can. We can study deviations (xd, yd) from the mean:
(X – Mx) and (Y – My) x y [A] [B] [C] [D] [E] [F] Mx = 9.78 Mx = 9.60 xd yd [A] [B] [C] [D] [E] [F] Note: In advanced stats courses, we use the term “centering” to describe “deviations from the mean”
10
Let’s rescale the graph (note the axes)
Raw Scores Deviation Scores Now we can ask whether people who are above the mean on x (i.e., “high” on x) are above the mean on y
11
What next? We could do a frequency count of the quadrants whether each point is above/below their x-y means xd yd [A] both below [B] both below [C] both above [D] both below [E] both above [F] both above 100% match
12
But here comes a problem
These two graphs differ! Yet, by fitting them into quadrants, you conclude that they are the same (100% match).
13
One solution A precise way to study the association is to multiply each person’s deviations. Advantage: when there is a match (both + or both -), the product will be +. When there is a mismatch (one + and other -), the product will be -. xd yd (xd*yd) [A] [B] [C] [D] [E] [E]
14
Average product of deviation scores
The average of these products indicates whether the typical person has the same signed deviation score on the two variables. xd yd (xd*yd) [A] [B] [C] [D] [E] [E]
15
Covariance
16
Features of covariance
When this average product is… Mathematical meaning Interpretation Positive two variables covary positively people who are high on one variable tend to be high on the other Zero two variables do not covary together People who are high on one variable are just as likely to be high on the other as they are to be low on the other. Negative two variables negatively covary together people who are high on one variable tend to be low on the other
17
Positively covariation
Visually… Positively covariation People who drink a lot of coffee tend to be happy. Preview: The line is called a regression line, and represents the estimated linear relationship between the two variables. Also known as “trend line”, “line of best fit” Notice that the slope of the line is positive in this example.
18
Visually… (near) Zero covariation
People who are high on x are just as likely to be high on y as they are low on y The regression line is flat
19
Negatively covariation
Visually… Negatively covariation People high on x tend to be low on y The regression line has a negative slope
20
One problem with covariance
It is very sensitive to the units in which X and Y are measured Imagine this: Happiness and monthly income, with income measured in ₹ lakhs You will end up with different covariances, even though they come from identical data
21
Here’s the proof The plots are identical The covariances are different
Income (Rs) Happiness Income (lakhs) 1 109831 6 2 79854 5 3 69320 0.6932 4 78883 66673 79426 7 71355 8 82067 9 98418 10 82170 0.8217 Rs lakhs Cov 4955.5 Cor The plots are identical The covariances are different The correlations are identical
22
Now we have a solution Covariances are sensitive to the measurement units If units change covariances change Doesn’t make sense because the X-Y relationships remain the same. The solution: Pearson’s r.
23
Pearson’s r We’ve taken the covariance and “standardized” it.
24
Interpretation is identical to covariance
Except now there are upper and lower limits r = + 1 r = 0 r = - 1
25
Interpreting magnitude of correlations
Absolute size of the correlation corresponds to the magnitude or strength of the relationship r = + .70 r = + .30 r = + 1 Notice that all rs are positive because the slopes are ___________
26
Quiz Which relationship is stronger? Rank order them. r = +.70
27
Large datasets & Correlation matrix
So far, we examined data from two variables. In the real world, you’d gather data from many variables (e.g., World Values Survey – your project) Suppose you want to know what variables predict academic success of Ashoka students. What variables would you collect? How would you display them? Facebook’s mood manipulation research as an example of real world big data.
28
Large datasets & Correlation matrix
Variables predict academic success Active learning strategies (ALS) Quality of Instruction & College Experience (QICE) Internet and campus technology (ICT) Student-faculty interaction (SFI) Grade point average (GPA) ALS QICE ICT SFI GPA - .14 .34 .47 .05 .12 .41 .22 -.11 .21 Ruggut & Chemosit (2005). Factors that influence college academic achievement. J Ed Res & Pol Stud.
29
Structural equation modeling
This is graduate level statistics You don’t need to know this. Structural equation modeling Inferring causality based on theoretical and empirical correlations between variables. Also known as “model fit”
30
Factors affecting correlations
Range restriction Heterogenous subsamples Non-linearity “Outliers” (covered Tutorial 6)
31
Range restriction
32
Heterogenous subgroups
33
What would the correlation of this be?
Non-linearity What would the correlation of this be?
34
I doubt you will get these data
35
The significance of any correlation
Correlation is one of the indicators of effect size Suppose you get the following: r = 0.10 r = 0.30 r = .70 How big is big, how small is small? (Preview to Weeks 13 & 14)
36
Pearson’s r is just one type of correlation
Pearson’s r is for continuous data for variables X and Y. What if one (or both) of your variables are ordinal, dichotomous, etc.? For SRM I, you only need to know Pearson’s r. Var X Continuous Nominal Ordinal Var Y Pearson Point biserial Biserial Phi Rank biserial Spearman
37
Summary Your textbook doesn’t cover covariance. But it will be an important concept if you take advance statistics courses. A correlation coefficient has two indices: Direction Magnitude Provides an easy way to quantify the association between two variables Correlation is the basis for regression (Week 11)
38
Back to the start Correlation ≠ Causation is not always true.
When you’ve found that A correlates with B, it does not necessarily mean A B (backward inference is problematic – this is typically the reason why we say “Correlation ≠ Causation”) For A B, A must correlate with B (forward inference)
39
What are the principles of causality – how would you know X causes Y?
Class Discussion What are the principles of causality – how would you know X causes Y? 20 min discussion
40
Covariation of cause and effect (but this has its problems)
Temporal precedence (but this has its problems too) No plausible alternative explanations (also problematic) My view: Causality will always be leap of faith. Even with experiments.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.