Download presentation
Presentation is loading. Please wait.
Published byOsborne Hall Modified over 6 years ago
1
Correlations: testing linear relationships between two metric variables
Lecture 18:
2
Agenda Brief Update on Data for Final Correlations
3
Some Data Considerations for Final Exam:
Two metric variables Correlation One binary categorical variable, one metric variable T-test Two categorical variables Chi-Square (crosstabulations) Polytomous dependent variable, metric IV ANOVA Metric dependent variable, multiple IV’s (categorical or metric) Linear Regression
4
Checking for simple linear relationships
Pearson’s correlation coefficient Measures the extent to which two metric or interval-type variables are linearly related Statistic is Pearson r, or the linear or product-moment correlation Or, the correlation coefficient is the average of the cross products of the corresponding z-scores.
5
Scatterplots !!! Three ways to summarize this data: point of averages
Horizontal SD Vertical SD
6
The correlation coefficient
r = average of (x in standard units) x (y in standard units) The correlation coefficient is a pure number without units. Thus, the correlation coefficient is not affected by: Change in scale Interchanging variables Adding the same number to one of the variables Multiplying all values of one variable by a positive number
7
Sidenote: Why N-1? If we have a randomly distributed variable in a population, extreme cases (i.e., the tails) are less likely to be selected than common cases (i.e., within 1 SD of the mean). One result of this: sample variance is lower than actual population variance. Dividing by n-1 corrects this bias when calculating sample statistics.
8
Correlations Ranges from zero to 1, where 1 = perfect linear relationship between the two variables. Negative relations Positive relations The correlation coefficient is a measure of clustering around a single line, relative to the standard deviations. Remember: correlation ONLY measures linear relationships, not all relationships. This is why the scatterplot is essential; you can get a coefficient even if there is not a good linear fit.
9
Interpretation Correlation is a proportional measure; does not depend on specific measurements Correlation interpretation: Direction (+/-) Magnitude of Effect (-1 to 1); shown as r Statistical Significance (p<.05, p<.01, p<.001)
10
Correlation vs. Causality
Recall that Correlation is a precondition for causality– but by itself it is not sufficient to show causality Fat in the diet causes cancer? (but fat and sugar are relatively expensive, and reduce grain consumption as trade off…) Education and Unemployment during the Great Depression? (turns out age is a confounding variable: education was going up for the young, and employers seemed to prefer younger job seekers)
11
Factors which limit Correlation coefficient
Homogeneity of sample group Non-linear relationships Censored or limited scales Unreliable measurement instrument Outliers
12
Homogenous Groups Limited number of education groups will give you a correlation, but how accurate?
13
Homogenous Groups: Adding Groups
14
Homogenous Groups: Adding More Groups
-The point is NOT that the line changes– b/c it doesn’t– its still a positive correlation every time -The issue is the overall spread, and whether the homogeneous groups are linear in the same way.
15
Separate Groups: non-homogeneous but similar linear relationship
16
Separate Groups: non-homogeneous and different linear relationship
17
Non-Linear Relationships
18
Censored or Limited Scales…
19
Censored or Limited Scales
20
Unreliable Instrument
21
Unreliable Instrument
22
Unreliable Instrument
23
Outliers
24
Outliers Outlier
25
Ecological Correlations
26
Ecological Correlations
The ecological correlation overstate the association because they eliminate the variance in the units of analysis (eg., the classroom) Freedman et al. give Sociology and Poli Sci a hard time for this, which is a valid critique– when the unit of analysis is a summary statistic, you expect to lose variability. Average by Classroom
27
Correlation: Null and Alt Hypotheses
Null versus Alternative Hypothesis H0 H1, H2, etc Test Statistics and Significance Level Test statistic Calculated from the data Has a known probability distribution Significance level Usually reported as a p-value (probability that a result would occur if the null hypothesis were true). price mpg price mpg 0.0000
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.