Correlations: testing linear relationships between two metric variables Lecture 18:

Correlations: testing linear relationships between two metric variables
Lecture 18:

Agenda Brief Update on Data for Final Correlations

Some Data Considerations for Final Exam:
Two metric variables Correlation One binary categorical variable, one metric variable T-test Two categorical variables Chi-Square (crosstabulations) Polytomous dependent variable, metric IV ANOVA Metric dependent variable, multiple IV’s (categorical or metric) Linear Regression

Checking for simple linear relationships
Pearson’s correlation coefficient Measures the extent to which two metric or interval-type variables are linearly related Statistic is Pearson r, or the linear or product-moment correlation Or, the correlation coefficient is the average of the cross products of the corresponding z-scores.

Scatterplots !!! Three ways to summarize this data: point of averages
Horizontal SD Vertical SD

The correlation coefficient
r = average of (x in standard units) x (y in standard units) The correlation coefficient is a pure number without units. Thus, the correlation coefficient is not affected by: Change in scale Interchanging variables Adding the same number to one of the variables Multiplying all values of one variable by a positive number

Sidenote: Why N-1? If we have a randomly distributed variable in a population, extreme cases (i.e., the tails) are less likely to be selected than common cases (i.e., within 1 SD of the mean). One result of this: sample variance is lower than actual population variance. Dividing by n-1 corrects this bias when calculating sample statistics.

Correlations Ranges from zero to 1, where 1 = perfect linear relationship between the two variables. Negative relations Positive relations The correlation coefficient is a measure of clustering around a single line, relative to the standard deviations. Remember: correlation ONLY measures linear relationships, not all relationships. This is why the scatterplot is essential; you can get a coefficient even if there is not a good linear fit.

Interpretation Correlation is a proportional measure; does not depend on specific measurements Correlation interpretation: Direction (+/-) Magnitude of Effect (-1 to 1); shown as r Statistical Significance (p<.05, p<.01, p<.001)

Correlation vs. Causality
Recall that Correlation is a precondition for causality– but by itself it is not sufficient to show causality Fat in the diet causes cancer? (but fat and sugar are relatively expensive, and reduce grain consumption as trade off…) Education and Unemployment during the Great Depression? (turns out age is a confounding variable: education was going up for the young, and employers seemed to prefer younger job seekers)

Factors which limit Correlation coefficient
Homogeneity of sample group Non-linear relationships Censored or limited scales Unreliable measurement instrument Outliers

Homogenous Groups Limited number of education groups will give you a correlation, but how accurate?

Homogenous Groups: Adding Groups

Homogenous Groups: Adding More Groups
-The point is NOT that the line changes– b/c it doesn’t– its still a positive correlation every time -The issue is the overall spread, and whether the homogeneous groups are linear in the same way.

Separate Groups: non-homogeneous but similar linear relationship

Separate Groups: non-homogeneous and different linear relationship

Non-Linear Relationships

Censored or Limited Scales…

Censored or Limited Scales

Unreliable Instrument

Outliers

Outliers Outlier

Ecological Correlations

Ecological Correlations
The ecological correlation overstate the association because they eliminate the variance in the units of analysis (eg., the classroom) Freedman et al. give Sociology and Poli Sci a hard time for this, which is a valid critique– when the unit of analysis is a summary statistic, you expect to lose variability. Average by Classroom

Correlation: Null and Alt Hypotheses
Null versus Alternative Hypothesis H0 H1, H2, etc Test Statistics and Significance Level Test statistic Calculated from the data Has a known probability distribution Significance level Usually reported as a p-value (probability that a result would occur if the null hypothesis were true). price mpg price mpg 0.0000

Correlations: testing linear relationships between two metric variables Lecture 18:

Similar presentations

Presentation on theme: "Correlations: testing linear relationships between two metric variables Lecture 18:"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Correlations: testing linear relationships between two metric variables Lecture 18:

Similar presentations

Presentation on theme: "Correlations: testing linear relationships between two metric variables Lecture 18:"— Presentation transcript:

Similar presentations

About project

Feedback