Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses.

Slides:



Advertisements
Similar presentations
CORRELATION. Overview of Correlation u What is a Correlation? u Correlation Coefficients u Coefficient of Determination u Test for Significance u Correlation.
Advertisements

Correlation and Linear Regression.
Correlation & Regression Chapter 15. Correlation statistical technique that is used to measure and describe a relationship between two variables (X and.
Correlation and Linear Regression
Correlation Chapter 9.
Chapter 15 (Ch. 13 in 2nd Can.) Association Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression.
Correlation. Introduction Two meanings of correlation –Research design –Statistical Relationship –Scatterplots.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
CORRELATION. Overview of Correlation u What is a Correlation? u Correlation Coefficients u Coefficient of Determination u Test for Significance u Correlation.
Chapter Eighteen MEASURES OF ASSOCIATION
Lecture 6: Multiple Regression
Topics: Regression Simple Linear Regression: one dependent variable and one independent variable Multiple Regression: one dependent variable and two or.
Correlational Designs
Multiple Regression Research Methods and Statistics.
Multiple Regression Dr. Andy Field.
Correlation and Regression
Lecture 16 Correlation and Coefficient of Correlation
Equations in Simple Regression Analysis. The Variance.
Correlation and Regression
Elements of Multiple Regression Analysis: Two Independent Variables Yong Sept
Department of Cognitive Science Michael J. Kalsher Adv. Experimental Methods & Statistics PSYC 4310 / COGS 6310 Bivariate Relationships 1 PSYC 4310/6310.
Correlation and Covariance. Overview Continuous Categorical Histogram Scatter Boxplot Predictor Variable (X-Axis) Height Outcome, Dependent Variable (Y-Axis)
Simple Covariation Focus is still on ‘Understanding the Variability” With Group Difference approaches, issue has been: Can group membership (based on ‘levels.
Covariance and correlation
Correlation.
Introduction to Regression Analysis. Two Purposes Explanation –Explain (or account for) the variance in a variable (e.g., explain why children’s test.
Chapter 15 Correlation and Regression
Fundamental Statistics in Applied Linguistics Research Spring 2010 Weekend MA Program on Applied English Dr. Da-Fu Huang.
L 1 Chapter 12 Correlational Designs EDUC 640 Dr. William M. Bauer.
Statistics for the Social Sciences Psychology 340 Fall 2013 Correlation and Regression.
Experimental Research Methods in Language Learning Chapter 11 Correlational Analysis.
Hypothesis of Association: Correlation
Correlations. Outline What is a correlation? What is a correlation? What is a scatterplot? What is a scatterplot? What type of information is provided.
UNDERSTANDING RESEARCH RESULTS: DESCRIPTION AND CORRELATION © 2012 The McGraw-Hill Companies, Inc.
Applied Quantitative Analysis and Practices LECTURE#11 By Dr. Osman Sadiq Paracha.
B AD 6243: Applied Univariate Statistics Correlation Professor Laku Chidambaram Price College of Business University of Oklahoma.
Correlation & Regression
Regression Chapter 16. Regression >Builds on Correlation >The difference is a question of prediction versus relation Regression predicts, correlation.
ITEC6310 Research Methods in Information Technology Instructor: Prof. Z. Yang Course Website: c6310.htm Office:
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
11/23/2015Slide 1 Using a combination of tables and plots from SPSS plus spreadsheets from Excel, we will show the linkage between correlation and linear.
Statistics for Psychology CHAPTER SIXTH EDITION Statistics for Psychology, Sixth Edition Arthur Aron | Elliot J. Coups | Elaine N. Aron Copyright © 2013.
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 12 Testing for Relationships Tests of linear relationships –Correlation 2 continuous.
Correlations. Distinguishing Characteristics of Correlation Correlational procedures involve one sample containing all pairs of X and Y scores Correlational.
Linear Regression Chapter 8. Slide 2 What is Regression? A way of predicting the value of one variable from another. – It is a hypothetical model of the.
Chapter 14 Correlation and Regression
CHAPTER 4 CORRELATION.
Chapter 12: Correlation and Linear Regression 1.
Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc.
Chapter 16: Correlation. So far… We’ve focused on hypothesis testing Is the relationship we observe between x and y in our sample true generally (i.e.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Linear Regression Chapter 7. Slide 2 What is Regression? A way of predicting the value of one variable from another. – It is a hypothetical model of the.
CORRELATION ANALYSIS.
Nonparametric Statistics
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Data Screening. What is it? Data screening is very important to make sure you’ve met all your assumptions, outliers, and error problems. Each type of.
Regression. Why Regression? Everything we’ve done in this class has been regression: When you have categorical IVs and continuous DVs, the ANOVA framework.
Chapter 12: Correlation and Linear Regression 1.
Nonparametric Statistics
Correlation Prof. Andy Field.
Psych 706: stats II Class #4.
Chapter 10 CORRELATION.
Elementary Statistics
Social Research Methods
Chapter 15: Correlation.
Nonparametric Statistics
CORRELATION ANALYSIS.
Correlation and Covariance
COMPARING VARIABLES OF ORDINAL OR DICHOTOMOUS SCALES: SPEARMAN RANK- ORDER, POINT-BISERIAL, AND BISERIAL CORRELATIONS.
Presentation transcript:

Correlation Chapter 6

What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses across variables.

Scatterplots Plotting reminder – ggplot(data, aes(X, Y, color/fill = categorical) – + theme coding – + geom_point() to get the dots – + geom_smooth() to get a line – + xlab/ylab

Scatterplots Not only do you need to check X and Y for proper labels Now you need to make sure X and Y are appropriate lengths for your scatterplot – Although this rule is less strictly enforced than the bar graph rule.

Fix the X/Y Axes coord_cartesian(xlim = NULL, ylim = NULL) coord_cartesian(xlim = c(0,100), ylim = c(0,100))

Modeling Relationships First, look at some scatterplots of the variables that have been measured. Outcome i = (model ) + error i Outcome i = (bX i ) + error i b = beta = r when you have one predictor

Modeling Relationships Outcome i = (model ) + error i – Previously, this was Mean + SE – And we used SE to determine if the model “fit” well.

Modeling Relationships Outcome i = (bX i ) + error i Now we are using b or r or β (beta) to determine the strength of the model – Traditionally you don’t see the error values reported (sometimes you see CI)

Measuring Relationships We need to see whether as one variable increases, the other increases, decreases or stays the same. This can be done by calculating the covariance. – We look at how much each score deviates from the mean. – If both variables deviate from the mean by the same amount, they are likely to be related.

Revision of Variance The variance tells us by how much scores deviate from the mean for a single variable. It is closely linked to the sum of squares. Covariance is similar – it tells is by how much scores on two variables differ from their respective means.

Problems with Covariance It depends upon the units of measurement. One solution: standardize it! – Divide by the standard deviations of both variables. The standardized version of covariance is known as the correlation coefficient. – It is relatively affected by units of measurement.

The Correlation Coefficient

Correlation Coefficient Basically, r values have the model + error built into one number – Instead of M + SE – So we can just look at r to determine “model fit” or strength of relationship.

Things to know about the Correlation It varies between -1 and +1 – 0 = no relationship It is an effect size – ±.1 = small effect – ±.3 = medium effect – ±.5 = large effect Coefficient of determination, r 2 – By squaring the value of r you get the proportion of variance in one variable shared by the other.

R versus r R = correlation coefficient for 3+ variables r = correlation coefficient for 2 variables r 2 = coefficient of determination for 2 variables R 2 = coefficient of determination for 3+ variables – In reality, we use R 2 for anything effect size related, even if it’s only 2

Correlation: Example Anxiety and Exam Performance Participants: – 103 students Measures – Time spent revising (hours) – Exam performance (%) – Exam Anxiety (the EAQ, score out of 100) – Gender

Assumptions Accuracy Missing (exclude pairwise if you don’t fill in) Outliers Normality Linearity** Homogeneity Homoscedasticity

How to Calculate You’ve already been doing these! cor(x,y) But what about p values? We’ve only been looking at the magnitude of the correlation for assumption checks.

How to Calculate cor() function will calculate: – Pearson, Spearman, Kendall, multiple correlations at once rcorr() function will calculate: – Pearson, Spearman, p values, multiple correlations at once cor.test() function will calculate: – Pearson, Spearman, Kendall, p values, CI

cor output cor(examdata, use = “pairwise.complete.obs”, method = “pearson”) – Remember these all have to be numeric, no factor variables.

rcorr output rcorr(as.matrix(examdata), type = “pearson”) – Automatically does pairwise deletion. – All things must be numeric, as well as in matrix format – Load the Hmisc library for this function

rcorr output

cor.test output cor.test(x, y, method = “pearson”) – Must use single vectors/columns for x, y

Correlation Interpretation The third-variable problem: – In any correlation, causality between two variables cannot be assumed because there may be other measured or unmeasured variables affecting the results. Direction of causality: – Correlation coefficients say nothing about which variable causes the other to change.

Nonparametric Correlation Spearman’s Rho – Pearson’s correlation on the ranked data Kendall’s Tau – Better than Spearman’s for small samples – When lots of things have the same rank

Example World’s best Liar Competition – 68 contestants – Measures Where they were placed in the competition (first, second, third, etc.) Creativity questionnaire (maximum score 60) C6 liar.csv

Spearman/Kendall You can calculate: – Spearman and Kendall with cor() but no p values. – Spearman with rcorr() but no Kendall – All the things with cor.test() but one pair at a time.

Spearman

Kendall

A note All values must be numeric to be able to do any of these correlations – Therefore, if you have any variables that import with labels, you will have defactor them. – as.numeric()

Point vs not point Point/Biserial correlation = correlation with a dichotomous variable Which is which? – Point biserial = true dichotomy, no underlying continuum (i.e. gender) – Biserial = not quite discrete (i.e. pass/fail)

Point vs not point

Comparing Correlations A question I am asked a lot – how can I tell if these two correlation coefficients are significantly different? Install package cocor to be able to compare them!

Comparing Correlations First, you have to decide if the correlations are independent or dependent Independent correlations  the correlations come from separate groups of people, but are on the same variables Dependent correlation  the correlations are from the same people, but are different variables (overlapping or not)

Independent Correlations There’s no split/subset function within cocor. Therefore, you have to split up the dataset on your independent variable. – (and then put it back together in list format). Subset the data, then create a list.

Independent Correlations cocor(~ X + Y | X + Y, data = data) Fill in X and Y with your variables you want to correlate – (on these they are likely to be the same). Data = new list we just created.

Independent Correlations

Dependent Correlations cocor(~ X + Y | Y + Z, data = data) Fill in X, Y, Z with your variables you want to correlate – Overlapping correlation You can also have X + Y | Q + Z – Non-overlapping correlation Data is the dataframe with all of the columns

Dependent Correlations So much output!

Partial and Semi-Partial Correlations Partial correlation: – Measures the relationship between two variables, controlling for the effect that a third variable has on them both. Semi-partial correlation: – Measures the relationship between two variables controlling for the effect that a third variable has on only one of the others.

Partial Correlations Install ppcor package! pcor(dataset, method = “pearson”)

Semipartial Correlations spcor(data, method = “pearson”) Notice how top and bottom half are not equal. Calculated as: Correlation between 1 and 2, given that 2 and 3 are correlated. The first column = 1 The rest are 2/3