Covariance and Correlation

Slides:



Advertisements
Similar presentations
CORRELATION. Overview of Correlation u What is a Correlation? u Correlation Coefficients u Coefficient of Determination u Test for Significance u Correlation.
Advertisements

Correlation Oh yeah!.
Correlation and Linear Regression.
Correlation & Regression Chapter 15. Correlation statistical technique that is used to measure and describe a relationship between two variables (X and.
CORRELATION. Overview of Correlation u What is a Correlation? u Correlation Coefficients u Coefficient of Determination u Test for Significance u Correlation.
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Designing Experiments In designing experiments we: Manipulate the independent.
Lecture 11 PY 427 Statistics 1 Fall 2006 Kin Ching Kong, Ph.D
Basic Statistical Concepts Psych 231: Research Methods in Psychology.
Lecture 4: Correlation and Regression Laura McAvinue School of Psychology Trinity College Dublin.
Chapter Eighteen MEASURES OF ASSOCIATION
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 6: Correlation.
1 Basic statistics Week 10 Lecture 1. Thursday, May 20, 2004 ISYS3015 Analytic methods for IS professionals School of IT, University of Sydney 2 Meanings.
Cal State Northridge  320 Andrew Ainsworth PhD Correlation.
Basic Statistical Concepts Part II Psych 231: Research Methods in Psychology.
Chapter 7 Correlational Research Gay, Mills, and Airasian
Today: Central Tendency & Dispersion
Lecture 16 Correlation and Coefficient of Correlation
Understanding Research Results
Joint Distributions AND CORRELATION Coefficients (Part 3)
Covariance and correlation
© Copyright 2001, Alan Marshall1 Statistics. 2 Statistics è Branch of Mathematics that deals with the collection and analysis of data è Descriptive Statistics:
September In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression.
Regression and Correlation. Bivariate Analysis Can we say if there is a relationship between the number of hours spent in Facebook and the number of friends.
Wednesday, October 12 Correlation and Linear Regression.
Correlation and Regression PS397 Testing and Measurement January 16, 2007 Thanh-Thanh Tieu.
Experimental Research Methods in Language Learning Chapter 11 Correlational Analysis.
Research & Statistics Looking for Conclusions. Statistics Mathematics is used to organize, summarize, and interpret mathematical data 2 types of statistics.
Stat 13, Thur 5/24/ Scatterplot. 2. Correlation, r. 3. Residuals 4. Def. of least squares regression line. 5. Example. 6. Extrapolation. 7. Interpreting.
Basic Statistics Correlation Var Relationships Associations.
Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary.
Correlational Research. Researchers try to determine the degree to which, or if at all, a relationship exists between two (or more) non-manipulated variables.
Counseling Research: Quantitative, Qualitative, and Mixed Methods, 1e © 2010 Pearson Education, Inc. All rights reserved. Basic Statistical Concepts Sang.
By: Amani Albraikan.  Pearson r  Spearman rho  Linearity  Range restrictions  Outliers  Beware of spurious correlations….take care in interpretation.
Covariance and Correlation Questions: What does it mean to say that two variables are associated with one another? How can we mathematically formalize.
Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables.
Psychology 820 Correlation Regression & Prediction.
Correlation Review and Extension. Questions to be asked… Is there a linear relationship between x and y? What is the strength of this relationship? Pearson.
Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc.
Outline of Today’s Discussion 1.Introduction to Correlation 2.An Alternative Formula for the Correlation Coefficient 3.Coefficient of Determination.
Outline Sampling Measurement Descriptive Statistics:
Descriptive Statistics ( )
Thursday, May 12, 2016 Report at 11:30 to Prairieview
Theme 5. Association 1. Introduction. 2. Bivariate tables and graphs.
Chapter 2 Linear regression.
Chapter 12 Understanding Research Results: Description and Correlation
Sit in your permanent seat
Selecting the Best Measure for Your Study
Covariance and Correlation
APPROACHES TO QUANTITATIVE DATA ANALYSIS
Elementary Statistics
Correlation and Regression
POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.
Correlation.
Chapter 15: Correlation.
مقدمة في الإحصاء الحيوي مع تطبيقات برنامج الحزم الإحصائية SPSS
Understanding Research Results: Description and Correlation
Ch. 11: Quantifying and Interpreting Relationships Among Variables
Theme 7 Correlation.
Research Statistics Objective: Students will acquire knowledge related to research Statistics in order to identify how they are used to develop research.
Module 8 Statistical Reasoning in Everyday Life
Regression.
Introduction to bivariate data
Reasoning in Psychology Using Statistics
Reasoning in Psychology Using Statistics
Inferential Statistics
15.1 The Role of Statistics in the Research Process
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
Regression & Correlation (1)
Bivariate Correlation
Correlation and Prediction
Presentation transcript:

Covariance and Correlation Whole seminar takes about 1.5 hrs. Seminar 4

You must have heard… Correlation ≠ Causation This is not completely true. Correlation ≠ Causation

Today’s questions What does it mean to say that two variables are associated with one another? How can we quantify the concept of association?

So far… We focused on summary (descriptive) statistics: their shape, central tendency, and dispersion. Often, in psychology, we ask “How do two variables relate to one another?” Coffee consumption & happiness? Cigarettes & lung cancer? IQ & nutrition?

The concept of bivariate association What does it mean to say: X is correlated to Y X is related (has a relationship) to Y X is associated to Y X predicts Y Identical statements

The concept of bivariate association It is about quantifying the association between two variables. Suppose you collect English (x) and Math (y) scores from 6 individuals x y [A] 9.75 9.56 [B] 7.72 7.81 [C] 10.84 10.30 [D] 9.37 8.57 [E] 10.04 10.22 [F] 10.94 11.15

x y [A] 9.75 9.56 [B] 7.72 7.81 [C] 10.84 10.30 [D] 9.37 8.57 [E] 10.04 10.22 [F] 10.94 11.15

People with high scores on x seem to have high scores on y x y [A] 9.75 9.56 [B] 7.72 7.81 [C] 10.84 10.30 [D] 9.37 8.57 [E] 10.04 10.22 [F] 10.94 11.15 Can we define “high scores” more precisely?

Yes, we can. We can study deviations (xd, yd) from the mean: (X – Mx) and (Y – My) x y [A] 9.75 9.56 [B] 7.72 7.81 [C] 10.84 10.30 [D] 9.37 8.57 [E] 10.04 10.22 [F] 10.94 11.15 Mx = 9.78 Mx = 9.60 xd yd [A] -0.03 -0.04 [B] -2.06 -1.79 [C] 1.07 0.70 [D] -0.40 -1.03 [E] 0.26 0.62 [F] 1.16 1.55 Note: In advanced stats courses, we use the term “centering” to describe “deviations from the mean”

Let’s rescale the graph (note the axes) Raw Scores Deviation Scores Now we can ask whether people who are above the mean on x (i.e., “high” on x) are above the mean on y

What next? We could do a frequency count of the quadrants whether each point is above/below their x-y means xd yd [A] -0.03 -0.04 both below [B] -2.06 -1.79 both below [C] 1.07 0.70 both above [D] -0.40 -1.03 both below [E] 0.26 0.62 both above [F] 1.16 1.55 both above 100% match

But here comes a problem These two graphs differ! Yet, by fitting them into quadrants, you conclude that they are the same (100% match).

One solution A precise way to study the association is to multiply each person’s deviations. Advantage: when there is a match (both + or both -), the product will be +. When there is a mismatch (one + and other -), the product will be -. xd yd (xd*yd) [A] -0.03 -0.04 0.00 [B] -2.06 -1.79 3.69 [C] 1.07 0.70 0.75 [D] -0.40 -1.03 0.41 [E] 0.26 0.62 0.16 [E] 1.16 1.55 1.80

Average product of deviation scores The average of these products indicates whether the typical person has the same signed deviation score on the two variables. xd yd (xd*yd) [A] -0.03 -0.04 0.00 [B] -2.06 -1.79 3.69 [C] 1.07 0.70 0.75 [D] -0.40 -1.03 0.41 [E] 0.26 0.62 0.16 [E] 1.16 1.55 1.80

Covariance    

Features of covariance When this average product is… Mathematical meaning Interpretation Positive two variables covary positively people who are high on one variable tend to be high on the other Zero two variables do not covary together People who are high on one variable are just as likely to be high on the other as they are to be low on the other. Negative two variables negatively covary together people who are high on one variable tend to be low on the other

Positively covariation Visually… Positively covariation People who drink a lot of coffee tend to be happy. Preview: The line is called a regression line, and represents the estimated linear relationship between the two variables. Also known as “trend line”, “line of best fit” Notice that the slope of the line is positive in this example.

Visually… (near) Zero covariation People who are high on x are just as likely to be high on y as they are low on y The regression line is flat

Negatively covariation Visually… Negatively covariation People high on x tend to be low on y The regression line has a negative slope

One problem with covariance It is very sensitive to the units in which X and Y are measured Imagine this: Happiness and monthly income, with income measured in ₹ lakhs You will end up with different covariances, even though they come from identical data

Here’s the proof The plots are identical The covariances are different Income (Rs) Happiness Income (lakhs) 1 109831 6 1.09831 2 79854 5 0.79854 3 69320 0.6932 4 78883 0.78883 66673 0.66673 79426 0.79426 7 71355 0.71355 8 82067 0.82067 9 98418 0.98418 10 82170 0.8217 Rs lakhs Cov 4955.5 0.049555 Cor 0.19720869 The plots are identical The covariances are different The correlations are identical

Now we have a solution Covariances are sensitive to the measurement units If units change  covariances change Doesn’t make sense because the X-Y relationships remain the same. The solution: Pearson’s r.

Pearson’s r   We’ve taken the covariance and “standardized” it.

Interpretation is identical to covariance Except now there are upper and lower limits r = + 1 r = 0 r = - 1

Interpreting magnitude of correlations Absolute size of the correlation corresponds to the magnitude or strength of the relationship r = + .70 r = + .30 r = + 1 Notice that all rs are positive because the slopes are ___________

Quiz Which relationship is stronger? Rank order them. r = +.70

Large datasets & Correlation matrix So far, we examined data from two variables. In the real world, you’d gather data from many variables (e.g., World Values Survey – your project) Suppose you want to know what variables predict academic success of Ashoka students. What variables would you collect? How would you display them? Facebook’s mood manipulation research as an example of real world big data.

Large datasets & Correlation matrix Variables predict academic success Active learning strategies (ALS) Quality of Instruction & College Experience (QICE) Internet and campus technology (ICT) Student-faculty interaction (SFI) Grade point average (GPA) ALS QICE ICT SFI GPA - .14 .34 .47 .05 .12 .41 .22 -.11 .21 Ruggut & Chemosit (2005). Factors that influence college academic achievement. J Ed Res & Pol Stud.

Structural equation modeling This is graduate level statistics You don’t need to know this. Structural equation modeling Inferring causality based on theoretical and empirical correlations between variables. Also known as “model fit”

Factors affecting correlations Range restriction Heterogenous subsamples Non-linearity “Outliers” (covered Tutorial 6)

Range restriction

Heterogenous subgroups

What would the correlation of this be? Non-linearity What would the correlation of this be?

I doubt you will get these data

The significance of any correlation Correlation is one of the indicators of effect size Suppose you get the following: r = 0.10 r = 0.30 r = .70 How big is big, how small is small? (Preview to Weeks 13 & 14)

Pearson’s r is just one type of correlation Pearson’s r is for continuous data for variables X and Y. What if one (or both) of your variables are ordinal, dichotomous, etc.? For SRM I, you only need to know Pearson’s r. Var X Continuous Nominal Ordinal Var Y Pearson Point biserial Biserial Phi Rank biserial Spearman

Summary Your textbook doesn’t cover covariance. But it will be an important concept if you take advance statistics courses. A correlation coefficient has two indices: Direction Magnitude Provides an easy way to quantify the association between two variables Correlation is the basis for regression (Week 11)

Back to the start Correlation ≠ Causation is not always true. When you’ve found that A correlates with B, it does not necessarily mean A  B (backward inference is problematic – this is typically the reason why we say “Correlation ≠ Causation”) For A  B, A must correlate with B (forward inference)

What are the principles of causality – how would you know X causes Y? Class Discussion What are the principles of causality – how would you know X causes Y? 20 min discussion

Covariation of cause and effect (but this has its problems) Temporal precedence (but this has its problems too) No plausible alternative explanations (also problematic) My view: Causality will always be leap of faith. Even with experiments.