Presentation is loading. Please wait.

Presentation is loading. Please wait.

Correlation1.  The variance of a variable X provides information on the variability of X.  The covariance of two variables X and Y provides information.

Similar presentations


Presentation on theme: "Correlation1.  The variance of a variable X provides information on the variability of X.  The covariance of two variables X and Y provides information."— Presentation transcript:

1 Correlation1

2  The variance of a variable X provides information on the variability of X.  The covariance of two variables X and Y provides information on the related variability of X and Y together.  Note the similarity of the structure of the formulas. Instead of relating X to itself as in the variance, X is related to the other variable Y. Correlation2

3  There is an Excel function to calculate covariance: =COVAR(range1, range2)  Unfortunately for most common purposes, Excel does not calculate the sample covariance but instead calculates what is known as the population covariance.  Therefore, in order to transform Excel’s covariance calculation into the more useful sample covariance, it is necessary multiply Excel’s covariance calculation by the factor n/(n-1). Correlation3

4  Covariance measures how much Y and X tend to vary in the same direction  High positive covariance means the highest values of Y tend to occur along with the highest values of X  However, it’s hard to interpret because it has no standard scale of reference. A covariance of 300,000 could be trivial while another of 2.1 fairly substantial. Correlation4

5 A more useful expression of this relationship between X and Y is to express it as a percentage of the standard deviations of X and Y. This percentage is known as the “standardized” covariance, or the correlation coefficient (correlation for short), and is commonly denoted by the variable r. In Excel, the correlation formula is =CORREL(range1, range2) Correlation5

6  The correlation coefficient (r) measures how much Y and X tend to vary in the same direction on a standard scale. (Varying in the same direction is implicitly a linear relationship.)  It will always be between -1 and +1 r = +1 implies a perfect positive relationship r = –1 implies a perfect negative relationship r = 0 implies no linear relationship exists! Correlation6

7  Since it is unlikely that any real social data will have either a perfect positive correlation (r=1) or a perfect negative correlation (r=-1), how does an analyst know if there is “enough” correlation.  A simple rule of the thumb is that a “correlation value” of less than 30% suggests no linear relationship, whereas a “correlation value” of more than 70% suggests a strong linear relationship. Everything in between is, say, “somewhat of a relationship”. Correlation7

8 8

9  The hypotheses are: H 0 : correlation = 0 Versus H 1 : correlation ≠ 0  Approximate the standard error using the formula:  Calculate the T-statistic, n-2 dof. The formula is: Correlation9

10  Suppose for a sample of size 20, the sample covariance between two variables X and Y is 87, the sample variance of X is 100 and the sample variance of Y is 400. Is there a statistically significant linear relationship? Correlation10

11 Correlation 11 A linear relationship between the two variables is statistically significant at the 10% level but not at the 5% level. (1.734 and 2.101)

12 Correlation12

13  Correlation is most useful for quickly considering possible relationships between many different variables.  Suppose for example that the analysis is examining 10 different variables: X 1 … X 10  Using Excel’s correl function would require entering 45 such calculations.  A better exploratory (one-time) way is use Excel’s built-in Data-Analysis Toolpak. Correlation13

14 Correlation14 Complete the dialog box Leads to the results

15  The correlation coefficient measures linearity.  If there is a nonlinear relationship, r will underestimate the predictive power of the relationship between the two variables. Correlation15

16  Rank correlation measures how two variables are related in a more general way.  A high rank correlation says that large values of X tend to occur with large values of Y, and low with low, whether or not the relationship is linear.  Generally this type of correlation test might be applied to data that is highly skewed. In other words, there are a significant amount of very extreme values. Correlation16

17  Compute the ranks for the set of X values, then for the Y values, low to high.  Compute the differences of the ranks and the square of the differences.  The statistic then is: Correlation 17 For simplicity, if some values are tied, interpolate the ranks and use the formula above. In this case, technically, the previous correlation calculation should be applied to the rankings rather than the formula above.

18  The hypotheses are: H 0 : correlation = 0 Versus H 1 : correlation ≠ 0  Approximate the standard error using the formula:  Calculate the T-statistic, n-2 dof. The formula is: Correlation18

19 Correlation19


Download ppt "Correlation1.  The variance of a variable X provides information on the variability of X.  The covariance of two variables X and Y provides information."

Similar presentations


Ads by Google