Presentation is loading. Please wait.

Presentation is loading. Please wait.

17 Correlation. 17 Correlation Chapter17 p399.

Similar presentations


Presentation on theme: "17 Correlation. 17 Correlation Chapter17 p399."— Presentation transcript:

1

2 17 Correlation

3 Chapter17 p399

4 Semimetric distance – Pearson correlation coefficient or Covariance
How about higher dimension data ? It is useful to have a similar measure to find out how much the dimensions vary from the mean with respect to each other. Covariance is measured between 2 dimensions, suppose one have a 3-dimension data set (X,Y,Z), then one can calculate Cov(X,Y), Cov(X,Z) and Cov(Y,Z) - to compare heterogenous pairs of variables, define the correlation coefficient or Pearson correlation coefficient, -1≦ rXY ≦1 -1  perfect anticorrelation 0  independent +1 perfect correlation

5 Semimetric distance – the squared Pearson correlation coefficient
Pearson correlation coefficient is useful for examining correlations in the data One may imagine an instance, for example, in which the same TF can cause both enhancement and repression of expression. A better alternative is the squared Pearson correlation coefficient (pcc), The square pcc takes the values in the range 0 ≦ rsq ≦ 1. 0  uncorrelate vector 1  perfectly correlated or anti-correlated pcc are measures of similarity Similarity and distance have a reciprocal relationship similarity↑  distance↓  d = 1 – r is typically used as a measure of distance

6 Semimetric distance – Pearson correlation coefficient or Covariance
The resulting rXY value will be larger than 0 if a and b tend to increase together, below 0 if they tend to decrease together, and 0 if they are independent. Remark: rXY only test whether there is a linear dependence, Y=aX+b if two variables independent  low rXY, a low rXY may or may not  independent, it may be a non-linear relation a high rXY is a sufficient but not necessary condition for variable dependence

7 Semimetric distance – the squared Pearson correlation coefficient
To test for a non-linear relation among the data, one could make a transformation by variables substitution Suppose one wants to test the relation u(v) = avn Take logarithm on both sides log u = log a + n log v Set Y = log u, b = log a, and X = log v  a linear relation, Y = b + nX  log u correlates (n>0) or anti-correlates (n<0) with log v

8 Semimetric distance – Pearson correlation coefficient or Covariance matrix
A covariance matrix is merely collection of many covariances in the form of a d x d matrix:

9 Spearman’s rank correlation (SRC)
One of the problems with using the PCC is that it is susceptible to being skewed by outliers: a single data point can result in two genes appearing to be correlated, even when all the other data points suggest that they are not. Spearman’s rank correlation (SRC) is a non-parametric measure of correlation that is robust to outliers. SRC is a measure that ignores the magnitude of the changes. The idea of the rank correlation is to transform the original values into ranks, and then to compute the correlation between the series of ranks. First we order the values of gene A and B in ascending order, and assign the lowest value with rank 1. The SRC between A and B is defined as the PCC between ranked A and B. In case of ties assign mid-ranks  both are ranked 5, then assign a rank of 5.5

10 Spearman’s rank correlation
The SRC can be calculated by the following formula, where xi and yi denote the rank of the x and y respectively. An approximate formula in case of ties is given by

11 SRC vs. PCC PCC(A, B) = 0.633 SRC(A,B) = -0.086 Time Gene A ratio
Gene B ratio Gene A rank Gene B rank 0.5 1 2 6 5 4 7 3 9 11 PCC(A, B) = 0.633 SRC(A,B) =

12 Chapter17 p401

13 Chapter17 p408


Download ppt "17 Correlation. 17 Correlation Chapter17 p399."

Similar presentations


Ads by Google