Finding associated genes in large collections of microarrays
Produce hypothesis of functional relations between genes Positive correlation: Co-regulated genes or positive modulator Negative correlation: Co-regulated genes or inhibitor. Used to derive networks of gene interactions.
4 simple ways of finding association Pearson correlation coefficient. Spearman’s rank correlation coefficient. Probabilistic approach (Present/Absent). Mutual information (Present/Absent)
Pearson correlation coefficient Varies between -1 and 1: Between 0.6 and 1: strong positive correlation. Between -0.6 and -1: strong negative correlation. -1 is perfect negative correlation 1 is perfect positive correlation Assumes linear relation between variables.
Pearson correlation coefficient Step 1: Prepare data. Step 2: Compute Pearson coefficient between pairs of probes of interest. Step 3: Assess significance. Step 4: Multiple testing correction.
Pearson correlation coefficient Step 1: Prepare data: Chips are normalized with MAS 5.0 or other procedure. Scale probes in each chip dividing by mean. Center and standardize each probe distribution: z-scores.
Pearson correlation coefficient Step 2: Compute Pearson coefficient between pairs of probes: when z-scores are pre-computed: n: number of chips
Pearson correlation coefficient Step 3: Assess significance: Randomize if possible. Good for less than 20 chips or Use t-Student distribution with n-2 degrees of freedom: ρ: correlation coefficient n: number of chips
Pearson correlation coefficient Step 4: Multiple testing correction
Spearman’s rank correlation coefficient Non parametric method: Less power but more robust. Does not assume normal distribution. Also varies between -1 and 1
Spearman’s rank correlation coefficient Step 1: Prepare data. Step 2: Compute Spearman’s rank correlation coefficient between probe of interest and the rest. Step 3: Assess significance. Step 4: Multiple test correction.
Spearman’s rank correlation coefficient Step 1: Prepare data: Same as Pearson. Order the values of the probes by increasing hybridization values. Construct the rank vectors.
Spearman’s rank correlation coefficient Step 2: Compute coefficient between probe sets of interest: d: differences between the ranks of the two probes n: number of chips
Spearman’s rank correlation coefficient Step 3: Assess significance: Same as Pearson. Randomize if possible. Less than 20 chips or Use t-Student distribution with n -2 degrees of freedom: ρ: correlation coefficient n: number of chips
Spearman’s rank correlation coefficient Step 4: Multiple testing correction.
Binary probabilistic approach based on Present/Absent Approach adapted from: “Computational methods for the identification of differential and coordinated gene expression.” Claverie JM Hum Mol Genet. 1999;8(10):1821-32 Use MAS 5.0 calls of Present-Marginal-Absent for each probe. Good for heterogeneous microarray collections.
Binary approach based on Present/Absent Step 1: Prepare data. Step 2: Compute p-value of # of observed matches. Step 3: Multiple test correction.
Binary approach based on Present/Absent Step 1: Obtain P/M/A calls for probes: Each call is associated to a p-value. Filter can be applied. Codify P/M/A calls as binary vectors: Encode P as 1 and M/A as 0
Binary approach based on Present/Absent Step 2: Compute p-value of # of matches probe x: 1 1 0 0 0 1 1 0 1 0 0 0 probe y: 1 1 0 0 0 0 1 0 1 0 0 0 probe z: 0 0 1 1 1 1 0 0 0 1 1 1 Find improbably high number of matches (or miss-matches). probe x & y: 11 out of 12 matches probe x & z: 11 out of 12 miss-matches
Binary approach based on Present/Absent Step 2: Compute probability for observing by chance x matches or more from the binomial distribution B(n,p). First, probability of a match. : fraction of 1s (Present) probe x. : fraction of 1s (Present) probe y.
Binary approach based on Present/Absent Step 2: Compute probability for observing by chance x matches or more from the binomial distribution: For n large one can use the normal distribution: n: number of chips.
Binary approach based on Present/Absent Step 3: Multiple test correction.
Mutual information based on Present/Absent Step 1: Prepare data. Step 2: Compute MI value for pairs of probes. Step 3: Use of a threshold for MI
Mutual information based on Present/Absent Step 1: Obtain P/M/A calls for probes: Each call is associated to a p-value. Filter can be applied. Codify P/M/A calls as binary vectors: Encode P/M as 1 and A as 0 OR Encode P as 1 and M/A as 0
Mutual information based on Present/Absent Step 2: Compute MI value for probes X and Y: p(.) frequencies of observed Ps and As p(x,y) frequencies of the joint distribution
Mutual information based on Present/Absent Step 3: Use a threshold: probes X and Y are correlated if: MI(X, Y) >1/n * log(1/P) n: number of chips. P: 1/p^2 (with p number of probes). “A simple method for reverse engineering causal networks” M. Andrecut and S. A. Kauffman J. Phys. A: Math. Gen. 39 No 46.
Try Pearson method in Stembase! Implemented by Reatha Sandie