Download presentation
Presentation is loading. Please wait.
Published byMyra Lucas Modified over 6 years ago
1
Finding associated genes in large collections of microarrays
2
Produce hypothesis of functional relations between genes
Positive correlation: Co-regulated genes or positive modulator Negative correlation: Co-regulated genes or inhibitor. Used to derive networks of gene interactions.
3
4 simple ways of finding association
Pearson correlation coefficient. Spearman’s rank correlation coefficient. Probabilistic approach (Present/Absent). Mutual information (Present/Absent)
4
Pearson correlation coefficient
Varies between -1 and 1: Between 0.6 and 1: strong positive correlation. Between -0.6 and -1: strong negative correlation. -1 is perfect negative correlation 1 is perfect positive correlation Assumes linear relation between variables.
5
Pearson correlation coefficient
Step 1: Prepare data. Step 2: Compute Pearson coefficient between pairs of probes of interest. Step 3: Assess significance. Step 4: Multiple testing correction.
6
Pearson correlation coefficient
Step 1: Prepare data: Chips are normalized with MAS 5.0 or other procedure. Scale probes in each chip dividing by mean. Center and standardize each probe distribution: z-scores.
7
Pearson correlation coefficient
Step 2: Compute Pearson coefficient between pairs of probes: when z-scores are pre-computed: n: number of chips
8
Pearson correlation coefficient
Step 3: Assess significance: Randomize if possible. Good for less than 20 chips or Use t-Student distribution with n-2 degrees of freedom: ρ: correlation coefficient n: number of chips
9
Pearson correlation coefficient
Step 4: Multiple testing correction
10
Spearman’s rank correlation coefficient
Non parametric method: Less power but more robust. Does not assume normal distribution. Also varies between -1 and 1
11
Spearman’s rank correlation coefficient
Step 1: Prepare data. Step 2: Compute Spearman’s rank correlation coefficient between probe of interest and the rest. Step 3: Assess significance. Step 4: Multiple test correction.
12
Spearman’s rank correlation coefficient
Step 1: Prepare data: Same as Pearson. Order the values of the probes by increasing hybridization values. Construct the rank vectors.
13
Spearman’s rank correlation coefficient
Step 2: Compute coefficient between probe sets of interest: d: differences between the ranks of the two probes n: number of chips
14
Spearman’s rank correlation coefficient
Step 3: Assess significance: Same as Pearson. Randomize if possible. Less than 20 chips or Use t-Student distribution with n -2 degrees of freedom: ρ: correlation coefficient n: number of chips
15
Spearman’s rank correlation coefficient
Step 4: Multiple testing correction.
16
Binary probabilistic approach based on Present/Absent
Approach adapted from: “Computational methods for the identification of differential and coordinated gene expression.” Claverie JM Hum Mol Genet. 1999;8(10): Use MAS 5.0 calls of Present-Marginal-Absent for each probe. Good for heterogeneous microarray collections.
17
Binary approach based on Present/Absent
Step 1: Prepare data. Step 2: Compute p-value of # of observed matches. Step 3: Multiple test correction.
18
Binary approach based on Present/Absent
Step 1: Obtain P/M/A calls for probes: Each call is associated to a p-value. Filter can be applied. Codify P/M/A calls as binary vectors: Encode P as 1 and M/A as 0
19
Binary approach based on Present/Absent
Step 2: Compute p-value of # of matches probe x: probe y: probe z: Find improbably high number of matches (or miss-matches). probe x & y: 11 out of 12 matches probe x & z: 11 out of 12 miss-matches
20
Binary approach based on Present/Absent
Step 2: Compute probability for observing by chance x matches or more from the binomial distribution B(n,p). First, probability of a match. : fraction of 1s (Present) probe x. : fraction of 1s (Present) probe y.
21
Binary approach based on Present/Absent
Step 2: Compute probability for observing by chance x matches or more from the binomial distribution: For n large one can use the normal distribution: n: number of chips.
22
Binary approach based on Present/Absent
Step 3: Multiple test correction.
23
Mutual information based on Present/Absent
Step 1: Prepare data. Step 2: Compute MI value for pairs of probes. Step 3: Use of a threshold for MI
24
Mutual information based on Present/Absent
Step 1: Obtain P/M/A calls for probes: Each call is associated to a p-value. Filter can be applied. Codify P/M/A calls as binary vectors: Encode P/M as 1 and A as 0 OR Encode P as 1 and M/A as 0
25
Mutual information based on Present/Absent
Step 2: Compute MI value for probes X and Y: p(.) frequencies of observed Ps and As p(x,y) frequencies of the joint distribution
26
Mutual information based on Present/Absent
Step 3: Use a threshold: probes X and Y are correlated if: MI(X, Y) >1/n * log(1/P) n: number of chips. P: 1/p^2 (with p number of probes). “A simple method for reverse engineering causal networks” M. Andrecut and S. A. Kauffman J. Phys. A: Math. Gen. 39 No 46.
27
Try Pearson method in Stembase!
Implemented by Reatha Sandie
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.