Presentation is loading. Please wait.

Presentation is loading. Please wait.

Gene Expression Index Stat 115 2012. 2 Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.

Similar presentations


Presentation on theme: "Gene Expression Index Stat 115 2012. 2 Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model."— Presentation transcript:

1 Gene Expression Index Stat 115 2012

2 2 Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model based, multi-array –Method comparisonMethod comparison Latin Square spike-in experiment –Importance of probe mappingprobe mapping These are perhaps the few most popular of many methods for normalizing and computing expression measures using Affymetrix data. Currently over 50 methods are described and compared at http://affycomp.biostat.jhsph.edu/.

3 3 cDNA Microarrays Fold change: ratio Cy5 / Cy3 When fold change is negative Log 2 (Cy5 / Cy3) Genes Arrays array 1array 2array 3array 4array 5 … 1 0.46 0.30 0.80 1.51 0.90... 2-0.10 0.49 0.24 0.06 0.46... 3 0.15 0.74 0.04 0.10 0.20... 4-0.45-1.03-0.79-0.56-0.32... 5-0.06 1.06 1.35 1.09-1.09...

4 4 Affymetrix Microarray Expression Index How to summarize probes in a probeset? Brighter PM usually carries more information, but not always the case (cross-hybridization)

5 5 MAS4 GeneChip ® older software Microarray Analysis Software 4.0 uses AvgDiff A: a set of suitable pairs chosen by software –Remove highest/lowest –Calculate mean, sd from remaining probes –Eliminate probes more than 3 sd from mean Drawback (naïve algorithm): –Can omit 30-40% probes –Can give negative values

6 6 MAS5 GeneChip ® newest version CT* (change threshold) a version of MM that is never bigger than PM –If MM<PM, CT* = MM –If MM>PM, estimate typical case MM for PM Tukeybiweight of MMs with similar PM values ~70% PM –If typical MMs>PM for, set CT* = PM -  Robust weighting to down weight outliers

7 7 Li & Wong (dChip) Important observation: relative values of probes within a probeset very stable across multiple samples.

8 8 Model-Based Expression Index Look at multiple samples at a time, give different probes a different weight Each probe signal is proportional to –Amount of target sample:   –Affinity of specific probe sequence to the target:  j 11 22 Probes 1 2 3 sample 1 sample 2      

9 9 Li & Wong (dChip) Model Iteratively estimate θ i and φ j to minimize ε ij Try to minimize the sum of errors Sample1 Sample2 Sample3 … φ 1 φ 2 φ 3 Probe1 Probe2 Probe3 … 123…123… ConcentrationProbe affinity Error

10 10 RMA = Robust Multi-chip Analysis Irizarry & Speed, 2003 Eliminates MM probes Probe intensity background adjustment Quantile normalize the background adjusted PM Take Log of PM Robust probe summary

11 11 RMA Background Subtraction Signal + BG = PM Signal ~ exponential; BG ~ normal += Signal + Noise = Observed

12 12 RMA Background Subtraction BG distribution

13 13 Why Log(PM) Captures the fact that higher value probes are more variable Assume probe noise is comparable on log scale

14 14 For each probe set, PM ij =  i  j Fit the model: –a j is expression index, b j is probe effect –Log 2 n() stands for logarithm after quantile normalization of n samples Iteratively refit a j and b j (similar to dChip) –Main difference is to minimize error at log PM RMA

15 RMA model fitting: Median Polish 15 For a given probe set with J probe pairs, let y ij denote the background-adjusted, base-2-logged, and quantile-normalized value for GeneChip i and probe j. Assume y ij = μ i + α j + e ij where α 1 + α 2 +... + α n = 0. Perform Tukey’s Median Polish on the matrix of y ij values with y ij in the i th row and j th column. gene expression of the probe set on GeneChip i probe affinity affect for the j th probe in the probe set residual

16 16 An Example (from Dan Nettleton) Suppose the following are background-adjusted, log 2 -transformed, quantile-normalized PM intensities for a single probe set. Determine the final RMA expression measures for this probe set. 1 2 3 4 5 1 4 3 6 4 7 2 8 1 10 5 11 3 6 2 7 8 8 4 9 4 12 9 12 5 7 5 9 6 10 GeneChip Probe

17 17 An Example (continued) 4 3 6 4 7 8 1 10 5 11 6 2 7 8 8 9 4 12 9 12 7 5 9 6 10 4879748797 row medians 0 -1 2 0 3 0 -7 2 -3 3 -1 -5 0 1 1 0 -5 3 0 3 0 -2 2 -1 3 matrix after removing row medians

18 18 An Example (continued) 0 -1 2 0 3 0 -7 2 -3 3 -1 -5 0 1 1 0 -5 3 0 3 0 -2 2 -1 3 0 -5 2 0 3 column medians 0 4 0 0 0 0 -2 0 -3 0 -1 0 -2 1 -2 0 0 1 0 0 0 3 0 -1 0 matrix after subtracting column medians

19 19 An Example (continued) 0 4 0 0 0 0 -2 0 -3 0 -1 0 -2 1 -2 0 0 1 0 0 0 3 0 -1 0 0 0 row medians matrix after removing row medians 0 4 0 0 0 0 -2 0 -3 0 0 1 -1 2 -1 0 0 1 0 0 0 3 0 -1 0

20 20 An Example (continued) 0 4 0 0 0 0 -2 0 -3 0 0 1 -1 2 -1 0 0 1 0 0 0 3 0 -1 0 0 1 0 0 0 column medians matrix after subtracting column medians 0 3 0 0 0 0 -3 0 -3 0 0 0 -1 2 -1 0 -1 1 0 0 0 2 0 -1 0

21 21 An Example (continued) 0 3 0 0 0 0 -3 0 -3 0 0 0 -1 2 -1 0 -1 1 0 0 0 2 0 -1 0 All row medians and column medians are 0. Thus the median polish procedure has converged. The above is the residual matrix that we will subtract from the original matrix to obtain the fitted values.

22 22 An Example (continued) 0 3 0 0 0 0 -3 0 -3 0 0 0 -1 2 -1 0 -1 1 0 0 0 2 0 -1 0 4 3 6 4 7 8 1 10 5 11 6 2 7 8 8 9 4 12 9 12 7 5 9 6 10 4 0 6 4 7 8 4 10 8 11 6 2 8 6 9 9 5 11 9 12 7 3 9 7 10 original matrix residuals from median polish matrix of fitted values 4.2 8.2 6.2 9.2 7.2 row means = μ 1 = μ 2 = μ 3 = μ 4 = μ 5 ^ ^ ^ ^ ^ RMA expression measures for the 5 GeneChips

23 23 Method Comparison Standard Spike-ins: introduce markers with known concentration (intensity) to RNA samples –Should cover a broad range of concentrations –Run two samples with and without spike-in, see whether algorithm can detect the spike-in (differential expression) Dilutions: –Serial dilutions: 1:2, 1:4, 1:8… Latin square spike-in captures both approaches above Compare both accuracy qualitatively and expression index quantiatively

24 24 Latin Square Spike-ins

25 25 MAS4 MAS 5 dChip RMA Red numbers indicate spiked genes Method Comparison of Spike-in

26 26 Method Comparison Conclusion No one uses MAS4 now With fold change, RMA > dChip > MAS5 With p-value, RMA ~ MAS5 > dChip MAS 5.0 does a good job on abundant genes dChip and RMA do better on less abundant genes Affy developed multi-chip model-based PLIER, currently open source, although no documentation All five models are implemented in BioConductor (open source R package)

27 27 214019_at: CCND1....

28 28

29 29 Probe Mapping in Affymetrix Expression arrays Inconsistencies in ~5% of NetAffx probe-to-gene annotations (Perez-Iratxeta et al. 2005). Remapping all the probes with documented human transcripts resulted in the redefinition of ~37% of probes in Affy’s newest U133 Plus 2.0 array (Harbig et al. 2005). –Provide new and better.cdf file for probe mapping Evolving gene/transcript definitions can cause ~30% difference in the differentially expressed genes (Dai et al. 2005).

30 30 Acknowledgment Terry Speed, Rafael Irizarry & group Kevin Coombes & Keith Baggerly Erick Rouchka Wing Wong & Cheng Li Mark Reimers Erin Conlon Larry Hunter Zhijin Wu Wei Li


Download ppt "Gene Expression Index Stat 115 2012. 2 Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model."

Similar presentations


Ads by Google