(2) Ratio statistics of gene expression levels and applications to microarray data analysis Bioinformatics, Vol. 18, no. 9, 2002 Yidong Chen, Vishnu Kamat, Edward R. Dougherty, Michael L. Bittner, Paul S. Meltzer1, and Jeffery M. Trent
Outline Introduction Ratio Statistics Quality Metric for Ratio Statistics Conclusion
Introduction Motivation Expression-based analysis for large families of genes has recently become possible owing to the development of cDNA microarrays, which allow simultaneous measurement of transcript levels for thousands of genes. For each spot on a microarray, signals in two channels must be extracted from their backgrounds. This requires algorithms to extract signals arising from tagged mRNA hybridized to arrayed cDNA locations and algorithms to determine the significance of signal ratios.
Introduction Results 1. estimation of signal ratios from the two channels, and the significance of those ratios. 2. a refined hypothesis test is considered in which the measured intensities forming the ratio are assumed to be combinations of signal and background. The new method involves a signal-to-noise ratio, and for a high signal-to-noise ratio the new test reduces (with close approximation) to the original test. The effect of low signal-to-noise ratio on the ratio statistics constitutes the main theme of the paper. 3. a quality metric is formulated for spots
Ratio Statistics
Consider a microarray having n genes, with red and green fluorescent expression values labeled by and, respectively. Hypothesis test: Assumption: Ratio Statistics assuming a constant coefficient of variation
Ratio test statistics: Assuming and to be normally and identically distributed, has the density function Ratio Statistics assuming a constant coefficient of variation Ratio Statistics assuming a constant coefficient of variation (cont.)
self-self experiment Duplicate
Ratio Statistics assuming a constant coefficient of variation Ratio Statistics assuming a constant coefficient of variation (cont.) Confidence interval 1. Integrating the ratio density function 2. The C.I. is determined by the parameter c, one can either use the par. derived from pre-selected housekeeping genes or a set of duplicate genes.
Ratio Statistics for low signal- to-noise ratio The actual expression intensity measurement is of the form
Ratio Statistics for low signal- to-noise ratio Ratio Statistics for low signal- to-noise ratio (cont.) Null hypothesis of interest: test statistics :
Ratio Statistics for low signal- to-noise ratio Ratio Statistics for low signal- to-noise ratio (cont.) Major difference: 1. the assumption of a constant cv applies to and, not to and 2. the density of is not applicable SNR (signal-to-noise ratio)
Assuming that are independent, SNR (signal-to-noise ratio)
The Expression intensity scatter plot
Confidence interval for the test statistics Assumption:
Confidence interval for the test statistics Confidence interval for the test statistics (cont.) Under the assumption of constant cv for the signal (without the background),
The 99% confidence interval for ratio statistic
Correction of background estimation Owing to interaction between the fluorescent signal and background, local-background estimation is often biased. To estimate the bias difference, we find the relationship between the red and green intensities under the null hypothesis by assuming a linear relation, G = aR+b.
Correction of background estimation (cont.) Simulation 1. generate 10,000 data points from exp. dist. with 2,000 to simulate 10,000 gene expression levels, 2. The intensity measurement for each channel is further simulated by using a normal dist. with mean intensity from the exp. dist. and a constant cv of simulate background level by a normal dist. (1) no bias: background level ~ N (0,100) (2) some bias: background level ~ N (b,100)
Scatter plot of simulated expression data dog-leg effect
Correction of background estimation (cont.) G = aR+b we employ a chi-square fitting method that minimizes
Quality Metric for Ratio Statistics For a given cDNA target, the following factors affect ratio measurement quality: (1) Weak fluorescent intensities (2) A smaller than normal detected target area (3) A very high local background level (4) A high standard deviation of target intensity
(1)Fluorescent intensity measurement quality Under the null hypothesis, the signal means are equal, so that
(2)Target area measurement quality
(3)Background flatness quality Define background flatness
(4)Signal intensity consistency quality Typical target shap cv=0.48cv=0.45cv=0.31 cv=0.81cv=0.98cv=0.59
(4)Signal intensity consistency quality (4)Signal intensity consistency quality (cont.)