Download presentation
Presentation is loading. Please wait.
Published byTamsyn Shelton Modified over 9 years ago
1
(2) Ratio statistics of gene expression levels and applications to microarray data analysis Bioinformatics, Vol. 18, no. 9, 2002 Yidong Chen, Vishnu Kamat, Edward R. Dougherty, Michael L. Bittner, Paul S. Meltzer1, and Jeffery M. Trent
2
Outline Introduction Ratio Statistics Quality Metric for Ratio Statistics Conclusion
3
Introduction Motivation Expression-based analysis for large families of genes has recently become possible owing to the development of cDNA microarrays, which allow simultaneous measurement of transcript levels for thousands of genes. For each spot on a microarray, signals in two channels must be extracted from their backgrounds. This requires algorithms to extract signals arising from tagged mRNA hybridized to arrayed cDNA locations and algorithms to determine the significance of signal ratios.
4
Introduction Results 1. estimation of signal ratios from the two channels, and the significance of those ratios. 2. a refined hypothesis test is considered in which the measured intensities forming the ratio are assumed to be combinations of signal and background. The new method involves a signal-to-noise ratio, and for a high signal-to-noise ratio the new test reduces (with close approximation) to the original test. The effect of low signal-to-noise ratio on the ratio statistics constitutes the main theme of the paper. 3. a quality metric is formulated for spots
5
Ratio Statistics
6
Consider a microarray having n genes, with red and green fluorescent expression values labeled by and, respectively. Hypothesis test: Assumption: Ratio Statistics assuming a constant coefficient of variation
7
Ratio test statistics: Assuming and to be normally and identically distributed, has the density function Ratio Statistics assuming a constant coefficient of variation Ratio Statistics assuming a constant coefficient of variation (cont.)
8
self-self experiment Duplicate
9
Ratio Statistics assuming a constant coefficient of variation Ratio Statistics assuming a constant coefficient of variation (cont.) Confidence interval 1. Integrating the ratio density function 2. The C.I. is determined by the parameter c, one can either use the par. derived from pre-selected housekeeping genes or a set of duplicate genes.
10
Ratio Statistics for low signal- to-noise ratio The actual expression intensity measurement is of the form
11
Ratio Statistics for low signal- to-noise ratio Ratio Statistics for low signal- to-noise ratio (cont.) Null hypothesis of interest: test statistics :
12
Ratio Statistics for low signal- to-noise ratio Ratio Statistics for low signal- to-noise ratio (cont.) Major difference: 1. the assumption of a constant cv applies to and, not to and 2. the density of is not applicable SNR (signal-to-noise ratio)
13
Assuming that are independent, SNR (signal-to-noise ratio)
14
The Expression intensity scatter plot
15
Confidence interval for the test statistics Assumption:
16
Confidence interval for the test statistics Confidence interval for the test statistics (cont.) Under the assumption of constant cv for the signal (without the background),
17
The 99% confidence interval for ratio statistic
18
Correction of background estimation Owing to interaction between the fluorescent signal and background, local-background estimation is often biased. To estimate the bias difference, we find the relationship between the red and green intensities under the null hypothesis by assuming a linear relation, G = aR+b.
19
Correction of background estimation (cont.) Simulation 1. generate 10,000 data points from exp. dist. with 2,000 to simulate 10,000 gene expression levels, 2. The intensity measurement for each channel is further simulated by using a normal dist. with mean intensity from the exp. dist. and a constant cv of 0.2 3. simulate background level by a normal dist. (1) no bias: background level ~ N (0,100) (2) some bias: background level ~ N (b,100)
20
Scatter plot of simulated expression data dog-leg effect
21
Correction of background estimation (cont.) G = aR+b we employ a chi-square fitting method that minimizes
22
Quality Metric for Ratio Statistics For a given cDNA target, the following factors affect ratio measurement quality: (1) Weak fluorescent intensities (2) A smaller than normal detected target area (3) A very high local background level (4) A high standard deviation of target intensity
23
(1)Fluorescent intensity measurement quality Under the null hypothesis, the signal means are equal, so that
24
(2)Target area measurement quality
25
(3)Background flatness quality Define background flatness
26
(4)Signal intensity consistency quality Typical target shap cv=0.48cv=0.45cv=0.31 cv=0.81cv=0.98cv=0.59
27
(4)Signal intensity consistency quality (4)Signal intensity consistency quality (cont.)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.