(2) Ratio statistics of gene expression levels and applications to microarray data analysis Bioinformatics, Vol. 18, no. 9, 2002 Yidong Chen, Vishnu Kamat,

Slides:



Advertisements
Similar presentations
SJS SDI_21 Design of Statistical Investigations Stephen Senn 2 Background Stats.
Advertisements

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 9 Inferences Based on Two Samples.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
ECS 289A Presentation Jimin Ding Problem & Motivation Two-component Model Estimation for Parameters in above model Define low and high level gene expression.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Regression Analysis Simple Regression. y = mx + b y = a + bx.
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
Filtering and Normalization of Microarray Gene Expression Data Waclaw Kusnierczyk Norwegian University of Science and Technology Trondheim, Norway.
Correlation Chapter 9.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
OHRI Bioinformatics Introduction to the Significance Analysis of Microarrays application Stem.
Multiple regression analysis
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
A Statistical Framework for the Design of Microarray Experiments and Effective Detection of Differential Gene Expression by Shu-Dong Zhang, Timothy W.
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
Microarray Data Preprocessing and Clustering Analysis
Differentially expressed genes
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
DNA Arrays …DNA systematically arrayed at high density, –virtual genomes for expression studies, RNA hybridization to DNA for expression studies, –comparative.
k r Factorial Designs with Replications r replications of 2 k Experiments –2 k r observations. –Allows estimation of experimental errors Model:
Topic 3: Regression.
Chapter 2 Simple Comparative Experiments
Introduce to Microarray
Bootstrap spatobotp ttaoospbr Hesterberger & Moore, chapter 16 1.
Analysis of microarray data
Filtering and Normalization of Microarray Gene Expression Data Waclaw Kusnierczyk Norwegian University of Science and Technology Trondheim, Norway.
(4) Within-Array Normalization PNAS, vol. 101, no. 5, Feb Jianqing Fan, Paul Tam, George Vande Woude, and Yi Ren.
Correlation & Regression
Quantitative Business Analysis for Decision Making Multiple Linear RegressionAnalysis.
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
Probability Distributions and Test of Hypothesis Ka-Lok Ng Dept. of Bioinformatics Asia University.
1 Design of Engineering Experiments Part 2 – Basic Statistical Concepts Simple comparative experiments –The hypothesis testing framework –The two-sample.
CDNA Microarrays MB206.
1 G Lect 10a G Lecture 10a Revisited Example: Okazaki’s inferences from a survey Inferences on correlation Correlation: Power and effect.
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
Biostatistics Class 6 Hypothesis Testing: One-Sample Inference 2/29/2000.
Examining Relationships in Quantitative Research
Y X 0 X and Y are not perfectly correlated. However, there is on average a positive relationship between Y and X X1X1 X2X2.
Stat 112: Notes 2 Today’s class: Section 3.3. –Full description of simple linear regression model. –Checking the assumptions of the simple linear regression.
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.
Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002.
Microarray Technology. Introduction Introduction –Microarrays are extremely powerful ways to analyze gene expression. –Using a microarray, it is possible.
Correlation & Regression Analysis
Sampling and estimation Petter Mostad
Inferences Concerning Variances
Math 4030 Final Exam Review. Probability (Continuous) Definition of pdf (axioms, finding k) Cdf and probability (integration) Mean and variance (short-cut.
+ Data Analysis Chemistry GT 9/18/14. + Drill The crown that King Hiero of Syracuse gave to Archimedes to analyze had a volume of 575 mL and a mass of.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Henrik Bengtsson Mathematical Statistics Centre for Mathematical Sciences Lund University Plate Effects in cDNA Microarray Data.
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Analyzing circadian expression data by harmonic regression based on autoregressive spectral estimation Rendong Yang and Zhen Su Division of Bioinformatics,
Inference about the slope parameter and correlation
Regression Analysis: Statistical Inference
Math 4030 – 10b Inferences Concerning Variances: Hypothesis Testing
Virtual COMSATS Inferential Statistics Lecture-26
Inferences for Regression
Chapter 2 Simple Comparative Experiments
Experimental Power Graphing Program
Statistical Methods For Engineers
CHAPTER 29: Multiple Regression*
6-1 Introduction To Empirical Models
Product moment correlation
Presentation transcript:

(2) Ratio statistics of gene expression levels and applications to microarray data analysis Bioinformatics, Vol. 18, no. 9, 2002 Yidong Chen, Vishnu Kamat, Edward R. Dougherty, Michael L. Bittner, Paul S. Meltzer1, and Jeffery M. Trent

Outline  Introduction  Ratio Statistics  Quality Metric for Ratio Statistics  Conclusion

Introduction  Motivation Expression-based analysis for large families of genes has recently become possible owing to the development of cDNA microarrays, which allow simultaneous measurement of transcript levels for thousands of genes. For each spot on a microarray, signals in two channels must be extracted from their backgrounds. This requires algorithms to extract signals arising from tagged mRNA hybridized to arrayed cDNA locations and algorithms to determine the significance of signal ratios.

Introduction  Results 1. estimation of signal ratios from the two channels, and the significance of those ratios. 2. a refined hypothesis test is considered in which the measured intensities forming the ratio are assumed to be combinations of signal and background. The new method involves a signal-to-noise ratio, and for a high signal-to-noise ratio the new test reduces (with close approximation) to the original test. The effect of low signal-to-noise ratio on the ratio statistics constitutes the main theme of the paper. 3. a quality metric is formulated for spots

Ratio Statistics

 Consider a microarray having n genes, with red and green fluorescent expression values labeled by and, respectively.  Hypothesis test:  Assumption: Ratio Statistics assuming a constant coefficient of variation

 Ratio test statistics:  Assuming and to be normally and identically distributed, has the density function  Ratio Statistics assuming a constant coefficient of variation Ratio Statistics assuming a constant coefficient of variation (cont.)

 self-self experiment  Duplicate

Ratio Statistics assuming a constant coefficient of variation Ratio Statistics assuming a constant coefficient of variation (cont.)  Confidence interval 1. Integrating the ratio density function 2. The C.I. is determined by the parameter c, one can either use the par. derived from pre-selected housekeeping genes or a set of duplicate genes.

Ratio Statistics for low signal- to-noise ratio  The actual expression intensity measurement is of the form

Ratio Statistics for low signal- to-noise ratio Ratio Statistics for low signal- to-noise ratio (cont.)  Null hypothesis of interest:  test statistics :

Ratio Statistics for low signal- to-noise ratio Ratio Statistics for low signal- to-noise ratio (cont.)  Major difference: 1. the assumption of a constant cv applies to and, not to and 2. the density of is not applicable  SNR (signal-to-noise ratio)

 Assuming that are independent, SNR (signal-to-noise ratio)

The Expression intensity scatter plot

Confidence interval for the test statistics  Assumption: 

Confidence interval for the test statistics Confidence interval for the test statistics (cont.)  Under the assumption of constant cv for the signal (without the background), 

The 99% confidence interval for ratio statistic

Correction of background estimation  Owing to interaction between the fluorescent signal and background, local-background estimation is often biased.  To estimate the bias difference, we find the relationship between the red and green intensities under the null hypothesis by assuming a linear relation, G = aR+b.

Correction of background estimation (cont.)  Simulation 1. generate 10,000 data points from exp. dist. with 2,000 to simulate 10,000 gene expression levels, 2. The intensity measurement for each channel is further simulated by using a normal dist. with mean intensity from the exp. dist. and a constant cv of simulate background level by a normal dist. (1) no bias: background level ~ N (0,100) (2) some bias: background level ~ N (b,100)

Scatter plot of simulated expression data dog-leg effect

Correction of background estimation (cont.)  G = aR+b we employ a chi-square fitting method that minimizes

Quality Metric for Ratio Statistics  For a given cDNA target, the following factors affect ratio measurement quality: (1) Weak fluorescent intensities (2) A smaller than normal detected target area (3) A very high local background level (4) A high standard deviation of target intensity

(1)Fluorescent intensity measurement quality  Under the null hypothesis, the signal means are equal, so that 

(2)Target area measurement quality    

(3)Background flatness quality  Define background flatness

(4)Signal intensity consistency quality Typical target shap cv=0.48cv=0.45cv=0.31 cv=0.81cv=0.98cv=0.59

(4)Signal intensity consistency quality (4)Signal intensity consistency quality (cont.)