ECS 289A Presentation Jimin Ding Problem & Motivation Two-component Model Estimation for Parameters in above model Define low and high level gene expression.

Slides:



Advertisements
Similar presentations
Autocorrelation and Heteroskedasticity
Advertisements

Tests of Significance for Regression & Correlation b* will equal the population parameter of the slope rather thanbecause beta has another meaning with.
1 Parametric Empirical Bayes Methods for Microarrays 3/7/2011 Copyright © 2011 Dan Nettleton.
From the homework: Distribution of DNA fragments generated by Micrococcal nuclease digestion mean(nucs) = bp median(nucs) = 110 bp sd(nucs+ = 17.3.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Normalization of microarray data
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.
Chapter 10 Simple Regression.
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
More On Preprocessing Javier Cabrera. Outline 1.Transform the data into a scale suitable for analysis. 2.Remove the effects of systematic and obfuscating.
Microarray Data Preprocessing and Clustering Analysis
Differentially expressed genes
Chapter 14 Conducting & Reading Research Baumgartner et al Chapter 14 Inferential Data Analysis.
GCB/CIS 535 Microarray Topics John Tobias November 8th, 2004.
Statistical Methods in Computer Science Hypothesis Testing I: Treatment experiment designs Ido Dagan.
1 Test of significance for small samples Javier Cabrera.
Chapter 11 Multiple Regression.
Topic 3: Regression.
Quantitative Business Analysis for Decision Making Simple Linear Regression.
Statistical Methods in Computer Science Hypothesis Testing I: Treatment experiment designs Ido Dagan.
Linear Regression/Correlation
Generalized Linear Models
9 - 1 Intrinsically Linear Regression Chapter Introduction In Chapter 7 we discussed some deviations from the assumptions of the regression model.
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Inference for regression - Simple linear regression
Multiple testing in high- throughput biology Petter Mostad.
(2) Ratio statistics of gene expression levels and applications to microarray data analysis Bioinformatics, Vol. 18, no. 9, 2002 Yidong Chen, Vishnu Kamat,
- Interfering factors in the comparison of two sample means using unpaired samples may inflate the pooled estimate of variance of test results. - It is.
1 G Lect 10a G Lecture 10a Revisited Example: Okazaki’s inferences from a survey Inferences on correlation Correlation: Power and effect.
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.
Effect Size Estimation in Fixed Factors Between-Groups ANOVA
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Effect Size Estimation in Fixed Factors Between- Groups Anova.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Introduction to analysis of microarray data David Edwards.
SPH 247 Statistical Analysis of Laboratory Data April 9, 2013SPH 247 Statistical Analysis of Laboratory Data1.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
Simple Linear Regression ANOVA for regression (10.2)
Chapter 16 Data Analysis: Testing for Associations.
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
Statistics for Differential Expression Naomi Altman Oct. 06.
Introduction to Biostatistics and Bioinformatics Regression and Correlation.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Model Building and Model Diagnostics Chapter 15.
Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic.
CJT 765: Structural Equation Modeling Class 8: Confirmatory Factory Analysis.
For a specific gene x ij = i th measurement under condition j, i=1,…,6; j=1,2 Is a Specific Gene Differentially Expressed Differential expression.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
Analyzing Expression Data: Clustering and Stats Chapter 16.
- We have samples for each of two conditions. We provide an answer for “Are the two sample means significantly different from each other, or could both.
1 Estimation of Gene-Specific Variance 2/17/2011 Copyright © 2011 Dan Nettleton.
1 Significance analysis of Microarrays (SAM) Applied to the ionizing radiation response Tusher, Tibshirani, Chu (2001) Dafna Shahaf.
Tutorial I: Missing Value Analysis
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.
Statistical Inferences for Variance Objectives: Learn to compare variance of a sample with variance of a population Learn to compare variance of a sample.
Microarray Data Analysis Xuming He Department of Statistics University of Illinois at Urbana-Champaign.
Micro array Data Analysis. Differential Gene Expression Analysis The Experiment Micro-array experiment measures gene expression in Rats (>5000 genes).
Fewer permutations, more accurate P-values Theo A. Knijnenburg 1,*, Lodewyk F. A. Wessels 2, Marcel J. T. Reinders 3 and Ilya Shmulevich 1 1Institute for.
Chapter 4. The Normality Assumption: CLassical Normal Linear Regression Model (CNLRM)
Estimation of Gene-Specific Variance
Generalized Linear Models
Statistics in Applied Science and Technology
Linear Regression.
Pan Du, Simon Lin Robert H. Lurie Comprehensive Cancer Center
Instrumental Variables Estimation and Two Stage Least Squares
Presentation transcript:

ECS 289A Presentation Jimin Ding Problem & Motivation Two-component Model Estimation for Parameters in above model Define low and high level gene expression Comparing expression levels Limitations of the model and method Other possible solutions References

A Model for Measurement Error for Gene Expression Arrays David Rocke & Blythe Durbin Journal of Computational Biology Nov.2001

Problem & Motivation Statistical inference for data need assumption of normality with constant variance --- So hypothesis testing for the difference between control and treatment need equal variance (not depending on the mean of the data); Measurement error for gene expression rises proportionately to the expression level --- So linear regression fails and log transformation has been tried; However, for genes whose expression level is low or entirely unexpressed, the measurement error doesn’t go down proportionately ExampleExample --- So log transformation fails by inflating the variance of observations near background, and two component model is introduced.

Example: Mice From: Barosiewics etatl, 2000

From Durbin et.al 2002 back back

Two-Component Model Y is the intensity measurement is the expression level in arbitrary units is the mean intensity of unexpressed genes Error term:

Estimation for background ( ) Estimation of background using negative controls Estimation of background with replicate measurements DetailDetail Estimation of background without replicate

Estimation of with replicate measurements Begin with a small subset of genes with low intensity (10%) Define a new subset consisting of genes whose intensity values are in Repeat the first and second steps until the set of genes does not change..

Estimation of the High-level RSD The variance of intensity in two-component model:, where At high expression level, only multiple error term is noticeable, so the ratio of the variation to the mean is a constant, i.e. RSD= For each replicated gene that is at high level, compute the mean of the and the standard deviation of Then use the pooled standard deviation to estimate :

Define “high” and “low” Low expression level : Most of the variance is due to the additive error component. 95% CI: High expression level: Most of the variance is due to the multiplicative error component. 95% CI:

Comparing Expression Levels Common method: standard t-test on ratio of expression for treatment and control (low level), or its logarithm (high level). Problem: Less effective when gene is expressed at a low level in one condition and high in the other:

Solution consider treatment and control are correlated Model: Variation: Background: High-level RSD:

Hypothesis testing (Comparison) Assume the data have been adjusted: Testing: (Gene has same expression level at Control and treatment) Then using the following approximate variance to do standard t-test for log ratio of raw data:

Limitations No theoretical result for above estimations. (Consistency and asymptotical distribution) Cutoff point of high level and low level is fairly artificial The convergence of estimation of background information is heavily dependent on data and initial selection

Literature & Other Possible Solutions for Measurement Error Chen et al. (1997): measurement error is normally distributed with constant coefficient of variation (CV)—in accord with experience Ideker et al.(2000) introduce a multiplicative error component (normal) Newton et al. (2001) propose a gamma model for measurement error. Durbin et al.(2002) suggest transformation, where Huber et al.(2002) introduce transformation

References Blythe Durbin, Johanna Hardin, Douglas Hawkins, and David Rocke. “A variancestabilizing transformation from gene-expression microarray data”, Bioinformatics, ISMB, Chen. Y., Dougherty, E.R. and Bittner, M.L.(1997) “Ratio-based decisions and the quantitative analysis of cDNA microarray images”, J.Biomed. Opt.,2, Wolfgang Huber, Anja von Heydebreck,Martin Vingron (Dec.2002) “Analysis of microarray gene expression data”, Preprint Wolfgang Huber, Anja von Heydebreck, Holger S¨ultmann, Annemarie Poustka, and Martin Vingron. “Variance stablization applied to microarray data calibration and to the quantification of differential expression”, Bioinformatics, 18 Suppl. 1:S96–S104, ISMB 2002.