Parametric Empirical Bayes Methods for Microarrays

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 10 The Analysis of Variance.
1 Parametric Empirical Bayes Methods for Microarrays 3/7/2011 Copyright © 2011 Dan Nettleton.
Hypothesis testing and confidence intervals by resampling by J. Kárász.
CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.
Week11 Parameter, Statistic and Random Samples A parameter is a number that describes the population. It is a fixed number, but in practice we do not know.
2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)
Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Outline input analysis input analyzer of ARENA parameter estimation
Darya Chudova, Alexander Ihler, Kevin K. Lin, Bogi Andersen and Padhraic Smyth BIOINFORMATICS Gene expression Vol. 25 no , pages
Use of moment generating functions. Definition Let X denote a random variable with probability density function f(x) if continuous (probability mass function.
Visual Recognition Tutorial
Continuous Random Variables and Probability Distributions
Part 2b Parameter Estimation CSE717, FALL 2008 CUBS, Univ at Buffalo.
A gentle introduction to Gaussian distribution. Review Random variable Coin flip experiment X = 0X = 1 X: Random variable.
1 Test of significance for small samples Javier Cabrera.
Visual Recognition Tutorial
Continuous Random Variables and Probability Distributions
Maximum Likelihood Estimation
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Functions of Random Variables. Methods for determining the distribution of functions of Random Variables 1.Distribution function method 2.Moment generating.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Maximum Likelihood - "Frequentist" inference x 1,x 2,....,x n ~ iid N( ,  2 ) Joint pdf for the whole random sample Maximum likelihood estimates.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
1 Introduction to Mixed Linear Models in Microarray Experiments 2/1/2011 Copyright © 2011 Dan Nettleton.
Confidence Interval & Unbiased Estimator Review and Foreword.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Point Estimation of Parameters and Sampling Distributions Outlines:  Sampling Distributions and the central limit theorem  Point estimation  Methods.
1 Estimation of Gene-Specific Variance 2/17/2011 Copyright © 2011 Dan Nettleton.
Statistical Estimation Vasileios Hatzivassiloglou University of Texas at Dallas.
Chapter 6 parameter estimation
1 Optimizing Decisions over the Long-term in the Presence of Uncertain Response Edward Kambour.
Continuous Random Variables and Probability Distributions
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
Univariate Gaussian Case (Cont.)
Bayesian Models for Gene expression With DNA Microarray Data Joseph G. Ibrahim, Ming-Hui Chen, and Robert J. Gray Presented by Yong Zhang.
Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Data Modeling Patrice Koehl Department of Biological Sciences
Applied statistics Usman Roshan.
Estimation of Gene-Specific Variance
Univariate Gaussian Case (Cont.)
Mixture Modeling of the p-value Distribution
Standard Errors Beside reporting a value of a point estimate we should consider some indication of its precision. For this we usually quote standard error.
Usman Roshan CS 675 Machine Learning
MTH 161: Introduction To Statistics
Mixture Modeling of the Distribution of p-values from t-tests
Mixture modeling of the distribution of p-values from t-tests
Uniform-Beta Mixture Modeling of the p-value Distribution
Latent Variables, Mixture Models and EM
Market Risk VaR: Historical Simulation Approach
Probability & Statistics Probability Theory Mathematical Probability Models Event Relationships Distributions of Random Variables Continuous Random.
Bootstrap - Example Suppose we have an estimator of a parameter and we want to express its accuracy by its standard error but its sampling distribution.
More about Normal Distributions
CS498-EA Reasoning in AI Lecture #20
POINT ESTIMATOR OF PARAMETERS
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Integration of sensory modalities
Introduction to Mixed Linear Models in Microarray Experiments
LECTURE 09: BAYESIAN LEARNING
Parametric Methods Berlin Chen, 2005 References:
Learning From Observed Data
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions July chonbuk national university.
Chapter 5 Continuous Random Variables and Probability Distributions
Presentation transcript:

Parametric Empirical Bayes Methods for Microarrays 4/6/2009 Copyright © 2009 Dan Nettleton

Parametric Empirical Bayes Methods for Microarrays Kendziorski, C. M., Newton, M. A., Lan, H., Gould, M. N. (2003). On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles. Statistics in Medicine. 22, 3899-3914. Newton, M. A. and Kendziorski, C. M. (2003). Parametric empirical Bayes methods for microarrays. Chapter 11 of The Analysis of Gene Expression Data. Springer. New York.

The Gamma Distribution X ~ Gamma(α, β) f(x)= xα-1e-βx for x>0. E(X)=α / β Var(X)= α / β2 α=1 β=.0001 α=340 β=.006 βα Г(α) α=33 β=.0008 density α=2 β=.00008 x

A Model for the Data from a Two-Treatment Experiment Assume there are J genes indexed by j=1, 2, ..., J. Data for gene j is xj = (xj1, xj2, ..., xjI) where xji is the normalized measure of expression on the original scale for the jth gene and ith experimental unit. Let s1 denote the subset of the indices {1,...,I} corresponding to treatment 1. Let s2 denote the subset of the indices {1,...,I} corresponding to treatment 2.

The Model (continued) Assume that each gene is differentially expressed (DE) with an unknown probability p, and equivalently expressed (EE) with probability 1-p. If gene j is equivalently expressed, then xj1, xj2, ..., xjI ~ Gamma(α , λj) with mean α / λj , where λj ~ Gamma(α0 , ν) i.i.d.

The Model (continued) If gene j is differentially expressed, then {xji : i in s1} ~ Gamma(α , λj1) with mean α / λj1 , where λj1 ~ Gamma(α0 , ν), and {xji : i in s2} ~ Gamma(α , λj2) with mean α / λj2 , where λj2 ~ Gamma(α0 , ν). All random variables are assumed to be independent. p, α, α0 , and ν are unknown parameters to be estimated from the data. i.i.d. i.i.d.

An example of how the model is imagined to generate the data for the jth gene. Suppose p=0.05, α=12, α0=0.9, and v=36. Generate a Bernoulli random variable with success probability 0.05. If the result is a success the gene is DE, otherwise the gene is EE. If EE, generate λj from Gamma(α0=0.9, v=36). Then generate i.i.d. expression values from Gamma(α=12, λj).

If gene is EE... Expression values for the jth gene. Trt 1 and Trt 2 density x Gamma(α=12, λj=0.05) Density Gamma(α0=0.9, v=36) Density density Expression values for the jth gene. Trt 1 and Trt 2 λj

Example Continued If the gene is DE, generate λj1 and λj2 independently from Gamma(α0=0.9, v=36) . Then generate treatment 1 expression values i.i.d. from Gamma(α=12, λj1), and generate treatment 2 expression values i.i.d. from Gamma(α=12, λj2).

If gene is DE... density x Gamma(α=12, λj1=0.02) Gamma(α=12, λj2=0.07) Gamma(α0=0.9, v=36) Density density λ

If gene is DE... Trt 2 Data Trt 1 Data density x Gamma(α=12, λj1=0.02) Gamma(α0=0.9, v=36) Density density λ Trt 2 Data Trt 1 Data

Coefficient of Variation is Constant across Gene-Treatment Combinations Coefficient of Variation = CV = sd / mean Conditional on the mean for a gene-treatment combination, say α / λjk, the CV for the expression data is the CV of Gamma(α, λjk). CV of Gamma(α, λjk) is (α1/2/λjk)/(α/λjk)=1/α1/2. Note that α is assumed to be the same for all gene-treatment combinations.

Marginal Density for Gene j f(xj) = p fDE(xj) + (1-p) fEE(xj) Marginal Likelihood for the Observed Data ... f(x1) f(x2) f(xJ) Use the EM algorithm to find values of p, α, α0, and v that make the log likelihood as large as possible.

with their maximum likelihood estimates. The posterior probability of differential expression for gene j is obtained by replacing p, α, α0, and v in p fDE(xj) p fDE(xj) + (1-p) fEE(xj) with their maximum likelihood estimates. Software for EBArrays is available at http://www.biostat.wisc.edu/~kendzior.

Extension to Multiple Treatment Groups If there are 3 treatment groups, each gene can be classified into 5 categories rather than just the two categories EE and DE: a) 1=2=3 b) 1=2≠3 c) 1≠2=3 d) 1=3≠2 e) 1≠2 , 2≠3, 1≠3. Extensions to more than 3 groups can be handled similarly.