1 QTL mapping in mice, cont. Lecture 11, Statistics 246 February 26, 2004.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Mean, Proportion, CLT Bootstrap
CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Integration of sensory modalities
Model Assessment, Selection and Averaging
Model assessment and cross-validation - overview
Segmentation and Fitting Using Probabilistic Methods
1 QTL mapping in mice Lecture 10, Statistics 246 February 24, 2004.
Visual Recognition Tutorial
Maximum likelihood estimates What are they and why do we care? Relationship to AIC and other model selection criteria.
1 How many genes? Mapping mouse traits, cont. Lecture 2B, Statistics 246 January 22, 2004.
Maximum likelihood (ML) and likelihood ratio (LR) test
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Point estimation, interval estimation
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Linear Regression Models Based on Chapter 3 of Hastie, Tibshirani and Friedman Slides by David Madigan.
Maximum likelihood (ML) and likelihood ratio (LR) test
Maximum Likelihood (ML), Expectation Maximization (EM)
Visual Recognition Tutorial
Linear and generalised linear models
EM Algorithm Likelihood, Mixture Models and Clustering.
Maximum likelihood (ML)
Sample Size Determination Ziad Taib March 7, 2014.
Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.
Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department.
沈致远. Test error(generalization error): the expected prediction error over an independent test sample Training error: the average loss over the training.
Investment Analysis and Portfolio management Lecture: 24 Course Code: MBF702.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Stats for Engineers Lecture 9. Summary From Last Time Confidence Intervals for the mean t-tables Q Student t-distribution.
EM and expected complete log-likelihood Mixture of Experts
Model Inference and Averaging
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
A statistical model Μ is a set of distributions (or regression functions), e.g., all uni-modal, smooth distributions. Μ is called a parametric model if.
Maximum Likelihood Estimation Methods of Economic Investigation Lecture 17.
1 Lecture 16: Point Estimation Concepts and Methods Devore, Ch
ELEC 303 – Random Signals Lecture 18 – Classical Statistical Inference, Dr. Farinaz Koushanfar ECE Dept., Rice University Nov 4, 2010.
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
Consistency An estimator is a consistent estimator of θ, if , i.e., if
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
INTRODUCTION TO Machine Learning 3rd Edition
CS Statistical Machine learning Lecture 24
Lecture 24: Quantitative Traits IV Date: 11/14/02  Sources of genetic variation additive dominance epistatic.
Lecture 21: Quantitative Traits I Date: 11/05/02  Review: covariance, regression, etc  Introduction to quantitative genetics.
Lecture 2: Statistical learning primer for biologists
Population structure at QTL d A B C D E Q F G H a b c d e q f g h The population content at a quantitative trait locus (backcross, RIL, DH). Can be deduced.
Machine Learning 5. Parametric Methods.
Lecture 1: Basic Statistical Tools. A random variable (RV) = outcome (realization) not a set value, but rather drawn from some probability distribution.
Lecture 22: Quantitative Traits II
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
The Mixed Effects Model - Introduction In many situations, one of the factors of interest will have its levels chosen because they are of specific interest.
1 QTL mapping in mice, completed. Lecture 12, Statistics 246 March 2, 2004.
R. Kass/W03 P416 Lecture 5 l Suppose we are trying to measure the true value of some quantity (x T ). u We make repeated measurements of this quantity.
Introduction We consider the data of ~1800 phenotype measurements Each mouse has a given probability distribution of descending from one of 8 possible.
Data Mining Lecture 11.
Latent Variables, Mixture Models and EM
Gene mapping in mice Karl W Broman Department of Biostatistics
Bayesian Models in Machine Learning
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Parametric Methods Berlin Chen, 2005 References:
Lecture 9: QTL Mapping II: Outbred Populations
Quantitative Trait Locus (QTL) Mapping
Presentation transcript:

1 QTL mapping in mice, cont. Lecture 11, Statistics 246 February 26, 2004

2 Maximum likelihood fitting of a two-component normal mixture model using the EM algorithm This is the simplest case. After dealing with it, we’ll turn to our QTL mixture models. Suppose that y 1, …, y n are i.i.d. random variables having normal mixture density (1-p)  ((y-  0 )/  ) + p  ((y-  1 )/  ), where  is the standard normal density. Our interest is usually in estimating p,  0,  1 and , though in our QTL problem, we know p. One approach to the problem is to introduce i.i.d Bernoulli r.v. z 1, …, z n, such that pr(z i = 1) = p, and pr(y i | z i ) =  ((y-  zi )/  ). In this case, the marginal distribution of y 1, …, y n is the above normal mixture. In the spirit of the EM-algorithm, we consider the full data to be (y 1,z 1 ), …, (y n,z n ) with z 1, …, z n missing; equivalently, the z 1, …, z n are viewed as random variables augmenting the y 1,…, y n.

3 EM generalities Write y = (y 1,…, y n ), z = (z 1,…,z n ), and  = (p,  0,  1,  ). The EM proceeds by considering the expectation of full data log-likelihood log pr(y,z;  ), given y, rather than the observed data log-likelihood log pr(y;  ). It alternates between the E-step, which involves forming Q( ,  old ) = E old {log pr(y,z ;  ) | y}, where E old denotes the expectation (given y) taken with an initial value  old, and the M-step, which involves calculating  new = arg max  Q( ,  old ). The key fact, which justifies the widespread use of the EM-algorithm, is that if we follow this 2-step process, log pr(y;  new ) ≥ log pr(y;  old ). Exercise. Prove this last fact. (Hint: Consider the ratio pr(y;  new )/pr(y;  old ). Multiply top and bottom by pr(y,z;  old ), sum over z, and use Jensen’s inequality.) The EM-algorithm could not be presented as having anything to do with maximum likelihood were it not for the fact that under certain conditions, E  {log pr(y,z ;  ) | y}, = log pr(y;  ). Exercise. Describe circumstances under which this is true, and prove it.

4 EM generalities, cont. Of course the whole point of using the EM algorithm is that the E and M steps should both be “easy”, e.g. given by simple closed form expressions, while direct maximization should be “hard”. As we will shortly see, this is the case in our normal mixture model problem, up to a point. But for the reasons which follow, not everyone will deal with normal mixture models via the EM. As always, a price is paid for “ ease”. If we do not have ready access to the observed data log-likelihood, log pr(  ; y) - and usually when we do, we would not consider using the EM-algorithm - we have to work harder to get the asymptotic standard errors and confidence intervals for the MLE,  . There are also a number of important details to consider with the EM: starting points, stopping rules, the rate of convergence, the issue of local vs global maxima, and so on. Some of these are issues whatever approach one adopts, while others are EM-specific. The EM is not a magic bullet, but its conceptual simplicity, and its ease coding have made it very popular in genetics and bioinformatics. Many problems do indeed benefit from the missing or agumented data perspective.

5 Normal mixture models via the EM In the normal mixture problem, -2log pr(  ; y,z) with known p takes the form Because of the linearity of this expression, the E-step simply consists of replacing I(z i = 0) by pr(z i = 0 | y i,  old ), and similarly for I(z i = 1). This is a quite general phenomenon. Exercise: Check these assertions and carry out the calculations.

6 Normal mixture models via the EM, cont Having completed this, the M-step leads to with a similar formula for  1 new. Exercise: Derive these formulae, and also one for  new. With many data sets, these quantities converge after a several hundred iterations. Exercise: Repeat this exercise for the case of unknown mixing probability p, obtaining the MLE of p. (Guess the answer first.)

7 Putting it together: the EM-algorithm in mouse QTL mapping. Now let’s work out the contribution of one mouse to the full data log- likelihood function log pr(y, m, g ;  ), for an F 2 intercross. Here y denotes all the phenotype (QT) data, m all the marker data, and g all the “missing” genotype data at a putative QTL located at a position z on a chromosome, and  = (  A,  H,  B,  ) are the parameters in the single QTL model. Let non-bold symbols denote the same data for just one mouse. First, note that we can factorize this mouse’s contribution as follows, pr(y, m, g ;  ) = pr(m)  pr(g | m)  pr(y | g;  ), since pr(y | m, g ;  ) = pr(y | g ;  ) (why?). Now pr(m) can be ignored, as it does not involve the parameters, while pr(g | m) gives the probabilities of this mouse having A, H or B at z, given all its marker data, i.e. the mixture component probabilities for this mouse. As previously mentioned, this is an HMM calculation, i.e. another EM. Finally, pr(y | g;  ) is just the mixture distribution for that mouse, and as in the previous discussion, the current probabilities of the mouse being A, H or B on the basis of its y weight its contributions to Q( ,  old ).

8 Multiple QTL methods Why consider multiple QTL at once? To separate linked QTL. If two QTL are close together on the same chromosome, our one-at-a-time strategy may have problems finding either (e.g. if they work in opposite directions, or interact). Our LOD scores won’t make sense either. To permit the investigation of interactions. It may be that interactions greatly strengthen our ability to find QTL, though this is not clear. The main reason for considering multiple QTL simultaneously is To reduce residual variation. If QTL exist at loci other than the one we are currently considering, they should be in our model. For if they are not, they will be in the error, and hence reduce our ability to detect the current one. See below.

9 Abstractions/Simplifications First pass: For a BC, assume Complete marker data QTL are at marker loci QTL act additively

10 The problem n backcross mice; M markers in all, with at most a handful expected to be near QTL x ij = genotype (0/1) of mouse i at marker j y i = phenotype (trait value) of mouse i Y i =  + ∑ j=1 M  j x ij +  j Which  j  0 ?  Variable selection in linear models (regression)

11 Variable selection in linear models: a few generalities There is a huge literature on the selection of variables in linear models. Why? First, if we leave out variables which should be in our model, we risk biasing the parameter estimates of those we include, and our “error” has systematic components, reducing our ability to identify those which should be there. Second, when we includes variables in a model which do not need to be there, we reduce the precision of the estimates of the ones which are there. We need to include all the relevant (perhaps important is a better word), and no irrelevant variables in our model. This is an instance of classic statistical trade-off of bias and variance: too few variables, higher bias; too many, higher variance. The literature contains a host of methods of addressing this problem, known as model or variable selection methods. Some are general, i.e. not just for linear models, e.g. the Akaike information criterion (AIC), the Bayes information criterion (BIC), or cross-validation (CV). Others are specifically designed for linear models, e.g. ridge regression, Mallows’ C p, and the lasso. We don’t have the time to offer a thorough review of this topic, but we’ll comment on and compare a few approaches.

12 Special features of our problem We have a known joint distribution of the covariates (marker genotypes), and these are Markovian We may have lots of missing covariate information, but it can be dealt with. Our aim is to identify the key players (markers), not to minimize prediction error or find the “true model” Conclusion: Our problem is not a generic linear model selection problem, and this offers the hope that we can do better than we might in general.