Course overview Tuesday lecture –Those not presenting turn in short review of a paper using the method being discussed Thursday computer lab –Turn in short.

Slides:



Advertisements
Similar presentations
Introduction to Monte Carlo Markov chain (MCMC) methods
Advertisements

Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Markov-Chain Monte Carlo
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Inferential Statistics & Hypothesis Testing
Bayesian inference Gil McVean, Department of Statistics Monday 17 th November 2008.
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
Estimating parameters from data Gil McVean, Department of Statistics Tuesday 3 rd November 2009.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Bayesian Inference Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical.
Data Mining Techniques Outline
Statistical inference form observational data Parameter estimation: Method of moments Use the data you have to calculate first and second moment To fit.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Bayesian Inference Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical.
Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 14 Goodness-of-Fit Tests and Categorical Data Analysis.
Inference about a Mean Part II
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Inferences About Process Quality
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
Statistical inference Population - collection of all subjects or objects of interest (not necessarily people) Sample - subset of the population used to.
5-3 Inference on the Means of Two Populations, Variances Unknown
Lecture 9: p-value functions and intro to Bayesian thinking Matthew Fox Advanced Epidemiology.
AM Recitation 2/10/11.
Chapter 13: Inference in Regression
Chapter 8 Inferences Based on a Single Sample: Tests of Hypothesis.
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 2 – Slide 1 of 25 Chapter 11 Section 2 Inference about Two Means: Independent.
WSEAS AIKED, Cambridge, Feature Importance in Bayesian Assessment of Newborn Brain Maturity from EEG Livia Jakaite, Vitaly Schetinin and Carsten.
Is this quarter fair? How could you determine this? You assume that flipping the coin a large number of times would result in heads half the time (i.e.,
+ Chapter 9 Summary. + Section 9.1 Significance Tests: The Basics After this section, you should be able to… STATE correct hypotheses for a significance.
The Triangle of Statistical Inference: Likelihoood
Introduction to MCMC and BUGS. Computational problems More parameters -> even more parameter combinations Exact computation and grid approximation become.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 22 Using Inferential Statistics to Test Hypotheses.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
PROBABILITY (6MTCOAE205) Chapter 6 Estimation. Confidence Intervals Contents of this chapter: Confidence Intervals for the Population Mean, μ when Population.
Mid-Term Review Final Review Statistical for Business (1)(2)
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
Ch9. Inferences Concerning Proportions. Outline Estimation of Proportions Hypothesis concerning one Proportion Hypothesis concerning several proportions.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
Likelihood Methods in Ecology November 16 th – 20 th, 2009 Millbrook, NY Instructors: Charles Canham and María Uriarte Teaching Assistant Liza Comita.
Chapter 7 Inferences Based on a Single Sample: Tests of Hypotheses.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Experimental Design Experimental Designs An Overview.
Molecular Systematics
Correlation Assume you have two measurements, x and y, on a set of objects, and would like to know if x and y are related. If they are directly related,
Selecting Input Probability Distribution. Simulation Machine Simulation can be considered as an Engine with input and output as follows: Simulation Engine.
Bayesian Reasoning: Tempering & Sampling A/Prof Geraint F. Lewis Rm 560:
Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.
Bayes Theorem. Prior Probabilities On way to party, you ask “Has Karl already had too many beers?” Your prior probabilities are 20% yes, 80% no.
G. Cowan Lectures on Statistical Data Analysis Lecture 8 page 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem 2Random variables and.
Tutorial I: Missing Value Analysis
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Bayesian statistics named after the Reverend Mr Bayes based on the concept that you can estimate the statistical properties of a system after measuting.
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
BIOL 582 Lecture Set 2 Inferential Statistics, Hypotheses, and Resampling.
Lec. 19 – Hypothesis Testing: The Null and Types of Error.
Model Comparison.
Overview G. Jogesh Babu. R Programming environment Introduction to R programming language R is an integrated suite of software facilities for data manipulation,
MCMC Output & Metropolis-Hastings Algorithm Part I
Bayesian inference Presented by Amir Hadadi
When we free ourselves of desire,
(Very Brief) Introduction to Bayesian Statistics
Discrete Event Simulation - 4
Statistical Inference about Regression
Presentation transcript:

Course overview Tuesday lecture –Those not presenting turn in short review of a paper using the method being discussed Thursday computer lab –Turn in short write up from previous computer lab. –Some methods take some computational time, so bring something else along.

Maximum Likelihood, MCMC, and Bayes Theorem Pop Quiz

Maximum Likelihood, MCMC, and Bayes Theorem

Maximum likelihood methods find the parameters most likely to produce the data observed given a specific model.

The likelihood (L) is the probability of the data given the hypothesis (or parameter value). L = P(data|hypothesis)

We will use ML for variety of calculations What is the ML estimate of d N /d S for a codon alignment? What parameters optimize conserved versus neutral regions? What number of populations maximizes the likelihood?

Probability of # heads in 5 coin tosses Heads Prob P(x) = (n!/(n-x)!)p x q n-x What is maximum likelihood? Comparison to probability theory:

Bias of coin towards Heads Heads Same calculation for coins with different bias.

Bias of coin towards Heads Heads Probabilities sum to 1 Same calculation for coins with different bias.

Bias of coin towards Heads Heads Probabilities sum to 1 Same calculation for coins with different bias. Observe 1 head in 5 coin tosses, what is the Maximum likelihood estimate for coin bias? Want to determine bias from observed # heads The likelihood (L) is the probability of the data given the hypothesis (or parameter value). L = P(data|hypothesis)

Bias of coin towards Heads Heads Same calculation for coins with different bias. Likelihoods do not sum to 1. Maximum is parameter that best fits the observed data. The likelihood (L) is the probability of the data given the hypothesis (or parameter value). L = P(data|hypothesis) Want to determine bias from observed # heads

Bias of coin towards Heads Heads Same calculation for coins with different bias. Want to determine bias from observed # heads L = P(data|bias) Likelihoods usually represented as ln(L), So looking for least negative value. Ln(0.33) = -1.1 ln (0.36) = Ln (0.16) = -1.8

One use of maximum likelihood for phylogeny inference. The likelihood (L) of a tree is the probability of the data given the tree and model (hypothesis or parameter value). L = P(data|tree) Problem is that there are LOTS of possible trees (hypotheses).

How do we calculate/estimate likelihoods?

One way to get the likelihood is to estimate them using Markov Chain Monte Carlo (MCMC) methods. -analogy to walking up hill.

Parameter estimation is made by changing values, estimating likelihood, and repeating until the function has been maximized. Parameter estimate Likelihood

-Graphs from John Huelsenbeck

Markov Chain Monte Carlo Start with proposed state Perturb old state and calculate probability of new state Test if new state is better than old state, accept if ratio of new to old is greater than a randomly drawn number between 0 and 1. Move to new state if accepted, if not stay at old state Start over

Markov Chain Monte Carlo Start with proposed state Perturb old state and calculate probability of new state Test if new state is better than old state, accept if ratio of new to old is greater than a randomly drawn number between 0 and 1. Move to new state if accepted, if not stay at old state Start over

Markov Chain Monte Carlo Start with proposed state Perturb old state and calculate probability of new state Test if new state is better than old state, accept if ratio of new to old is greater than a randomly drawn number between 0 and 1. Move to new state if accepted, if not stay at old state Start over

Markov Chain Monte Carlo Start with proposed state Perturb old state and calculate probability of new state Test if new state is better than old state, accept if ratio of new to old is greater than a randomly drawn number between 0 and 1. Move to new state if accepted, if not stay at old state Start over

Markov Chain Monte Carlo Start with proposed state Perturb old state and calculate probability of new state Test if new state is better than old state, accept if ratio of new to old is greater than a randomly drawn number between 0 and 1. Move to new state if accepted, if not stay at old state Start over

Circle represents amount of potential proposed change.

Repeat steps until you find the peak.

What is the “answer” Peak = maximum likelihood Mean Mode Median Credible set (ie with confidence interval)

How do you know if you reached the “peak” (maximum likelihood)?

Convergence = tested all of likelihood surface and found maximum - example which did not converge (chain caught in local optima)

Convergence = tested all of likelihood surface and found maximum Check convergence by starting different initial estimates increase the amount values altered for parameter optimization rerun the analysis several times run for very long time

MC-Robot demo:

Any other methods to explore parameter space?

Heated chains Heating refers to powering the acceptance - rejection formula (new height/old height). –If greater than 1, always accept –If <1 and cold chain accept with p = 0.25, twice as hot chain would accept with p 1/2 = 0.5

Cold chain

Hot Chain - notice peaks lower

Swap between chains

MC-Robot demo

Now what to do with likelihoods Compare models –Likelihood ratio test (LRT) –Is the d N /d S ratio for gene A significantly different from 1? –Does a selection model fit better than a neutral model Compute posterior probabilities –Bayes Theorem –Determine confidence

Compare two nested models (LRT) For each additional parameter added, the likelihood should always improve –How do you determine if improvement is significant?

Compare two nested models Free estimate (H 1 ). Estimate of d N /d S (  ) from alignment. –1 parameter estimated –lnL = Null model (H 0 ). Fix d N /d S (  ) to be 1. –0 parameters estimated (all fixed) –lnL = -90

Compare two nested models Free estimate (H 1 ). Estimate of d N /d S (  ) from alignment. –1 parameter estimated –lnL = Null model (H 0 ). Fix d N /d S (  ) to be 1. –0 parameters estimated (all fixed) –lnL = -90 LRT statistic is -2[lnL(H 1 ) - lnL(H 0 )] Degrees of freedom = difference # parameters Compare to  2 distribution

Significance level DF Chi-square table:

Is it really so simple?

M7 is the Null model (H 0 ). M8 is the alternate (H 1 ) With 2 additional parameters

What other information can you get from MCMC methods?

Example from Box 3, Lewis 2001 Urn A contains 40% black marbles Urn B contains 80% black marbles What is the likelihood that a black marble came from Urn A? Urn B? What is the posterior probability that a black marble came from Urn A? Urn B? Bayes’s theorem and posterior probability:

Ass with likelihoods, this is difficult to calculate. Can we use MCMC?

For an appropriately constructed and adequately run Markov chain, the proportion of the time any parameter value is visited is a valid approximation of the posterior probability of that parameter (= Bayes)

Red dots = “burn in” period. generations lnL

Red dots = “burn in” period. generations lnL Burn in period

Advantages Bayesian perspective Gives probability of hypothesis of interest –Likelihood probability data given hypothesis Compare to bootstrap Disadvantage is subjectivity of prior

Two type of Bayes inference we will see Naïve Empirical Bayes (NEB) –Assumes parameter estimates from likelihood are exact Bayes Empirical Bayes (BEB) –Takes into account error in likelihood estimates of parameters

Some things to consider when running MCMC analyses Number of generations Number of chains Burn-in period Convergence

Some uses of MCMC Phylogeny Models of codon evolution (PAML-April 6) Conserved regions (Shadowing-April 13) Mutation rates Migration rates Population structure (Structure-April 20)