Reverend Thomas Bayes ( )

Slides:



Advertisements
Similar presentations
Introduction to Monte Carlo Markov chain (MCMC) methods
Advertisements

Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Phylip PHYLIP (the PHYLogeny Inference Package) is a package of programs for inferring phylogenies (evolutionary trees). PHYLIP is the most widely-distributed.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Introduction to Phylogenies
ATPase dataset -> nj in figtree. ATPase dataset -> muscle -> phyml (with ASRV)– re-rooted.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
Molecular Evolution Revised 29/12/06
Current Approaches to Whole Genome Phylogenetic Analysis Hongli Li.
BIOE 109 Summer 2009 Lecture 4- Part II Phylogenetic Inference.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Bayesian Inference Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical.
Phylip PHYLIP (the PHYLogeny Inference Package) is a package of programs for inferring phylogenies (evolutionary trees). PHYLIP is the most widely-distributed.
Sequence alignment: Removing ambiguous positions: Generation of pseudosamples: Calculating and evaluating phylogenies: Comparing phylogenies: Comparing.
Lecture 5: Learning models using EM
Course overview Tuesday lecture –Those not presenting turn in short review of a paper using the method being discussed Thursday computer lab –Turn in short.
Phylogenetic reconstruction - How
Steps of the phylogenetic analysis
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Gene transfer Organismal tree: species B species A species C species D Gene Transfer seq. from B seq. from A seq. from C seq. from D molecular tree: speciation.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Bayesian Inference Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical.
Trees – what might they mean? Calculating a tree is comparatively easy, figuring out what it might mean is much more difficult. If this is the probable.
What is it good for? Gene duplication events can provide an outgroup that allows rooting a molecular phylogeny. Most famously this principle was applied.
Computer vision: models, learning and inference
Probabilistic methods for phylogenetic trees (Part 2)
Phylogenetic Analysis. 2 Phylogenetic Analysis Overview Insight into evolutionary relationships Inferring or estimating these evolutionary relationships.
Trees – what might they mean? Calculating a tree is comparatively easy, figuring out what it might mean is much more difficult. If this is the probable.
MCB5472 Computer methods in molecular evolution Lecture 3/31/2014.
1 Bayesian methods for parameter estimation and data assimilation with crop models Part 2: Likelihood function and prior distribution David Makowski and.
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Input for the Bayesian Phylogenetic Workflow All Input values could be loaded as text file or typing directly. Only for the multifasta file is advised.
Molecular phylogenetics
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Bayesian inference review Objective –estimate unknown parameter  based on observations y. Result is given by probability distribution. Bayesian inference.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
ATPase dataset -> nj in figtree. ATPase dataset -> muscle -> phyml (with ASRV)– re-rooted.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Molecular phylogenetics 4 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU.
Maximum Likelihood - "Frequentist" inference x 1,x 2,....,x n ~ iid N( ,  2 ) Joint pdf for the whole random sample Maximum likelihood estimates.
Phylogeny and Genome Biology Andrew Jackson Wellcome Trust Sanger Institute Changes: Type program name to start Always Cd to phyml directory before starting.
Molecular Systematics
ATPase dataset from last Friday Alignment clustal vs muscle Conserved part are aligned reproducibly.
ATPase dataset from last Friday Alignment clustal vs muscle Conserved part are aligned reproducibly.
Ben Stöver WS 2012/2013 Ancestral state reconstruction Molecular Phylogenetics – exercise.
Phylogenetics.
Bayes’ Theorem Reverend Thomas Bayes ( ) Posterior Probability represents the degree to which we believe a given model accurately describes the.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Bootstrap ? See herehere. Maximum Likelihood and Model Choice The maximum Likelihood Ratio Test (LRT) allows to compare two nested models given a dataset.Likelihood.
Bayesian statistics named after the Reverend Mr Bayes based on the concept that you can estimate the statistical properties of a system after measuting.
Building Phylogenies Maximum Likelihood. Methods Distance-based Parsimony Maximum likelihood.
Introns early Self splicing RNA are an example for catalytic RNA that could have been present in RNA world. There is little reason to assume that the RNA.
Phylogenetic reconstruction - How Distance analyses calculate pairwise distances (different distance measures, correction for multiple hits, correction.
First & Last Name August X, 2000 Evolution
Introduction to Bioinformatics Resources for DNA Barcoding
Lecture Slides Essentials of Statistics 5th Edition
Probability Theory and Parameter Estimation I
Bayesian inference Presented by Amir Hadadi
Endeavour to reconstruct the characters of each hypothetical ancestor.
Patterns in Evolution I. Phylogenetic
Exercises: Write a script that determines the number of elements in = keys(%ash); #assigns keys to an array $number # determines number.
Why could a gene tree be different from the species tree?
Summary and Recommendations
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
MCB 5472 Intro to Trees Peter Gogarten Office: BSP 404
Volume 16, Issue 18, Pages (September 2006)
DN/dS.
Phylogenetics Chapter 26.
Summary and Recommendations
Presentation transcript:

Reverend Thomas Bayes (1702-1761) Bayes’ Theorem Likelihood describes how well the model predicts the data P(model|data, I) = P(model, I) P(data|model, I) P(data,I) Posterior Probability represents the degree to which we believe a given model accurately describes the situation given the available data and all of our prior information I Prior Probability describes the degree to which we believe the model accurately describes reality based on all of our prior information. Normalizing constant Reverend Thomas Bayes (1702-1761)

Alternative Approaches to Estimate Posterior Probabilities Bayesian Posterior Probability Mapping with MrBayes (Huelsenbeck and Ronquist, 2001) Problem: Strimmer’s formula pi= Li L1+L2+L3 only considers 3 trees (those that maximize the likelihood for the three topologies) Solution: Exploration of the tree space by sampling trees using a biased random walk (Implemented in MrBayes program) Trees with higher likelihoods will be sampled more often pi Ni Ntotal ,where Ni - number of sampled trees of topology i, i=1,2,3 Ntotal – total number of sampled trees (has to be large)

Illustration of a biased random walk Image generated with Paul Lewis's MCRobot Figures generated using MCRobot program (Paul Lewis, 2001)

One needs to remove the burnin that is created, when the robot initially runs around in parameter space without seeing the probability landscape.

One complication is that the robot might be stuck on a local optimum.

One complication is that the robot might be stuck on a local optimum One complication is that the robot might be stuck on a local optimum. (Same as last slide but in sideview).

One solution is to run multiple chains, some of which “heated”, so that the probability landscape is melted down, and the robots move away more easily away from the optimum. The different chains compare their sampled probability, and switch in case the heated chain is better

4 chains, same as previous, but many more generations, and the trajectory is not plotted, only the points visited by each of the four chains.

4 chains, same as previous, but even more generations

Why could a gene tree be different from the species tree? Lack of resolution Lineage sorting Gene duplications/gene loss (paralogs/orthologs) Gene transfer Systematic artifacts (e.g., compositional bias and long branch attraction)

Trees – what might they mean? Calculating a tree is comparatively easy, figuring out what it might mean is much more difficult. If this is the probable organismal tree: species A species B species C species D what could be the reason for obtaining this gene tree: seq. from A seq. from D seq. from C seq. from B

lack of resolution seq. from A seq. from D seq. from C seq. from B e.g., 60% bootstrap support for bipartition (AD)(CB) seq. from B

long branch attraction artifact the two longest branches join together seq. from A seq. from D seq. from C e.g., 100% bootstrap support for bipartition (AD)(CB) seq. from B What could you do to investigate if this is a possible explanation? use only slow positions, use an algorithm that corrects for ASRV

Gene transfer Organismal tree: species A species B Gene Transfer species C species D seq. from B seq. from A seq. from C seq. from D molecular tree: speciation gene transfer

Lineage Sorting Organismal tree: species A species B species C Genes diverge and coexist in the organismal lineage species D seq. from B seq. from A seq. from C seq. from D molecular tree:

Gene duplication Organismal tree: species A species B species C species D molecular tree: seq. from D seq. from A seq.’ from B seq.’ from C gene duplication molecular tree: molecular tree: seq. from D seq. from A seq. from C seq. from B seq.’ from D seq.’ from C seq.’ from B gene duplication seq. from D seq. from A seq. from C seq. from B seq.’ from D seq.’ from C seq.’ from B gene duplication

Gene duplication and gene transfer are equivalent explanations. The more relatives of C are found that do not have the blue type of gene, the less likely is the duplication loss scenario Ancient duplication followed by gene loss Horizontal or lateral Gene Note that scenario B involves many more individual events than A 1 HGT with orthologous replacement 1 gene duplication followed by 4 independent gene loss events

Function, ortho- and paralogy molecular tree: seq. from A seq.’ from B seq.’ from C seq.’ from D gene duplication seq. from B seq. from C seq. from D The presence of the duplication is a taxonomic character (shared derived character in species B C D). The phylogeny suggests that seq’ and seq have similar function, and that this function was important in the evolution of the clade BCD. seq’ in B and seq’in C and D are orthologs and probably have the same function, whereas seq and seq’ in BCD probably have different function (the difference might be in subfunctionalization of functions that seq had in A. – e.g. organ specific expression)