. Phylogenetic Trees Lecture 13 This class consists of parts of Prof Joe Felsenstein’s lectures 4 and 5 taken from:

Slides:



Advertisements
Similar presentations
. Markov Chains. 2 Dependencies along the genome In previous classes we assumed every letter in a sequence is sampled randomly from some distribution.
Advertisements

Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Parsimony based phylogenetic trees Sushmita Roy BMI/CS 576 Sep 30 th, 2014.
Phylogenetic Trees Lecture 4
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Markov Chains Lecture #5
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
. Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken.
. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Maximum Likelihood. Historically the newest method. Popularized by Joseph Felsenstein, Seattle, Washington. Its slow uptake by the scientific community.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Lecture 16: Wrap-Up COMP 538 Introduction of Bayesian networks.
Branch lengths Branch lengths (3 characters): A C A A C C A A C A C C Sum of branch lengths = total number of changes.
Molecular Evolution with an emphasis on substitution rates Gavin JD Smith State Key Laboratory of Emerging Infectious Diseases & Department of Microbiology.
. Maximum Likelihood (ML) Parameter Estimation with applications to reconstructing phylogenetic trees Comput. Genomics, lecture 6b Presentation taken from.
1 Markov Chains Algorithms in Computational Biology Spring 2006 Slides were edited by Itai Sharon from Dan Geiger and Ydo Wexler.
Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.
. Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.
CISC667, F05, Lec16, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (III) Probabilistic methods.
Probabilistic methods for phylogenetic trees (Part 2)
Phylogeny Tree Reconstruction
Perfect Phylogeny MLE for Phylogeny Lecture 14
Terminology of phylogenetic trees
Why Models of Sequence Evolution Matter Number of differences between each pair of taxa vs. genetic distance between those two taxa. The x-axis is a proxy.
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Lecture 3: Markov models of sequence evolution Alexei Drummond.
Tree Inference Methods
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
Phylogenetic trees School B&I TCD Bioinformatics May 2010.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Lecture 2: Principles of Phylogenetics
Introduction to Phylogenetics
MOLECULAR PHYLOGENETICS Four main families of molecular phylogenetic methods :  Parsimony  Distance methods  Maximum likelihood methods  Bayesian methods.
Calculating branch lengths from distances. ABC A B C----- a b c.
Lecture 10 – Models of DNA Sequence Evolution Correct for multiple substitutions in calculating pairwise genetic distances. Derive transformation probabilities.
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
Rooting Phylogenetic Trees with Non-reversible Substitution Models Von Bing Yap* and Terry Speed § *Statistics and Applied Probability, National University.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Why do trees?. Phylogeny 101 OTUsoperational taxonomic units: species, populations, individuals Nodes internal (often ancestors) Nodes external (terminal,
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.
Phylogeny Ch. 7 & 8.
Phylogenetic Trees - Parsimony Tutorial #13
MODELLING EVOLUTION TERESA NEEMAN STATISTICAL CONSULTING UNIT ANU.
Measuring genetic change Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Section 5.2.
1 CAP5510 – Bioinformatics Phylogeny Tamer Kahveci CISE Department University of Florida.
Lecture 15: Reconstruction of Phylogeny Adaptive characters: 1.May indicate derived character (special adaptation) e.g. Raptorial forelegs in mantids 2.May.
Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.
Probabilistic methods for phylogenetic tree reconstruction BMI/CS 576 Colin Dewey Fall 2015.
Probabilistic Approaches to Phylogenies BMI/CS 576 Sushmita Roy Oct 2 nd, 2014.
. Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.
Phylogenetic basis of systematics
Models for DNA substitution
Lecture 6B – Optimality Criteria: ML & ME
Inferring a phylogeny is an estimation procedure.
Maximum likelihood (ML) method
Models of Sequence Evolution
Goals of Phylogenetic Analysis
Recitation 5 2/4/09 ML in Phylogeny
CS 581 Tandy Warnow.
Why Models of Sequence Evolution Matter
Lecture 6B – Optimality Criteria: ML & ME
BNFO 602 Phylogenetics – maximum likelihood
BNFO 602 Phylogenetics Usman Roshan.
The Most General Markov Substitution Model on an Unrooted Tree
Presentation transcript:

. Phylogenetic Trees Lecture 13 This class consists of parts of Prof Joe Felsenstein’s lectures 4 and 5 taken from: and on Chapter 8.2 of Durbin et al. Edited by Dan Geiger. Background reading: Durbin et al Chapter 8. NOTE: THE PDF FORMAT INCLUDES MORE SLIDES

2 Three Methods of Tree Construction u Distance- A tree that recursively combines two nodes of the smallest distance. u Parsimony – A tree with a total minimum number of character changes between nodes. u Maximum likelihood - Finding the best Bayesian network of a tree shape. The method of choice nowadays. Most known and useful software called phylip uses this method.

3 Maximum Likelihood Approach Consider the phylogenetic tree to be a stochastic process. AGA GGA AAA AAG AAA AGA AAA The probability of transition from character a to character b is given by parameters  b|a. The probability of letter a in the root is q a (written  a in Felsenstein’s slides ). These parameters are defined via rates of change per time unit times the time unit. Given the complete tree, the probability of data is defined by the values of the  b|a ‘s and the q a ’s. Observed Unobserved

4 Maximum Likelihood Approach Assume each site evolves independently of the others. A G A A Write down the likelihood of the data (leaves sequences) given each tree. Use EM to estimate the  b|a parameters. When the tree is not given: Search for the tree that maximizes Pr(D|Tree,  EM )=  i Pr(D (i) |Tree,  EM ) G G A A A A A G Pr(D|Tree,  )=  i Pr(D (i) |Tree,  )

5 The Jukes-Cantor model (1969) We need to develop a formula for DNA evolution via Pr(y|x,t) where x and y are taken from {A,C,G,T} and t is the time length. Jukes-Cantor assume equal rate of change: GA TC    -3 

6 The Jukes-Cantor model (Cont) We denote by S(t) the transition probabilities: We assume the matrix is multiplicative in the sense that: S(t+s) = S(t) S(s) for any time lengths s or t.

7 The Jukes-Cantor model (Cont) For a short time period , we write: By multiplicatively: S(t+  ) = S(t) S(  )  S(t)(I+R  ) Hence: [ S(t+  ) - S(t)] /   S(t)R Leading to the linear differential equation: S`(t)  S(t)R With the additional condition that in the limit as t goes to infinity:

8 The Jukes-Cantor model (Cont) Substituting S(t) into the differential equation yields: Yielding the unique solution which is known as the Jukes-Cantor model:

9 Kimura’s K2P model (1980) Jukes-Cantor model does not take into account that transitions rates (between purines) A  G and (between pyrmidine) C  T are different from transversions rates of A  C, A  T, C  G, G  T. Kimura used a different rate matrix:

10 Kimura’s K2P model (Cont) Leading using similar methods to: Where:

11 Hasegawa, Kishino & Yano model (1985) Still the equilibrium probabilities are all ¼ in Kimura’s model, despite the facts that in many organisms show strong bias in their AT to CG ratio. HKY’s model takes care of this. Also Felsenstein’s model F84 takes care of this problem. There are other models as well, the most general of which is a matrix where all rates of change are distinct (12 parameters). The following chart shows relationships among most used models.