Models of sequence evolution GTR HKY Jukes-Cantor Felsenstein K2P Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic.

Slides:



Advertisements
Similar presentations
Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Advertisements

Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
An Introduction to Phylogenetic Methods
Based on lectures by C-B Stewart, and by Tal Pupko Phylogenetic Analysis based on two talks, by Caro-Beth Stewart, Ph.D. Department of Biological Sciences.
Phylogenetic Analysis
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.
Phylogenetic reconstruction
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.
BIOE 109 Summer 2009 Lecture 4- Part II Phylogenetic Inference.
UPGMA and FM are distance based methods. UPGMA enforces the Molecular Clock Assumption. FM (Fitch-Margoliash) relieves that restriction, but still enforces.
Heuristic alignment algorithms and cost matrices
Bioinformatics Algorithms and Data Structures
Distance Methods. Distance Estimates attempt to estimate the mean number of changes per site since 2 species (sequences) split from each other Simply.
Phylogenetic reconstruction
Multiple Sequence Alignments
Tree-Building. Methods in Tree Building Phylogenetic trees can be constructed by: clustering method optimality method.
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Phylogenetic analyses Kirsi Kostamo. The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among.
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Terminology of phylogenetic trees
BINF6201/8201 Molecular phylogenetic methods
Molecular phylogenetics
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
Lecture 3: Markov models of sequence evolution Alexei Drummond.
Tree Inference Methods
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
COMPUTATIONAL MODELS FOR PHYLOGENETIC ANALYSIS K. R. PARDASANI DEPTT OF APPLIED MATHEMATICS MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY (MANIT) BHOPAL.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Computational Biology, Part D Phylogenetic Trees Ramamoorthi Ravi/Robert F. Murphy Copyright  2000, All rights reserved.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
BINF6201/8201 Molecular phylogenetic methods
Phylogenetics and Coalescence Lab 9 October 24, 2012.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Applied Bioinformatics Week 8 Jens Allmer. Practice I.
Phylogenetic Prediction Lecture II by Clarke S. Arnold March 19, 2002.
Phylogenetic Trees  Importance of phylogenetic trees  What is the phylogenetic analysis  Example of cladistics  Assumptions in cladistics  Frequently.
A brief introduction to phylogenetics
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Lecture 2: Principles of Phylogenetics
Introduction to Phylogenetics
Comp. Genomics Recitation 3 The statistics of database searching.
Construction of Substitution Matrices
Calculating branch lengths from distances. ABC A B C----- a b c.
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
Phylogenetic Analysis Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics Figures from Higgs & Attwood.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Maximum Likelihood Given competing explanations for a particular observation, which explanation should we choose? Maximum likelihood methodologies suggest.
Phylogeny Ch. 7 & 8.
Applied Bioinformatics Week 8 Jens Allmer. Theory I.
Construction of Substitution matrices
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Statistical stuff: models, methods, and performance issues CS 394C September 3, 2009.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
Introduction to Bioinformatics Resources for DNA Barcoding
Phylogenetic basis of systematics
Inferring a phylogeny is an estimation procedure.
Phylogenetic Inference
Multiple Alignment and Phylogenetic Trees
Goals of Phylogenetic Analysis
Summary and Recommendations
Lecture 7 – Algorithmic Approaches
Summary and Recommendations
But what if there is a large amount of homoplasy in the data?
Presentation transcript:

Models of sequence evolution GTR HKY Jukes-Cantor Felsenstein K2P Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages

More models of sequence evolution … Currently, there are more than 60 models described - plus gamma distribution and invariable sites - accuracy of models rapidly decreases for highly divergent sequences - problem: more complicated models tend to be less accurate (and slower) How to pick an appropriate model? - use a maximum likelihood ratio test - implemented in Modeltest 3.06 (Posada & Crandall, 1998) Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages

More models of sequence evolution … Example for Modeltest file JC = F81 = K80 = HKY = TrNef = TrN = K81 = K81uf = TIMef = TIM = TVMef = TVM = SYM = GTR = A Equal base frequencies Null model = JC -lnL0 = Alternative model = F81 -lnL1 = (lnL1-lnL0) = df = 3 P-value = < B Model selected: TVM+G -lnL = C Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages

More models of sequence evolution … Amino acid sequences - infinitely more complicated than nucleotide sequences - some amino acids can replace one another with relatively little effect on the structure and function of the final protein while other replacements can be functionally devastating - from the standpoint of the genetic code, some amino acid changes can be made by a single DNA mutation while others require two or even three changes in the DNA sequence - in practice, what has been done is to calculate tables of frequencies of all amino acid replacements within families of related protein sequences in the databanks: i.e. PAM and BLOSSUM Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages

Phylogenetic Inference II Before describing any theoretical or practical aspects of phylogenetics, it is necessary to give some disclaimers. This area of computational biology is an intellectual minefield! Neither the theory nor the practical applications of any algorithms are universally accepted throughout the scientific community. The application of different software packages to a data set is very likely to give different answers; minor changes to a data set are also likely to profoundly change the result. Disclaimers Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages

CS 177 Phylogenetics II Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic software packages Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages

 helix  sheet Are there Correct trees?? Phylogenetic Inference II Despite all of all problems, it is actually quite simple to use computer programs calculate phylogenetic trees for data sets Provided the data are clean, outgroups are correctly specified, appropriate algorithms are chosen, no assumptions are violated, etc., can the true, correct tree be found and proven to be scientifically valid? Unfortunately, it is impossible to ever conclusively state what is the "true" tree for a group of sequences (or a group of organisms); taxonomy is constantly under revision as new data is gathered Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages

Phenetic methods construct trees (phenograms) by considering the current states of characters without regard to the evolutionary history that brought the species to their current phenotypes; phenograms are based on overall similarity Cladistic methods construct trees (cladograms) rely on assumptions about ancestral relationships as well as on current data; cladograms are based on character evolution (e.g. shared derived characters) Phenetics versus cladistics Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages

Tree building methods Data type: genetic distance / character-state Computational method: optimality criterion/clustering algorithmen Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages

Tree building (distance based) UPGMA - The simplest of the distance methods is the UPGMA (Unweighted Pair Group Method using Arithmetic averages) - Many multiple alignment programs such as PILEUP use a variant of UPGMA to create a dendrogram of DNA sequences which is then used to guide the multiple alignment algorithm Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages

UPGMA ABCDEFG A- B63- C9479- D E F G Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages

UPGMA ABCDEFG A- B63- C9479- D E F G Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages

ABCEFDG A- B63- C9479- E F DG UPGMA Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages

ABEFCDG A- B63- E6716- F CDG UPGMA Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages

UPGMA AFBECDG AF- B98- E CDG Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages

UPGMA AFBECDG AF- BE188- CDG Root Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages

Maximum Parsimony (MP) Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages - Parsimony involves evaluating all possible trees for each vertical column of sequence character (nucleotide position) - only informative sites are considered - each tree is given a score based on the number of evolutionary changes that are needed to explain the observed data - finally, those trees that produce the smallest number of changes (shortest trees) overall for all sequence positions are identified

Maximum Likelihood (ML) Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages - Maximum Likelihood uses probability calculations based on a specific model of sequence evolution to find a tree that best accounts for the variation in a set of sequences - all possible trees for each nucleotide position are considered - the less mutations needed to fit a tree to the data, the more likely the tree - ML resembles MP in that the tree with the least number of changes will be most likely - however, ML evaluates trees using explicit evolutionary models - thus, the method can be used to explore relationships among more diverse taxa

Computational methods for finding optimal trees Possible evolutionary trees 2,027, , , unrooted (2n-5)!/(2n-3(n-3)!) Taxa (n) x Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages

Computational methods for finding optimal trees Exact algorithms - “Guarantee” to find the optimal or “best” tree for the method of choice - Two types used in tree building: Exhaustive search: Evaluates all possible unrooted trees, choosing the one with the best score for the method Branch-and-bound search: Eliminates part of the tree that only contain suboptimal solutions Heuristic algorithms - Approximate or “quick-and-dirty” methods that attempt to find the optimal tree for the method of choice, but cannot guarantee to do so - Often operate by “hill-climbing” methods Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages

Heuristic algorithms Search for global minimum GLOBAL MAXIMUM GLOBAL MINIMUM local minimum local maximum Search for global maximum Heuristic search algorithms are input order dependent and can get stuck in local minima or maxima GLOBAL MAXIMUM GLOBAL MINIMUM Rerunning heuristic searches using different input orders of taxa can help find global minima or maxima From NHGRI lecture, C.-B. Stewart Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages

Assessing Phylogenetic Data Most data includes potentially misleading evidence of relationships One should not only construct phylogenetic hypotheses but should also assess what ‘confidence’ can be placed in these hypotheses How much support is there for a particular clade? Question: Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages

Assessing Phylogenetic Data How much support is there for a particular clade? Bootstrapping/Jack-knifing: Lots of randomized data sets are produced by sampling the real data with replacement (or in jackknifing, by removing some random proportion of the data); Frequencies of occurrence of groups are a measure of support for those groups - Bootstrap proportions aren’t easily interpretable - no indication for how good the data are but simply for how well the tree fits the data Problems: Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages

Review available at: Popular phylogenetic software packages Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages