Summary and Recommendations. Avoid the “Black Box” Researchers invest considerable resources in producing molecular sequence dataResearchers invest considerable.

Slides:



Advertisements
Similar presentations
A Separate Analysis Approach to the Reconstruction of Phylogenetic Networks Luay Nakhleh Department of Computer Sciences UT Austin.
Advertisements

Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
An Introduction to Phylogenetic Methods
Introduction to Phylogenies
Wellcome Trust Workshop Working with Pathogen Genomes Module 6 Phylogeny.
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Classification of Living Things. 2 Taxonomy: Distinguishing Species Distinguishing species on the basis of structure can be difficult  Members of the.
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
Molecular Evolution Revised 29/12/06
Current Approaches to Whole Genome Phylogenetic Analysis Hongli Li.
BIOE 109 Summer 2009 Lecture 4- Part II Phylogenetic Inference.
Review of cladistic technique Shared derived (apomorphic) traits are useful in understanding evolutionary relationships Shared primitive (plesiomorphic)
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Maximum Likelihood. Historically the newest method. Popularized by Joseph Felsenstein, Seattle, Washington. Its slow uptake by the scientific community.
Some basics: Homology = refers to a structure, behavior, or other character of two taxa that is derived from the same or equivalent feature of a common.
Distance Methods. Distance Estimates attempt to estimate the mean number of changes per site since 2 species (sequences) split from each other Simply.
Molecular Evolution with an emphasis on substitution rates Gavin JD Smith State Key Laboratory of Emerging Infectious Diseases & Department of Microbiology.
Dispersal models Continuous populations Isolation-by-distance Discrete populations Stepping-stone Island model.
Tree Evaluation Tree Evaluation. Tree Evaluation A question often asked of a data set is whether it contains ‘significant cladistic structure’, that is.
Molecular Evolution, Part 2 Everything you didn’t want to know… and more! Everything you didn’t want to know… and more!
Probabilistic methods for phylogenetic trees (Part 2)
Lecture 13 – Performance of Methods Folks often use the term “reliability” without a very clear definition of what it is. Methods of assessing performance.
Phylogenetic Analysis. 2 Phylogenetic Analysis Overview Insight into evolutionary relationships Inferring or estimating these evolutionary relationships.
Processing & Testing Phylogenetic Trees. Rooting.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Characterizing the Phylogenetic Tree-Search Problem Daniel Money And Simon Whelan ~Anusha Sura.
Molecular phylogenetics
The Evolutionary History of Biodiversity
Classification and Systematics Tracing phylogeny is one of the main goals of systematics, the study of biological diversity in an evolutionary context.
COMPUTATIONAL MODELS FOR PHYLOGENETIC ANALYSIS K. R. PARDASANI DEPTT OF APPLIED MATHEMATICS MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY (MANIT) BHOPAL.
Phylogenetic trees School B&I TCD Bioinformatics May 2010.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
Tree Confidence Have we got the true tree? Use known phylogenies Unfortunately, very rare Hillis et al. (1992) created experimental phylogenies using phage.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Molecular phylogenetics 4 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Evolutionary Biology Concepts Molecular Evolution Phylogenetic Inference BIO520 BioinformaticsJim Lund Reading: Ch7.
Introduction to Phylogenetics
Calculating branch lengths from distances. ABC A B C----- a b c.
It’s not easy being (photosynthetic) green…. The origin and diversification of Flowering Plants om
Identifying and Modeling Selection Pressure (a review of three papers) Rose Hoberman BioLM seminar Feb 9, 2004.
Phylogeny and Genome Biology Andrew Jackson Wellcome Trust Sanger Institute Changes: Type program name to start Always Cd to phyml directory before starting.
Speaker: Bin-Shenq Ho Dec. 19, 2011
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Why do trees?. Phylogeny 101 OTUsoperational taxonomic units: species, populations, individuals Nodes internal (often ancestors) Nodes external (terminal,
Processing & Testing Phylogenetic Trees. Rooting.
Advanced Methods in Reconstructing Phylogenetic Relationships 2008 EMBO World Practical Course: March 3rd to 9th, 2008, Botanical Garden, Rio de Janeiro.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Bootstrap ? See herehere. Maximum Likelihood and Model Choice The maximum Likelihood Ratio Test (LRT) allows to compare two nested models given a dataset.Likelihood.
The Big Issues in Phylogenetic Reconstruction Randy Linder Integrative Biology, University of Texas
Application of Phylogenetic Networks in Evolutionary Studies Daniel H. Huson and David Bryant Presented by Peggy Wang.
Molecular Evolution. Study of how genes and proteins evolve and how are organisms related based on their DNA sequence Molecular evolution therefore is.
Section 2: Modern Systematics
Introduction to Bioinformatics Resources for DNA Barcoding
Maximum likelihood (ML) method
Pipelines for Computational Analysis (Bioinformatics)
Section 2: Modern Systematics
Phylogenetic Inference
Molecular Clocks Rose Hoberman.
Molecular Evolution.
Why could a gene tree be different from the species tree?
Summary and Recommendations
Phylogeny and Systematics
Reverend Thomas Bayes ( )
Volume 9, Issue 9, Pages (September 2016)
Molecular data assisted morphological analyses
Lecture 11 – Increasing Model Complexity
Summary and Recommendations
Volume 13, Issue 2, Pages (January 2003)
Presentation transcript:

Summary and Recommendations

Avoid the “Black Box” Researchers invest considerable resources in producing molecular sequence dataResearchers invest considerable resources in producing molecular sequence data They should also be ready to invest the time and effort needed to get the most out of their dataThey should also be ready to invest the time and effort needed to get the most out of their data Modern phylogenetic software makes it easy to align and produce trees from sequence data but phylogenetic inference should not be treated as a “black box”Modern phylogenetic software makes it easy to align and produce trees from sequence data but phylogenetic inference should not be treated as a “black box”

Choices are Unavoidable There are many different phylogenetic methodsThere are many different phylogenetic methods Thus the investigator is confronted with unavoidable choicesThus the investigator is confronted with unavoidable choices Not all methods are equally good for all dataNot all methods are equally good for all data Although we need not understand all the details of the various phylogenetic methods, an understanding of the basic properties is essential for informed choice of method and interpretation of resultsAlthough we need not understand all the details of the various phylogenetic methods, an understanding of the basic properties is essential for informed choice of method and interpretation of results

Data are not Perfect Most data includes misleading evidence of relationships and we need to have a cautious attitude to the quality of data and treesMost data includes misleading evidence of relationships and we need to have a cautious attitude to the quality of data and trees Data can be subject to both systematic biases and noise that affect our chances of getting the correct treeData can be subject to both systematic biases and noise that affect our chances of getting the correct tree For example:For example: Saturation (noise) Alignment artefacts Base compositional biases (e.g. thermophilic convergence) Branch length or rate asymmetries leading to long branch attractions Different methods may be more or less sensitive to some of these problemsDifferent methods may be more or less sensitive to some of these problems

Alignment - Homology The data determines the resultsThe data determines the results The alignment determines the data (hypotheses of homology)The alignment determines the data (hypotheses of homology) Be aware of potential alignment artefactsBe aware of potential alignment artefacts If using multiple alignment software, explore the sensitivity of the alignment to variations in the parameters usedIf using multiple alignment software, explore the sensitivity of the alignment to variations in the parameters used Eliminate regions that cannot be aligned with confidenceEliminate regions that cannot be aligned with confidence

Models Simple models (in ML and distance analyses) often perform poorly because the data does not fit the modelSimple models (in ML and distance analyses) often perform poorly because the data does not fit the model Explore the data for potential biases and deviations from the assumptions of the modelExplore the data for potential biases and deviations from the assumptions of the model Be prepared to use more complex models that better approximate the evolution of the sequences and therefore might be expected to give more accurate resultsBe prepared to use more complex models that better approximate the evolution of the sequences and therefore might be expected to give more accurate results

Choice of Models More complex models require the estimation of more parameters each of which is subject to some errorMore complex models require the estimation of more parameters each of which is subject to some error Thus there is a trade-off between more realistic and complex models and their power to discriminate between alternative hypothesesThus there is a trade-off between more realistic and complex models and their power to discriminate between alternative hypotheses By comparing likelihoods of trees under different models we can determine if a more complex model gives a significantly better fit to the dataBy comparing likelihoods of trees under different models we can determine if a more complex model gives a significantly better fit to the data

Choice of Method Not all methods deal with all known problemsNot all methods deal with all known problems LogDet is useful when there are strong base compositional biases but does not deal with rate heterogeneity (need to remove invariant sites)LogDet is useful when there are strong base compositional biases but does not deal with rate heterogeneity (need to remove invariant sites) ML with gamma distribution is useful when there are strong rate heterogeneities across sitesML with gamma distribution is useful when there are strong rate heterogeneities across sites Gamma shape and proportions of invariant sites can be estimated from the dataGamma shape and proportions of invariant sites can be estimated from the data

An Experimental Science Phylogenetics differs from many sciences in its historical focusPhylogenetics differs from many sciences in its historical focus The classical experimental method is not applicableThe classical experimental method is not applicable However, we can perform experiments in the analysis of dataHowever, we can perform experiments in the analysis of data Experiments (multiple analyses) help us to understand the behaviour of the dataExperiments (multiple analyses) help us to understand the behaviour of the data The only cost is the time invested!The only cost is the time invested!

Some Additional Experiments Vary the included taxaVary the included taxa You may be able to minimise the effects of biases by appropriate taxon sampling to break long branches or reduce base compositional biases by introducing intermediate taxa Vary the characters includedVary the characters included You may be able to improve the fit of data to a model by removing the fastest evolving sites or the slowest evolving sites

Is the data any good? Explore the data for phylogenetic signal:Explore the data for phylogenetic signal: randomization tests will identify data that cannot be used to generate reliable phylogenetic inferences randomization tests will identify data that cannot be used to generate reliable phylogenetic inferences Be ready to explore data partitions or ways of treating the dataBe ready to explore data partitions or ways of treating the data for example in protein coding genes, systematic biases or noise may differentially effect 3rd positions in codons and might be avoided by excluding this data or by translating DNA sequences and analysing amino acid sequences for example in protein coding genes, systematic biases or noise may differentially effect 3rd positions in codons and might be avoided by excluding this data or by translating DNA sequences and analysing amino acid sequences

Measure support for groups Evaluate relationships shown in trees with bootstrap or other resampling techniquesEvaluate relationships shown in trees with bootstrap or other resampling techniques Appreciate that such measures may be misleading if the data is misleading (particularly if subject to systematic biases)Appreciate that such measures may be misleading if the data is misleading (particularly if subject to systematic biases) Explore the sensitivity of these results to methods of analyses - disagreements should limit confidence in results unless they can be explained as a result of undesirable properties of methods/characteristics of the dataExplore the sensitivity of these results to methods of analyses - disagreements should limit confidence in results unless they can be explained as a result of undesirable properties of methods/characteristics of the data

Hypothesis testing Alternative evolutionary hypotheses may be supported by alternative phylogenetic treesAlternative evolutionary hypotheses may be supported by alternative phylogenetic trees We can test alternative hypotheses by determining if any of the alternative trees are significantly better explanations of the dataWe can test alternative hypotheses by determining if any of the alternative trees are significantly better explanations of the data Use constrained analyses to find alternative treesUse constrained analyses to find alternative trees Use KH (a priori) or AU tests (a posteriori) to evaluate alternative treesUse KH (a priori) or AU tests (a posteriori) to evaluate alternative trees

Gene trees and species trees Remember that molecular systematics yields gene treesRemember that molecular systematics yields gene trees Accurate gene trees may not be accurate organismal treesAccurate gene trees may not be accurate organismal trees Gene duplications and paralogy, lateral transfer, and lineage sorting of plastid genomes can produce mismatches between gene and organismal phylogeniesGene duplications and paralogy, lateral transfer, and lineage sorting of plastid genomes can produce mismatches between gene and organismal phylogenies Use congruence between separate gene trees to identify robust organismal phylogenies or mismatches that require further informationUse congruence between separate gene trees to identify robust organismal phylogenies or mismatches that require further information

Thank You! (and EMBO!) ………..and don’t panic!!