Darwin’s Tree of Life, July 1837 https://tree.opentreeoflife.org.https://tree.opentreeoflife.org 2.3 million species Phylogenetic inference from genomic.

Slides:



Advertisements
Similar presentations
Ortholog vs. paralog? 1. Collect Sequence Data Good Dataset
Advertisements

Wellcome Trust Workshop Working with Pathogen Genomes Module 6 Phylogeny.
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Phylogenetic reconstruction
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
Molecular Evolution Revised 29/12/06
Current Approaches to Whole Genome Phylogenetic Analysis Hongli Li.
Review of cladistic technique Shared derived (apomorphic) traits are useful in understanding evolutionary relationships Shared primitive (plesiomorphic)
Bas E. Dutilh Phylogenomics Using complete genomes to determine the phylogeny of species.
Probabilistic methods for phylogenetic trees (Part 2)
Lecture 13 – Performance of Methods Folks often use the term “reliability” without a very clear definition of what it is. Methods of assessing performance.
Chapter 2 Opener How do we classify organisms?. Figure 2.1 Tracing the path of evolution to Homo sapiens from the universal ancestor of all life.
Classification and Phylogenies Taxonomic categories and taxa Inferring phylogenies –The similarity vs. shared derived character states –Homoplasy –Maximum.
Topic : Phylogenetic Reconstruction I. Systematics = Science of biological diversity. Systematics uses taxonomy to reflect phylogeny (evolutionary history).
What Is Phylogeny? The evolutionary history of a group.
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Terminology of phylogenetic trees
Molecular phylogenetics
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
BINF6201/8201 Molecular phylogenetic methods
Phylogenetics and Coalescence Lab 9 October 24, 2012.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Molecular phylogenetics 4 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Introduction to Phylogenetics
Calculating branch lengths from distances. ABC A B C----- a b c.
Molecular Phylogeny. 2 Phylogeny is the inference of evolutionary relationships. Traditionally, phylogeny relied on the comparison of morphological features.
Phylogeny and Genome Biology Andrew Jackson Wellcome Trust Sanger Institute Changes: Type program name to start Always Cd to phyml directory before starting.
Phylogenomics “The intersection of phylogenetics and genomics”
Lecture 17: Phylogenetics and Phylogeography
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Why do trees?. Phylogeny 101 OTUsoperational taxonomic units: species, populations, individuals Nodes internal (often ancestors) Nodes external (terminal,
Phylogeny Ch. 7 & 8.
Phylogenetic Trees - Parsimony Tutorial #13
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
The Big Issues in Phylogenetic Reconstruction Randy Linder Integrative Biology, University of Texas
CS 395T: Computational phylogenetics January 18, 2006 Tandy Warnow.
Phylogenetic trees. 2 Phylogeny is the inference of evolutionary relationships. Traditionally, phylogeny relied on the comparison of morphological features.
Lecture 19 – Species Tree Estimation
From: On the Origin of Darwin's Finches
Phylogeny and the Tree of Life
Phylogeny and the Tree of Life
Phylogenetic genome analysis, phylogenomics
Bioinformatics Overview
Evolutionary genomics can now be applied beyond ‘model’ organisms
Phylogenetic basis of systematics
Distance based phylogenetics
Inferring a phylogeny is an estimation procedure.
Phylogenetic Inference
Multiple Alignment and Phylogenetic Trees
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Goals of Phylogenetic Analysis
Methods of molecular phylogeny
Biological Classification: The science of taxonomy
Patterns in Evolution I. Phylogenetic
Chapter 26 Phylogeny and the Tree of Life
Summary and Recommendations
Parsimony is Computationally Intensive
Chapter 25 Phylogeny and the Tree of Life
Chapter 19 Molecular Phylogenetics
Phylogeny.
Chapter 26- Phylogeny and Systematics
Molecular data assisted morphological analyses
Summary and Recommendations
But what if there is a large amount of homoplasy in the data?
Phylogeny and the Tree of Life
Presentation transcript:

Darwin’s Tree of Life, July million species Phylogenetic inference from genomic data

Historical context Physical traits Simple genetic markers AFLP, RFLP A few SNPs Single gene sequences (or part of a gene) Multiple genes (from a few to many) Many SNPs (10s of thousands to millions) Whole mitochondrial genomes Concatenated to form a single string composed of multiple genes or SNPs Earlier Later Complexity has increased as computing capability has increased and sequencing costs have dropped

How do you construct a phylogeny? Parsimony The tree with the fewest changes is the best Several trees can be the same length (equally parsimonious) The number of trees you have to investigate grows rapidly as you include more sequences Often have to do heuristic searches (many trees are never actually examined during analysis) This was the primary method used for homologous morphological traits

Using distance methods: Calculate pairwise distances between all sequences Use distance matrix to infer phylogeny Neighbor joining or minimum evolution These methods are fast and can be bootsrapped more easily than other methods (less computation required) How do you construct a phylogeny?

Likelihood methods: Based on probability Computationally intensive Utilize models of DNA evolution Describe the rate at which one nucleotide replaces another during evolution Models may have equal rates or different rates (transitions vs. transversions) Examples: JC69, GTR, HKY, etc. Maximum Likelihood: Picks the highest probability tree by using a specific evolutionary model and your data Bayesian methods Use posterior probabilities and find the tree that best fits your data How do you construct a phylogeny?

Methods of phylogenomic inference Supermatrix Build one tree from the concatenated genes Can be partitioned so that each gene utilizes a different model/rate Combined sequences will be different lengths because of genes that do not exist across taxa Supertree Build optimal tree for each traits each tree will not include all taxa Combine those trees into one supertree Figure from Delsuc, Brinkmann and Philippe 2005

Number of characters (i.e. genes) vs. number of species Many species, few genes If you want to sample the most species, you usually have to focus on a few genes Limited by budget and time constraints Few species, many genes Its easy to get more genes if you have the whole genome available As more genomes are sequenced, these will increase Sequence based methods The goal is many species, many genes but isn’t always realistic How will you deal with missing data? What if certain genes don’t exist in all taxa? Studies have shown that phylogenies can be very tolerant of missing data Is it better to include a species that only has part of the data you are looking at? Which strategy is most appropriate for your study? What consequences will choosing poorly have?

Potential pitfalls????? Incomplete taxa sampling Incomplete lineage sorting recombination across genomes horizontal gene transfer Example: Based on complete mitochondrial genome sequences “If one is interested in inferring the evolutionary history of life, a much broader sample of taxa (perhaps sequenced for far less than full genomes) will result in a much more accurate estimate of phylogeny than will complete genomes of only a small number of taxa.” quote from David Hillis et al.

Incomplete lineage sorting & gene flow

Multiple lines of evidence from genomic data Remember from Monday the example from Crocodilians 3 datasets generated from genomes: UCE’s Protein coding genes Transposable elements Morphological phylogeny results in a different tree Alligatoridae Crocodylidae Tomistoma schlegelli Gavialis gangeticus ~80 my, ~8 species ~20 my, ~13 species

Class Mammalia ProtheriaTheria Eutheria Metatheria The root of mammals????? Then comes the ‘bushy’ bit of the mammalian tree Mammalian classification in 1945 Three main divisions in mammals Based primarily on non-dental skeletal morphology MarsupialsPlacental Mammals

The root of mammals????? Jump to

The root of mammals????? Jump to 2004

The root of mammals????? retrotransposons- suggests that these diverged nearly simultaneously

Darwin’s Finches Evolution of Darwin’s finches and their beaks revealed by genome sequencing (Nature 2015)

Darwin’s Finches Key findings: based on about 45 million variable sites morphology ≠ genetics extensive interspecific gene flow some species are a result of hybridization 240 kb ALX1 gene is a transcription factor strongly associated with beak shape

Darwin’s Finches

Tree of life??? Nature 2004 Genome fusions and horizontal gene transfers make building a prokaryote/eukaryote tree difficult Using 34 prokaryote and eukaryote genomes Found evidence that eukaryotes resulted from a fusion of a photosynthetic prokaryote and another prokaryote

The Tree of Life Published 11 Apr 2016 Constructed with 16 ribosomal protein sequences More resolution than using 1 gene 3,081 genomes used 1,011 were new genomes Includes 1 representative per genus for all genera with high quality genomes available New genomes- uncultivable bacteria

Laura A. Hug, et al. (2016) A new view of the tree of life. Nature Microbiology. Delsuc F, H Brinkmann, and H Philippe (2005) Phylogenomics and the reconstruction of the tree of life. Nature reviews. SIMPSON, G. G. (1945) The principles of classification and a classification of mammals. Bull. Am. Museum Nat. Hist. 85: l Shoshani J (1986) Mammalian Phylogeny : comparison of morphological and molecular results. Mol. Biol. Evol. 3(3): Edwards, S, et al. (2016) Implementing and testing the multispecies coalescent model: a valuable paradigm for phylogenomics. Molecular phylogenetics and evolution. Papers and websites used for this presentation: