Advanced Methods in Reconstructing Phylogenetic Relationships 2010 Practical Course: March 8th to 13th, 2010, Rio de Janeiro.

Slides:



Advertisements
Similar presentations
Advanced Methods in Reconstructing Phylogenetic Relationships 2009 EMBO World Practical Course: March 16th to 22nd, 2009, Botanical Garden, Rio de Janeiro.
Advertisements

THE EVOLUTIONARY HISTORY OF BIODIVERSITY
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Tree of Life Chapter 26.
Classification of Living Things. 2 Taxonomy: Distinguishing Species Distinguishing species on the basis of structure can be difficult  Members of the.
Plant Molecular Systematics (Phylogenetics). Systematics classifies species based on similarity of traits and possible mechanisms of evolution, a change.
Phylogeny and Systematics
PHYLOGENY AND SYSTEMATICS
Classification systems have changed over time as information has increased. Section 2: Modern Classification K What I Know W What I Want to Find Out L.
Chapter 26 – Phylogeny & the Tree of Life
Phylogeny and Systematics By: Ashley Yamachika. Biologists use systematics They use systematics as an analytical approach to understanding the diversity.
Molecular Evolution Revised 29/12/06
BIOE 109 Summer 2009 Lecture 4- Part II Phylogenetic Inference.
Distance Methods. Distance Estimates attempt to estimate the mean number of changes per site since 2 species (sequences) split from each other Simply.
Summary and Recommendations. Avoid the “Black Box” Researchers invest considerable resources in producing molecular sequence dataResearchers invest considerable.
Chapter 2 Opener How do we classify organisms?. Figure 2.1 Tracing the path of evolution to Homo sapiens from the universal ancestor of all life.
Phylogeny and the Tree of Life
and the three domain system
Molecular phylogenetics
The Evolutionary History of Biodiversity
Classification and Systematics Tracing phylogeny is one of the main goals of systematics, the study of biological diversity in an evolutionary context.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
Chapter 26: Phylogeny and the Tree of Life Objectives 1.Identify how phylogenies show evolutionary relationships. 2.Phylogenies are inferred based homologies.
BINF6201/8201 Molecular phylogenetic methods
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Phylogenetic Trees: Common Ancestry and Divergence 1B1: Organisms share many conserved core processes and features that evolved and are widely distributed.
Chapter 26 Phylogeny and the Tree of Life
Introduction to Phylogenetics
17.2 Modern Classification
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Phylogeny & the Tree of Life
Classification. Cell Types Cells come in all types of shapes and sizes. Cell Membrane – cells are surrounded by a thin flexible layer Also known as a.
Classification.
Advanced Methods in Reconstructing Phylogenetic Relationships 2008 EMBO World Practical Course: March 3rd to 9th, 2008, Botanical Garden, Rio de Janeiro.
Phylogeny & Systematics
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Chapter 25: Phylogeny and Systematics. “Taxonomy is the division of organisms into categories based on… similarities and differences.” p. 495, Campbell.
Chapter 18: Classification
PHYOGENY & THE Tree of life Represent traits that are either derived or lost due to evolution.
5.4 Cladistics The images above are both cladograms. They show the statistical similarities between species based on their DNA/RNA. The cladogram on the.
Chapter 26 Phylogeny and the Tree of Life
Phylogeny and Taxonomy. Phylogeny and Systematics The evolutionary history of a species or related species Reconstructing phylogeny is done using evidence.
Taxonomy & Phylogeny. B-5.6 Summarize ways that scientists use data from a variety of sources to investigate and critically analyze aspects of evolutionary.
Section 2: Modern Systematics
Phylogeny and the Tree of Life
Phylogeny and the Tree of Life
Phylogeny & the Tree of Life
Phylogeny and the Tree of Life
Section 2: Modern Systematics
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Summary and Recommendations
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Chapter 25 Phylogeny and the Tree of Life
Phylogeny and the Tree of Life
Phylogeny and the Tree of Life
Phylogeny and Systematics
Phylogeny and the Tree of Life
Chapter 19 Molecular Phylogenetics
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Phylogeny and Systematics (Part 6)
Phylogeny and the Tree of Life
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Phylogeny and the Tree of Life
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Unit Genomic sequencing
Summary and Recommendations
1 2 Biology Warm Up Day 6 Turn phones in the baskets
Presentation transcript:

Advanced Methods in Reconstructing Phylogenetic Relationships 2010 Practical Course: March 8th to 13th, 2010, Rio de Janeiro

Darwin’s letter to Thomas Huxley 1857 The time will come I believe, though I shall not live to see it, when we shall have fairly true genealogical (phylogenetic) trees of each great kingdom of nature Haeckel’s pedigree of man

Aims of the course: To introduce the theory and practice of phylogenetic inference from molecular data To introduce some of the most useful methods and computer programmes To encourage a critical attitude to data and its analysis

Some definitions

Richard Owen

Homologue: the same organ under every variety of form and function (true or essential correspondence) Analogy: superficial or misleading similarity Richard Owen 1843 Owen’s definition of homology

Charles Darwin

“The natural system is based upon descent with modification.. the characters that naturalists consider as showing true affinity (i.e. homologies) are those which have been inherited from a common parent, and, in so far as all true classification is genealogical; that community of descent is the common bond that naturalists have been seeking” Charles Darwin, Origin of species 1859 p. 413 Darwin and homology

Homology: similarity that is the result of inheritance from a common ancestor - the identification and analysis of homologies is central to phylogenetic systematics Homology is...

Sees homology as evidence of common ancestry Uses tree diagrams to portray relationships based upon recency of common ancestry Monophyletic groups (clades) - contain species which are more closely related to each other than to any outside of the group Phylogenetic systematics

Bacterium 1 Bacterium 3 Bacterium 2 Eukaryote 1 Eukaryote 4 Eukaryote 3 Eukaryote 2 Bacterium 1 Bacterium 3 Bacterium 2 Eukaryote 1 Eukaryote 4 Eukaryote 3 Eukaryote 2 Phylograms show branch order and branch lengths Cladograms and phylograms Cladograms show branching order - branch lengths are meaningless

Rooted by outgroup Rooting using an outgroup archaea eukaryote bacteria outgroup root eukaryote Unrooted tree archaea Monophyletic group Monophyletic group

What kind of data?

Fossil skulls

Family tree for humans

Microbial morphologies - some are complex but many are simple - for example look at a drop of lake water:

Linus Pauling

“We may ask the question where in the now living systems the greatest amount of information of their past history has survived and how it can be extracted” “Best fit are the different types of macromolecules (sequences) which carry the genetic information” Molecules as documents of evolutionary history

Small subunit ribosomal RNA 18S or 16S rRNA

An alignment involves hypotheses of positional homology between bases or amino acids Alignment of 16S rRNA sequences from different bacteria

Automated Progressive Alignment of Sequences Essentially a heuristic method and as such is not guaranteed to find the ‘optimal’ alignment. Most successful implementation is Clustal (Des Higgins). This software is cited 3,000 times per year in the scientific literature.

Des Higgins is very famous

Automatic alignment programs There are a variety available: Clustal W 2.0, Muscle, T-Coffee are among the most popular All are easy to use and relatively quick (but this depends on how many sequences and how similar they are). Outputs files are produced which can be read by most phylogenetic analysis programmes. Can fail badly with highly divergent sequences.

James McInerney is not here But he has produced a nice lecture on some background issues for multiple alignment This can be downloaded from the embo world 2009 directory on our lab webpage:

Advice on alignments Treat cautiously Can be improved by eye (usually) Often helps to have colour-coding Depending on the use, the user should be able to make a judgement on those regions that are reliable or not For phylogeny reconstruction, only use those positions whose hypothesis of positional homology is unimpeachable (or do experiments)

Patterns in sequence data

Which sequences should we use? Do the sequences contain phylogenetic signal for the relationships of interest? (might be too conserved or too variable) Are there features of the data which might mislead us about evolutionary relationships? Exploring patterns in sequence data 1:

Is there a molecular clock? The idea of a molecular clock was initially suggested by Zuckerkandl and Pauling in 1962 They noted that rates of amino acid replacements in animal haemoglobins were roughly proportional to time - as judged against the fossil record

Rate Heterogeneity

Rates of amino acid replacement in different proteins

There is no universal molecular clock The initial proposal saw the clock as a Poisson process with a constant rate Now known to be more complex - differences in rates occur for: –different sites in a molecule –different genes –different regions of genomes –different genomes in the same cell –different taxonomic groups for the same gene There is no universal molecular clock

Small subunit ribosomal RNA 18S or 16S rRNA

Failure To Accommodate Rate Heterogeneity Can Lead To Problems When Making Trees

Unequal rates in different lineages may cause problems for phylogenetic analysis Felsenstein (1978) made a simple model phylogeny including four taxa and a mixture of short and long branches All methods are susceptible to “long branch” problems Methods which assume that all sites change at the same rate are particularly poor at recovering the true tree A B C D TRUE TREEWRONG TREE AB CD pp q qq p > q

Chaperonin 60 Protein Maximum Likelihood Tree (PROTML, Roger et al. 1998, PNAS 95: 229) Longest branches Bootstrap values are a common way of assessing support for relationships

High bootstrap values can be misleading - adding a single new sequence

A proposal for three domains of life (Woese, Kandler and Wheelis 1990 PNAS 87, 4576)

archaebacteria bacteria eukaryotes Concatenated LSU+SSU rRNA analyzed using a standard (GTR plus gamma*2) model The 3-domains tree of life Cox et al PNAS eocyte archaebacteria Two longest branches

NDCH (GTR+g+2cv)*2 Heterogeneous across tree CAT model bacteria eukaryotes Other archaebacteria eocytes The same RNA data analyzed using better models (Cox et al. 2008)

Saturation is due to multiple changes at the same site subsequent to lineage splitting Most data will contain some fast evolving sites which are potentially saturated (e.g. in proteins often position 3) In severe cases the data becomes essentially random and all information about relationships can be lost Saturation in sequence data:

Multiple changes at a single site - hidden changes CA C G T A Seq 1 Seq 2 Number of changes Seq 1 AGCGAG Seq 2 GCGGAC

Exploring patterns in sequence data Do sequences manifest biased base compositions (e.g thermophilic convergence) or biased codon usage patterns which may obscure phylogenetic signal

A case study in phylogenetic analysis: Deinococcus and Thermus Deinococcus are radiation resistant bacteria Thermus are thermophilic bacteria –BUT: –Both have the same very unusual cell wall based upon ornithine –Both have the same menaquinones (Mk 9) –Both have the same unusual polar lipids Congruence between these complex characters supports a phylogenetic relationship between Deinococcus and Thermus

% Guanine + Cytosine in 16S rRNA genes from mesophiles and thermophiles Thermophiles: Thermotoga maritima Thermus thermophilus Aquifex pyrophilus Mesophiles: Deinococcus radiodurans Bacillus subtilis %GC all sites variable sites

Shared nucleotide or amino acid composition biases can also cause problems for phylogenetic analysis True tree Wrong tree AquifexThermus Bacillus Deinococcus Aquifex (73%) Thermus (72%) Bacillus (50%) Deinococcus (52% G+C) 16S rRNA The correct tree can be obtained if a model is used which allows base/aa composition to vary between sequences -LogDet/Paralinear Distances Heterogeneous Maximum Likelihood Thermus Deinococcus Aquifex Bacillus

Gene trees and species trees We often assume that gene trees give us species trees a b c A B C Gene tree Species tree

Orthologues and paralogues a A* b* cBC* Ancestral gene Duplication to give 2 copies on the same genome = paralogues of each other orthologous paralogous A*C*b* A mixture of orthologues and paralogues sampled

The malic enzyme gene tree contains a mixture of orthologues and paralogues Anas = a duck! Gene duplication Plant chloroplast Plant mitochondrion

There may be conflicting patterns in data which can potentially mislead us about evolutionary relationships Our methods of analysis need to be able to deal with the complexities of sequence evolution and to recover any underlying phylogenetic signal Some methods may do this better than others depending on the properties of individual data sets All trees are simply hypotheses! Summary:

Phylogenetic analysis is frequently treated as a black box into which data are fed (often gathered at considerable cost) and out of which “The Tree” springs (Hillis, Moritz & Mable 1996, Molecular Systematics) Phylogenetic analysis requires careful thought