Bas E. Dutilh Phylogenomics Using complete genomes to determine the phylogeny of species.

Slides:



Advertisements
Similar presentations
Computational Molecular Biology Biochem 218 – BioMedical Informatics Doug Brutlag Professor.
Advertisements

Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.
Phylogenetics workshop: Protein sequence phylogeny week 2 Darren Soanes.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
Chapter 25: Phylogeny and Systematics
Wellcome Trust Workshop Working with Pathogen Genomes Module 6 Phylogeny.
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers.
Phylogenetic reconstruction
Types of homology BLAST
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Molecular Evolution Revised 29/12/06
© Wiley Publishing All Rights Reserved. Phylogeny.
CISC667, F05, Lec14, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (I) Maximum Parsimony.
Finding Orthologous Groups René van der Heijden. What is this lecture about? What is ‘orthology’? Why do we study gene-ancestry/gene-trees (phylogenies)?
Bioinformatics and Phylogenetic Analysis
FOG: High-Resolution Fungal Orthologous Groups René van der Heijden Project 5.10: Comparative genomics for the prediction of protein function and pathways.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Finding Orthologous Groups René van der Heijden. What is this lecture about? What is ‘orthology’? Why do we study gene-ancestry/gene-trees (phylogenies)?
Multiple Sequence Alignments
Phylogenetic trees Sushmita Roy BMI/CS 576
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
An Introduction to Bioinformatics
Protein Evolution and Sequence Analysis Protein Evolution and Sequence Analysis.
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
The Graph of Life Dennis Shasha Joint work with Kenneth Birnbaum Treester system by: Matt Olim.
Chapter 26: Phylogeny and the Tree of Life Objectives 1.Identify how phylogenies show evolutionary relationships. 2.Phylogenies are inferred based homologies.
Dichotomy of major bacterial phyla inferred from gene arrangement comparisons Takashi Kunisawa Science University of Tokyo Noda , Japan CODATA06.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
BINF6201/8201 Molecular phylogenetic methods
3- RIBOSOMAL RNA GENE RECONSTRUCITON  Phenetics Vs. Cladistics  Homology/Homoplasy/Orthology/Paralogy  Evolution Vs. Phylogeny  The relevance of the.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Applied Bioinformatics Week 8 Jens Allmer. Practice I.
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Introduction to Phylogenetics
Construction of Substitution Matrices
Calculating branch lengths from distances. ABC A B C----- a b c.
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Sequence Alignment Csc 487/687 Computing for bioinformatics.
Ch.6 Phylogenetic Trees 2 Contents Phylogenetic Trees Character State Matrix Perfect Phylogeny Binary Character States Two Characters Distance Matrix.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Phylogeny and Genome Biology Andrew Jackson Wellcome Trust Sanger Institute Changes: Type program name to start Always Cd to phyml directory before starting.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Phylogeny Ch. 7 & 8.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Phylogenetics.
Phylogeny & Systematics
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Part 9 Phylogenetic Trees
PHYOGENY & THE Tree of life Represent traits that are either derived or lost due to evolution.
Building Phylogenies Maximum Likelihood. Methods Distance-based Parsimony Maximum likelihood.
Universal Tree of Life  Universal tree ids the roadmap of life. It depicts the evolutionary history of the cells of all organism and the criteria reveals.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
Phylogeny and the Tree of Life
Phylogeny and the Tree of Life
Phylogenetic genome analysis, phylogenomics
Sequence similarity, BLAST alignments & multiple sequence alignments
Announcements.
Evolutionary genomics can now be applied beyond ‘model’ organisms
Multiple Alignment and Phylogenetic Trees
Methods of molecular phylogeny
Phylogenetic Trees.
Chapter 19 Molecular Phylogenetics
Phylogenetics Chapter 26.
Unit Genomic sequencing
Presentation transcript:

Bas E. Dutilh Phylogenomics Using complete genomes to determine the phylogeny of species

Tree of life Bacteria Archaea Eukaryota

Evolution What we can see are the present-day species Offspring looks like its parents Mutations –Phenotype –Genotype Nature selects: survival of the fittest

Phenotype Which properties to compare? Watanabe's Ugly Duckling Theorem: “All things have an infinite number of features. So any two things share an infinite number of features. Therefore two things cannot be of the same kind because they share more features than they do with things of a different kind.”

Evolution

Genotype Genome sequence is finite and you do not have to choose Genetic properties –Word frequency –Sequence (nt/aa) –Gene content –Gene order

Why sequence similarity works Every residue (nt/aa) is a separate dimension –Human: 3 billion nucleotides Most mutations are …  Sequences never converge

Evolution: mutation and selection Mutation is responsible for changes Selection is responsible for continuity The more differences, the more distantly related two sequences are Contrary to structure or phenotype, sequences do not converge

Phylogenetics Distance matrix Hierarchical clustering Evaluate likelihood of all possible trees Maximum likelihood P P P Inferring the evolution of a gene

Substitution matrix Describes the rate at which one character in a sequence changes to other character states BLOck SUbstitution Matrix (BLOSUM) is based on observed substitutions between proteins with e.g. >62% sequence identity

Neighbour joining

Maximum likelihood Make all possible trees Calculate likelihood that the alignment evolved in this tree  Maximum likelihood tree Very computer intensive PhyML searches “around” starting tree (e.g. NJ) P P P

Maximum parsimony Parsimony is a special case of likelihood The tree with the smallest number of mutations is the maximum parsimony tree

Fox et al, Science 1980 Present in all species Constant function Slowly evolving SSU rRNA

Olsen et al, J Bacteriol 1994 Phylogeny of SSU rRNA discovered the three domains Representative for the evolutionary history of species SSU rRNA Bacteria Archaea Eukaryota

ancestor Conflict between trees based on single genes Unrecognized paralogy Horizontal gene transfer Mutation saturation, biases, divergent rates Different genes tell different stories spec B spec A - Orthologs - Paralogs spec C

Is a tree the right representation? Genomes are chimeras with genes from different origins –Endosymbiosis (mitochondrion, chloroplast) –Horizontal gene transfer (many examples, often adaptations to environment)

More data = more consistent trees Combine information from more genes to average out these anomalies Complete genomes contain the maximum phylogenetic information

Fungi Yeasts, filamentous and dimorphic fungi Fungi are the eukaryotic clade with largest number of completely sequenced genomes S. cerevisiae is a well studied model organism Much consensus about phylogeny

Consensus phylogeny (literature) 19 target nodes

ancestor spec B spec A spec C Which genes to compare between species –Homologs (originated “de novo”) –Orthologs (originated at speciation) Orthology has higher resolution –Pairwise orthology –Cluster orthology –Tree-based orthology Orthology

Pairwise orthology (Inparanoid) Compare all proteins in species A to all proteins in species B to find homologs Find bi-directional best hit All proteins closer than bi-directional best hit are (in-) paralogs

Cluster orthology (COG) First group in-paralogs in every species Find bi-directional best hits between in- paralogous groups Join in-paralogs to orthologous groups –Link all pairs of in-paralogous groups –Only if link is confirmed by third species (triangle)

Tree based orthology Phylogenetic tree of homologs Find gene duplication nodes Two homologous genes are orthologs if last common ancestor is not a duplication node but a speciation node

Presence/absence matrix (0/1) Similarity: number of shared orthologous groups –Genomes that share few OGs are distantly related –Genomes that share many OGs are closely related Gene content methods OG1 OG2 OG3 OG4 … sp … sp … sp … … … … … … but…

Genome size correction Large genomes have more genes, so they also share more genes Divide number of shared genes by –Average genome size –Smallest of two genomes –Weighted average genome size # shared genes genome size P. chrysosporium Korbel et al, Trends Genet 2002

Similarity: corrected number of shared genes Distance: (1 – similarity) Neighbour joining \s sp1 sp2 sp3 sp4 … sp1 \ … sp2 \ … sp3 \1 0.3 … sp4 \1 … … … … … … ( ) # shared OGs (spA, spB) weighted average size (spA, spB) d dist (spA, spB) = 1 – Gene content methods

Dollo parsimony –Gaining a complex character (gene) is rare and happens once –Losing it is relatively easy –Minimize the number of gene losses for maximum parsimony Gene content methods

Superalignment methods Multiple alignment Concatenate alignments (1:1:1) A missing gene in a certain species (row) can be seen as a gap in the alignment

Superdistance methods Combine distance matrices from separate gene families, e.g. average

Supertree methods Make phylogenetic trees for all gene families separately Matrix Representation using Parsimony (MRP)

13 trees14 trees15 trees12 trees

Gene content vs. sequence based Gene content supertrees are different than sequence based supertrees

Consensus phylogeny (literature) 19 target nodes

Low-dimensional compared to genotype Intermediate between genotype and phenotype –Main dichotomy between yeasts and filamentous Fungi, not Ascomycota and Basidiomycota –Dimorphic Basidiomycota exclude filamentous P. chrysosporium Gene content 10.38

Superalignment Supertree Sequence based trees agree better with literature Literature is dominated by sequence based trees

Hyperthermophiles

Nanoarchaeum Nanoarchaeota Waters et al. PNAS 2003; Di Giulio, J Theor Biol 2006 Crenarchaeota Ciccarelli et al. Science 2006 Euryarchaeota Brochier et al. Genome Biol 2005 Cren Eury Gene content tree

Assignment Make a gene content tree Compare with other phylogenetic trees Describe the differences –Can you find literature that specifically studies these species? –What do you think is going on? Why are the trees different? Write a paper about some of your most interesting findings, include references