Phylogenetic genome analysis, phylogenomics

Slides:



Advertisements
Similar presentations
Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Advertisements

1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.
Phylogenetics workshop: Protein sequence phylogeny week 2 Darren Soanes.
Wellcome Trust Workshop Working with Pathogen Genomes Module 6 Phylogeny.
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Summer Bioinformatics Workshop 2008 Comparative Genomics and Phylogenetics Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Molecular Evolution Revised 29/12/06
© Wiley Publishing All Rights Reserved. Phylogeny.
Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.
Bas E. Dutilh Phylogenomics Using complete genomes to determine the phylogeny of species.
Protein Sequence Classification Using Neighbor-Joining Method
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Lecture 13 – Performance of Methods Folks often use the term “reliability” without a very clear definition of what it is. Methods of assessing performance.
Phylogenetic Analysis. 2 Phylogenetic Analysis Overview Insight into evolutionary relationships Inferring or estimating these evolutionary relationships.
Topic : Phylogenetic Reconstruction I. Systematics = Science of biological diversity. Systematics uses taxonomy to reflect phylogeny (evolutionary history).
Phylogenetic trees Sushmita Roy BMI/CS 576
The diversity of genomes and the tree of life
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
The Evolutionary History of Biodiversity
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
The Graph of Life Dennis Shasha Joint work with Kenneth Birnbaum Treester system by: Matt Olim.
Chapter 26: Phylogeny and the Tree of Life Objectives 1.Identify how phylogenies show evolutionary relationships. 2.Phylogenies are inferred based homologies.
3- NON-RIBOSOMAL GENE RECONSTRUCTION  Core / auxiliary / strain specific genes  Housekeeping genes and accordance with global reconstruction  MLSA 
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Phylogenetic trees School B&I TCD Bioinformatics May 2010.
BINF6201/8201 Molecular phylogenetic methods
3- RIBOSOMAL RNA GENE RECONSTRUCITON  Phenetics Vs. Cladistics  Homology/Homoplasy/Orthology/Paralogy  Evolution Vs. Phylogeny  The relevance of the.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Calculating branch lengths from distances. ABC A B C----- a b c.
Phylogeny and Genome Biology Andrew Jackson Wellcome Trust Sanger Institute Changes: Type program name to start Always Cd to phyml directory before starting.
Using blast to study gene evolution – an example.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Phylogeny Ch. 7 & 8.
Phylogeny & Systematics
The evolutionary history of a species or a group of species
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Building Phylogenies Maximum Likelihood. Methods Distance-based Parsimony Maximum likelihood.
Universal Tree of Life  Universal tree ids the roadmap of life. It depicts the evolutionary history of the cells of all organism and the criteria reveals.
Chapter 26: Phylogeny and the Tree of Life
Darwin’s Tree of Life, July million species Phylogenetic inference from genomic.
Evolutionary history of a group of organisms
Phylogeny and the Tree of Life
Phylogeny and the Tree of Life
Evolutionary genomics can now be applied beyond ‘model’ organisms
Phylogenetic basis of systematics
Gene-sequence analysis reveals at least three species hidden in Zausodes arenicolus Erin Easton November 13, 2008.
Phylogenetic Inference
Multiple Alignment and Phylogenetic Trees
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Phylogeny & Systematics
Methods of molecular phylogeny
Phylogeny & Systematics
Inferring phylogenetic trees: Distance and maximum likelihood methods
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Phylogenetic Trees.
Chapter 25 Phylogeny and the Tree of Life
Comments on bipartitions, quartets and supertrees
Chapter 19 Molecular Phylogenetics
Phylogenetics Chapter 26.
Molecular data assisted morphological analyses
Unit Genomic sequencing
But what if there is a large amount of homoplasy in the data?
Chapter 18: Evolution and Origin of Species
Fig. 2. —Phylogenetic relationships and motif compositions of some representative MORC genes in plants and animals. ... Fig. 2. —Phylogenetic relationships.
Presentation transcript:

Phylogenetic genome analysis, phylogenomics Bas E. Dutilh

What we can see

Family tree → species tree Offspring looks like its parents Darwin: species evolve like families

Species tree

Tree of life Archaea Bacteria Eukaryota

Phylogeny Term coined by Ernst Haeckel (1866) Phylon (Greek: fulon) Tribe Race Genus (Latin) Birth Origin

Phenotype ↔ genotype Infinite # features Subjective choice Value can depend on observation (etc.) Gene/genome is finite Objective choice A sequence is absolute

Convergence Contrary to phenotype or structure, sequences do not converge Highly dimensional: every residue is a dimension

Phylogenetic markers Available/easy to sequence Present in all species Cytochrome C Present in all species Constant function Slowly evolving SSU rRNA Fitch, Science1967 Woese et al, PNAS 1977

SSU rRNA Phylogeny of SSU rRNA discovered the three domains Representative for the evolutionary history of species Archaea Bacteria Eukaryota

Phylogenetic assumptions Sequences are homologous – have a common ancestor Sequences diverge in a binary fashion Each position evolves independently

Phylogenetics Neighbor joining Maximum parsimony Maximum likelihood Which tree assumes the fewest mutations? Maximum likelihood For a given model, which tree has the highest probability of generating observed alignment?

Bootstrapping Jackknifing Randomly re-sample all columns in the alignment with replacement Re-create trees Count presence of each branch Jackknifing Delete fraction of columns Re-create tree

Different genes tell different stories Conflict between trees based on single genes Unrecognized paralogy Horizontal gene transfer Mutation saturation, biases, divergent rates spec B spec A - Paralogs - Orthologs ancestor spec C

More data → more consistent trees Combine information from more genes to average out these anomalies Complete genomes contain the maximum phylogenetic information Dutilh et al, Bioinformatics 2007

Chimeric genomes Is a tree the right representation of the evolutionary history of a genome? Endosymbiosis (mitochondrion, chloroplast) Horizontal gene transfer (many examples, often adaptations to environment) Darwin, 1859 Doolittle, Science 1999

Densitree “Fuzzy” trees Draw the tree lots of times Bootstrap Different genes Use transparency to make fuzziness

Splitstree Tries to accommodate non-bifurcating nodes Some positions evolve independently Parallel edges are related

Genomic properties Word frequency Sequence (nt/aa) Gene content Gene order

Dutilh et al, Bioinformatics 2007 Fungi Yeasts, filamentous and dimorphic fungi Fungi are the eukaryotic clade with largest number of completely sequenced genomes S. cerevisiae is a well studied model organism Much consensus about phylogeny Dutilh et al, Bioinformatics 2007

Consensus phylogeny (literature) 19 target nodes Dutilh et al, Bioinformatics 2007

13 trees 14 trees 15 trees 12 trees

Gene content methods Presence/absence matrix (0/1) Similarity: number of shared orthologous groups Genomes that share few OGs are distantly related Genomes that share many OGs are closely related OG1 OG2 OG3 OG4 … sp1 1 1 0 1 … sp2 0 1 0 0 … sp3 0 0 1 1 … … … … … … but… Snel et al, Nat Genet 1999 Tekaia et al, Genome Res 1999

Genome size correction Large genomes have more genes, so they also share more genes Divide number of shared genes by Average genome size Smallest of two genomes Weighted average genome size P. chrysosporium # shared genes genome size Korbel et al, TiG 2002

Saitou et al, Mol Biol Evol 1987 Gene content methods Similarity: corrected number of shared genes Distance: (1 – similarity) Neighbour joining ( ) # shared OGs (spA, spB) weighted average size (spA, spB) d 0.8 0 0.6 0.1 0 0.8 0.9 0.7 0 dist (spA, spB) = 1 – \s sp1 sp2 sp3 sp4 … sp1 \1 0.2 0.4 0.2 … sp2 \1 0.9 0.1 … sp3 \1 0.3 … sp4 \1 … … … … … … Saitou et al, Mol Biol Evol 1987

Superalignment methods Multiple alignment Concatenate alignments (1:1:1) A missing gene in a certain species (row) can be seen as a gap in the alignment

Select positions Percentage gaps Percentage conservation GBlocks Slow-fast Castresana, Mol Biol Evol 2000 Brinkmann et al, Mol Biol Evol 1999

Gene content vs. sequence Gene content supertrees are different than sequence based supertrees Dutilh et al, Bioinformatics 2007

“Hot” origin of life? Protein sequence Gene content

Other evidence Membrane composition Gene structure Plötz et al, J Biol Chem 2000 Gene structure Gribaldo et al, J Bact 1999

Light from different angles Sequence Phylogenetic trees (marker genes) Phylogenomic trees Gene content Gene content trees Signature genes Phenotype Morphology Metabolism / chemistry

Highly similar strains Almost identical gene content Low recombination rate Whole genome alignment Mauve Nucmer Extract positions that are not completely conserved from the genome alignment SNPs Small indels Abundance Recombination rate