Download presentation
Presentation is loading. Please wait.
1
Bas E. Dutilh Phylogenomics Using complete genomes to determine the phylogeny of species
2
Tree of life Bacteria Archaea Eukaryota
3
Evolution What we can see are the present-day species Offspring looks like its parents Mutations –Phenotype –Genotype Nature selects: survival of the fittest
4
Phenotype Which properties to compare? Watanabe's Ugly Duckling Theorem: “All things have an infinite number of features. So any two things share an infinite number of features. Therefore two things cannot be of the same kind because they share more features than they do with things of a different kind.”
5
Evolution
6
Genotype Genome sequence is finite and you do not have to choose Genetic properties –Word frequency –Sequence (nt/aa) –Gene content –Gene order
7
Why sequence similarity works Every residue (nt/aa) is a separate dimension –Human: 3 billion nucleotides Most mutations are … Sequences never converge
8
Evolution: mutation and selection Mutation is responsible for changes Selection is responsible for continuity The more differences, the more distantly related two sequences are Contrary to structure or phenotype, sequences do not converge
9
Phylogenetics Distance matrix Hierarchical clustering Evaluate likelihood of all possible trees Maximum likelihood P P P Inferring the evolution of a gene
10
Substitution matrix Describes the rate at which one character in a sequence changes to other character states BLOck SUbstitution Matrix (BLOSUM) is based on observed substitutions between proteins with e.g. >62% sequence identity
11
Neighbour joining
12
Maximum likelihood Make all possible trees Calculate likelihood that the alignment evolved in this tree Maximum likelihood tree Very computer intensive PhyML searches “around” starting tree (e.g. NJ) P P P
13
Maximum parsimony Parsimony is a special case of likelihood The tree with the smallest number of mutations is the maximum parsimony tree
14
Fox et al, Science 1980 Present in all species Constant function Slowly evolving SSU rRNA
15
Olsen et al, J Bacteriol 1994 Phylogeny of SSU rRNA discovered the three domains Representative for the evolutionary history of species SSU rRNA Bacteria Archaea Eukaryota
16
ancestor Conflict between trees based on single genes Unrecognized paralogy Horizontal gene transfer Mutation saturation, biases, divergent rates Different genes tell different stories spec B spec A - Orthologs - Paralogs spec C
17
Is a tree the right representation? Genomes are chimeras with genes from different origins –Endosymbiosis (mitochondrion, chloroplast) –Horizontal gene transfer (many examples, often adaptations to environment)
18
More data = more consistent trees Combine information from more genes to average out these anomalies Complete genomes contain the maximum phylogenetic information
19
Fungi Yeasts, filamentous and dimorphic fungi Fungi are the eukaryotic clade with largest number of completely sequenced genomes S. cerevisiae is a well studied model organism Much consensus about phylogeny
20
Consensus phylogeny (literature) 19 target nodes
22
ancestor spec B spec A spec C Which genes to compare between species –Homologs (originated “de novo”) –Orthologs (originated at speciation) Orthology has higher resolution –Pairwise orthology –Cluster orthology –Tree-based orthology Orthology
23
Pairwise orthology (Inparanoid) Compare all proteins in species A to all proteins in species B to find homologs Find bi-directional best hit All proteins closer than bi-directional best hit are (in-) paralogs
24
Cluster orthology (COG) First group in-paralogs in every species Find bi-directional best hits between in- paralogous groups Join in-paralogs to orthologous groups –Link all pairs of in-paralogous groups –Only if link is confirmed by third species (triangle)
25
Tree based orthology Phylogenetic tree of homologs Find gene duplication nodes Two homologous genes are orthologs if last common ancestor is not a duplication node but a speciation node
27
Presence/absence matrix (0/1) Similarity: number of shared orthologous groups –Genomes that share few OGs are distantly related –Genomes that share many OGs are closely related Gene content methods OG1 OG2 OG3 OG4 … sp1 1 1 0 1 … sp2 0 1 0 0 … sp3 0 0 1 1 … … … … … … but…
28
Genome size correction Large genomes have more genes, so they also share more genes Divide number of shared genes by –Average genome size –Smallest of two genomes –Weighted average genome size # shared genes genome size P. chrysosporium Korbel et al, Trends Genet 2002
29
Similarity: corrected number of shared genes Distance: (1 – similarity) Neighbour joining \s sp1 sp2 sp3 sp4 … sp1 \1 0.2 0.4 0.2 … sp2 \1 0.9 0.1 … sp3 \1 0.3 … sp4 \1 … … … … … … ( ) # shared OGs (spA, spB) weighted average size (spA, spB) d 0 0.8 0 0.6 0.1 0 0.8 0.9 0.7 0 dist (spA, spB) = 1 – Gene content methods
30
Dollo parsimony –Gaining a complex character (gene) is rare and happens once –Losing it is relatively easy –Minimize the number of gene losses for maximum parsimony Gene content methods
31
Superalignment methods Multiple alignment Concatenate alignments (1:1:1) A missing gene in a certain species (row) can be seen as a gap in the alignment
32
Superdistance methods Combine distance matrices from separate gene families, e.g. average
33
Supertree methods Make phylogenetic trees for all gene families separately Matrix Representation using Parsimony (MRP)
34
13 trees14 trees15 trees12 trees
35
Gene content vs. sequence based Gene content supertrees are different than sequence based supertrees
36
Consensus phylogeny (literature) 19 target nodes
37
Low-dimensional compared to genotype Intermediate between genotype and phenotype –Main dichotomy between yeasts and filamentous Fungi, not Ascomycota and Basidiomycota –Dimorphic Basidiomycota exclude filamentous P. chrysosporium Gene content 10.38
38
Superalignment 18.21 Supertree 17.50 Sequence based trees agree better with literature Literature is dominated by sequence based trees
39
Hyperthermophiles
40
Nanoarchaeum Nanoarchaeota Waters et al. PNAS 2003; Di Giulio, J Theor Biol 2006 Crenarchaeota Ciccarelli et al. Science 2006 Euryarchaeota Brochier et al. Genome Biol 2005 Cren Eury Gene content tree
41
Assignment Make a gene content tree Compare with other phylogenetic trees Describe the differences –Can you find literature that specifically studies these species? –What do you think is going on? Why are the trees different? Write a paper about some of your most interesting findings, include references www.cmbi.ru.nl/edu/seminars
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.