Download presentation
Presentation is loading. Please wait.
1
Phylogenetic genome analysis, phylogenomics
Bas E. Dutilh
2
What we can see
3
Family tree → species tree
Offspring looks like its parents Darwin: species evolve like families
4
Species tree
5
Tree of life Archaea Bacteria Eukaryota
6
Phylogeny Term coined by Ernst Haeckel (1866) Phylon (Greek: fulon)
Tribe Race Genus (Latin) Birth Origin
7
Phenotype ↔ genotype Infinite # features Subjective choice
Value can depend on observation (etc.) Gene/genome is finite Objective choice A sequence is absolute
8
Convergence Contrary to phenotype or structure, sequences do not converge Highly dimensional: every residue is a dimension
9
Phylogenetic markers Available/easy to sequence Present in all species
Cytochrome C Present in all species Constant function Slowly evolving SSU rRNA Fitch, Science1967 Woese et al, PNAS 1977
10
SSU rRNA Phylogeny of SSU rRNA discovered the three domains
Representative for the evolutionary history of species Archaea Bacteria Eukaryota
11
Phylogenetic assumptions
Sequences are homologous – have a common ancestor Sequences diverge in a binary fashion Each position evolves independently
12
Phylogenetics Neighbor joining Maximum parsimony Maximum likelihood
Which tree assumes the fewest mutations? Maximum likelihood For a given model, which tree has the highest probability of generating observed alignment?
13
Bootstrapping Jackknifing
Randomly re-sample all columns in the alignment with replacement Re-create trees Count presence of each branch Jackknifing Delete fraction of columns Re-create tree
14
Different genes tell different stories
Conflict between trees based on single genes Unrecognized paralogy Horizontal gene transfer Mutation saturation, biases, divergent rates spec B spec A - Paralogs - Orthologs ancestor spec C
15
More data → more consistent trees
Combine information from more genes to average out these anomalies Complete genomes contain the maximum phylogenetic information Dutilh et al, Bioinformatics 2007
16
Chimeric genomes Is a tree the right representation of the evolutionary history of a genome? Endosymbiosis (mitochondrion, chloroplast) Horizontal gene transfer (many examples, often adaptations to environment) Darwin, 1859 Doolittle, Science 1999
17
Densitree “Fuzzy” trees Draw the tree lots of times
Bootstrap Different genes Use transparency to make fuzziness
18
Splitstree Tries to accommodate non-bifurcating nodes
Some positions evolve independently Parallel edges are related
19
Genomic properties Word frequency Sequence (nt/aa) Gene content
Gene order
20
Dutilh et al, Bioinformatics 2007
Fungi Yeasts, filamentous and dimorphic fungi Fungi are the eukaryotic clade with largest number of completely sequenced genomes S. cerevisiae is a well studied model organism Much consensus about phylogeny Dutilh et al, Bioinformatics 2007
21
Consensus phylogeny (literature)
19 target nodes Dutilh et al, Bioinformatics 2007
22
13 trees 14 trees 15 trees 12 trees
23
Gene content methods Presence/absence matrix (0/1)
Similarity: number of shared orthologous groups Genomes that share few OGs are distantly related Genomes that share many OGs are closely related OG1 OG2 OG3 OG4 … sp … sp … sp … … … … … … but… Snel et al, Nat Genet 1999 Tekaia et al, Genome Res 1999
24
Genome size correction
Large genomes have more genes, so they also share more genes Divide number of shared genes by Average genome size Smallest of two genomes Weighted average genome size P. chrysosporium # shared genes genome size Korbel et al, TiG 2002
25
Saitou et al, Mol Biol Evol 1987
Gene content methods Similarity: corrected number of shared genes Distance: (1 – similarity) Neighbour joining ( ) # shared OGs (spA, spB) weighted average size (spA, spB) d 0.8 0 dist (spA, spB) = 1 – \s sp1 sp2 sp3 sp4 … sp1 \ … sp \ … sp \ … sp \1 … … … … … … Saitou et al, Mol Biol Evol 1987
26
Superalignment methods
Multiple alignment Concatenate alignments (1:1:1) A missing gene in a certain species (row) can be seen as a gap in the alignment
27
Select positions Percentage gaps Percentage conservation GBlocks
Slow-fast Castresana, Mol Biol Evol 2000 Brinkmann et al, Mol Biol Evol 1999
28
Gene content vs. sequence
Gene content supertrees are different than sequence based supertrees Dutilh et al, Bioinformatics 2007
29
“Hot” origin of life? Protein sequence Gene content
30
Other evidence Membrane composition Gene structure
Plötz et al, J Biol Chem 2000 Gene structure Gribaldo et al, J Bact 1999
31
Light from different angles
Sequence Phylogenetic trees (marker genes) Phylogenomic trees Gene content Gene content trees Signature genes Phenotype Morphology Metabolism / chemistry
32
Highly similar strains
Almost identical gene content Low recombination rate Whole genome alignment Mauve Nucmer Extract positions that are not completely conserved from the genome alignment SNPs Small indels Abundance Recombination rate
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.