Building and visualizing phylogeny Henrik Lantz Dept. of Medical Biochemistry and Microbiology, BMC, Uppsala University
Me PhD in Plant Systematics, post doc in Fungal Systematics Now bioinformatician working with genomic and transcriptomic data, mostly annotation of genomes
You How many of you have experience with inferring phylogenies? How many of you have experience with working with sequence data in the computer?
This lecture Basic facts about phylogenies and nomenclature used How to infer a phylogeny from sequence data How to visualize phylogenies
What is phylogeny? Evolutionary relationship of organisms or genes – anything related by descent Often visualized as a phylogenetic tree
From individual to phylogeny
Zoooooming out…
Overview
Organism based phylogeny
Gene families
A simple phylogeny Branch Node Root Time
ABCD Root CDAB Clades
ABCD Root CDAB Sister groups
ABCDEFGH Sister-group relationships
MouseRabbitChimp Human BirdsKangarooHorseDog
ABCDEFGH Monophyletic Paraphyletic
Support values
Branch lengths
Phylogeny Sequence data, nucleotides or amino acids, in FASTA-format 2. Align the sequences 3. Run the alignment in the phylogeny program 4. Visualize the results in a tree viewer Phylogeny.fr does all of this!
Expasy
Phylogeny.fr
1. Fasta-format >CO1_species1 ACGTGTCCGA... >CO1_species2 TCCGATGAAC... >CO1_species3 GTGTCCGATC... Etc.
2. Alignment From: >CO1_species1 ACGTGTCCGA >CO1_species2 TCCGATGAAC >CO1_species3 GTGTCCGATC To: >CO1_species1 ACGTGTCCGA----- >CO1_species TCCGATGAAC >CO1_species3 --GTGTCCGATC---
2. Alignment >CO1_species1 ACGTGTCCGA----- >CO1_species TCCGATGAAC >CO1_species3 --GTGTCCGATC--- >CO1_species4 ACGTGACCGATC--- >CO1_species5 -CGTGACCGATCAAC >CO1_species6 ACGTGTCCGATGAAC
Homology and orthology Homology - traits shared due to common ancestry, e.g., fingered forelimbs in birds and mammals Analogy - traits of similar function, but not due to shared ancestry, e.g., wings in birds and insects Orthology - Sequences were split due to speciation events Paralogy - Sequences were split due to duplication events
Outgroup Should be something outside of the study group. Used to orient the tree. If you can, pick several taxa as outgroup Most phylogenetic programs pick the upper-most sequence in your input-file as the outgroup unless you tell the program otherwise
3. Build the phylogeny - Phylogenetic methods UPGMA Neighbor joining Parsimony Likelihood methods - Maximum Likelihood - Bayesian Methods Do not use! Not recommended Outdated Use one of these Phyml, Raxml MrBayes
Substitution models Jukes-Cantor - Transitions between all nucleotides are the same Kimura - Different rates for transitions and transversions (Purine AG, Pyrimidine CT) GTR - Different rates for all transitions. Used by Phylogeny-fr Amino acid models become much more complex as there are 20 states rather than 4
4. Visualization