Download presentation
Presentation is loading. Please wait.
1
Lecture 19: Evolution/Phylogeny
Introduction to Bioinformatics
2
Bioinformatics “Nothing in Biology makes sense except in the light of evolution” (Theodosius Dobzhansky ( )) “Nothing in bioinformatics makes sense except in the light of Biology”
3
Evolution Most of bioinformatics is comparative biology
Comparative biology is based upon evolutionary relationships between compared entities Evolutionary relationships are normally depicted in a phylogenetic tree
4
Where can phylogeny be used
For example, finding out about orthology versus paralogy Predicting secondary structure of RNA Studying host-parasite relationships Mapping cell-bound receptors onto their binding ligands Multiple sequence alignment (e.g. Clustal)
5
Phylogenetic tree (unrooted)
human Drosophila internal node fugu mouse leaf OTU – Observed taxonomic unit edge
6
Phylogenetic tree (unrooted)
human Drosophila internal node fugu mouse leaf OTU – Observed taxonomic unit edge
7
Phylogenetic tree (rooted)
time edge internal node (ancestor) leaf OTU – Observed taxonomic unit Drosophila human fugu mouse
8
How to root a tree f m Outgroup – place root between distant sequence and rest group Midpoint – place root at midpoint of longest path (sum of branches between any two OTUs) Gene duplication – place root between paralogous gene copies h D D f m h 1 f m 3 1 4 2 2 3 1 1 1 5 h D f m h D f- f- h- f- h- f- h- h-
9
Combinatoric explosion
# sequences # unrooted # rooted trees trees ,395 , ,135 ,135 2,027,025 ,027, ,459,425
10
Tree distances Evolutionary (sequence distance) = sequence dissimilarity 5 human x mouse x fugu x Drosophila x human 1 2 mouse 1 fugu 6 Drosophila human mouse fugu Drosophila
11
Phylogeny methods Parsimony – fewest number of evolutionary events (mutations) – relatively often fails to reconstruct correct phylogeny, but methods have improved recently Distance based – pairwise distances Maximum likelihood – L = Pr[Data|Tree]
12
Parsimony & Distance parsimony Sequences distance 1 2 3 4 5 6 7
Drosophila t t a t t a a fugu a a t t t a a mouse a a a a a t a human a a a a a a t Drosophila mouse 1 6 4 5 2 3 7 human fugu distance human x mouse x fugu x Drosophila x Drosophila 2 mouse 2 1 1 1 human fugu human mouse fugu Drosophila
13
Maximum likelihood If data=alignment, hypothesis = tree, and under a given evolutionary model, maximum likelihood selects the hypothesis (tree) that maximises the observed data Extremely time consuming method We also can test the relative fit to the tree of different models (Huelsenbeck & Rannala, 1997)
14
Bayesian methods Calculates the posterior probability of a tree (Huelsenbeck et al., 2001) –- probability that tree is true tree given evolutionary model Most computer intensive technique Feasible thanks to Markov chain Monte Carlo (MCMC) numerical technique for integrating over probability distributions Gives confidence number (posterior probability) per node
15
Distance methods: fastest
Clustering criterion using a distance matrix Distance matrix filled with alignment scores (sequence identity, alignment scores, E-values, etc.) Cluster criterion
16
Phylogenetic tree by Distance methods (Clustering)
1 2 3 4 5 Multiple alignment Similarity criterion Similarity matrix Scores 5×5 Phylogenetic tree
17
Lactate dehydrogenase multiple alignment
Human KITVVGVGAVGMACAISILMKDLADELALVDVIEDKLKGEMMDLQHGSLFLRTPKIVSGKDYNVTANSKLVIITAGARQ Chicken KISVVGVGAVGMACAISILMKDLADELTLVDVVEDKLKGEMMDLQHGSLFLKTPKITSGKDYSVTAHSKLVIVTAGARQ Dogfish –KITVVGVGAVGMACAISILMKDLADEVALVDVMEDKLKGEMMDLQHGSLFLHTAKIVSGKDYSVSAGSKLVVITAGARQ Lamprey SKVTIVGVGQVGMAAAISVLLRDLADELALVDVVEDRLKGEMMDLLHGSLFLKTAKIVADKDYSVTAGSRLVVVTAGARQ Barley TKISVIGAGNVGMAIAQTILTQNLADEIALVDALPDKLRGEALDLQHAAAFLPRVRI-SGTDAAVTKNSDLVIVTAGARQ Maizey casei -KVILVGDGAVGSSYAYAMVLQGIAQEIGIVDIFKDKTKGDAIDLSNALPFTSPKKIYSA-EYSDAKDADLVVITAGAPQ Bacillus TKVSVIGAGNVGMAIAQTILTRDLADEIALVDAVPDKLRGEMLDLQHAAAFLPRTRLVSGTDMSVTRGSDLVIVTAGARQ Lacto__ste -RVVVIGAGFVGASYVFALMNQGIADEIVLIDANESKAIGDAMDFNHGKVFAPKPVDIWHGDYDDCRDADLVVICAGANQ Lacto_plant QKVVLVGDGAVGSSYAFAMAQQGIAEEFVIVDVVKDRTKGDALDLEDAQAFTAPKKIYSG-EYSDCKDADLVVITAGAPQ Therma_mari MKIGIVGLGRVGSSTAFALLMKGFAREMVLIDVDKKRAEGDALDLIHGTPFTRRANIYAG-DYADLKGSDVVIVAAGVPQ Bifido KLAVIGAGAVGSTLAFAAAQRGIAREIVLEDIAKERVEAEVLDMQHGSSFYPTVSIDGSDDPEICRDADMVVITAGPRQ Thermus_aqua MKVGIVGSGFVGSATAYALVLQGVAREVVLVDLDRKLAQAHAEDILHATPFAHPVWVRSGW-YEDLEGARVVIVAAGVAQ Mycoplasma -KIALIGAGNVGNSFLYAAMNQGLASEYGIIDINPDFADGNAFDFEDASASLPFPISVSRYEYKDLKDADFIVITAGRPQ Distance Matrix 1 Human 2 Chicken 3 Dogfish 4 Lamprey 5 Barley 6 Maizey 7 Lacto_casei 8 Bacillus_stea 9 Lacto_plant 10 Therma_mari 11 Bifido 12 Thermus_aqua 13 Mycoplasma
19
Cluster analysis – (dis)similarity matrix
C1 C2 C3 C4 C5 C6 .. 1 2 3 4 5 Raw table Similarity criterion Similarity matrix Scores 5×5 Di,j = (k | xik – xjk|r)1/r Minkowski metrics r = 2 Euclidean distance r = 1 City block distance
20
Cluster analysis – Clustering criteria
Similarity matrix Scores 5×5 Cluster criterion Phylogenetic tree Single linkage - Nearest neighbour Complete linkage – Furthest neighbour Group averaging – UPGMA Ward Neighbour joining – global measure
21
Neighbour joining Global measure – keeps total branch length minimal, tends to produce a tree with minimal total branch length At each step, join two nodes such that distances are minimal (criterion of minimal evolution) Agglomerative algorithm Leads to unrooted tree
22
Neighbour joining y x x x y (c) (a) (b) x x x y y (f) (d) (e)
At each step all possible ‘neighbour joinings’ are checked and the one corresponding to the minimal total tree length (calculated by adding all branch lengths) is taken.
23
How to assess confidence in tree
Bayesian method – time consuming The Bayesian posterior probabilities (BPP) are assigned to internal branches in consensus tree Bayesian Markov chain Monte Carlo (MCMC) analytical software such as MrBayes (Huelsenbeck and Ronquist, 2001) and BAMBE (Simon and Larget,1998) is now commonly used Uses all the data Distance method – bootstrap: Select multiple alignment columns with replacement Recalculate tree Compare branches with original (target) tree Repeat times, so calculate different trees How often is branching (point between 3 nodes) preserved for each internal node? Uses samples of the data
24
The Bootstrap 1 2 3 4 5 6 7 8 C C V K V I Y S M A V R L I F S
M C L R L L F T V K V S I I S I V R V S I I S I L R L T L L T L 5 1 2 3 Original 4 2x 3x 1 1 2 3 Non-supportive Scrambled 5
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.