Download presentation
Presentation is loading. Please wait.
Published byElisabeth Floyd Modified over 8 years ago
1
1 CAP5510 – Bioinformatics Phylogeny Tamer Kahveci CISE Department University of Florida
2
2 Goals Understand phylogenetic tree Learn –distance matrix based methods –maximum likelihood method –character based methods
3
3 What is phylogeny?
4
4 Phylogeny Shows the ancestral relationship between genes or organisms Infer relationship based on genotype rather than phenotype
5
5 Why Phylogeny? Understand history of organisms Understand how various functions evolved Multiple sequence alignment Gene function prediction
6
6 Phylogenetic Tree (1) Node = taxonomical unit –Leaf nodes = gene or organism –Internal node = inferred ancestor Bifurcating = two lineages Multifurcating = more than two lineages Branch = ancestral relationship
7
7 Phylogenetic Tree (2) Rooted = a single node is common ancestor to all Unrooted = provides no information about the direction of evolution Viruses of the family Reoviridae
8
8 Phylogenetic Tree (3) n = number of data Find the number of rooted trees for n = 3. Rooted => NR = (2n-3)!/2 n-2 (n-2)! Unrooted => NU = (2n-5)!/2 n-3 (n-3)! nNRNU 211 331 510515 1034x10 6 2x10 6 15213x10 12 7x10 12 208x10 21 0.2x10 21 113113 232232 3 -> ((1, 2), 3) 2 -> ((1, 3), 2) 1 -> ((3, 2), 1) Newick format
9
9 Distance Matrix Methods UPGMA (Unweighted Pair Group Method with Arithmetic mean)
10
10 UPGMA (1) Create a distance matrix between all pairs of taxa Iteratively do following until all taxa are merged –Merge the pair (x, y) with smallest distance d(x, y) and form xy –Set distance d(z, xy) = (d(z, x) + d(z, y))/2 for all z
11
11 Choose two clusters with minimum distance and combine them ABCDE A0101297 B04414 C0616 D013 E0 UPGMA (2) A BC D E
12
12 Update distance matrix Distance of new cluster to nodes in original clusters is half of original distance ABCDE A01197 BC0515 D013 E0 UPGMA (3) A BC D E 2 2
13
13 ABCDE A01197 BC0515 D013 E0 UPGMA (4) A BC D E 2 2
14
14 ABCDE A0107 BCD014 E0 UPGMA (5) A BC D E 2 2 2.5 0.5
15
15 ABCDE A0107 BCD014 E0 UPGMA (6) A BC D E 2 2 2.5 0.5
16
16 AEBCD AE012 BCD0 UPGMA (7) A BC D E 2 2 2.5 0.5 3.5
17
17 produced tree (((B, C), D), (A, E)) UPGMA (8) A BC D E 2 2 2.5 0.5 3.5 2.5 ABCDE A0101297 B04414 C0616 D013 E0 Not additive (path lengths may not Indicate actual distance. E.g., C and D)
18
18 Other distance based methods
19
19 Neighbor Relation Method (1) Consider all possible arrangements Choose the one that satisfies distance relation B A C D a b e c d AC + BD = AD + BC AB + CD < AC + BD
20
20 Neighbor Relation Method (2) {A, B, C, D, E, F, …} {A, B, C, D} 1.AB + CD 2.AC + BD 3.AD + BC min ABCDEFGH… A B C D E F G H … (Sattath, Tversky, 1977) {A, B, C, E}... 1.AB + CE 2.AE + BC 3.AC + BE Vote UPGMA on the votes
21
21 Neighbor Joining Method Start with a star tree Merge pairs of nodes that minimize sum of branch lengths B A C D B A C D E E
22
22 Maximum Likelihood Method
23
23 Maximum Likelihood Method Generate all possible trees Find the likelihood of tree –Use substitution probabilities (e.g., Jukes-Cantor) Choose the tree with highest likelihood Exhaustive search. Very slow Requires computation of inferred ancestors ACGCTAFKI GCGCTAFKI ACGCTAFKL GCGCTGFKI GCGCTLFKI ASGCTAFKL ACACTAFKL A G I L A G A L C S G A
24
24 Character Based Methods
25
25 AAA AGAAGA AAG GGA AAA AGAAGA AGA AAA AAG GGA Parsimony (1) There are various trees that could explain the phylogeny of the following sequences: AAG, AAA, GGA, AGA Parsimony prefers the second tree because it requires the fewer substitution events
26
26 Parsimony (2) Multiply align sequences For each column of the alignment –Generate all possible trees –Compute the number of substitutions –Vote for the tree with the smallest number of substitutions Pick the tree with the best vote 1: G G G G G G 2: G G G A G T 3: G G A T A G 4: G A T C A T 2G 1G3A 4A 3A 1G2G 4A
27
27 How can we infer the ancestors? ? ? ?
28
28 Inferring Ancestor (1/3) ATGGA A TGG A A TGG A XY Z If X Y = Z = X Y Else Z = X Y
29
29 Inferring Ancestor (2/3) A A TG G GA G,A G,A,T A A TG G G A G,T G,A,T A A TG G G A G G,A XY Z If X Y = Z = X Y Else Z = X Y
30
30 Inferring Ancestor (3/3) A A TG G GA G,A G,A,T A A TG G G A G,T G,A,T A A TG G G A G G,A Minimum number of substitutions = # unique characters - 1
31
31 Branch and Bound Method 1.Find an upper bound to tree length (L) –E.g., use UPGMA 2.Start with a small tree 3.Incrementally add more branches to tree –Exclude trees with length > L
32
32 Branch and Bound Example BC A BC D A BD C A DC B A
33
33 Consensus Trees There may be many trees of the same parsimony Consensus tree summarizes them by collapsing nodes –Resulting tree may not be bifurcating Strict consensus T% majority rule consensus
34
34 Consensus 1: (A, ((B, (C, D)), (E, (F, G)))) 2: ((A, (C, (B, D))), (E, (F, G))) 3: ((A, (D, (B, C))), (E, (F, G))) Strict: (A, (B, C, D), (E, (F, G))) 50% : ((A, (B, C, D)), (E, (F, G)))
35
35 Tree Confidence Is the resulting tree reliable? Usually a confidence is computed for each part of the tree –Bootstrapping
36
36 Bootstrapping Given a phylogenetic tree T 1.Multiply align sequences based on T 2.Randomly select columns from the alignment (with replacement) to create a new dataset of the same size 3.Find the phylogenetic tree T’ for the subset 4.Repeat steps 2-3 many times 5.Compute the fraction of times T’ overlaps with T 0 1 2 3 4 5 6 7 8 9 1: G G G A G G A T C A 2: G G G A G T A T C A 3: G G A T A G A C A T 4: G A T C A T G T A T 5: G T T C A T A T C T 0 0 2 4 4 4 5 8 8 8 1: G G G G G G G C C C 2: G G G G G T T C C C 3: G G A A A G G A A A 4: G G T A A T T A A A 5: G G T A A T T A C C
37
37 Reading Assignment Krane, Chapter 4, 5 Mount, Chapter 7
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.