Presentation is loading. Please wait.

Presentation is loading. Please wait.

Phylogenetic trees as a visualization tools for evolutionary classification.

Similar presentations


Presentation on theme: "Phylogenetic trees as a visualization tools for evolutionary classification."— Presentation transcript:

1 Phylogenetic trees as a visualization tools for evolutionary classification

2 ChimpHumanGorilla HumanChimpGorilla = ChimpGorillaHuman == GorillaChimp Trees

3 Same thing… s4s5 s1 s3 s2 s4s5 s1 s3 s2 =

4 Bifurcating / Multifurcating s4s5 s1 s3 s2 A multifurcation = Polytomy s4s5 s1 s3 s2 Dichotomy There are two types of polytomies: soft (lack of information to resolve the tree) and hard (multiple divergence in short evolutionary time).

5 A “comb” A comb s4s5 s1 s3 s2

6 Terminology A branch = An edge External node - leaf HumanChimp Chicken Gorilla The root Internal nodes

7 Ingroup / Outgroup: HumanChimp Chicken Gorilla INGROUP OUTGROUP

8 Subtrees HumanChimp Chicken Gorilla Duck A subtree

9 Monophyletic groups HumanChimp Chicken Gorilla The Gorilla+Human+Chimp are monophyletic. A clade is a monophyletic group.

10 Paraphyletic = Non- monophyletic groups WhaleChimp Drosophila Zebrafish The Zebrafish+Whale are paraphyletic

11

12 The maximum parsimony principle. 3. Tree building

13 Genes: 0 = absence, 1 = presence speciesg1g2g3g4g5g6 s1100110 s2001000 s3110000 s4110111 s5001110 3. Tree building

14 s1s4s3 s2 s5 Evaluate this tree… 3. Tree building

15 s1s4s3s2s5 Gene number 1 11100 1 0 1 3. Tree building

16 s1s4s3s2s5 Gene number 1, Option number 1. 11100 1 0 1 1 3. Tree building

17 s1s4s3s2s5 Gene number 1, Option number 2. Number of changes for gene 1 (character 1) = 1 11100 1 0 0 1 3. Tree building

18 s1s4s3 s2 s5 Gene number 2, Option number 1. 01100 1 0 0 1 3. Tree building

19 s1s4s3 s2 s5 Gene number 2, Option number 2. 01100 1 0 1 1 3. Tree building

20 s1s4s3 s2 s5 Gene number 2, Option number 3. 01100 0 0 0 0 Number of changes for gene 2 (character 2) = 2 3. Tree building

21 s1s4s3 s2 s5 Gene number 3, Option number 1. 00011 0 1 0 0 3. Tree building

22 s1s4s3 s2 s5 Gene number 3, Option number 2. 00011 0 1 1 0 Number of changes for gene 3 (character 3) = 1 3. Tree building

23 s1s4s3 s2 s5 Gene number 4, Option number 1. 11001 1 1 1 1 3. Tree building

24 s1s4s3 s2 s5 Gene number 4, Option number 2. 11001 0 0 0 1 Number of changes for gene 4 (character 4) = 2 3. Tree building

25 Gene number 5 is the same as Gene number 4 Number of changes for gene 5 (character 5) = 2 3. Tree building

26 s1s4s3 s2 s5 Gene number 6, 1 option only: 01000 0 0 0 0 Number of changes for gene 6 (character 6) = 1 3. Tree building

27 Sum of changes Number of changes for gene 6 (character 6) = 1 Number of changes for gene 5 (character 5) = 2 Number of changes for gene 4 (character 4) = 2 Number of changes for gene 3 (character 3) = 1 Number of changes for gene 2 (character 2) = 2 Sum of changes for this tree topology = 9 Can we do better ??? Number of changes for gene 1 (character 1) = 1 3. Tree building

28 s1s4s3 s2 s5 The MP (most parsimonious) tree: Sum of changes for this tree topology = 8 3. Tree building

29

30 How to efficiently compute the MP score of a tree

31 The Fitch algorithm (1971): AG C C A HumanChimp Chicken Gorilla Duck {A,G} {A,C,G} {A,C} Postorder tree scan. In each node, if the intersection between the leaves is empty: we apply a union operator. Otherwise, an intersection.

32 Number of changes AG C C A HumanChimp Chicken Gorilla Duck {A,G} {A,C,G} {A,C} Total number of changes = number of union operators.

33 Patterns: AG C C A HumanChimp Chicken Gorilla Duck {A,G} {A,C,G} {A,C} CACAG require the same number of changes as CACAT, or in general all those positions with the pattern XYXYZ.

34 Ex: GACAGGGA CAAG GCGA GAAA HumanChimp Chicken Gorilla Duck Find min. number of changes. Point to all identical patterns.

35 Ambiguous characters: AG C C R = {A,G} HumanChimp Chicken Gorilla Duck {A,G} {A,C,G} {A,G,C } {A,C,G } R = {A,G} = Purine..

36 Subtrees Each node has an ID. 78 2 5 3 HumanChimp Chicken Gorilla Duck 6 4 1 0 Subtree of node 4.

37 The Sankoff algorithm: Generalization: they assume a cost function Cij for changing from i to j. If Cij = 1, it just counts number of changes. We now search for the tree with the min. cost. Definition: Si(k) = Minimum cost of the subtree of node i, given that the assignment of node i = character k.

38 Easy to compute for the leaves. For example S 2 (A) = 0 (no cost in A there) S 2 (C) = S 2 (G) = S 2 (T) ∞ (they just can’t be there). 78 2 5 3 A G A A C 6 4 1 0

39 Definition: Si(k) = Minimum cost of the subtree of node i, given that the assignment of node i = character k. 7 8 2 5 3 AG A A C 6 4 1 0 [0, ∞, ∞, ∞][∞, 0, ∞, ∞][0, ∞, ∞, ∞] [∞, ∞, 0, ∞]

40 Definition: Si(k) = Minimum cost of the subtree of node i, given that the assignment of node i = character k. 1 0 [s 1 (A), s 1 (C), s 1 (G), s 1 (T)] ACGT A0312 C3021 G1203 T2130 Costs: 2 [s 2 (A), s 2 (C), s 2 (G), s 2 (T)] S 0 (A) = min x (C AX + S 1 (X)) + min Y (C AY +S 2 (Y))

41 Definition: Si(k) = Minimum cost of the subtree of node i, given that the assignment of node i = character k. 1 0 [13, 17, 22, 14] ACGT A0312 C3021 G1203 T2130 Costs: 2 [15,14,21,17] S 0 (A) = min { 13, 17 + 3, 22 + 1, 14 + 2 } + min { 15, 14 + 3, 21 + 1, 17 + 2 } =13 + 15 = 28.

42 Definition: Si(k) = Minimum cost of the subtree of node i, given that the assignment of node i = character k. 1 [13, 17, 22, 14] ACGT A0312 C3021 G1203 T2130 Costs: 2 [15,14,21,17] S 0 (C) = min { 13 + 3, 17, 22 + 2, 14 + 1 } + min { 15 + 3, 14, 21 + 2, 17 + 1 } =15 + 14 = 29. [28,x,y,z}

43 Definition: Si(k) = Minimum cost of the subtree of node i, given that the assignment of node i = character k. 1 [13, 17, 22, 14] ACGT A0312 C3021 G1203 T2130 Costs: 2 [15,14,21,17] S 0 (G) = min { 13 + 1, 17 + 2, 22, 14 + 3 } + min { 15 + 1, 14 + 2, 21, 17 + 3 } =14 + 16 = 30. [28,29,y,z}

44 Definition: Si(k) = Minimum cost of the subtree of node i, given that the assignment of node i = character k. 1 [13, 17, 22, 14] ACGT A0312 C3021 G1203 T2130 Costs: 2 [15,14,21,17] S 0 (T) = min { 13 + 2, 17 + 1, 22 + 3, 14 } + min { 15 + 2, 14 + 1, 21 + 3, 17 } =14 + 15 = 29. [28,29,30,z}

45 Definition: Si(k) = Minimum cost of the subtree of node i, given that the assignment of node i = character k. 1 [28,29,30,29} [13, 17, 22, 14] ACGT A0312 C3021 G1203 T2130 Costs: 2 [15,14,21,17] The cost of the tree is the minimum of this vector, which is 28.

46 Dynamic programming. This is an example of dynamic programming, because you first solve some small problems, and then recursively, use these solutions to build a solution to a larger problem.

47 Exercise. Compute minimal cost for this tree 78 2 5 3 A G A C C 6 4 1 0 ACGT A02.51 C 0 1 G1 0 T 1 0 Solution: the vector at the root should be [6,6,7,8], thus, the answer is 6.


Download ppt "Phylogenetic trees as a visualization tools for evolutionary classification."

Similar presentations


Ads by Google