Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reconstrucción Filogenética. Una manera simple de entender la evolución…

Similar presentations


Presentation on theme: "Reconstrucción Filogenética. Una manera simple de entender la evolución…"— Presentation transcript:

1 Reconstrucción Filogenética

2 Una manera simple de entender la evolución…

3 DATOS: Alineamiento de secuencias de genes Cómo podemos transformar esta información a un contexto histórico?

4

5

6 Phylogeny inference 1.Distance based methods -Pair wise distance matrix -Adjust tree branch lengths to fit the distance matrix (ex. Minimum squares, Neighbor joining) 2. Character based methods -Parsimony -Maximum likelihood or model based evolution

7 In 1866, Ernst Haeckel coined the word “phylogeny” and presented phylogenetic trees for most known groups of living organisms.

8 Surf the tree of life at: http://tolweb.org/tree/phylogeny.html The Tree of Life project

9 What is a tree? A tree consists of nodes connected by branches. Terminal nodes represent sequences or organisms for which we have data. Each is typically called a “Operational Taxonomical Unit” or OTU. Internal nodes represent hypothetical ancestors The ancestor of all the sequences is the root of the tree A tree is a mathematical structure which is used to model the actual evolutionary history of a group of sequences or organisms, i.e. an evolutionary hypothesis.

10 Types of Trees Rooted vs. Unrooted NodesBranches M – 1M – 2InteriorRooted 2M – 12M – 2Total M – 2M – 3InteriorUnrooted 2M – 22M – 3Total M is the number of OTU’s

11 Possible Number of Rooted treesUnrooted trees 211 331 4153 510515 6945105 710395945 813513510395 92027025135135 10344594252027025 The number of rooted and unrooted trees: Number of OTU’s OTU – Operational Taxonomical Unit

12 Bifurcating Polytomies: Soft vs. Hard Soft: designate a lack of information about the order of divergence. Hard: the hypothesis that multiple divergences occurred simultaneously Types of Trees Multifurcating Polytomy

13 Trees Types of Trees Networks Only one path between any pair of nodes More than one path between any pair of nodes

14 A shorthand for trees: the Newick format 1 2 3 4 5 6 (((1,2),((3,4),5)),6) 1 2 3 4 ((1,2),(3,4))

15 Comments on Trees Trees give insights into underlying data Identical trees can appear differently depending upon the method of display Information maybe lost when creating the tree. The tree is not the underlying data.

16 Different kinds of trees can be used to depict different aspects of evolutionary history 1.Cladogram: simply shows relative recency of common ancestry 2.Additive trees: a cladogram with branch lengths, also called phylograms and metric trees 3. Ultrametric trees: (dendograms) special kind of additive tree in which the tips of the trees are all equidistant from the root 5 4 3 1 3 7 3 2 1 1 1 1 1 2 3 1 1 1 3

17 Making trees according to morphological features Ridley New Scientist (Dec. 1983) 100, 647-51

18 A - GCTTGTCCGTTACGAT B – ACTTGTCTGTTACGAT C – ACTTGTCCGAAACGAT D - ACTTGACCGTTTCCTT E – AGATGACCGTTTCGAT F - ACTACACCCTTATGAG Given a multiple alignment, how do we construct the tree? ?

19 Distance methods General Method: Evolutionary distances are computed for all pairs of taxa. A phylogenetic tree is constructed by considering the relationships among these distance data (fitting a tree to the matrix). Logic: Evolutionary distance is a tree metric and hence defines a tree Methods we’ll talk about UPGMA ( Unweighted Pair Group Method with Arithmetic Mean ) Neighbor Joining

20 Distance methods

21 Metric distances must obey 4 rules: Non-negativityd(a,b) >= 0 Symmetryd(a,b) = d(b,a) Triangle Inequalityd(a,c) <= d(a,b) + d(b,c) Distinctness d(a,b) = 0 iff a = b Ultrametric Trees 1 1 1 1 2 3 1 1 1 4 1 abc

22 Construction of a distance tree using clustering with the Unweighted Pair Group Method with Arithmatic Mean (UPGMA) A B C D E B 2 C 4 4 D 6 6 6 E 6 6 6 4 F 8 8 8 8 8 From http://www.icp.ucl.ac.be/~opperd/private/upgma.html A - GCTTGTCCGTTACGAT B – ACTTGTCTGTTACGAT C – ACTTGTCCGAAACGAT D - ACTTGACCGTTTCCTT E – AGATGACCGTTTCGAT F - ACTACACCCTTATGAG First, construct a distance matrix:

23 First round dist(A,B),C = (distAC + distBC) / 2 = 4 dist(A,B),D = (distAD + distBD) / 2 = 6 dist(A,B),E = (distAE + distBE) / 2 = 6 dist(A,B),F = (distAF + distBF) / 2 = 8 A B C D E B 2 C 4 4 D 6 6 6 E 6 6 6 4 F 8 8 8 8 8 A,B C D E C 4 D 6 6 E 6 6 4 F 8 8 8 8 UPGMA Choose the most similar pair, cluster them together and calculate the new distance matrix.

24 A,B C D E C 4 D 6 6 E 6 6 4 F 8 8 8 8 C D,E C 4 6 6 F 8 8 8 Second round Third round UPGMA

25 AB,C D,E 6 F 8 8 ABC,DE F 8 Fourth round Fifth round UPGMA Note the this method identifies the root of the tree.

26

27 http://www.genpat.uu.se/mtDB/ A tree of human mitochondria sequences The mitochondrial genome has 16,500 base-pairs. In 2000, Gyllensten and colleagues sequenced the mitochondrial genomes of 53 people of diverse geographical, racial and linguistic backgrounds. A molecular clock seems to hold the divergence of these sequences at a rate of 1.7x10 -8 substitutions per site per year. Ingman, M., Kaessmann, H., P ää bo, S. & Gyllensten, U. (2000) Nature 408: 708-713.

28 The deepest branches lead exclusively to sub-Saharan mtDNAs, with the second branch containing both Africans and non-Africans. sub-Sahara mtDNA A tree of 86 mitochondrial sequences. Downloaded from http://www.genpat.uu.se/mtDB/sequences.html and analyzed using MEGA, method: UPGMA

29 Rooting the tree with an outgroup Ingman, M., Kaessmann, H., P ää bo, S. & Gyllensten, U. (2000) Nature 408: 708-713. Root Outgroup

30 Phylogeny based upon the molecular clock Evidence for a human mitochondrial origin in Africa: African sequence diversity is twice as large as that of non-African Gyllensten and colleagues estimate that the divergence of Africans and non-Africans occurred 52,000 to 28,000 years ago. Ingman, M., Kaessmann, H., P ää bo, S. & Gyllensten, U. (2000) Nature 408: 708-713.

31 The UPGMA clustering method is very sensitive to unequal evolutionary rates (assumes that the evolutionary rate is the same for all branches). Clustering works only if the data are ultrametric Ultrametric distances are defined by the satisfaction of the 'three-point condition'. UPGMA assumes a molecular clock A B C For any three taxa, the two greatest distances are equal. The three-point condition:

32 A B C D E B 5 C 4 7 D 7 10 7 E 6 9 6 5 F 8 11 8 9 8 UPGMA fails when rates of evolution are not constant A tree in which the evolutionary rates are not equal From http://www.icp.ucl.ac.be/~opperd/private/upgma.html (Neighbor joining will get the right tree in this case.)

33 Neighbors A B C D a b x c d A and B are neighbors because they are connected through a single internal node. C and D are also neighbors, but A and D are not neighbors.

34 The Four Point Condition A B C D d AC + d BD = d AD + d BC = a + b + c + d + 2x = d AB + d CD + 2x a b x c d d AB + d CD < d AC + d BD d AB + d CD < d AD + d BC The 4-point condition can be used to identify neighbors. Basically states that neighbors are closer than non-neighbors. neighbors non-neighbors

35 a b c d Start with a star (no hierarchical structure) Neighbor Joining An algorithm for finding the shortest tree The length of the tree Pair-wise distances Number of OTUs

36 Neighbor Joining (Saitou and Nei, 1987)

37 Neighbor Joining (Saitou and Nei, 1987)

38 Neighbor Joining (Saitou and Nei, 1987)

39 Neighbor Joining (Saitou and Nei, 1987)

40 Character state methods MAXIMUM PARSIMONY Logic: Examine each column in the multiple alignment of the sequences. Examine all possible trees and choose among them according to some optimality criteria Method we’ll talk about Maximum parsimony

41 Maximum Parsimony Simpler hypotheses are preferable to more complicated ones and that as hoc hypotheses should be avoided whenever possible (Occam’s Razor). Thus, find the tree that requires the smallest number of evolutionary changes. 0123456789012345 W - ACTTGACCCTTACGAT X – AGCTGGCCCTGATTAC Y – AGTTGACCATTACGAT Z - AGCTGGTCCTGATGAC W Y X Z

42 123456789012345678901 Mouse CTTCGTTGGATCAGTTTGATA Rat CCTCGTTGGATCATTTTGATA Dog CTGCTTTGGATCAGTTTGAAC Human CCGCCTTGGATCAGTTTGAAC ------------------------------------ Invariant * * ******** ***** Variant ** * * ** ------------------------------------ Informative ** ** Non-inform. * * Start by classifying the sites: Maximum Parsimony

43 123456789012345678901 Mouse CTTCGTTGGATCAGTTTGATA Rat CCTCGTTGGATCATTTTGATA Dog CTGCTTTGGATCAGTTTGAAC Human CCGCCTTGGATCAGTTTGAAC ** * Mouse Rat Dog Human Mouse Rat Dog Human MouseRat Dog Human Mouse Rat Dog Human Mouse Rat Dog Human MouseRat Dog Human Mouse Rat Dog Human Mouse Rat Dog Human MouseRat Dog Human Site 5: G G T C T T T C T C G G T G T C G T G C T C T G G T T T T G G C C C T G G G C C G G G G C T G G T G C C G T Site 2: Site 3:

44 123456789012345678901 Mouse CTTCGTTGGATCAGTTTGATA Rat CCTCGTTGGATCATTTTGATA Dog CTGCTTTGGATCAGTTTGAAC Human CCGCCTTGGATCAGTTTGAAC Informative ** ** Mouse Rat Dog Human Mouse Rat Dog Human MouseRat Dog Human 3 0 1 Maximum Parsimony

45 The situation is more complicated when there are more than four units. C T G T A A (CT) (GT) (AGT) T (AT) T T A A G C T A (AG) (TAG) (TAGC) Maximum Parsimony Problems with maximum parsimony: Only uses “informative” sites Long-branches “attract”

46

47

48

49 Maximum Likelihood Analysis Same as Maximum Parsimony except rates of nucleic acids substitutions are not considered to have equal probability. All possible unrooted trees are evaluated. (Same for Parsimony) Each column of the alignment is processed. (Same for Parsimony) The transition of A -> T will have a different probability than the transition from G -> C Start with a frequency distribution table that specifies the probability of one base being substituted for another base. See probabilities of nucleotide substitution. (Table 6.5 pg 275) Probability that unrooted tree predicts each column of the alignment is calculated. Probabilities for each column are summed together for each tree. The unrooted tree with the highest probability is chosen.

50 Maximum Likelihood Example Four sequences are compared (w, x, y and z) All unrooted trees are shown In this example we will examine the first unrooted tree.

51 Maximum Likelihood Example Continued L(Tree x) = L0 * L1 * L2 * L3 * L4 * L5 * L6 L0 base probability of nucleotide at 0 (0.25) L1 probability of nucleotide changing from value at 0 to value at 1. L2 probability of nucleotide changing from value at 0 to value at 1. L3 probability of nucleotide changing from value at 1 to value at 3 (T). L4, L5, L6 probability of nucleotide changing to value at leaf.

52 Maximum Likelihood Example Continued There are 64 likelihood trees to evaluate. (number of bases) ^ (number of internal nodes) or 4^3. We will show evaluation TTG against the first unrooted tree for column TTAG Determine values for L0, … L6. Values are determined by looking up probabilities in transition probability table. Probability of L2 is T->G Probability of L5 is G -> A Probability of L3 is T->T Determine combined probability L0 * L1 * L2 * … * L6

53 Maximum Likelihood Example Continued Determine probability for combination TGG Determine probability for the other 62 combinations. Sum all the trees together. L(Tree) = (LTree1) + L(Tree2) + … + L(Tree64) Move to next column and repeat the same procedure. Once all columns are complete sum all the probabilities. This is the likelihood of the first unrooted tree. Continue this process for the other unrooted trees. Pick the unrooted tree with the highest probability. This is the most likely unrooted tree.

54

55

56 EVOLUCIÓN IN VITRO POR INTERMEDIO DE PCR

57

58

59

60

61

62 Conclusion Phylogenetic Prediction can be used for more than Evolutionary Distance –Verification of Taxonomy –Identification of unknown –Techniques work for genetic and non genetic data (Fatty Acid). Use multiple methods for verification –Pick at least two different types of methods from Parsimony, Distance and Likelihood. –If the analysis is in agreement there is a higher level of confidence that the analysis is correct.

63

64 BOOTSTRAPING How confident are we in this tree?

65 A statistical method that can be used to place confidence intervals on phylogenies Bootstrapping

66

67 human_myoglobin -GLSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHL... pig_myoglobin -GLSDGEWQLVLNVWGKVEADVAGHGQEVLIRLFKGHPETLEKFDKFKHL... horse_myoglobin -GLSDGEWQQVLNVWGKVEADIAGHGQEVLIRLFTGHPETLEKFDKFKHL... common_seal_myoglobin -GLSEGEWQLVLNVWGKVEADLAGHGQDVLIRLFKGHPETLEKFDKFKHL... sperm_whale_myoglobin MVLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHL... sea_hare_myoglobin -SLSAAEADLAGKSWAPVFANKDANGDAFLVALFEKFPDSANFFADFKG-... Pick with replacement human_myoglobin LQKWDQKHNVHTEFGAEELQGDKLSWKKLDQGKKVVKKELGLDEDEWLGE pig_myoglobin LQKWDQKHNVHTEFGAEELQGDKLSWKKLDQGKKVVKKELGLDEDEWLGE horse_myoglobin LQKWDQTHNVHTEFGAEELQGDKLSWKTLDQGKKVVTKELGQDEDEWLGE common_seal_myoglobin LQKWEQKHNVHTEFGADELQGDKLSWKKLDQGKKVVKKELGLDEDDWLGE - sperm_whale_myoglobin LQRWEQKHHVHTEFAADELQGDKLSWKKLDQGRKVVKKELGLDEDDWLGE sea_hare_myoglobin LDDWADENKSNSNFAAAELDANFASAPELNDGDKVAEKFAALNNAAWAAN Resampling from the Data Original data: Resampled data number 1: Repeat 99 more time (or 999,999..)

68 Chimpanzee Gorilla Human Orang-utan Gibbon Given the following tree, estimate the confidence of the two internal branches

69 Chimpanzee Gorilla Human Orang-utan Gibbon 41/100 28/100 31/100 Chimpanzee Gorilla Human Orang-utan Gibbon 100 41 Estimating Confidence from the Resamplings 1. Of the 100 trees: In 100 of the 100 trees, gibbon and orang-utan are split from the rest. In 41 of the 100 trees, chimp and gorilla are split from the rest. 2. Upon the original tree we superimpose bootstrap values:

70

71

72

73

74 THE TREE OF LIFE Relationships between 16S ribosomal RNAs Distant relationships Close relationships bacteria eukaryotes archaea

75 The three domains of Life as identified by phylogenetic analysis of the highly conserved 16S ribosomal RNA (Woese and Fox 1977) 16S ribosomal RNA

76 Where is the root of the tree of life? (by definition there is no outgroup)

77 An ancient gene duplication can root a tree Graur & Li. Fundamentals of Molecular Evolution (1999) Gene duplication Speciation of 3 and 1-2 Speciation of 1 and 2 Root of 1,2,3 Outgroups for A 2 Outgroups for A 1

78 Graur & Li. Fundamentals of Molecular Evolution (1999) The root of the tree of life as inferred from Ef-Tu and EF-G Both trees show Archaea and Eucarya as sister taxa

79 Mn-dependent transcriptional regulator Horizontal Gene Transfer (Tatusov, 1996) eubacteria archae

80 What is the origin of the mitochondria? http://www.mitomap.org/

81 The endosymbiotic theory The evidence: Both mitochondria and chloroplasts can arise only from preexisting mitochondria and chloroplasts. They cannot be formed in a cell that lacks them because nuclear genes encode only some of the proteins of which they are made. Both mitochondria and chloroplasts have their own genome. Both genomes consist of a single circular molecule of DNA. There are no histones associated with the DNA.

82 The Mitochondria sit with the proteobacteria in the tree of life Gray MW Nature. 1998 Nov 12;396(6707):109-10. mitochondrial (MT) Small-subunit (SSU) ribosomal RNA tree

83 mitochondrion chloroplast Lack mitochondria (?)

84 Andersson SG Nature 1998 Nov 12;396(6707):133-40 The genome sequence of Rickettsia prowazekii and the origin of mitochondria.

85 Mitochondrial ribosomal proteins are most similar to those of R. prowazekii Andersson SG Nature 1998 Nov 12;396(6707):133-40

86 Mitochondrial proteins involved in ATP synthesis are most similar to those of R. prowazekii Andersson SG Nature 1998 Nov 12;396(6707):133-40

87 Mitochondria derive from  -Purple bacteria Chloroplasts derive from cyanobacteria Graur & Li. Fundamentals of Molecular Evolution (1999)

88 The tree of life with mitochondria and chloroplast endosymbiotic events (Doolittle, 1999)

89 Horizontal transfer is a dominant feature of the “tree” of life (Doolittle, 1999)


Download ppt "Reconstrucción Filogenética. Una manera simple de entender la evolución…"

Similar presentations


Ads by Google