Presentation is loading. Please wait.

Presentation is loading. Please wait.

What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: (1) Phylogeny inference or “tree building”

Similar presentations


Presentation on theme: "What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: (1) Phylogeny inference or “tree building”"— Presentation transcript:

1 What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: (1) Phylogeny inference or “tree building” the inference of the branching orders, and ultimately the evolutionary relationships, between “taxa” (entities such as genes, populations, species, etc.) (2) Analyzing change in traits (phenotypes, genes) using phylogenies as analytical frameworks for rigorous understanding of the evolution of various traits or conditions of interest Germline and somatic evolution included!

2 Uses of Phylogenetics in the Study of Health & Disease (1)Evolutionary history of humans, between and within species (2)Analysis of evolution of phenotypic and genetic traits in humans, especially human-specific traits - evolved when, where, why, how (3)Evolution of parasites and pathogens, in relation to their hosts (us) (4)Evolution of cancer cell lineages, and somatic evolution more generally. (5) Study of adaptation in humans and other taxa

3 What you will learn in this lecture (1)About phylogenies, terminology, what they are, how they work, ‘tree thinking’ (2) How to infer phylogenies (3) How we can use phylogenies to answer questions related to human adaptation, health and disease

4 Ancestral Node or ROOT of the Tree Internal Nodes or Divergence Points (represent hypothetical ancestors of the taxa) Branches or Lineages Terminal Nodes A B C D E Represent the TAXA (genes, populations, species, etc.) used to infer the phylogeny Common Phylogenetic Tree Terminology

5 Phylogenetic trees diagram the evolutionary relationships between the taxa ((A,(B,C)),(D,E)) = The above phylogeny as nested parentheses Taxon A Taxon B Taxon C Taxon E Taxon D No meaning to the spacing between the taxa, or to the order in which they appear from top to bottom. This dimension either can have no scale (for ‘cladograms’), can be proportional to genetic distance or amount of change (for ‘phylograms’ or ‘additive trees’), or can be proportional to time (for ‘ultrametric trees’ or true evolutionary trees). These say that B and C are more closely related to each other than either is to A, and that A, B, and C form a clade that is a sister group to the clade composed of D and E. If the tree has a time scale, then D and E are the most closely related.

6 Taxon A Taxon B Taxon C Taxon D 1 1 1 6 3 5 genetic change Taxon A Taxon B Taxon C Taxon D time Taxon A Taxon B Taxon C Taxon D no meaning Three types of trees Cladogram Phylogram Ultrametric tree All show the same evolutionary relationships, or branching orders, between the taxa.

7 Completely unresolved or "star" phylogeny Partially resolved phylogeny Fully resolved, bifurcating phylogeny AAA B BB C C C E E E D DD Polytomy or multifurcationA bifurcation A major goal of phylogeny inference is to resolve the branching orders of lineages in evolutionary trees: RESOLUTION AND SUPPORT for nodes

8 There are three possible unrooted trees for four taxa (A, B, C, D) AC B D Tree 1 AB C D Tree 2 AB D C Tree 3 Phylogenetic tree building (or inference) methods are aimed at discovering which of the possible unrooted trees is "correct". We would like this to be the “true” biological tree — that is, one that accurately represents the evolutionary history of the taxa. However, we must settle for discovering the computationally correct or optimal tree for the phylogenetic method of choice.

9 The number of unrooted trees increases in a greater than exponential manner with number of taxa (2N - 5)!! = # unrooted trees for N taxa C A B D A B C A D B E C A D B E C F

10 Inferring evolutionary relationships between the taxa requires rooting the tree: To root a tree mentally, imagine that the tree is made of string. Grab the string at the root and tug on it until the ends of the string (the taxa) fall opposite the root: A B C Root D A B C D Note that in this rooted tree, taxon A is no more closely related to taxon B than it is to C or D. Rooted tree Unrooted tree TIME

11 Now, try it again with the root at another position: A B C Root D Unrooted tree Note that in this rooted tree, taxon A is most closely related to taxon B, and together they are equally distantly related to taxa C and D. C D Root Rooted tree A B TIME

12 An unrooted, four-taxon tree theoretically can be rooted in five different places to produce five different rooted trees The unrooted tree 1: AC B D Rooted tree 1d C D A B 4 Rooted tree 1c A B C D 3 Rooted tree 1e D C A B 5 Rooted tree 1b A B C D 2 Rooted tree 1a B A C D 1 These trees show five different evolutionary relationships among the taxa!

13 All of these rearrangements show the same evolutionary relationships between the taxa B A C D A B D C B C A D B D A C B A C D Rooted tree 1a B A C D A B C D

14 By outgroup: Uses taxa (the “outgroup”) that are known to fall outside of the group of interest (the “ingroup”). Requires some prior knowledge about the relationships among the taxa. Main way to root trees: outgroup

15 Molecular phylogenetic tree building methods: Are mathematical and/or statistical methods for inferring the divergence order of taxa, as well as the lengths of the branches that connect them. There are many phylogenetic methods available today, each having strengths and weaknesses. Most can be classified as follows: COMPUTATIONAL METHOD Clustering algorithmOptimality criterion DATA TYPE Characters Distances PARSIMONY MAXIMUM LIKELIHOOD UPGMA NEIGHBOR-JOINING MINIMUM EVOLUTION LEAST SQUARES

16 Types of data used in phylogenetic inference: Character-based methods: Use the aligned characters, such as DNA or protein sequences, directly during tree inference. Taxa Characters Species AATCGCTAGTCCTATAGTGCA Species BATCGCTAGTCCTATATTGCA Species CTTCGCTAGACCTGTGGTCCA Species DTTGACCAGACCTGTGGTCCG Species ETTGACCAGTTCTGTGGTCCG ETC ETC

17 Similarity vs. Evolutionary Relationship: Similarity and relationship are not the same thing, even though evolutionary relationship is inferred from certain types of similarity. Similar: having likeness or resemblance (an observation) Related: genetically connected (an historical fact) Two taxa can be most similar without being most closely-related: Taxon A Taxon B (eg HUMANS!) Taxon C Taxon D 1 1 1 6 3 5 C is more similar in sequence to A (d = 3) than to B (d = 7), but C and B are most closely related (that is, C and B shared a common ancestor more recently than either did with A).

18 Main computational approach: Optimality approaches: Use either character or distance data. First define an optimality criterion (minimum branch lengths, fewest number of events, highest likelihood), and then use a specific algorithm for finding trees with the best value for the objective function. Can identify many equally optimal trees, if such exist. Warning: Finding an optimal tree is not necessarily the same as finding the "true” tree. Random data will give you an ‘optimal’ (best ) tree!

19 Parsimony methods: Optimality criterion: The ‘most-parsimonious’ tree is the one that requires the fewest number of evolutionary events (e.g., nucleotide substitutions, amino acid replacements) to explain the sequences. Advantages: Are simple, intuitive, and logical (many possible by ‘pencil-and-paper’). Can be used on molecular and non-molecular (e.g., morphological) data. Can be used for character (can infer the exact substitutions) and rate analysis. Can be used to infer the sequences of the extinct (hypothetical) ancestors. Disadvantages: Not explicitly statistical Can be fooled by high levels of parallel evolution

20 Use parsimony to infer the optimal (best) tree Character-based methods: Use the aligned characters, such as DNA or protein sequences, directly during tree inference. Taxa Characters Species AATCG CTAGACCTATAGTGCA Species BATCG CTAGACCTATATTGCA Species CTTCG CTAGACCTGTGGTCCA Species DTTGA CCAGACCTGTGGTCCG Species ETTGA CCAGTTGTGTGGTCCG OUTGROUPTTAC CCATTTGTGTCCTCCG Infer maximum parsimony tree using first four characters Quality of trees (how likely it is that they reflect the one True Tree) can be evaluated in various ways (random data will give you a low-quality ‘best’ tree)

21 We can Statistically Compare alternative trees, corresponding to specific biological hypotheses of the history of some set of lineages

22 Timescales on trees: molecular clocks % genetic divergence Time since divergence (Myr) 100% 50% 75% 25% 15003006009001200 Fibrinopeptides Hemoglobin Cytochrome c Histone IV Why such different profiles? Variation in mutation rate? Variation in selection. Genes coding for some molecules under very strong stabilizing selection.

23 Dates for calibrating molecular clocks can come from geology, fossils, or historical data From known ages of islands, for two genes

24 Calibrating using fossil data chimps humans whales hippos 56 mya 60 substitutions 6 substitutions

25 Calibrating from known dates of the ages of samples: for very fast-evolving taxa such as HIV

26 Uses of Phylogenetics in the Study of Health & Disease (1)Evolutionary history of humans, between and within species (2)Analysis of evolution of phenotypic and genetic traits in humans, especially human-specific traits - evolved when, where, why, how (3)Taxonomy and evolution of parasites and pathogens, and evolution in relation to their hosts (4)Evolution of cancer cell lineages, and somatic evolution more generally. (5)Study of adaptation in humans and other taxa, via analysis of divergence and convergence

27 VIRUS - what IS it? Sequence it’s DNA and relate sequence to known viruses Evolution of SIV and HIV viruses: multiple transfers to humans, from chimps and from green monkeys EMERGING VIRUSES - THE GREATEST KNOWN HEALTH THREAT TO HUMANITY

28 SARS (severe acute respiratory syndrome) what causes it and where did it come from?

29

30 HIV phylogeny within humans in different regions: Haiti as stepping stone to North America

31 HIV evolves very rapidly WITHIN hosts, as a result of interactions with the immune system Can do phylogenetics: -Pathogens within individuals, -Pathogens between Individuals (eg in different or same regions) How originate? From other species? How spread? How does resistance to Antibiotics evolve in pathogens, & resistance to chemotherapeutic agents evolve in cancer?

32

33 Cancer evolves genetically in the body during carcinogenesis, allowing the inference of ‘oncogenetic trees’ Cytogenetic data: Gains and losses of Chromosomal regions During evolution of cancers; Lose tumor suppressor gene copies, gain Oncogene copies Involves losses of heterozygosity and losses of imprinting

34

35

36

37 Cancer Evolutionary Phylogenomics Compare primary cancer with metastatic tumors

38 What you learned in this lecture (1)About phylogenies, terminology, what they are, how they work, ‘tree thinking’ (2) How to infer and evaluate phylogenies (3) How to use phylogenies to answer questions related to human adaptation, health and disease (viruses, cancer, etc) (4) How to THINK in terms of evolutionary trees (historical patterns of evolution), within and between species


Download ppt "What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: (1) Phylogeny inference or “tree building”"

Similar presentations


Ads by Google