Based on lectures by C-B Stewart, and by Tal Pupko Phylogenetic Analysis based on two talks, by Caro-Beth Stewart, Ph.D. Department of Biological Sciences.

Slides:



Advertisements
Similar presentations
Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Advertisements

Molecular Phylogeny Analysis, Part II. Mehrshid Riahi, Ph.D. Iranian Biological Research Center (IBRC), July 14-15, 2012.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
Phylogenetic analysis
Reading Phylogenetic Trees Gloria Rendon NCSA November, 2008.
Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)
Phylogenetic Analysis
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.
Summer Bioinformatics Workshop 2008 Comparative Genomics and Phylogenetics Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State.
Phylogenetic reconstruction
Reconstructing and Using Phylogenies
Molecular Evolution Revised 29/12/06
Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.
BIOE 109 Summer 2009 Lecture 4- Part II Phylogenetic Inference.
UPGMA and FM are distance based methods. UPGMA enforces the Molecular Clock Assumption. FM (Fitch-Margoliash) relieves that restriction, but still enforces.
1. 2 Rooting the tree and giving length to branches.
Phylogenetic reconstruction
In addition to maximum parsimony (MP) and likelihood methods, pairwise distance methods form the third large group of methods to infer evolutionary trees.
Phylogenetic reconstruction
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Phylogenetic Analysis
Chapter 2 Opener How do we classify organisms?. Figure 2.1 Tracing the path of evolution to Homo sapiens from the universal ancestor of all life.
Building Phylogenies Parsimony 2.
Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood.
Phylogenetic trees Sushmita Roy BMI/CS 576
Phylogenetic analyses Kirsi Kostamo. The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among.
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Phylogenetic Analysis. 2 Introduction Intension –Using powerful algorithms to reconstruct the evolutionary history of all know organisms. Phylogenetic.
Terminology of phylogenetic trees
Molecular phylogenetics
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
Phylogentic Tree Evolution Evolution of organisms is driven by Diversity  Different individuals carry different variants of.
Models of sequence evolution GTR HKY Jukes-Cantor Felsenstein K2P Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
BINF6201/8201 Molecular phylogenetic methods
Phylogenetics and Coalescence Lab 9 October 24, 2012.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Phylogenetic Inference Data Optimality Criteria Algorithms Results Practicalities BIO520 BioinformaticsJim Lund Reading: Ch8.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
 Read Chapter 4.  All living organisms are related to each other having descended from common ancestors.  Understanding the evolutionary relationships.
Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson.
Introduction to Phylogenetic Trees
Evolutionary Biology Concepts Molecular Evolution Phylogenetic Inference BIO520 BioinformaticsJim Lund Reading: Ch7.
Introduction to Phylogenetics
Calculating branch lengths from distances. ABC A B C----- a b c.
Phylogenetic Analysis Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics Figures from Higgs & Attwood.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Introduction to Phylogenetic trees Colin Dewey BMI/CS 576 Fall 2015.
Maximum Likelihood Given competing explanations for a particular observation, which explanation should we choose? Maximum likelihood methodologies suggest.
Phylogeny Ch. 7 & 8.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Part 9 Phylogenetic Trees
Ch. 26 Phylogeny and the Tree of Life. Opening Discussion: Is this basic “tree of life” a fact? If so, why? If not, what is it?
5.4 Cladistics The images above are both cladograms. They show the statistical similarities between species based on their DNA/RNA. The cladogram on the.
Phylogenetic trees. 2 Phylogeny is the inference of evolutionary relationships. Traditionally, phylogeny relied on the comparison of morphological features.
PHYLOGENETIC ANALYSIS. Phylogenetics Phylogenetics is the study of the evolutionary history of living organisms using treelike diagrams to represent pedigrees.
What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: (1) Phylogeny inference or “tree building”
Bioinformatics Lecture 3 Molecular Phylogenetic By: Dr. Mehdi Mansouri Mehr 1395.
Phylogenetic basis of systematics
Inferring a phylogeny is an estimation procedure.
Phylogenetic Inference
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Phylogeny and the Tree of Life
Phylogeny and the Tree of Life
Reading Phylogenetic Trees
Phylogenetic Trees Jasmin sutkovic.
Presentation transcript:

Based on lectures by C-B Stewart, and by Tal Pupko Phylogenetic Analysis based on two talks, by Caro-Beth Stewart, Ph.D. Department of Biological Sciences University at Albany, SUNY and Tal Pupko, Ph.D. Faculty of Life Science Tel-Aviv University

Based on lectures by C-B Stewart, and by Tal Pupko What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference or “tree building” — the inference of the branching orders, and ultimately the evolutionary relationships, between “taxa” (entities such as genes, populations, species, etc.) 2.Character and rate analysis — using phylogenies as analytical frameworks for rigorous understanding of the evolution of various traits or conditions of interest

Based on lectures by C-B Stewart, and by Tal Pupko Ancestral Node or ROOT of the Tree Internal Nodes or Divergence Points (represent hypothetical ancestors of the taxa) Branches or Lineages Terminal Nodes A B C D E Represent the TAXA (genes, populations, species, etc.) used to infer the phylogeny Common Phylogenetic Tree Terminology

Based on lectures by C-B Stewart, and by Tal Pupko Phylogenetic trees diagram the evolutionary relationships between the taxa ((A,(B,C)),(D,E)) = The above phylogeny as nested parentheses Taxon A Taxon B Taxon C Taxon E Taxon D No meaning to the spacing between the taxa, or to the order in which they appear from top to bottom. This dimension either can have no scale (for ‘cladograms’), can be proportional to genetic distance or amount of change (for ‘phylograms’ or ‘additive trees’), or can be proportional to time (for ‘ultrametric trees’ or true evolutionary trees). These say that B and C are more closely related to each other than either is to A, and that A, B, and C form a clade that is a sister group to the clade composed of D and E. If the tree has a time scale, then D and E are the most closely related.

Based on lectures by C-B Stewart, and by Tal Pupko A few examples of what can be inferred from phylogenetic trees built from DNA or protein sequence data: Which species are the closest living relatives of modern humans? Did the infamous Florida Dentist infect his patients with HIV? What were the origins of specific transposable elements? Plus countless others…..

Based on lectures by C-B Stewart, and by Tal Pupko Which species are the closest living relatives of modern humans? Mitochondrial DNA, most nuclear DNA- encoded genes, and DNA/DNA hybridization all show that bonobos and chimpanzees are related more closely to humans than either are to gorillas. The pre-molecular view was that the great apes (chimpanzees, gorillas and orangutans) formed a clade separate from humans, and that humans diverged from the apes at least MYA. MYA Chimpanzees Orangutans Humans Bonobos Gorillas Humans Bonobos GorillasOrangutans Chimpanzees MYA

Based on lectures by C-B Stewart, and by Tal Pupko Did the Florida Dentist infect his patients with HIV? DENTIST Patient D Patient F Patient C Patient A Patient G Patient B Patient E Patient A Local control 2 Local control 3 Local control 9 Local control 35 Local control 3 Yes: The HIV sequences from these patients fall within the clade of HIV sequences found in the dentist. No From Ou et al. (1992) and Page & Holmes (1998) Phylogenetic tree of HIV sequences from the DENTIST, his Patients, & Local HIV-infected People:

Based on lectures by C-B Stewart, and by Tal Pupko A few examples of what can be learned from character analysis using phylogenies as analytical frameworks: When did specific episodes of positive Darwinian selection occur during evolutionary history? Which genetic changes are unique to the human lineage? What was the most likely geographical location of the common ancestor of the African apes and humans? Plus countless others…..

Based on lectures by C-B Stewart, and by Tal Pupko The number of unrooted trees increases in a greater than exponential manner with number of taxa (2N - 5)!! = # unrooted trees for N taxa C A B D A B C A D B E C A D B E C F

Based on lectures by C-B Stewart, and by Tal Pupko Inferring evolutionary relationships between the taxa requires rooting the tree: To root a tree mentally, imagine that the tree is made of string. Grab the string at the root and tug on it until the ends of the string (the taxa) fall opposite the root: A B C Root D A B C D Note that in this rooted tree, taxon A is no more closely related to taxon B than it is to C or D. Rooted tree Unrooted tree

Based on lectures by C-B Stewart, and by Tal Pupko Now, try it again with the root at another position: A B C Root D Unrooted tree Note that in this rooted tree, taxon A is most closely related to taxon B, and together they are equally distantly related to taxa C and D. C D Root Rooted tree A B

Based on lectures by C-B Stewart, and by Tal Pupko An unrooted, four-taxon tree theoretically can be rooted in five different places to produce five different rooted trees The unrooted tree 1: AC B D Rooted tree 1d C D A B 4 Rooted tree 1c A B C D 3 Rooted tree 1e D C A B 5 Rooted tree 1b A B C D 2 Rooted tree 1a B A C D 1 These trees show five different evolutionary relationships among the taxa!

Based on lectures by C-B Stewart, and by Tal Pupko By outgroup: Uses taxa (the “outgroup”) that are known to fall outside of the group of interest (the “ingroup”). Requires some prior knowledge about the relationships among the taxa. The outgroup can either be species (e.g., birds to root a mammalian tree) or previous gene duplicates (e.g.,  -globins to root  -globins). There are two major ways to root trees: A B C D By midpoint or distance: Roots the tree at the midway point between the two most distant taxa in the tree, as determined by branch lengths. Assumes that the taxa are evolving in a clock-like manner. This assumption is built into some of the distance-based tree building methods. outgroup d (A,D) = = 18 Midpoint = 18 / 2 = 9

Based on lectures by C-B Stewart, and by Tal Pupko x = C A B D AD B E C A D B E C F (2N - 3)!! = # unrooted trees for N taxa Each unrooted tree theoretically can be rooted anywhere along any of its branches

Based on lectures by C-B Stewart, and by Tal Pupko Molecular phylogenetic tree building methods: Are mathematical and/or statistical methods for inferring the divergence order of taxa, as well as the lengths of the branches that connect them. There are many phylogenetic methods available today, each having strengths and weaknesses. Most can be classified as follows: COMPUTATIONAL METHOD Clustering algorithmOptimality criterion DATA TYPE Characters Distances PARSIMONY MAXIMUM LIKELIHOOD UPGMA NEIGHBOR-JOINING MINIMUM EVOLUTION LEAST SQUARES

Based on lectures by C-B Stewart, and by Tal Pupko Types of data used in phylogenetic inference: Character-based methods: Use the aligned characters, such as DNA or protein sequences, directly during tree inference. Taxa Characters Species AATGGCTATTCTTATAGTACG Species BATCGCTAGTCTTATATTACA Species CTTCACTAGACCTGTGGTCCA Species DTTGACCAGACCTGTGGTCCG Species ETTGACCAGTTCTCTAGTTCG Distance-based methods: Transform the sequence data into pairwise distances (dissimilarities), and then use the matrix during tree building. A B C D E Species A Species B Species C Species D Species E Example 1: Uncorrected “p” distance (=observed percent sequence difference) Example 2: Kimura 2-parameter distance (estimate of the true number of substitutions between taxa)

Based on lectures by C-B Stewart, and by Tal Pupko Exact algorithms: "Guarantee" to find the optimal or "best" tree for the method of choice. Two types used in tree building: Exhaustive search: Evaluates all possible unrooted trees, choosing the one with the best score for the method. Branch-and-bound search: Eliminates the parts of the search tree that only contain suboptimal solutions. Heuristic algorithms: Approximate or “quick-and-dirty” methods that attempt to find the optimal tree for the method of choice, but cannot guarantee to do so. Heuristic searches often operate by “hill-climbing” methods. Computational methods for finding optimal trees:

Based on lectures by C-B Stewart, and by Tal Pupko Exact searches become increasingly difficult, and eventually impossible, as the number of taxa increases: (2N - 5)!! = # unrooted trees for N taxa A D B E C C A B D A B C A D B E C F

Based on lectures by C-B Stewart, and by Tal Pupko Heuristic search algorithms are input order dependent and can get stuck in local minima or maxima Rerunning heuristic searches using different input orders of taxa can help find global minima or maxima Search for global minimum GLOBAL MAXIMUM GLOBAL MINIMUM local minimum local maximum Search for global maximum GLOBAL MAXIMUM GLOBAL MINIMUM

Based on lectures by C-B Stewart, and by Tal Pupko COMPUTATIONAL METHOD Clustering algorithmOptimality criterion DATA TYPE Characters Distances PARSIMONY MAXIMUM LIKELIHOOD UPGMA NEIGHBOR-JOINING MINIMUM EVOLUTION LEAST SQUARES Classification of phylogenetic inference methods

Based on lectures by C-B Stewart, and by Tal Pupko Parsimony methods: Optimality criterion: The ‘most-parsimonious’ tree is the one that requires the fewest number of evolutionary events (e.g., nucleotide substitutions, amino acid replacements) to explain the sequences. Advantages: Are simple, intuitive, and logical (many possible by ‘pencil-and-paper’). Can be used on molecular and non-molecular (e.g., morphological) data. Can tease apart types of similarity (shared-derived, shared-ancestral, homoplasy) Can be used for character (can infer the exact substitutions) and rate analysis. Can be used to infer the sequences of the extinct (hypothetical) ancestors. Disadvantages: Are simple, intuitive, and logical (derived from “Medieval logic”, not statistics!) Can be fooled by high levels of homoplasy (‘same’ events). Can become positively misleading in the “Felsenstein Zone”: [See Stewart (1993) for a simple explanation of parsimony analysis, and Swofford et al. (1996) for a detailed explanation of various parsimony methods.]

Based on lectures by C-B Stewart, and by Tal Pupko Branch and Bound Tal Pupko, Tel-Aviv University

Based on lectures by C-B Stewart, and by Tal Pupko There are many trees.., We cannot go over all the trees. We will try to find a way to find the best tree. There are approximate solutions… But what if we want to make sure we find the global maximum. There is a way more efficient than just go over all possible tree. It is called BRANCH AND BOUND and is a general technique in computer science, that can be applied to phylogeny.

Based on lectures by C-B Stewart, and by Tal Pupko BRANCH AND BOUND To exemplify the BRANCH AND BOUND (BNB) method, we will use an example not connected to evolution. Later, when the general BNB method is understood, we will see how to apply this method to finding the MP tree. We will present the traveling salesperson path problem (TSP).

Based on lectures by C-B Stewart, and by Tal Pupko THE TSP PROBLEM (especially adapted to israel). A guard has to visit n check-points whose location on a map is known. The problem is to find the shortest path that goes through all points exactly once (no need to come back to starting point). Naïve approach: (say for 5 points). You have 5 starting points. For each such starting point you have 4 “next steps”. For each such combination of starting point and first step, you have 3 possible second steps, etc. All together we have 5*4*3*2*1 Possible solutions = 5!.

Based on lectures by C-B Stewart, and by Tal Pupko THE TSP TREE

Based on lectures by C-B Stewart, and by Tal Pupko THE SHP NAÏVE APPROACH Each solution can be represented as a permutation: (1,2,3,4,5) (1,2,3,5,4) (1,2,4,3,5) (1,2,4,5,3) (1,2,5,3,4) … We can go over the list and find the one giving the highest score.

Based on lectures by C-B Stewart, and by Tal Pupko THE SHP NAÏVE APPROACH However, for 15 points, for example, there are 1,307,674,368,000 The rate of increase of the number of solutions is too fast for this to be practical.

Based on lectures by C-B Stewart, and by Tal Pupko A TSP GREEDY HEURISTIC Start from a random point. Go to the closest point. Go to its closest point, etc.etc. This approach doesn’t work so well… (but a reasonably close heuristic, based on simulated annealing, will be presented in a couple of lectures.)

Based on lectures by C-B Stewart, and by Tal Pupko BNB SOLUTION TO SHP Shortest path found so far = 15 Score here already 16: no point in expanding the rest of the subtree

Based on lectures by C-B Stewart, and by Tal Pupko Back to finding the MP tree Finding the MP tree is NP-Hard (will see shortly)… BNB helps, though it is still exponential…

Based on lectures by C-B Stewart, and by Tal Pupko The MP search tree is added to branch is added to branch 2. There are 5 branches

Based on lectures by C-B Stewart, and by Tal Pupko The MP search tree 4 is added to branch

Based on lectures by C-B Stewart, and by Tal Pupko MP-BNB 4 is added to branch Best (minimum) value = 52

Based on lectures by C-B Stewart, and by Tal Pupko MP-BNB 4 is added to branch Best record = 52

Based on lectures by C-B Stewart, and by Tal Pupko MP-BNB 4 is added to branch Best record = 52

Based on lectures by C-B Stewart, and by Tal Pupko MP-BNB Best record = 52

Based on lectures by C-B Stewart, and by Tal Pupko MP-BNB Best record = 52

Based on lectures by C-B Stewart, and by Tal Pupko MP-BNB Best record =

Based on lectures by C-B Stewart, and by Tal Pupko MP-BNB Best record =

Based on lectures by C-B Stewart, and by Tal Pupko MP-BNB Best record =

Based on lectures by C-B Stewart, and by Tal Pupko MP-BNB Best record =

Based on lectures by C-B Stewart, and by Tal Pupko MP-BNB Best TREE. MP score = 42 Total # trees visited: 14

Based on lectures by C-B Stewart, and by Tal Pupko Order of Evaluation Matters Evaluate all 3 first Total tree visited: 9 The bound after searching this subtree will be 42.

Based on lectures by C-B Stewart, and by Tal Pupko And Now Maximum Parsimony is Computationally Intractable Felsenstein’s Dynamic Programming Algorithm for tiny maximum likelihood and more, time permitting