Phylogenetic Analysis. 2 Introduction Intension –Using powerful algorithms to reconstruct the evolutionary history of all know organisms. Phylogenetic.

Slides:



Advertisements
Similar presentations
Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Advertisements

Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
Reading Phylogenetic Trees Gloria Rendon NCSA November, 2008.
Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)
Based on lectures by C-B Stewart, and by Tal Pupko Phylogenetic Analysis based on two talks, by Caro-Beth Stewart, Ph.D. Department of Biological Sciences.
Phylogenetic Analysis
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.
Phylogenetic reconstruction
Reading Phylogenetic Trees
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.
. Computational Genomics 5a Distance Based Trees Reconstruction (cont.) Modified by Benny Chor, from slides by Shlomo Moran and Ydo Wexler (IIT)
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
BNFO 602 Phylogenetics Usman Roshan.
Distance methods. UPGMA: similar to hierarchical clustering but not additive Neighbor-joining: more sophisticated and additive What is additivity?
Phylogenetic reconstruction
In addition to maximum parsimony (MP) and likelihood methods, pairwise distance methods form the third large group of methods to infer evolutionary trees.
The Tree of Life From Ernst Haeckel, 1891.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
Phylogenetic trees Sushmita Roy BMI/CS 576
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Terminology of phylogenetic trees
Molecular phylogenetics
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
Phylogentic Tree Evolution Evolution of organisms is driven by Diversity  Different individuals carry different variants of.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
BINF6201/8201 Molecular phylogenetic methods
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
Phylogenetic Trees  Importance of phylogenetic trees  What is the phylogenetic analysis  Example of cladistics  Assumptions in cladistics  Frequently.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Evolutionary Biology Concepts Molecular Evolution Phylogenetic Inference BIO520 BioinformaticsJim Lund Reading: Ch7.
Introduction to Phylogenetics
Calculating branch lengths from distances. ABC A B C----- a b c.
394C, Spring 2013 Sept 4, 2013 Tandy Warnow. DNA Sequence Evolution AAGACTT TGGACTTAAGGCCT -3 mil yrs -2 mil yrs -1 mil yrs today AGGGCATTAGCCCTAGCACTT.
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
Gene tree discordance and multi-species coalescent models Noah Rosenberg December 21, 2007 James Degnan Randa Tao David Bryant Mike DeGiorgio.
Phylogenetic Analysis Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics Figures from Higgs & Attwood.
Maximum Likelihood Given competing explanations for a particular observation, which explanation should we choose? Maximum likelihood methodologies suggest.
Phylogeny Ch. 7 & 8.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Applied Bioinformatics Week 8 Jens Allmer. Theory I.
Phylogenetic Trees - Parsimony Tutorial #13
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
1 CAP5510 – Bioinformatics Phylogeny Tamer Kahveci CISE Department University of Florida.
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
CSCE555 Bioinformatics Lecture 13 Phylogenetics II Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: (1) Phylogeny inference or “tree building”
Tree Terminologies. Phylogenetic Tree - phylogenetic relationships are normally displayed in a tree-like diagram (phylogenetic tree/cladogram) - a cladogram.
Bioinformatics Lecture 3 Molecular Phylogenetic By: Dr. Mehdi Mansouri Mehr 1395.
Introduction to Bioinformatics Resources for DNA Barcoding
Phylogenetic basis of systematics
Distance based phylogenetics
Inferring a phylogeny is an estimation procedure.
Clustering methods Tree building methods for distance-based trees
Multiple Alignment and Phylogenetic Trees
The Tree of Life From Ernst Haeckel, 1891.
Phylogenetic Trees.
Reading Phylogenetic Trees
Lecture 7 – Algorithmic Approaches
Phylogeny.
Phylogenetic Trees Jasmin sutkovic.
Presentation transcript:

Phylogenetic Analysis

2 Introduction Intension –Using powerful algorithms to reconstruct the evolutionary history of all know organisms. Phylogenetic tree –It can help understand the evolutionary relationships among species of organisms. –But we have to infer the evolutionary history of current organisms.

Campanulaceae (bluebell) family Herpesviruses

4 Ancestral Node or ROOT of the Tree Internal Nodes or Divergence Points (represent hypothetical ancestors of the taxa) Branches or Lineages Terminal Nodes A B C D E Represent the TAXA (genes, populations, species, etc.) used to infer the phylogeny Common Phylogenetic Tree Terminology

5 Taxon A Taxon B Taxon C Taxon D genetic change Taxon A Taxon B Taxon C Taxon D time Taxon A Taxon B Taxon C Taxon D no meaning Three types of trees Cladogram Phylogram Ultrametric tree All show the same evolutionary relationships, or branching orders, between the taxa.

Phylogenetic trees diagram the evolutionary relationships between the taxa ((A,(B,C)),(D,E)) = The above phylogeny as nested parentheses Taxon A Taxon B Taxon C Taxon E Taxon D No meaning to the spacing between the taxa, or to the order in which they appear from top to bottom. This dimension either can have no scale (for ‘cladograms’), can be proportional to genetic distance or amount of change (for ‘phylograms’ or ‘additive trees’), or can be proportional to time (for ‘ultrametric trees’ or true evolutionary trees). These say that B and C are more closely related to each other than either is to A, and that A, B, and C form a clade that is a sister group to the clade composed of D and E. If the tree has a time scale, then D and E are the most closely related.

7 Completely unresolved or "star" phylogeny Partially resolved phylogeny Fully resolved, bifurcating phylogeny AAA B BB C C C E E E D DD Polytomy or multifurcationA bifurcation The goal of phylogeny inference is to resolve the branching orders of lineages in evolutionary trees:

C-B Stewart, NHGRI lecture, 12/5/00 There are three possible unrooted trees for four taxa (A, B, C, D) AC B D Tree 1 AB C D Tree 2 AB D C Tree 3 Phylogenetic tree building (or inference) methods are aimed at discovering which of the possible unrooted trees is "correct". We would like this to be the “true” biological tree — that is, one that accurately represents the evolutionary history of the taxa. However, we must settle for discovering the computationally correct or optimal tree for the phylogenetic method of choice.

9 The number of unrooted trees increases in a greater than exponential manner with number of taxa (2N - 5)!! = # unrooted trees for N taxa (2N- 3)!! = # rooted trees for N taxa C A B D A B C A D B E C A D B E C F

10 Introduction NP-Hard optimization problem –Unrooted trees # of n organisms = TU(n) –Edges # of unrooted trees of n organisms = E(n) = 2n-3, n>=2 –TU(n) = TU(n-1)*E(n-1) = ΠE(i) = Π(2i-5) –Ex. –Rooted trees # of n organisms = TR(n) = TU(n)*E(n) = TU(n+1) x y z x y z t x y z t x y z t n-1 n i=2i=3 add t

11 Inferring evolutionary relationships between the taxa requires rooting the tree: To root a tree mentally, imagine that the tree is made of string. Grab the string at the root and tug on it until the ends of the string (the taxa) fall opposite the root: A B C Root D A B C D Note that in this rooted tree, taxon A is no more closely related to taxon B than it is to C or D. Rooted tree Unrooted tree

12 Now, try it again with the root at another position: A B C Root D Unrooted tree Note that in this rooted tree, taxon A is most closely related to taxon B, and together they are equally distantly related to taxa C and D. C D Root Rooted tree A B

13 An unrooted, four-taxon tree theoretically can be rooted in five different places to produce five different rooted trees The unrooted tree 1: AC B D Rooted tree 1d C D A B 4 Rooted tree 1c A B C D 3 Rooted tree 1e D C A B 5 Rooted tree 1b A B C D 2 Rooted tree 1a B A C D 1 These trees show five different evolutionary relationships among the taxa!

14 All of these rearrangements show the same evolutionary relationships between the taxa B A C D A B D C B C A D B D A C B A C D Rooted tree 1a B A C D A B C D

15 Molecular phylogenetic tree building methods: Are mathematical and/or statistical methods for inferring the divergence order of taxa, as well as the lengths of the branches that connect them. There are many phylogenetic methods available today, each having strengths and weaknesses. Most can be classified as follows: COMPUTATIONAL METHOD Clustering algorithmOptimality criterion DATA TYPE Characters Distances PARSIMONY MAXIMUM LIKELIHOOD UPGMA NEIGHBOR-JOINING MINIMUM EVOLUTION LEAST SQUARES

16 parsimony model complexity vs. sample size minimize Hamming distance summed over all edges of the tree justification: minimum possible number of evolutionary events subject of serious dispute by systematic biologists

17 Method –Maximum parsimony (MP) Seek the tree that minimizes the total number of evolutionary events on the edges of tree Ex. Require two algorithms –Search over tree topology –The computation of a cost for a given tree 11 1 AAA AAG AAA GGA AGA AAA AAG AAA GGAAGA 112 AAA AAG AAA AGAGGA 112

18 maximum likelihood estimate probability that a specific evolutionary model will produce a particular phylogeny yielding the observed sequences many evolutionary models

19 Method –Maximum likelihood (ML) Seek the tree that maximizes likelihood P(data|tree) Ex. –Compute likelihood P(x 1,x 2,x 3 |T,t 1,t 2,t 3,t 4 ) –x : a set of sequences –T: a tree –t : edge lengths of tree Require two algorithms –Search over tree topology –Search over all possible lengths of edges t to compute likelihood X1X1 X2X2 X5X5 X4X4 X3X3 root t1t1 t2t2 t3t3 t4t4

20 Distance Matrix Methods produce a tree such that the path distance between leaves i and j (sum of edge weights in the path between i and j) equals D ij this the additive property for a distance matrix -- of course real distance matrices may not be additive most methods use agglomerative clustering -- successively choosing pairs of nodes to combine

21 Ultrametric trees path distance from the root to each leaf is the same strong molecular clock assumption - distance is proportional to evolutionary time

22 Example Tree and Additive Matrix

23 Distance Matrix Methods UPGMA Neighbor Joining Fitch Margoliash Quartet Puzzling Witness-Anitwitness Double Pivot many are “ not yet in use by the systematic biology community ”

24 Distance Measures DNA hybridization amounts immunological distances genetic distances sequence distances (DNA, RNA, protein)

25 … what distance? need distance measure that reflects the actual number of point mutations on the path between the leaves particular problem with sequence data - Hamming distance and assumption of no reversals

26 UPGMA Unweighted Pair-Group Method with Arithmetic mean

27 UPGMA Step 1 combine B and C

28 UPGMA step 2 combine BC and D (10+12)/2 (4+6)/2

29 UPGMA step 3 combine A and E a e c b d

30 UPGMA step 4 combine AE and BCD

31 UPGMA Result 3.5

32 UPGMA Result 3.5

33 Method Phylogenetic reconstruction techniques –NJ (neighbor-joining method) A star tree is successively inserted branches between a pair of closest neighbors and the remaining terminals in the tree Character –The fastest reconstruction method –Poor accuracy when the distance matrix contains large value

34 Method Ex. –The cost save by pairing S1 and S2 = New connection cost (NC) – Old connection cost (OC) = 2.34 NC = ½(average(S1)+average(S2)+d(S1,S2))=6.33 OC = average(S1) +average(S2) = 8.67 –The largest cost save by pairing S3 and S4 = 2.67 Thus we pair S3 and S4 S1S2S3S4 S10443 S2065 S302 S40 Distance matrixStar tree S2 S1S3 S X S2 S1S3 S4 X X S2 S1 Pair S1 and S2

35 Neighbor-Joining Result

36 Genome Rearragement –Generalized Nadean-Tayor (GNT) evolution model P(transpostion) = α P(inverted trans.) = β P(inversion) = 1-(α+β) events # on edge : according to Poisson distribution f(x) = ; x=1,2,.. Genome rearrangement λ xe -3 x!

37 Improving reconstruction algorithms

38 Improving reconstruction algorithms –Estimators of true evolutionary distance Exact-IEBP (inverting the expected breakpoint distance) ML estimate of the breakpoint distance after K rearrangements Approx-IEBP approximate Exact-IEBP EDE (empirically derived estimator) empirical estimate of the inversion distance after K rearrangements produced a nonlinear regression formula that computes the expected distance given that K random rearrangements

39 Conclusion New generation of phylogenetic software needs –More sophisticated models of evolution –Faster optimization algorithms –High performance algorithm engineering –Powerful modes of user interaction