Chapter 5 The Evolution Trees.

Slides:



Advertisements
Similar presentations
Review: Search problem formulation
Advertisements

Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Great Theoretical Ideas in Computer Science for Some.
3.3 Spanning Trees Tucker, Applied Combinatorics, Section 3.3, by Patti Bodkin and Tamsen Hunter.
Graphs Chapter 12. Chapter Objectives  To become familiar with graph terminology and the different types of graphs  To study a Graph ADT and different.
Minimum Spanning Trees
CMPS 2433 Discrete Structures Chapter 5 - Trees R. HALVERSON – MIDWESTERN STATE UNIVERSITY.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Tree Reconstruction.
1 Discrete Structures & Algorithms Graphs and Trees: II EECE 320.
UPGMA Algorithm.  Main idea: Group the taxa into clusters and repeatedly merge the closest two clusters until one cluster remains  Algorithm  Add a.
. Computational Genomics 5a Distance Based Trees Reconstruction (cont.) Modified by Benny Chor, from slides by Shlomo Moran and Ydo Wexler (IIT)
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
The Evolution Trees From: Computational Biology by R. C. T. Lee S. J. Shyu Department of Computer Science Ming Chuan University.
3 -1 Chapter 3 The Greedy Method 3 -2 The greedy method Suppose that a problem can be solved by a sequence of decisions. The greedy method has that each.
5 - 1 Chap 5 The Evolution Trees Evolutionary Tree.
Graphs Chapter 12. Chapter 12: Graphs2 Chapter Objectives To become familiar with graph terminology and the different types of graphs To study a Graph.
Spring 2010CS 2251 Graphs Chapter 10. Spring 2010CS 2252 Chapter Objectives To become familiar with graph terminology and the different types of graphs.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Phylogenetic Trees Tutorial 6. Measuring distance Bottom-up algorithm (Neighbor Joining) –Distance based algorithm –Relative distance based Phylogenetic.
Tirgul 13. Unweighted Graphs Wishful Thinking – you decide to go to work on your sun-tan in ‘ Hatzuk ’ beach in Tel-Aviv. Therefore, you take your swimming.
Fall 2007CS 2251 Graphs Chapter 12. Fall 2007CS 2252 Chapter Objectives To become familiar with graph terminology and the different types of graphs To.
9-1 Chapter 9 Approximation Algorithms. 9-2 Approximation algorithm Up to now, the best algorithm for solving an NP-complete problem requires exponential.
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Distance Matrix Methods Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.
Phylogenetic trees Tutorial 6. Distance based methods UPGMA Neighbor Joining Tools Mega phylogeny.fr DrewTree Phylogenetic Trees.
Phylogenetic trees Sushmita Roy BMI/CS 576
Escaping local optimas Accept nonimproving neighbors – Tabu search and simulated annealing Iterating with different initial solutions – Multistart local.
Phylogenetic Analysis. 2 Introduction Intension –Using powerful algorithms to reconstruct the evolutionary history of all know organisms. Phylogenetic.
Terminology of phylogenetic trees
© The McGraw-Hill Companies, Inc., Chapter 3 The Greedy Method.
The Traveling Salesperson Problem Algorithms and Networks.
SPANNING TREES Lecture 21 CS2110 – Spring
Graph Theory Topics to be covered:
COSC 2007 Data Structures II Chapter 14 Graphs III.
Spanning Trees Introduction to Spanning Trees AQR MRS. BANKS Original Source: Prof. Roger Crawfis from Ohio State University.
Spanning Trees Introduction to Spanning Trees AQR MRS. BANKS Original Source: Prof. Roger Crawfis from Ohio State University.
BINF6201/8201 Molecular phylogenetic methods
Module 5 – Networks and Decision Mathematics Chapter 23 – Undirected Graphs.
Phylogenetic Prediction Lecture II by Clarke S. Arnold March 19, 2002.
Calculating branch lengths from distances. ABC A B C----- a b c.
Using traveling salesman problem algorithms for evolutionary tree construction Chantal Korostensky and Gaston H. Gonnet Presentation by: Ben Snider.
Evolutionary tree reconstruction
1 1 © 2003 Thomson  /South-Western Slide Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Lectures on Greedy Algorithms and Dynamic Programming
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
SPANNING TREES Lecture 20 CS2110 – Fall Spanning Trees  Definitions  Minimum spanning trees  3 greedy algorithms (incl. Kruskal’s & Prim’s)
Foundation of Computing Systems
Graphs Chapter 12. Chapter 12: Graphs2 Chapter Objectives To become familiar with graph terminology and the different types of graphs To study a Graph.
SPANNING TREES Lecture 21 CS2110 – Fall Nate Foster is out of town. NO 3-4pm office hours today!
Computer Sciences Department1.  Property 1: each node can have up to two successor nodes (children)  The predecessor node of a node is called its.
CSE 421 Algorithms Richard Anderson Winter 2009 Lecture 5.
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
CSE 421 Algorithms Richard Anderson Autumn 2015 Lecture 5.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
Graph Search Applications, Minimum Spanning Tree
Greedy Technique.
Inferring a phylogeny is an estimation procedure.
The Evolution Trees (Part I)
B+ Tree.
Spanning Trees Lecture 21 CS2110 – Fall 2016.
Clustering methods Tree building methods for distance-based trees
Graph Algorithm.
BNFO 602 Phylogenetics Usman Roshan.
Lectures on Graph Algorithms: searching, testing and sorting
CS 581 Tandy Warnow.
Phylogeny.
Spanning Trees Lecture 20 CS2110 – Spring 2015.
More Graphs Lecture 19 CS2110 – Fall 2009.
Presentation transcript:

Chapter 5 The Evolution Trees

An Evolution Tree siamang (合趾猴) gibbon (長臂猿) orangutan (猩猩) human (人類) gorilla (大猩猩) chimpanzee (黑猩猩)

Tree Topology Rooted trees Unrooted trees

Properties of an Evolution Tree Leaf nodes represent species. In a rooted tree, the degree of each internal node is 3, except the root. In an unrooted tree, the degree of each internal node is 3. In a rooted tree, the distances from the root to all leaf nodes are the same.

Distance Matrix and Rooted Tree   s1 s2 s3 s4 s5 50 10 30

Distance d(si, sj): the distance between species si and sj in the distance matrix dt(si, sj): the distance between species si and sj in an evolution tree d(si, sj)  dt(si, sj) s1 = agctccca s1 = agctccca s2 = agccccca s'1 = agcaccca d(s1, s2) = 1 s2 = agccccca dt(s1, s2) = 2

Number of Unrooted Trees Number of edges in an unrooted evolution tree NE(n) = 2n  3 Number of unrooted evolution trees for n species TU(n + 1) = (2n  3)  TU(n) TU(n) = (2n  5)  (2n 7)    1

Number of Rooted Trees TR(n) = (2n  3) TU(n) =(2n-3) (2n  5) (2n 7) 1 =TU(n+1)

Different Tree Specifications Minimax evolution trees The maximum of (dt(si, sj)  d(si, sj)) is minimized. Minisum evolution trees The total sum of all pairs of distances among leaf nodes is minimized. Minisize evolution trees The total length of the tree is minimized.

Complexities of Evolution Tree Problems Minimax Minisum Minisize Unrooted NP-complete Unknown Rooted O(n2)

The Rooted Minimax Evolution Tree Algorithm Step 1: Find the longest distance in the distance matrix: d(s2, s4) s1 s2 s3 s4 2 3 3.1 3.6 5 1

Step 2: Construct a minimal spanning tree. 2 3 3.1 3.6 5 1

Step 3: Break the longest edge in the path connecting s2 and s4.

Step 4: Construct rooted subtrees recursively. 2 3 3.1 3.6 5 1

Step 5: Combine the two subtrees Step 5: Combine the two subtrees. The distance of each leaf to the root is d(s2, s4)/2. That is, dt(s2, s4) = d(s2, s4) s1 s2 s3 s4 2 3 3.1 3.6 5 1

Weights Determination for a Tree with a Given Topology Suppose we want to construct a minisize unrooted evolution tree. Suppose the following is the best tree topology. We can determine the weights with the linear programming approach.

Suppose we want to construct a minisize rooted evolution tree. Suppose the following is the best tree topology.

UPGMA for Rooted Evolution Trees Unweighted pair group method with arithmetic mean Finding a rooted evolution tree topology for a given distance matrix Greedy and heuristic method

UPGMA Step 1: Select the pair of species with the smallest distance: (s3, s4) s1 s2 s3 s4 4 3 6 5 2

Step 2: Consider (s3, s4) as a new species. d(s1, (s3, s4)) = (d(s1, s3) + d(s1, s4))/2 = (4+3)/2 = 3.5 d(s2, (s3, s4)) = (d(s2, s3) + d(s2, s4))/2 = (6+5)/2 = 5.5 d(s1, s2) = 4 s1 s2 (s3, s4) 4 3.5 5.5

(Repeat Steps 1 and 2) Select the pair of species with the smallest distance: (s1, (s3, s4)) 4 3.5 5.5

Obtain the final evolution tree. Then use linear programming technique to produce an evolution tree for a given criteria.

The Neighbor Joining Method for Unrooted Evolution Trees Finding an unrooted evolution tree topology for a given distance matrix. Greedy and heuristic method

Neighbor Joining Method Step 1: Construct a 1-star: Create an internal node x. s1 s2 s3 s4 4 3 6 5 2

Step 2: Find a good pair for putting in the same branch. Step 2.1: Try to select a pair of species (S1, S2), insert an internal node x1. Step 2.2: Formulate the following equations:

Step 2.3 Calculate the new connection cost NC. Step 2.4: Calculate the weights of the edges.

(Repeat Step 2.1) Try to select another pair of species (S1, S3), insert an internal node x1. (Repeat Steps 2.2 through 2.4) Recalculate the weights of the edges.

Step 2.5: Calculate the saved cost of each pair. The cost saved by pairing s1 with s2: Old cost OC= average(S1)+average(S2)=5+3.67=8.67 Cost saved The cost saved by (s1, s3 )=1.835 (s1, s4 )=2 (s2, s3 )=1.5 (s2, s4 )=1.67 (s3, s4 )=2.67 Step 2.6: Pair (s3, s4 ) has the maximum cost saving.

Step 3: Put S3 and S4 in the same branch, insert an internal node. Repeat Steps 3 and 4 until the degree of x is 3. The final tree structure: After the tree topology has been found, we can apply linear programming to find the final distance of each edge.

An Approximation Algorithm for an Unrooted Minisize Evolution Tree Find an unrooted evolution tree for a given distance matrix. This algorithm is based upon the minimal spanning tree. The approximate solution is never larger than twice of the size of an optimal solution.

Step 1: Construct a minimal spanning tree. Step 2: Find a BFS (breadth first search) order (with any node as the root): s4, s3, s1, s2 (See the example for BFS on the next page.) s1 s2 s3 s4 4 3 6 5 2

Breadth First Search BFS order with e as the root:e, b, g, j, f, a, c, d, h, i

Approximation Algorithm (Cont.) Step 3: Add nodes one by one with the BFS order. s4, s3, s1, s2 s4, s3, s1, s2

An unrooted evolution tree transformed from the minimal spanning tree. s4, s3, s1, s2

Proof of Approximate Rate The total length of this unrooted evolution tree is less than or equal to twice of the length of an optimal unrooted minisize evolution tree. (Approximate rate=2.) |MST|<|TSP| APP= |MST|<|TSP|

Original evolution tree Duplicate every edge in the tree, then there exists an Euler cycle. Euler cycle |ET|=Total cost of Euler cycle |ET|=2|OPT| |TSP|  |ET|=2|OPT| APP= |MST|<|TSP| APP<2|OPT|