Brandon Andrews CS6030.  What is a phylogenetic tree?  Goals in a phylogenetic tree generator  Distance based method  Fitch-Margoliash Method Example.

Slides:



Advertisements
Similar presentations
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Advertisements

1 Disjoint Sets Set = a collection of (distinguishable) elements Two sets are disjoint if they have no common elements Disjoint-set data structure: –maintains.
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Phylogenies Preliminaries Distance-based methods Parsimony Methods.
Fitch-Margoliash (FM) Algorithm
UPGMA Algorithm.  Main idea: Group the taxa into clusters and repeatedly merge the closest two clusters until one cluster remains  Algorithm  Add a.
Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.
. Computational Genomics 5a Distance Based Trees Reconstruction (cont.) Modified by Benny Chor, from slides by Shlomo Moran and Ydo Wexler (IIT)
Branch and Bound Similar to backtracking in generating a search tree and looking for one or more solutions Different in that the “objective” is constrained.
Heapsort. 2 Why study Heapsort? It is a well-known, traditional sorting algorithm you will be expected to know Heapsort is always O(n log n) Quicksort.
CISC220 Fall 2009 James Atlas Nov 13: Heap Implementations, Graphs.
Distance methods. UPGMA: similar to hierarchical clustering but not additive Neighbor-joining: more sophisticated and additive What is additivity?
The Tree of Life From Ernst Haeckel, 1891.
CISC667, F05, Lec15, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (II) Distance-based methods.
Phylogenetic Trees Tutorial 6. Measuring distance Bottom-up algorithm (Neighbor Joining) –Distance based algorithm –Relative distance based Phylogenetic.
Multiple sequence alignment
Transforming Infix to Postfix
Phylogenetic Trees Tutorial 6. Measuring distance Bottom-up algorithm (Neighbor Joining) –Distance based algorithm –Relative distance based Phylogenetic.
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
Phylogenetic trees Sushmita Roy BMI/CS 576
Heapsort Based off slides by: David Matuszek
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
Heapsort CSC Why study Heapsort? It is a well-known, traditional sorting algorithm you will be expected to know Heapsort is always O(n log n)
COSC 2007 Data Structures II Chapter 14 Graphs III.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
BINF6201/8201 Molecular phylogenetic methods
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
Phylogenetic Prediction Lecture II by Clarke S. Arnold March 19, 2002.
Chapter 8 Molecular Phylogenetics: Measuring Evolution.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Using traveling salesman problem algorithms for evolutionary tree construction Chantal Korostensky and Gaston H. Gonnet Presentation by: Ben Snider.
Phylogenetic Analysis Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics Figures from Higgs & Attwood.
Graphs A ‘Graph’ is a diagram that shows how things are connected together. It makes no attempt to draw actual paths or routes and scale is generally inconsequential.
Doug Raiford Lesson 9.  3 Approaches  Distance  Parsimony  Maximum Likelihood  Have already seen a distance method 12/18/20152Phylogenetics Part.
Heapsort. What is a “heap”? Definitions of heap: 1.A large area of memory from which the programmer can allocate blocks as needed, and deallocate them.
Tutorial 5 Phylogenetic Trees.
Lecture 9COMPSCI.220.FS.T Lower Bound for Sorting Complexity Each algorithm that sorts by comparing only pairs of elements must use at least 
1 CAP5510 – Bioinformatics Phylogeny Tamer Kahveci CISE Department University of Florida.
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
1 Joe Meehean. A A B B D D I I C C E E X X A A B B D D I I C C E E X X  Terminology each circle is a node pointers are edges topmost node is the root.
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Hierarchical clustering approaches for high-throughput data Colin Dewey BMI/CS 576 Fall 2015.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
Data Structures and Algorithms Instructor: Tesfaye Guta [M.Sc.] Haramaya University.
Chapter 6 – Trees. Notice that in a tree, there is exactly one path from the root to each node.
Fitch-Margoliash Algorithm 1.From the distance matrix find the closest pair, e.g., A & B 2.Treat the rest of the sequences as a single composite sequence.
Navigation Piles with Applications to Sorting, Priority Queues, and Priority Deques Jyrki Katajainen and Fabio Vitale Department of Computing, University.
Clustering [Idea only, Chapter 10.1, 10.2, 10.4].
Lecture 14 CS5661 Neighbor Joining Generates unrooted tree, allowing for unequal branches Given: Distance matrix for sequences Steps: Repeat 1-3 till all.
Clustering methods Tree building methods for distance-based trees
(edited by Nadia Al-Ghreimil)
Multiple Alignment and Phylogenetic Trees
Hierarchical clustering approaches for high-throughput data
Dr. David Matuszek Heapsort Dr. David Matuszek
Inferring phylogenetic trees: Distance and maximum likelihood methods
Phylogenetic Trees.
CS 581 Tandy Warnow.
Heapsort.
(edited by Nadia Al-Ghreimil)
Lecture 7 – Algorithmic Approaches
Self-organizing map numeric vectors and sequence motifs
Phylogeny.
Heapsort.
Heapsort.
CO 303 Algorithm Analysis and Design
Presentation transcript:

Brandon Andrews CS6030

 What is a phylogenetic tree?  Goals in a phylogenetic tree generator  Distance based method  Fitch-Margoliash Method Example  Verification  Demo

 Also known as an evolutionary tree  Attempts to map the genetic similarity of organisms into a tree where longer branches indicate more dissimiliarity A B C B and C are similar A and B are more similar than A and C which have a longer distance

 Given the sequences and calculated or known dissimilarity construct a tree which correctly maps this data  Naïve method: Generate every possible tree and grade its quality

 Take a distance matrix that stores the distance from every sequence to every other sequence  Construct a tree which preserves these distances Most don’t 100% preserve the distances

 Clustering algorithm that works bottom up to create an unrooted tree  Weights are used to help lower the error rate for long paths

 Calculate a distance matrix Hamming distance can be used, but a better dissimilarity function is advised ABCDE A B00 43 C D E00000

 Add all the sequences to an array of nodes and mark them as leaves  Select the closest nodes by scanning the distance matrix  Those two nodes, in our example D and E will make up the two branches in a 3-branch calculation to find the branch lengths D E A, B, C d e abc dist(ABC, D) is the average distance from ABC to D Dist(ABC, E) is the average distance from ABC to E d = (dist(D, E) + (dist(ABC, D) - dist(ABC, E))) / 2; e = dist(D, E) - d; abc = dist(ABC, D) - d;

 dist(ABC, D) and dist(ABC, E) Calculate by taking the distance from each of the elements A, B, and C and averaging them d = (10 + (32.6… …)) / 2 = 4 e = = 6 abc = 32.6… - 4 = 28.6… ABCDE 032.6…34.6… D0010 E000

 Now we can create a new node with distance 28.6… and set D and E to their respective distances  Since D and E are leaves their distance are kept. However, if they weren’t then the average of the child distances would be subtracted as seen later D E A, B, C …

 The final step in this iteration is to recalculate the nodes and distance matrix The nodes array has the new merged node DE appended to the end and D and E are removed The distance matrix is updated with DE merged and D and E are removed: ABCDE A B C00019 DE0000

 Look at the new distance matrix find the closest pair, C and DE  Now there is a special step. C is a leaf so it gets the calculated distance DE is not a leaf so we need to subtract from DE the average child distance C DE A, B c de ab dist(AB, C) is the average distance from AB to C Dist(AB, DE) is the average distance from AB to DE c = (dist(C, DE) + (dist(AB, C) - dist(AB, DE))) / 2; de = dist(C, DE) - c; ab = dist(AB, C) - c;

 Merging A and B to calculate the average distance to C and DE. dist(AB, C) dist(AB, DE) ABCDE AB04041 C0019 DE000

 Average child distance example Recursively take the average of each branches ((5 + ((2 + (4 + 6) / 2) + 3) / 2) + 1) / 2 =

 So for DE which has two child nodes we need to subtract the average of the children. Since DE has two leaf nodes we perform:  (4 + 6) / 2 = 5  So now we calculate c, de, and ab:  c = (dist(C, DE) + (dist(AB, C) - dist(AB, DE))) / 2 = (19 + (40 – 41)) / 2 = 9  de = dist(C, DE) – c – AverageDistance(DE) = 19 – 9 – (4 + 6) / 2 = 5  ab = dist(AB, C) – c = 40 – 9 = 31  Notice that the distance at de replaces whatever was previously there

 With the new node added:  Recalculated distance matrix: C A, B D E 4 6 ABCDE A B CDE000

 As before choose the next closest nodes by looking at the distance matrix A and B are chosen Now a and b can be calculated since they are leaves, but notice we’re linking two trees at cde, so we need a special step to subtract the average distance A CDE a b cde B dist(CDE, A) is the average distance from CDE to A Dist(CDE, B) is the average distance from CDE to B a = (dist(A, B) + (dist(CDE, A) - dist(CDE, B))) / 2 = 10 b = dist(A, B) - c = 12 cde = dist(CDE, A) - a = 29.5

 So AverageDistance(CDE) ((5 + (4 + 6) / 2) + 9) / 2 = = 20 C A, B 9 5 D E 4 6 A CDE cde B 29.5 C 9 5 D E 4 6 A B 20

 So we have a completely defined unrooted tree. How do we root it? Just take the last branch and divide it by two C 9 5 D E 4 6 A B 10

 Original:  From the generated tree:  Exact match Rare to happen Usually off by a small amount ABCDE A B00 43 C D E00000 ABCDE A B00 43 C D E00000

 Distance based methods such as the Fitch-Margoliash method produce very accurate trees given an accurate distance matrix in a very timely manner

Bacardit, J., Krasnogor, N. Phylogenetic Trees [PPT document]. Retrieved from Louhisuo K. (2004, May 4). Constructing phylogenetic trees with UPGMA and Fitch- Margoliash. Retrieved from