Lecture 6A – Introduction to Trees & Optimality Criteria

Slides:



Advertisements
Similar presentations
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
Advertisements

Data Structures: A Pseudocode Approach with C 1 Chapter 6 Objectives Upon completion you will be able to: Understand and use basic tree terminology and.
Parsimony based phylogenetic trees Sushmita Roy BMI/CS 576 Sep 30 th, 2014.
Tree Reconstruction.
Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.
. Computational Genomics 5a Distance Based Trees Reconstruction (cont.) Modified by Benny Chor, from slides by Shlomo Moran and Ydo Wexler (IIT)
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
Phylogenetic trees as a visualization tools for evolutionary classification.
CISC667, F05, Lec14, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (I) Maximum Parsimony.
. Phylogenetic Trees - Parsimony Tutorial #12 Next semester: Project in advanced algorithms for phylogenetic reconstruction (236512) Initial details in:
Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.
5 - 1 Chap 5 The Evolution Trees Evolutionary Tree.
Branch lengths Branch lengths (3 characters): A C A A C C A A C A C C Sum of branch lengths = total number of changes.
Multiple sequence alignment
Chapter 9: Huffman Codes
. Phylogenetic Trees - Parsimony Tutorial #11 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
. Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.
Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood.
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
Phylogeny Tree Reconstruction
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
Maximum parsimony Kai Müller.
Busby, Dodge, Fleming, and Negrusa. Backtracking Algorithm Is used to solve problems for which a sequence of objects is to be selected from a set such.
Parsimony and searching tree-space Phylogenetics Workhop, August 2006 Barbara Holland.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
Introduction Of Tree. Introduction A tree is a non-linear data structure in which items are arranged in sequence. It is used to represent hierarchical.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Phylogenetics II.
394C, Spring 2013 Sept 4, 2013 Tandy Warnow. DNA Sequence Evolution AAGACTT TGGACTTAAGGCCT -3 mil yrs -2 mil yrs -1 mil yrs today AGGGCATTAGCCCTAGCACTT.
Ch.6 Phylogenetic Trees 2 Contents Phylogenetic Trees Character State Matrix Perfect Phylogeny Binary Character States Two Characters Distance Matrix.
Evolutionary tree reconstruction
5.5.3 Rooted tree and binary tree  Definition 25: A directed graph is a directed tree if the graph is a tree in the underlying undirected graph.  Definition.
5.5.2 M inimum spanning trees  Definition 24: A minimum spanning tree in a connected weighted graph is a spanning tree that has the smallest possible.
Lecture 6A – Introduction to Trees & Optimality Criteria Branches: n-taxa -> 2n-3 branches 1, 2, 4, 6, & 7 are external (leaves) 3 & 5 are internal branches.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.
Huffman’s Algorithm 11/02/ Weighted 2-tree A weighted 2-tree T is an extended binary tree with n external nodes and each of the external nodes is.
Foundation of Computing Systems
1 Alignment Matrix vs. Distance Matrix Sequence a gene of length m nucleotides in n species to generate an… n x m alignment matrix n x n distance matrix.
Bahareh Sarrafzadeh 6111 Fall 2009
Phylogenetic Trees - Parsimony Tutorial #13
Probabilistic methods for phylogenetic tree reconstruction BMI/CS 576 Colin Dewey Fall 2015.
Probabilistic Approaches to Phylogenies BMI/CS 576 Sushmita Roy Oct 2 nd, 2014.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
Chapter AGB. Today’s material Maximum Parsimony Fixed tree versions (solvable in polynomial time using dynamic programming) Optimal tree search.
Discrete Structures Li Tak Sing( 李德成 ) Lectures
Lecture on Data Structures(Trees). Prepared by, Jesmin Akhter, Lecturer, IIT,JU 2 Properties of Heaps ◈ Heaps are binary trees that are ordered.
Phylogenetic Trees - Parsimony Tutorial #12
Chapter 5 : Trees.
Greedy Technique.
Lecture 6B – Optimality Criteria: ML & ME
CHAPTER 4 Trees.
PC trees and Circular One Arrangements
Lecture 6A – Introduction to Trees & Optimality Criteria
Character-Based Phylogeny Reconstruction
Clustering methods Tree building methods for distance-based trees
Chapter 9: Huffman Codes
Analysis & Design of Algorithms (CSCE 321)
CS 581 Tandy Warnow.
Why Models of Sequence Evolution Matter
COMPS263F Unit 2 Discrete Structures Li Tak Sing( 李德成 ) Room A
CSCI2950-C Lecture 8 Molecular Phylogeny: Parsimony and Likelihood
Lecture 36 Section 12.2 Mon, Apr 23, 2007
Lecture 6B – Optimality Criteria: ML & ME
Use MSA in phylogenetics: MP
Lecture 7 – Algorithmic Approaches
Phylogeny.
Minimum Spanning Trees (MSTs)
Lecture 4: Tree Search Strategies
Unit II Game Playing.
Presentation transcript:

Lecture 6A – Introduction to Trees & Optimality Criteria Branches: n-taxa -> 2n-3 branches 1, 2, 4, 6, & 7 are external (leaves) 3 & 5 are internal branches (edges) Nodes A – E are terminals x, y, & z are internal (vertices)

If we break branch 3, we have two sub-trees: (A,B) and (C,(D,E)). Newick Format ((A,B),C,(D,E)). If we break branch 3, we have two sub-trees: (A,B) and (C,(D,E)).

Rooting – The tree is an unrooted tree.

Also note that there is free rotation around nodes: (1 2 3 4) (1 2 3 4) (1 2 3) (1 2)

Growth of tree space.

The Scope of the Problem Taxa Unrooted Trees 3 1 4 3 5 15 6 105 7 945 8 10,395 9 135,135 10 2.027 X 106 22 3 X 1023 50 3 X 1074 100 2 X 1082 1000 2 X 102,860 10 mil 5 X 1068,667,340

II. Optimality Criteria A. Parsimony First, the score of a tree (i.e., its length) for the entire data set is given by: li is the length of character i when optimized on tree t. wi is the weight we assign to character i.

The Fitch Algorithm (1971): state sets and accumulated lengths. (Unordered states with equal transformation costs) We erect a state set at each terminal node and assign an accumulated length of zero to terminal nodes. This is the minimum number of changes in the daughter subtree.

The Fitch Algorithm: state sets and accumulated lengths. 1 – Form the intersection of the state sets of the two daughter nodes. If the intersection is non-empty, assign the set for the internal node equal to the intersection. The accumulated length of the internal node is the sum of those of the daughter nodes. 2 – If the intersection is empty, we assign the union of the two daughter nodes to the state set for the internal node. The accumulated length is the sum of those of the daughter nodes plus one. empty Union: 0+0+1=1 non-empty Intersection: 0+0+0=0 empty Union: 1+0+1=2 So li = 2

Sankoff Algorithm – Character-state vectors and step matrices. Step Matrix – define ci,j   A C G T A -- 4 1 4 C 4 -- 4 1 G 1 4 -- 4 T 4 1 4 -- Step one: Fill in the character-state vectors for terminal nodes. Each cell is indexed by sk(i), the cost of having state i at node k.

A C G T A -- 4 1 4 C 4 -- 4 1 G 1 4 -- 4 T 4 1 4 -- Step two: Fill in vectors for other nodes, descending tree. Node 1 (k = 1): Node 2 (k = 2): s1(A) = cAG + cAA = 1 + 0 = 1, s2(A) = 4 + 4 = 8 s1(C) = cCG + cCA = 4 + 4 = 8, s2(C) = 0 + 0 = 0 s1(G) = cGG + cGA = 0 + 1 = 1, s2(G) = 4 + 4 = 8 s1(T) = cTG + cTA = 4 + 4 = 8 s2(T) = 1 + 1 = 2

For nodes below, we must calculate the cost for each possible state assignment for daughter nodes. From daughter node 1 From step matrix s3(A) = min[s1A + cAj] + min[s2A + cAj] = min[1,12,2,12] + min[8,4,9,6] = 1+4 = 5 s3(C) = min[s1C + cCj] + min[s2C + cCj] = min [5,8,5,9] + min[12,0,12,3] = 5+0 = 5 5 = min [2,12,1,12] + min[9,4,8,6] = 1+4 = 5 5 = min [5,9,5,8] + min[12,1,12,2] = 5+1 = 6 6 5 s3(G) = min[s1G + cGj] + min[s2G + cGj] A C G T A -- 4 1 4 C 4 -- 4 1 G 1 4 -- 4 T 4 1 4 -- s3(T) = min[s1T + cTj] + min[s2T + cTj] So we fill in the character-state vector for node 3:

So, li = 5 Points to note: 1) Two types of weighting are possible: weighting of transformations within characters (which we demonstrated with the step matrix) and weighting among characters, which are reflected in the weighted sum of lengths across characters (wi). 2) One can’t compare tree lengths across weighting schemes. In the first example, with all transformations having the same cost, the length of the character on this tree was 2. In the second, with a 4:1 step matrix to weight transversions, the length was 5.