Comp. Genomics Recitation 8 Phylogeny. Outline Phylogeny: Distance based Probabilistic Parsimony.

Slides:



Advertisements
Similar presentations
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Advertisements

Maximum Parsimony Probabilistic Models of Evolutions Distance Based Methods Lecture 12 © Shlomo Moran, Ilan Gronau.
Computational Molecular Biology Biochem 218 – BioMedical Informatics Doug Brutlag Professor.
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Parsimony based phylogenetic trees Sushmita Roy BMI/CS 576 Sep 30 th, 2014.
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
Phylogenetic Trees Lecture 4
Molecular Evolution and Phylogenetic Tree Reconstruction
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
CS Data Structures Chapter 10 Search Structures (Selected Topics)
Problem Set 2 Solutions Tree Reconstruction Algorithms
UPGMA Algorithm.  Main idea: Group the taxa into clusters and repeatedly merge the closest two clusters until one cluster remains  Algorithm  Add a.
. Computational Genomics 5a Distance Based Trees Reconstruction (cont.) Modified by Benny Chor, from slides by Shlomo Moran and Ydo Wexler (IIT)
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
UPGMA and FM are distance based methods. UPGMA enforces the Molecular Clock Assumption. FM (Fitch-Margoliash) relieves that restriction, but still enforces.
Bioinformatics Algorithms and Data Structures
Phylogeny Tree Reconstruction
Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.
The Tree of Life From Ernst Haeckel, 1891.
Phylogenetic reconstruction
CISC667, F05, Lec15, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (II) Distance-based methods.
Realistic evolutionary models Marjolijn Elsinga & Lars Hemel.
Rooted Trees. More definitions parent of d child of c sibling of d ancestor of d descendants of g leaf internal vertex subtree root.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Phylogeny Tree Reconstruction
Phylogenetic Trees Tutorial 6. Measuring distance Bottom-up algorithm (Neighbor Joining) –Distance based algorithm –Relative distance based Phylogenetic.
DAST 2005 Week 4 – Some Helpful Material Randomized Quick Sort & Lower bound & General remarks…
Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood.
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
. Phylogenetic Trees (2) Lecture 12 Based on: Durbin et al Section 7.3, 7.8, Gusfield: Algorithms on Strings, Trees, and Sequences Section 17.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
. Phylogenetic Trees (2) Lecture 12 Based on: Durbin et al Section 7.3, 7.8, Gusfield: Algorithms on Strings, Trees, and Sequences Section 17.
Phylogenetic trees Sushmita Roy BMI/CS 576
TECH Computer Science Graph Optimization Problems and Greedy Algorithms Greedy Algorithms  // Make the best choice now! Optimization Problems  Minimizing.
1 Chapter 7 Building Phylogenetic Trees. 2 Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances –UPGMA method.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Computational Biology, Part D Phylogenetic Trees Ramamoorthi Ravi/Robert F. Murphy Copyright  2000, All rights reserved.
BINF6201/8201 Molecular phylogenetic methods
CS Data Structures Chapter 10 Search Structures.
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Calculating branch lengths from distances. ABC A B C----- a b c.
Using traveling salesman problem algorithms for evolutionary tree construction Chantal Korostensky and Gaston H. Gonnet Presentation by: Ben Snider.
5.5.2 M inimum spanning trees  Definition 24: A minimum spanning tree in a connected weighted graph is a spanning tree that has the smallest possible.
Ch.6 Phylogenetic Trees 2 Contents Phylogenetic Trees Character State Matrix Perfect Phylogeny Binary Character States Two Characters Distance Matrix.
Evolutionary tree reconstruction
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Building Phylogenetic Trees.
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
5.5.3 Rooted tree and binary tree  Definition 25: A directed graph is a directed tree if the graph is a tree in the underlying undirected graph.  Definition.
Phylogeny Ch. 7 & 8.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Comp. Genomics Recitation 10 Clustering and analysis of microarrays.
Selecting Genomes for Reconstruction of Ancestral Genomes Louxin Zhang Department of Mathematics National University of Singapore.
Probabilistic methods for phylogenetic tree reconstruction BMI/CS 576 Colin Dewey Fall 2015.
Probabilistic Approaches to Phylogenies BMI/CS 576 Sushmita Roy Oct 2 nd, 2014.
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Statistical stuff: models, methods, and performance issues CS 394C September 3, 2009.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
Multiple Alignment and Phylogenetic Trees
Recitation 5 2/4/09 ML in Phylogeny
The Tree of Life From Ernst Haeckel, 1891.
Inferring phylogenetic trees: Distance and maximum likelihood methods
Phylogeny.
Presentation transcript:

Comp. Genomics Recitation 8 Phylogeny

Outline Phylogeny: Distance based Probabilistic Parsimony

Exercise Show that in UPGMA, for some new cluster k The distances d kl are given by: for any cluster l

Solution Since the members of k are the members of i and j, the sum of distances between members of k and l can be written as: This is equal to:

Solution By the definition of distance between clusters, we divide the latter sum by |C k |·|C l |: Which can also be written as: ·

Exercise Show that every parent in a tree constructed by UPGMA is never lower than its daughter nodes

Exercise k ji h k =d ij /2 n h n =d kl /2 l Can n be lower than k?

Solution Since h n =d kl /2, we will show that for every k and l d kl ≥d ij and therefore node n is higher than node k According to the previous exercise: 

Solution Since i and j were merged and not i and l or j and l, we can conclude that

Exercise Show an example in which the parent node height is equal to the child node height (UPGMA).

Solution Suppose 3 pairs of sequences have the same distance d. We choose to merge leafs 1 and 2 and produce node 4, with height d/2. The new distance, d 43, is exactly d So when we merge node 4 and leaf 3, we create a new node 5 of height d/2

Solution height=d/2 5

Solution

Exercise The famous paleontologist R. Geller argued to his sister that the last common ancestor of birds and dinosaurs lived 100 million years ago. His sister claimed that the ancestor lived 200 million years ago. The evidence are 1000nt long homologous genes with 350 differences (its not contamination this time…)

Exercise Both accept the Jukes-Cantor model Both accept the assumption of a molecular clock If mutations occur independently, with rate mutations per year, whose theory is more likely to be correct?

Solution According to Jukes-Cantor, the probability of a nucleotide remaining unchanged over t time units is: The probability for a specific change:

Solution BirdDinosaur Ancestor tt Molecular clock – both species evolve at the same rate Tree T

Solution The likelihood of the tree at site i is: Likelihood of a tree Jukes-Cantor Reversibility property Jukes-Cantor Additivity Less work to do

Solution Since the distance between the species is 2t, the probability of every site in which there is a match is: For a mismatch, the probability is:

Solution So the likelihood of the tree T is

Solution The log likelihood of the trees suggested by Dr. Geller and his sister is: 3α=10 -9  α=1/3*10 -9

Solution Yay!

Exercise Assume that the substitution cost for a weighted parsimony algorithm is a metric, i.e. it satisfied S(a,a)=0, S(a,b)=S(b,a) and S(a,c)≤S(a,b)+S(b,c). Show the tree with minimal cost is independent of the position of the root.

Solution We have a set of species and we are given a minimal weight tree for it. Denote the root in this tree by k k ij lm We will show that deleting k and moving it to this edge does not change the cost of the tree

Solution What is the cost of the tree before translocation of the root? k ij lm For a specific choice of character c at the root: The minimal choice is the cost of this tree:

And the minimal cost of the tree is: Solution Due to the triangle inequality, S(a,b)≤S(a,c)+S(c,b) k ij lm If we set c to a (or equivalently to b), we get:

Solution Now we move the root: k ij lm k Call this tree T’

Solution Denote the character at l as d k li jm The new cost is: where the S’ is due to the change in subtree

Solution k li jm k ij lm

k li jm k ij lm

We proved that when moving the root to an adjacent position does not change the minimal cost. Why is the case of moving the root to a non-adjacent position easier to prove?

Question Does every symmetric distance with 0 on the diagonal have a tree?

Answer No! Example: If d(a,d) = 0.25 and d(b,d)=0.25, then it must be that d(a,b) ≤ 0.5 abcd a b101 c110 d 0