An introduction to maximum parsimony and compatibility

Slides:



Advertisements
Similar presentations
Solving connectivity problems parameterized by treewidth in single exponential time Marek Cygan, Marcin Pilipczuk, Michal Pilipczuk Jesper Nederlof, Dagstuhl.
Advertisements

Connectivity - Menger’s Theorem Graphs & Algorithms Lecture 3.
Edge-connectivity and super edge-connectivity of P 2 -path graphs Camino Balbuena, Daniela Ferrero Discrete Mathematics 269 (2003) 13 – 20.
22C:19 Discrete Math Graphs Fall 2014 Sukumar Ghosh.
Midwestern State University Department of Computer Science Dr. Ranette Halverson CMPS 2433 – CHAPTER 4 GRAPHS 1.
Bayesian Networks, Winter Yoav Haimovitch & Ariel Raviv 1.
13 May 2009Instructor: Tasneem Darwish1 University of Palestine Faculty of Applied Engineering and Urban Planning Software Engineering Department Introduction.
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
Week 21 Basic Set Theory A set is a collection of elements. Use capital letters, A, B, C to denotes sets and small letters a 1, a 2, … to denote the elements.
GOLOMB RULERS AND GRACEFUL GRAPHS
Applied Discrete Mathematics Week 12: Trees
D. Gusfield, V. Bansal (Recomb 2005) A Fundamental Decomposition Theory for Phylogenetic Networks and Incompatible Characters.
Bioinformatics Algorithms and Data Structures
On Balanced Signed Graphs and Consistent Marked Graphs Fred S. Roberts DIMACS, Rutgers University Piscataway, NJ, USA.
Lists A list is a finite, ordered sequence of data items. Two Implementations –Arrays –Linked Lists.
Phylogenetic Networks of SNPs with Constrained Recombination D. Gusfield, S. Eddhu, C. Langley.
Introduction to Graph Theory
Copyright © Cengage Learning. All rights reserved. CHAPTER 11 ANALYSIS OF ALGORITHM EFFICIENCY ANALYSIS OF ALGORITHM EFFICIENCY.
GRAPH Learning Outcomes Students should be able to:
V. V. Vazirani. Approximation Algorithms Chapters 3 & 22
Matrix Completion Problems for Various Classes of P-Matrices Leslie Hogben Department of Mathematics, Iowa State University, Ames, IA 50011
1 Treewidth, partial k-tree and chordal graphs Delpensum INF 334 Institutt fo informatikk Pinar Heggernes Speaker:
May 1, 2002Applied Discrete Mathematics Week 13: Graphs and Trees 1News CSEMS Scholarships for CS and Math students (US citizens only) $3,125 per year.
Approximating the Minimum Degree Spanning Tree to within One from the Optimal Degree R 陳建霖 R 宋彥朋 B 楊鈞羽 R 郭慶徵 R
Copyright © Cengage Learning. All rights reserved. CHAPTER 10 GRAPHS AND TREES.
UNC Chapel Hill Lin/Foskey/Manocha Minimum Spanning Trees Problem: Connect a set of nodes by a network of minimal total length Some applications: –Communication.
CSE, IIT KGP Graph Theory: Introduction Pallab Dasgupta Dept. of CSE, IIT
© by Kenneth H. Rosen, Discrete Mathematics & its Applications, Sixth Edition, Mc Graw-Hill, 2007 Chapter 9 (Part 2): Graphs  Graph Terminology (9.2)
Edge-disjoint induced subgraphs with given minimum degree Raphael Yuster 2012.
CSNB143 – Discrete Structure Topic 9 – Graph. Learning Outcomes Student should be able to identify graphs and its components. Students should know how.
Discrete Structures Lecture 12: Trees Ji Yanyan United International College Thanks to Professor Michael Hvidsten.
Indian Institute of Technology Kharagpur PALLAB DASGUPTA Graph Theory: Introduction Pallab Dasgupta, Professor, Dept. of Computer Sc. and Engineering,
394C, Spring 2013 Sept 4, 2013 Tandy Warnow. DNA Sequence Evolution AAGACTT TGGACTTAAGGCCT -3 mil yrs -2 mil yrs -1 mil yrs today AGGGCATTAGCCCTAGCACTT.
Ch.6 Phylogenetic Trees 2 Contents Phylogenetic Trees Character State Matrix Perfect Phylogeny Binary Character States Two Characters Distance Matrix.
Graph Colouring L09: Oct 10. This Lecture Graph coloring is another important problem in graph theory. It also has many applications, including the famous.
10. Lecture WS 2006/07Bioinformatics III1 V10: Network Flows V10 follows closely chapter 12.1 in on „Flows and Cuts in Networks and Chapter 12.2 on “Solving.
Theory of Computation, Feodor F. Dragan, Kent State University 1 TheoryofComputation Spring, 2015 (Feodor F. Dragan) Department of Computer Science Kent.
1 12/2/2015 MATH 224 – Discrete Mathematics Formally a graph is just a collection of unordered or ordered pairs, where for example, if {a,b} G if a, b.
11. Lecture WS 2014/15 Bioinformatics III1 V11 Menger’s theorem Borrowing terminology from operations research consider certain primal-dual pairs of optimization.
Maximum Likelihood Given competing explanations for a particular observation, which explanation should we choose? Maximum likelihood methodologies suggest.
Graph Theory and Applications
THEORY OF COMPUTATION Komate AMPHAWAN 1. 2.
Computing Branchwidth via Efficient Triangulations and Blocks Authors: F.V. Fomin, F. Mazoit, I. Todinca Presented by: Elif Kolotoglu, ISE, Texas A&M University.
1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,
598AGB Basics Tandy Warnow. DNA Sequence Evolution AAGACTT TGGACTTAAGGCCT -3 mil yrs -2 mil yrs -1 mil yrs today AGGGCATTAGCCCTAGCACTT AAGGCCTTGGACTT.
Chapter 11 - Graph CSNB 143 Discrete Mathematical Structures.
Algorithms for hard problems Parameterized complexity Bounded tree width approaches Juris Viksna, 2015.
. Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.
COMPSCI 102 Introduction to Discrete Mathematics.
12. Lecture WS 2012/13Bioinformatics III1 V12 Menger’s theorem Borrowing terminology from operations research consider certain primal-dual pairs of optimization.
(CSC 102) Lecture 30 Discrete Structures. Graphs.
by d. gusfield v. bansal v. bafna y. song presented by vikas taliwal
Subgraphs Lecture 4.
Graphs: Definitions and Basic Properties
Chapter 9 (Part 2): Graphs
What is Probability? Quantification of uncertainty.
Graph theory Definitions Trees, cycles, directed graphs.
Graph Theory and Algorithm 01
PC trees and Circular One Arrangements
Multiple Alignment and Phylogenetic Trees
Slide 1: Thank you Elizabeth for the introduction, and hello everybody. So, I have been a PhD student with Charles Semple and Mike Steel at the UoC since.
Discrete Mathematics and its Applications Lecture 1 – Graph Theory
Connectivity Section 10.4.
On (k,l)-Leaf Powers Peter Wagner University of Rostock, Germany
V12 Menger’s theorem Borrowing terminology from operations research
Discrete Mathematics for Computer Science
Applied Discrete Mathematics Week 13: Graphs
Minimum Spanning Trees
Presentation transcript:

An introduction to maximum parsimony and compatibility Trevor Bruen PhD Candidate McGill Centre for Bioinformatics

Overview The point of this talk is to give a sense how discrete mathematics enters into phylogenetic and genetic inference. I will illustrate these ideas by describing two approaches in detail namely maximum compatibility and maximum parsimony. I will also show how ideas from these two criteria can be used to develop applications such as bounds and tests for recombination. My goal is to give the basis for further study in this type of area and to give greater insight into these methods.

Outline Introduction to compatibility and parsimony Overview of basic notation/concepts Compatibility Compatibility as a graph theory problem Compatibility for pairs of characters Interpretation of compatibility Parsimony Parsimony score with connections to graph theory Connections between parsimony and compatibility Homoplasy Parsimony for pairs of characters Connections between SPRs/TBRs and parsimony Applications to recombination Parsimony as a consensus method

Introduction Maximum parsimony and maximum compatibility that are used in phylogenetics, linguistics and population genetics Phylogenetics goal is to infer an evolutionary tree Linguistics often the same Population genetics uses compatibility for recombination For general phylogenetic inference with molecular data, likelihood (probability based) methods are generally preferred. BUT compatibility and parsimony are computationally tractable. ALSO the mathematics behind parsimony and compatibility is very well developed. We can show that parsimony=likelihood in certain circumstances (Tuffley and Steel 1997). This gives us insight in where to go in terms of research.

Formalism A character is a mapping from a set of taxa to a set of states. In this case, X={S1,S2,S3,S4} Also, C={A,C} Informally, a character is a “column” in a multiple sequence alignment

Binary Character / Splits If character has two states then it induces a split of the taxa set. Example: Let X be the taxa set {S1,S2,S3,S4}. Let C be the state set {A,C}. Then {S1,S2} | {S3,S4} is the split induced by the first character. In general a character induces a set of equivalence classes

Tree and Labeling Informally we would like to be able to mathematically describe a tree and a labeling structure. In graph theory a tree T=(V,E) consists of a graph with no cycles. Informally, we would also like to be able to add taxa (members of X) to our tree (actually the leaves). Define a labeling function (such that leaves of V(T) are labeled by members of X):

X-Trees An X-tree consists of pair: (T, phi) where phi is a labeling function that labels the leaves of T. Recall:

Extensions Informally, we have an X-tree consisting of the pair (T,phi). We also have a character chi. We need to relate the character to the tree. Define an extension of character as a function (which is consistent at the leaves with chi): Informally, an extension provides a description of how the internal vertices are labeled.

Quick Summary Summary so far: X-tree are trees along with functions labeling the leaves with members of X A character is a function from X into a state set C An extension is a labeling of the vertices of T with states of C

Compatibility - Definition A character is compatible with a tree if and only if there exists an extension of the character to the tree so that the subgraphs induced by each of the states are connected. Example: First tree character is compatible with tree Second tree character is incompatible since both A’s are disconnected

Compatibility Problem definition: Given a sequence of characters determine whether there exists a tree on which all character are compatible. Related problem: Given a sequence of characters determine largest set of characters that are compatible with some tree

Intersection Graph Suppose we have sequence of characters where Then each character induces a partition of X - I.e. Create a graph where the vertex set consists of There is an edge between two vertices iff only the intersection of the two subsets are non-empty

Intersection Graph To figure out whether the sequence of characters are compatible, we will be able to determine this directly from the intersection graph. First we need to define two concepts: a chordal graph and a restricted chordal completion of the intersection graph.

Chordal Graphs A graph G=(V,E) is chordal graph if every cycle with at least four vertices contains a chord (an edge connecting two non-consecutive vertices). A chordalization of graph is a graph G’=(V,E’) where such that G’ is chordal

Restricted Chordal Completions Imagine the vertices of our graph G=(V,E) are colored. Then a restricted chordalization of G is a graph G’=(V,E’), where G’ is chordal but all edges of G connect vertices of different colors.

Restricted chordal completions A restricted chordal completion of the intersection graph is a chordalization where there is no edge between vertices that share the same character. In this case, the “colors” correspond to characters

Main Theorem for Compatibility Let be a collection of characters. Then is compatible if and only if there is a restricted chordal completion of the intersection graph.

Pairs of Characters A simple corollary of main theorem arises when we restrict our attention to two characters. Corollary: Two characters are compatible if and only if the intersection graph, G for both characters is acyclic Proof: (backwards direction) If graph is acyclic then it is chordal so the characters are compatible. (forward direction) OTOH Suppose G contains a cycle. Then any chordal completion of G must contain a three cycle. But no restricted completion of G can contain a three cycle! So G is acyclic.

Interpretation Recall: a set of characters are compatible with a X-tree if and only if there exists an extension of the character to the tree so that the subgraphs induced by each of the states are connected. Informally speaking this is a very strict condition. This corresponds to an “all or nothing” condition - either a character is compatible with a tree or it isn’t. Relaxing this condition is the subject of the next section.

Parsimony Informally: given an leaf labeled tree and a character, how can we define the fit of the character to the tree? Consider a character, along with an extension to a leaf labeled tree. Then the length of the extension is the number edges where Define the parsimony score of a character on a tree as the length of a minimal extension of the character to the tree. Denote this value by

Parsimony Then the maximum parsimony score for a set of characters on a tree is defined as: The tree that minimizes this score is referred to as the maximum parsimony tree.

Parsimony and graph theory A minimal cut-set for a leaf-labeled tree T=(V,E) and a character is a minimal set of edges whose removal ensure that if that x and y are in different components. Claim: There is a bijection between the set of minimal cut sets and minimal extensions. So the cardinality of the minimal cut set is equal to the parsimony score.

Parsimony and Graph Theory Recall Menger’s Theorem (1927): Let G=(V,E) be a graph with V1 and V2 as two disjoint subsets of V. Then the minimum number of edges whose removal from G leaves vertices of V1 and V2 in different components is equal to the maximum number of edge disjoint paths between V1 and V2. Corollary: For a binary character, the maximal number of edge disjoint paths corresponds to the parsimony score.

Compatibility and parsimony Recall: let be a collection of characters. Then is compatible if and only if there is a restricted chordal completion of the intersection graph. Question: How can characterize parsimony with respect to an intersection graph?

Compatibility Graph Recall: Each character induces a partition of X - I.e. A block for a character is a subset taxa on which is constant. Thus we may identify the blocks of with the vertices of the intersection graph.

Character Refinement A character refines another character if implies Thus characters that refine other characters correspond to refinements of the partition

Compatibility and Parsimony Recall: Let be a collection of characters. Then is compatible if and only if there is a restricted chordal completion of the intersection graph. Main:

Special Case: Two characters Recall: Two characters are compatible if and only if the intersection graph, G for both characters is acyclic Using the previous theorem we can show that the parsimony score for two characters corresponds to: where k is the number of components in the graph. Note: This score corresponds to the maximum parsimony score over all trees.

Homoplasy Recall: The parsimony score of a character on a tree, corresponds to minimum number of changes of a character on a tree. Informally: What is an intuitive way to think about the parsimony score? Define the homoplasy of character on a tree as

Homoplasy Note that with equality if and only if is convex on T Informally: Homoplasy corresponds to the number of “extra” mutations of the character on the tree. These “extra” mutations correspond to recurrent mutations Informally: Thus a character is not compatible on a tree iff it cannot be placed on a tree without “extra” mutations.

Homoplasy For Two Characters Recall: The parsimony score for a pair of characters can be found directly from the bipartite intersection graph. Recall: This score corresponds to an optimum over all trees. Thus for two characters, we can define a pairwise homoplasy score as Recall: Up to now homoplasy refers to “extra” mutations on a tree.

A second look at homoplasy Example: Two characters with a pairwise homoplasy score equal to one. Informally: We have seen that the homoplasy corresponds to the number of “extra” mutations on a tree. But in certain situations, this is biologically implausible. The state 1 may correspond to a mutation that has only arisen once. In this case, the fact that the pairs of characters are incompatible can be explained by a recombination event. This will be defined more precisely later.

A quick aside - tree distances. Differences between leaf labeled trees can be defined using various metrics - e.g. Subtree Prune and Regrafts A “subtree prune and regraft” corresponds to a specific re-arrangement of a tree. For two leaf-labeled trees, dSPR(T1, T2) is minimum #SPRs between T1 and T2

Homoplasy for two characters Theorem: If and are two characters then corresponds to the minimum number of SPRs from any leaf-labled tree on which is compatible to any leaf labeled tree on which is compatible! Informally: Thus we have a whole new interpretation of homoplasy.

Application - Testing for Recombination If recombination has occurred sites will have different histories Nearby sites will tend to have “greater” genealogical correlation than distant sites Idea: If recombination has occurred, genealogical correlation will be partially reflected by a tendency for pairs of closely linked sites to have than less homoplasy than distant sites

Test for Recombination Idea: We would like to distinguish between two possibilities - recurrent mutation and recombination. Idea: Use previous observations to develop test for recombination. H0: Single history describe all sites. H0 ’ : Nearby sites share no more compatibility than arbitrary pairs of sites Use statistic to capture information and solve analytically for p-values

Application: Parsimony and supertrees Supertree: MRP - parsimony with characters that represent trees. What does homoplasy mean in this context? Courtesy of TREE 12:315-322

Parsimony as a consensus tree Recall: If and are two characters then corresponds to the minimum number of SPRs from any leaf-labeled tree on which is compatible to any leaf labeled tree on which is compatible. Informally: This can be generalized to show that the maximum parsimony tree for a set of charaters minimizes the SPR distance to each of the set of tree on which each character is compatible…

Acknowledgements Thanks for listening! Background and further reading: Phylogenetics, Semple and Steel (book 2003) Some results I presented are not on this book - they are from work I have worked on. Please talk to me if you are interested. I have many other references- please see me if interested.