Incomplete Directed Perfect Phylogeny Itsik Pe'er, Tal Pupko, Ron Shamir, and Roded Sharan SIAM Journal on Computing Volume 33, Number 3, pp. 590-607.

Slides:



Advertisements
Similar presentations
The Primal-Dual Method: Steiner Forest TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A A AA A A A AA A A.
Advertisements

Interval Heaps Complete binary tree. Each node (except possibly last one) has 2 elements. Last node has 1 or 2 elements. Let a and b be the elements in.
Property testing of Tree Regular Languages Frédéric Magniez, LRI, CNRS Michel de Rougemont, LRI, University Paris II.
Parallel BioInformatics Sathish Vadhiyar. Parallel Bioinformatics  Many large scale applications in bioinformatics – sequence search, alignment, construction.
O(N 1.5 ) divide-and-conquer technique for Minimum Spanning Tree problem Step 1: Divide the graph into  N sub-graph by clustering. Step 2: Solve each.
PHYLOGENETIC TREES Bulent Moller CSE March 2004.
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
A simple example finding the maximum of a set S of n numbers.
Graph Search Methods Spring 2007 CSE, POSTECH. Graph Search Methods A vertex u is reachable from vertex v iff there is a path from v to u. A search method.
Advance Data Structure 1 College Of Mathematic & Computer Sciences 1 Computer Sciences Department م. م علي عبد الكريم حبيب.
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
1 Merge Sort Review of Sorting Merge Sort. 2 Sorting Algorithms Selection Sort uses a priority queue P implemented with an unsorted sequence: –Phase 1:
1 CS 201 Compiler Construction Lecture 2 Control Flow Analysis.
June 2, Combinatorial methods in Bioinformatics: the haplotyping problem Paola Bonizzoni DISCo Università di Milano-Bicocca.
June 2, Combinatorial methods in Bioinformatics: the haplotyping problem Paola Bonizzoni DISCo Università di Milano-Bicocca.
1 A Linear-Time Algorithm for the Perfect Phylogeny Haplotyping (PPH) Problem Zhihong Ding, Vladimir Filkov, Dan Gusfield Department of Computer Science.
D. Gusfield, V. Bansal (Recomb 2005) A Fundamental Decomposition Theory for Phylogenetic Networks and Incompatible Characters.
1 CS 201 Compiler Construction Lecture 2 Control Flow Analysis.
. Perfect Phylogeny Tutorial #11 © Ilan Gronau Original slides by Shlomo Moran.
Computational Problems in Perfect Phylogeny Haplotyping: Xor-Genotypes and Tag SNPs Tamar Barzuza 1 Jacques S. Beckmann 2,3 Ron Shamir 4 Itsik Pe’er 5.
The (Supertree) of Life: Procedures, Problems, and Prospects Presented by Usman Roshan.
L6: Haplotype phasing. Genotypes and Haplotypes Each individual has two “copies” of each chromosome. Each individual has two “copies” of each chromosome.
Haplotyping via Perfect Phylogeny Conceptual Framework and Efficient (almost linear-time) Solutions Dan Gusfield U.C. Davis RECOMB 02, April 2002.
Haplotyping via Perfect Phylogeny: A Direct Approach
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter : Strings and.
Integer Programming for Phylogenetic and Population- Genetic Problems with Complex Data D. Gusfield, Y. Frid, D. Brown Cocoon’07, July 16, 2007.
Incorporating Mutations
Conjugacy in Thompson’s Group Jim Belk (joint with Francesco Matucci)
Copyright N. Friedman, M. Ninio. I. Pe’er, and T. Pupko. 2001RECOMB, April 2001 Structural EM for Phylogentic Inference Nir Friedman Computer Science &
Phylogenetic Networks of SNPs with Constrained Recombination D. Gusfield, S. Eddhu, C. Langley.
Perfect Phylogeny MLE for Phylogeny Lecture 14
The Design and Analysis of Algorithms
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
A Linear-Time Algorithm for the Perfect Phylogeny Haplotyping (PPH) Problem Zhihong Ding, Vladimir Filkov, Dan Gusfield RECOMB 2005, pp. 585–600 Date:
Approximating the MST Weight in Sublinear Time Bernard Chazelle (Princeton) Ronitt Rubinfeld (NEC) Luca Trevisan (U.C. Berkeley)
Giuseppe Lancia University of Udine The phasing of heterozygous traits: Algorithms and Complexity.
GRAPHS CSE, POSTECH. Chapter 16 covers the following topics Graph terminology: vertex, edge, adjacent, incident, degree, cycle, path, connected component,
MINATO ZDD Project Efficient Enumeration of the Directed Binary Perfect Phylogenies from Incomplete Data Toshiki Saitoh (ERATO) Joint work with Masashi.
Phylogenetics II.
ReferencesReferences AcknowledgementsAcknowledgements TORQUE server DefinitionsDefinitions MethodsMethods IntroductionIntroduction Experiments & Results.
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
394C, Spring 2013 Sept 4, 2013 Tandy Warnow. DNA Sequence Evolution AAGACTT TGGACTTAAGGCCT -3 mil yrs -2 mil yrs -1 mil yrs today AGGGCATTAGCCCTAGCACTT.
Ch.6 Phylogenetic Trees 2 Contents Phylogenetic Trees Character State Matrix Perfect Phylogeny Binary Character States Two Characters Distance Matrix.
TORQUE: T OPOLOGY -F REE Q UERYING OF P ROTEIN I NTERACTION N ETWORKS Sharon Bruckner 1, Falk Hüffner 1, Richard M. Karp 2, Ron Shamir 1, and Roded Sharan.
. Perfect Phylogeny Tutorial #10 © Ilan Gronau Original slides by Shlomo Moran.
598AGB Basics Tandy Warnow. DNA Sequence Evolution AAGACTT TGGACTTAAGGCCT -3 mil yrs -2 mil yrs -1 mil yrs today AGGGCATTAGCCCTAGCACTT AAGGCCTTGGACTT.
. Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.
An Algorithm for the Consecutive Ones Property Claudio Eccher.
Application of Phylogenetic Networks in Evolutionary Studies Daniel H. Huson and David Bryant Presented by Peggy Wang.
by d. gusfield v. bansal v. bafna y. song presented by vikas taliwal
394C, Spring 2012 Jan 23, 2012 Tandy Warnow.
Top 50 Data Structures Interview Questions
CS 201 Compiler Construction
The Design and Analysis of Algorithms
Approximating the MST Weight in Sublinear Time
CPSC 411 Design and Analysis of Algorithms
Character-Based Phylogeny Reconstruction
CS 201 Compiler Construction
Recitation 5 2/4/09 ML in Phylogeny
EMIS 8373: Integer Programming
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Speaker: Chuang-Chieh Lin National Chung Cheng University
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
CS 581 Tandy Warnow.
5.4 T-joins and Postman Problems
CPSC 411 Design and Analysis of Algorithms
Biconnectivity SEA PVD ORD FCO SNA MIA 5/23/ :21 PM
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Imputing Supertrees and Supernetworks from Quartets
Perfect Phylogeny Tutorial #10
Presentation transcript:

Incomplete Directed Perfect Phylogeny Itsik Pe'er, Tal Pupko, Ron Shamir, and Roded Sharan SIAM Journal on Computing Volume 33, Number 3, pp

Abstract Perfect phylogeny is one of the fundamental models for studying evolution. We investigate the following variant of the model: The input is a species-characters matrix. The characters are binary and directed, i.e., a species can only gain characters. The difference from standard perfect phylogeny is that for some species the states of some characters are unknown. The question is whether one can complete the missing states in a way that admits a perfect phylogeny. The problem arises in classical phylogenetic studies, when some states are missing or undetermined.

Abstract(cont.) Quite recently, studies that infer phylogenies using inserted repeat elements in DNA gave rise to the same problem. Extant solutions for it take time O(n 2 m) for n species and m characters. We provide a graph theoretic formulation of the problem as a graph sandwich problem, and give near-optimal ~ O(nm)-time algorithms for the problem. We also study the problem of finding a single, general solution tree, from which any other solution can be obtained by node splitting. We provide an algorithm to construct such a tree, or determine that none exists.

Problem An incomplete matrix A c1c1 c2c2 c3c3 c4c4 c5c5 s1s1 1?001 s2s2 ??010 s3s3 ?01?? c1c1 c2c2 c3c3 c4c4 c5c5 s1s s2s s3s A completion of B c1c1 c5c5 c3c3 c2, c4c2, c4 s2s2 s1s1 s3s3 A phylogenetic tree that explains A via B

Problem(cont.) c2c2 c1c1 s3s3 s1s1 s2s2 The Σ subgraph. c1c1 c2c2 s1s1 11 s2s2 10 s3s3 01 A binary matrix B has a phylogenetic tree iff the 1-sets of every two characters are compatible. ( Two sets are compatible if they are either disjoint, or one of them contains the other.)

Algorithm (Divide and Conquer) Alg( A = ((S, C), E 0, E ?, E 1 )): 1. If |S| > 1 then do: (a) Remove all S-semi-universal characters and all null characters from G( A ). (b) If the resulting graph G ’ is connected then output False and halt. (c) Otherwise, let K 1 …K r be the connected components of G ’ 0, and let A 1 … A r be the corresponding submatrices of A. (d) For i = 1 … r do: Alg( A i ). 2. Output S.

Example c1c1 c2c2 c3c3 c4c4 c5c5 s1s1 1?00? s2s2 11??0 s3s3 ?11?0 s4s4 ??11? s5s5 ?0?10 c1c1 c2c2 c3c3 c4c4 c5c5 s1s s2s s3s s4s s5s