The restriction mapping problem revisited Gopal Pandurangan and H. Ramesh Journal of Computer and System Sciences 526~544(2002)

Slides:



Advertisements
Similar presentations
Improved Algorithms for Inferring the Minimum Mosaic of a Set of Recombinants Yufeng Wu and Dan Gusfield UC Davis CPM 2007.
Advertisements

An Improved Succinct Dynamic k-Ary Tree Representation (work in progress) Diego Arroyuelo Department of Computer Science, Universidad de Chile.
A new method of finding similarity regions in DNA sequences Laurent Noé Gregory Kucherov LORIA/UHP Nancy, France LORIA/INRIA Nancy, France Corresponding.
Reference Assisted Nucleic Acid Sequence Reconstruction from Mass Spectrometry Data Gabriel Ilie 1, Alex Zelikovsky 2 and Ion Măndoiu 1 1 CSE Department,
CS 171: Introduction to Computer Science II
Simulating a CRCW algorithm with an EREW algorithm Efficient Parallel Algorithms COMP308.
CSE182 CSE182-L12 Mass Spectrometry Peptide identification.
Fa 05CSE182 CSE182-L7 Protein sequencing and Mass Spectrometry.
Complexity Analysis (Part I)
Pattern Discovery in RNA Secondary Structure Using Affix Trees (when computer scientists meet real molecules) Giulio Pavesi& Giancarlo Mauri Dept. of Computer.
Peptide Identification by Tandem Mass Spectrometry Behshad Behzadi April 2005.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
The max flow problem
Sequence Alignment Variations Computing alignments using only O(m) space rather than O(mn) space. Computing alignments with bounded difference Exclusion.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy under contract.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (continued) Advanced Implementation of Tables.
Fa 05CSE182 CSE182-L8 Mass Spectrometry. Fa 05CSE182 Bio. quiz What is a gene? What is a transcript? What is translation? What are microarrays? What is.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (excerpts) Advanced Implementation of Tables CS102 Sections 51 and 52 Marc Smith and.
Physical Mapping II + Perl CIS 667 March 2, 2004.
The Simplified Partial Digest Problem: Hardness and a Probabilistic Analysis Zo ë Abrams Ho-Lin Chen
1 An Algorithmic Approach to Peptide Sequencing via Tandem Mass Spectrometry Ming-Yang Kao Department of Computer Science Northwestern University Evanston,
Introduction to Bioinformatics Algorithms Exhaustive Search and Branch-and-Bound Algorithms for Partial Digest Mapping.
Simulating a CRCW algorithm with an EREW algorithm Lecture 4 Efficient Parallel Algorithms COMP308.
UCSC 1 Aman ShaikhICNP 2003 An Efficient Algorithm for OSPF Subnet Aggregation ICNP 2003 Aman Shaikh Dongmei Wang, Guangzhi Li, Jennifer Yates, Charles.
Protein sequencing and Mass Spectrometry. Sample Preparation Enzymatic Digestion (Trypsin) + Fractionation.
Restriction Mapping of Plasmid DNA. Restriction Maps Restriction enzymes can be used to construct maps of plasmid DNA Restriction enzymes can be used.
February 17, 2015Applied Discrete Mathematics Week 3: Algorithms 1 Double Summations Table 2 in 4 th Edition: Section th Edition: Section th.
Orthogonal Range Searching I Range Trees. Range Searching S = set of geometric objects Q = query object Report/Count objects in S that intersect Q Query.
1 Physical Mapping --An Algorithm and An Approximation for Hybridization Mapping Shi Chen CSE497 04Mar2004.
1 Geometric Intersection Determining if there are intersections between graphical objects Finding all intersecting pairs Brute Force Algorithm Plane Sweep.
Distance Approximating Trees in Graphs
Physical Mapping of DNA Shanna Terry March 2, 2004.
1 Plaxton Routing. 2 Introduction Plaxton routing is a scalable mechanism for accessing nearby copies of objects. Plaxton mesh is a data structure that.
1 Outline Last time: –Molecular biology primer (sections ) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter.
The dynamic nature of the proteome
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
DNA Computing on a Chip Mitsunori Ogihara and Animesh Ray Nature, vol. 403, pp Cho, Dong-Yeon.
Chapter 3 Sec 3.3 With Question/Answer Animations 1.
An Approximation Algorithm for Binary Searching in Trees Marco Molinaro Carnegie Mellon University joint work with Eduardo Laber (PUC-Rio)
INF380 - Proteomics-91 INF380 – Proteomics Chapter 9 – Identification and characterization by MS/MS The MS/MS identification problem can be formulated.
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (continued) Advanced Implementation of Tables.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
JM - 1 Introduction to Bioinformatics: Lecture III Genome Assembly and String Matching Jarek Meller Jarek Meller Division of Biomedical.
Fall 2002CMSC Discrete Structures1 Enough Mathematical Appetizers! Let us look at something more interesting: Algorithms.
MCA-2012Data Structure1 Algorithms Rizwan Rehman CCS, DU.
INF380 - Proteomics-101 INF380 – Proteomics Chapter 10 – Spectral Comparison Spectral comparison means that an experimental spectrum is compared to theoretical.
Combinatorial Optimization Problems in Computational Biology Ion Mandoiu CSE Department.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
PEAKS: De Novo Sequencing using Tandem Mass Spectrometry Bin Ma Dept. of Computer Science University of Western Ontario.
CSE182 CSE182-L12 Mass Spectrometry Peptide identification.
Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems Sailesh Kumar Patrick Crowley.
Introduction to Bioinformatics Algorithms DNA Mapping and Brute Force Algorithms.
CSE182 CSE182-L11 Protein sequencing and Mass Spectrometry.
7.3 Divide-and-Conquer Algorithms and Recurrence Relations If f(n) represents the number of operations required to solve the problem of size n, it follow.
1 Application of Algorithm Research to Molecular Biology R. C. T. Lee Dept. Of Computer Science National Chinan University.
1 Parallel Sorting Algorithm. 2 Bitonic Sequence A bitonic sequence is defined as a list with no more than one LOCAL MAXIMUM and no more than one LOCAL.
MINRMS: an efficient algorithm for determining protein structure similarity using root-mean-squared-distance Andrew I. Jewett, Conrad C. Huang and Thomas.
Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.
Dipankar Ranjan Baisya, Mir Md. Faysal & M. Sohel Rahman CSE, BUET Dhaka 1000 Degenerate String Reconstruction from Cover Arrays (Extended Abstract) 1.
Suffix Tree 6 Mar MinKoo Seo. Contents  Basic Text Searching  Introduction to Suffix Tree  Suffix Trees and Exact Matching  Longest Common Substring.
10/30/2013BCHB Edwards Project/Review BCHB Lecture 17.
Enough Mathematical Appetizers!
Accessing nearby copies of replicated objects
Applied Discrete Mathematics Week 6: Computation
Minimizing the Aggregate Movements for Interval Coverage
Advanced Implementation of Tables
Dynamic Programming II DP over Intervals
CSE 5290: Algorithms for Bioinformatics Fall 2009
(Journal of Computational Biology, 2001) (SODA, 2000)
Presentation transcript:

The restriction mapping problem revisited Gopal Pandurangan and H. Ramesh Journal of Computer and System Sciences 526~544(2002)

Abstract-1 In computational molecular biology, the aim of restriction mapping is to locate the restriction sites of a given enzyme on a DNA molecule. Double digest and partial digest are two well-studied techniques for restriction mapping. While double digest is NP-complete, there is no known polynomial-time algorithm for partial digest. Another disadvantage of the above techniques is that there can be multiple solutions for reconstruction.

Abstract-2 In this paper, we study a simple technique called labeled partial digest for restriction mapping. We give a fast polynomial time (O(n 2 log n) worst-case) algorithm for finding all the n sites of a DNA molecule using this technique. An important advantage of the algorithm is the unique reconstruction of the DNA molecule from the digest. The technique is also robust in handling errors in fragment lengths which arises in the laboratory. We give a robust O(n 4 ) worst-case algorithm that can provably tolerate an absolute error of (where is the minimum inter-site distance), while giving a unique reconstruction. We test our theoretical results by simulating the performance of the algorithm on a real DNA molecule.

Abstract-3 Motivated by the similarity to the labeled partial digest problem, we address a related problem of interest––the de novo peptide sequencing problem (ACM-SIAM Symposium on Discrete Algorithms (SODA), 2000, pp. 389–398), which arises in the reconstruction of the peptide sequence of a protein molecule. We give a simple and efficient algorithm for the problem without using dynamic programming. The algorithm runs in time O(k log k), where k is the number of ions and is an improvement over the algorithm in Chen et al.

Partial digest problem Also called turnpike reconstruction problem. reconstruct a set of n points (restriction sites) on a line given the set of (n,2) distance (fragment) between them. Possible solutions between 1/2n and 1/2n Fragment length could be with error in laboratory

Labeled partial digest Label both ends of the DNA molecule Primary fragments : fragments have one endpoint Others are called secondary fragments Complementary pair: c i =(x i, y i ), x i <= y i,x i + y i =t, where t is the length of parent fragment

An algorithm for error-free labeled partial digest

Running Time Step 1 takes O(n 2 ) time as it involves scanning O(n 2 ) fragments and determining the ones that are labeled Step 2 can be done in O(n log n) time by sorting Steps 4.1, 4.2, involve operations of searching, deleting and finding a maximum in a set of size O(n 2 ) Steps 4.4 and 4.5 involve O(n) delete operations per round. Each operation can be implemented in O(log n) time (by a balanced binary tree) and there are a total of O(n 2 ) search and delete operations (O(n) per loop), and there are a total of O(n) find- max operations

Tolerating errors in fragment lengths Fragment length lies in an interval [f-e,f+e] Let Δdenote the minimum inter-site distance, technique is robust in tolerating an absolute error up to e= Δ /(6n+2) a complementary pair if and only if f1+f2 lies in [t-3e,t+3e] the (open) interval.

Robust labeled partial digest algorithm

The de novo peptide sequencing problem The de novo peptide sequencing problem is the reconstruction of the peptide sequence from a given tandem mass spectral data a peptide (R1-R2-R3) leads to the following prefix ions (also called as b-ions): (R1)+,(R1- R2)+ and (R1-R2-R3)+ and to the following suffix ions (also called as y-ions): (R1-R2- R3)+, (R2-R3)+ and (R3)+

The de novo peptide sequencing problem (continue) 1. it is unknown whether a mass peak of some ion corresponds to a prefix or suffix sequence 2. some ions may be lost in experiments and the corresponding mass peaks disappear in the spectrum

Algorithm for peptide sequencing

Running Time Determining feasibility between a pair of sites takes O(1) time since we have precomputed the mass array Step 1 takes O(k log k) time2 due to sorting2 Steps 3 and 4 can be implemented in O(k) time by noting that each site is handled at most twice

two important open problems in the partial digest approach Suppose there are multiple copies of each fragment, i.e., different numbers of copies for different fragment, then our algorithm as stated above does not work. (This also includes the possibility of distances being lost––missing fragments). the status of the partial digest problem is still unresolved. It is strongly suspected that there is a polynomial time algorithm