The Simplified Partial Digest Problem: Hardness and a Probabilistic Analysis Zo ë Abrams Ho-Lin Chen

Slides:



Advertisements
Similar presentations
More Vectors.
Advertisements

Shortest Vector In A Lattice is NP-Hard to approximate
Circuit and Communication Complexity. Karchmer – Wigderson Games Given The communication game G f : Alice getss.t. f(x)=1 Bob getss.t. f(y)=0 Goal: Find.
1 Decomposing Hypergraphs with Hypertrees Raphael Yuster University of Haifa - Oranim.
~1~ Infocom’04 Mar. 10th On Finding Disjoint Paths in Single and Dual Link Cost Networks Chunming Qiao* LANDER, CSE Department SUNY at Buffalo *Collaborators:
Fast Algorithms For Hierarchical Range Histogram Constructions
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
Theory of Computing Lecture 3 MAS 714 Hartmut Klauck.
The Theory of NP-Completeness
1 Polynomial Time Probabilistic Learning of a Subclass of Linear Languages with Queries Yasuhiro TAJIMA, Yoshiyuki KOTANI Tokyo Univ. of Agri. & Tech.
Section 3.4 Systems of Equations in 3 Variables
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Balanced Graph Partitioning Konstantin Andreev Harald Räcke.
Connection Preemption in Multi-Class Networks Fahad Rafique Dogar Carnegie Mellon University, USA Collaborators: Laeeq Aslam and Zartash Uzmi (LUMS, Pakistan)
1 Maximum matching Max Flow Shortest paths Min Cost Flow Linear Programming Mixed Integer Linear Programming Worst case polynomial time by Local Search.
Computational Complexity, Physical Mapping III + Perl CIS 667 March 4, 2004.
The restriction mapping problem revisited Gopal Pandurangan and H. Ramesh Journal of Computer and System Sciences 526~544(2002)
Physical Mapping II + Perl CIS 667 March 2, 2004.
11 -1 Chapter 11 Randomized Algorithms Randomized algorithms In a randomized algorithm (probabilistic algorithm), we make some random choices.
Exhaustive Search: DNA Mapping and Brute Force Algorithms
Solving Systems of Linear Equations and Circles Adapted from Walch Education.
Introduction to Bioinformatics Algorithms Exhaustive Search and Branch-and-Bound Algorithms for Partial Digest Mapping.
SYSTEM OF EQUATIONS SYSTEM OF LINEAR EQUATIONS IN THREE VARIABLES
§2.1 Introductory Material The student will learn about: and the beginning postulates to be used in this course. definitions of basic terms, 1.
Physical Mapping of DNA Shanna Terry March 2, 2004.
MAPS OF DNA AND INTERVAL GRAPHS by Akshita Gurram.
Simple and Improved Parameterized Algorithms for Multiterminal Cuts Mingyu Xiao The Chinese University of Hong Kong Hong Kong SAR, CHINA CSR 2008 Presentation,
1 The Theory of NP-Completeness 2012/11/6 P: the class of problems which can be solved by a deterministic polynomial algorithm. NP : the class of decision.
Introduction to Bioinformatics Algorithms DNA Mapping and Brute Force Algorithms.
Confidence Intervals 1 Chapter 6. Chapter Outline Confidence Intervals for the Mean (Large Samples) 6.2 Confidence Intervals for the Mean (Small.
Confidence Intervals for the Mean (Large Samples) Larson/Farber 4th ed 1 Section 6.1.
Confidence Intervals for the Mean (σ known) (Large Samples)
Prabhas Chongstitvatana1 NP-complete proofs The circuit satisfiability proof of NP- completeness relies on a direct proof that L  p CIRCUIT-SAT for every.
11 -1 Chapter 11 Randomized Algorithms Randomized Algorithms In a randomized algorithm (probabilistic algorithm), we make some random choices.
Physical Mapping of DNA BIO/CS 471 – Algorithms for Bioinformatics.
Approximation Algorithms
1 On Completing Latin Squares Iman Hajirasouliha Joint work with Hossein Jowhari, Ravi Kumar, and Ravi Sundaram.
Introduction to Bioinformatics Algorithms DNA Mapping and Brute Force Algorithms.
NP-COMPLETE PROBLEMS. Admin  Two more assignments…  No office hours on tomorrow.
NP-Complete problems.
Limits to Computation How do you analyze a new algorithm? –Put it in the form of existing algorithms that you know the analysis. –For example, given 2.
NP-Completeness (Nondeterministic Polynomial Completeness) Sushanth Sivaram Vallath & Z. Joseph.
Solving Linear Equations Substitution. Find the common solution for the system y = 3x + 1 y = x + 5 There are 4 steps to this process Step 1:Substitute.
NP-Completness Turing Machine. Hard problems There are many many important problems for which no polynomial algorithms is known. We show that a polynomial-time.
NP-completeness NP-complete problems. Homework Vertex Cover Instance. A graph G and an integer k. Question. Is there a vertex cover of cardinality k?
CSC 413/513: Intro to Algorithms
An Introduction to Bioinformatics Algorithmswww.bioalgorithms.info Physical Mapping – Restriction Mapping.
Topics in Algorithms 2005 The Turnpike Problem Ramesh Hariharan.
An Algorithm for the Consecutive Ones Property Claudio Eccher.
DNA Replication. McGraw Hill DNA Replication Movies the two strands of DNA are separated and then used as templates from which new strands are constructed.
Substitution Method: Solve the linear system. Y = 3x + 2 Equation 1 x + 2y=11 Equation 2.
P & NP.
Naotoshi Seo, Hiroshi Toyoizumi Performance Evaluation Laboratory
6.5 Stochastic Prog. and Benders’ decomposition
Sum of Squares, Planted Clique, and Pseudo-Calibration
Digital Signature Schemes and the Random Oracle Model
CS154, Lecture 16: More NP-Complete Problems; PCPs
The Curve Merger (Dvir & Widgerson, 2008)
Logarithmic and exponential equations
Example 1: Finding Solutions of Equations with Two Variables
Chapter 6 Confidence Intervals.
Prabhas Chongstitvatana
What is the difference between simplifying and solving?
CS154, Lecture 16: More NP-Complete Problems; PCPs
Confidence Intervals for the Mean (Large Samples)
6.5 Stochastic Prog. and Benders’ decomposition
Complexity Theory in Practice
CSE 5290: Algorithms for Bioinformatics Fall 2009
Chapter 6 Confidence Intervals.
Logarithmic and exponential equations
Presentation transcript:

The Simplified Partial Digest Problem: Hardness and a Probabilistic Analysis Zo ë Abrams Ho-Lin Chen

An enzyme cuts a target DNA strand to into DNA fragments, and these DNA fragments are used to reconstruct the restriction site locations of the enzyme. Two common Approaches  Double Digest Problem (NP-complete) [Goldstein, Waterman ’87]  Partial Digest Problem Restriction Site Analysis

Reconstruct the locations using the length of all fragments that can possibly be produced. The hardness of the problem is unknown. [Skiena, Sundaram ’93][Lemke, Skiena, Smith ’02] Adding the primary fragments to the information used, we can find a unique reconstruction in polynomial time. [Pandurangan, Ramesh ’01] Information is susceptible to experimental error caused by missing fragments. Partial Digest Problem

Proposed by Blazewicz et. Al. ’01 Uses primary fragments and base fragments to reconstruct restriction sites  Primary fragments: One of the endpoints is the endpoint of the original DNA strand  Base fragments: two endpoints are consecutive sites on the DNA strand Simplified Partial Digest Problem

Problem Definition Given  X 0 = 0, X n+1 = D  A set of base fragments {X i - X i-1 } 1  i  n+1  A set of primary fragments {(X n+1 - X i )  (X i – X 0 )} 1  i  n Reconstruct the original series X 1,...,X n,

Theoretical and Algorithmic Issues The algorithm that finds the exact solution may take 2 n time in the worst case. [Blazewicz, Jaroszewski ’03] The Simplified Partial Digest Problem may have exponential number of solutions. The problem is APX-hard. Simple algorithms can give correct solution with high probability.

Proof of APX-Hardness We proved Simplified Partial Digest Problem is APX-hard by reducing the Tripartite-Matching problem to it. Tripartite-Matching Problem: Given a set S of triples in {1,2,3..n} 3, |S|=T. Find whether there exists a subset M of S such that |M| = n, and no two triples in M are the same in some coordinates.

Tripartite Matching Problem

Proof of APX-Hardness Use symmetric restriction sites to cut the segment into 2T equal-length segments ……. 12 2T

Proof of APX-Hardness Use symmetric restriction sites to cut the segment into 2T equal-length segments ……. Pairs of symmetric restriction sites

Proof of APX-Hardness Use symmetric restriction sites to cut the segment into 2T equal-length segments ……. Pairs of symmetric restriction sites

Proof of APX-Hardness Use symmetric restriction sites to cut the segment into 2T equal-length segments ……. Pairs of symmetric restriction sites

Proof of APX-Hardness Use symmetric restriction sites to cut the segment into 2T equal-length segments. In each pair of equal-length segments, there are seven restriction sites that can be put on either side. ……. 12 2T Sites “x" can be on either side

Proof of APX-Hardness Use symmetric restriction sites to cut the segment into 2T equal-length segments. In each pair of equal-length segments, there are seven restriction sites that can be put on either side. ……. 12 2T Sites “x" can be on either side

Proof of APX-Hardness Those seven restriction sites can be divided into two groups, denoted by “o” and “x” respectively.

Proof of APX-Hardness Those seven restriction sites can be divided into two groups, denoted by “o” and “x” respectively. In each segment, restriction sites in the same group must be put on the same side.

Proof of APX-Hardness Those seven restriction sites can be divided into two groups, denoted by “o” and “x” respectively. In each segment, restriction sites in the same group must be put on the same side. Each placement of restriction sites corresponds to a set of triples chosen in the Tripartite Matching Problem. not chosen chosen

Proof of APX-Hardness Those seven restriction sites can be divided into two groups, denoted by “o” and “x” respectively. In each segment, restriction sites in the same group must be put on the same side. Each placement of restriction sites corresponds to a set of triples chosen in the Tripartite Matching Problem. The current placement of restriction sites is a solution iff the corresponding set of triples is a solution to the Tripartite Matching Problem.

A Simple Algorithm Put all symmetric points at correct locations Put all asymmetric points on the left side

A Simple Algorithm Put all symmetric points at correct locations Put all asymmetric points on the left side From each site, do (from endpoints to the middle)  If the base segment is matched, fix its location

A Simple Algorithm Put all symmetric points at correct locations Put all asymmetric points on the left side From each site, do (from endpoints to the middle)  If the base segment is matched, fix its location  If the base segment isn’t matched, move it and all points toward middle to the other side.

A Simple Algorithm Put all symmetric points at correct locations Put all asymmetric points on the left side From each site, do (from endpoints to the middle)  If the base segment is matched, fix its location  If the base segment isn’t matched, move it and all points toward middle to the other side.

Analysis of the Algorithm Assuming a uniform distribution for restriction sites, for many practical parameters*, with probability at least 0.4 the algorithm outputs correct locations. All the primary fragments are matched, and at least ¼ of all base fragments will be matched in the worst case. Runs in time linear to the number of sites *Ex: Length of the DNA strand around 20,000, restriction sites

Future Work Construct better heuristics to solve SPDP Analyze the hardness of Partial Digest Problem Find other characterizations of restriction sites that are both easy to measure and can be used to reconstruct the sites