Fixed-Parameter Algorithms for CLOSEST STRING and Related Problems Algorithmica(2003) Jens Gramm, Rolf Niedermeier, Peter Rossmanith.

Slides:



Advertisements
Similar presentations
Minimization of AND-OR-EXOR Three Level Networks with AND gate Sharing Hasnain Heickal (SH-223)
Advertisements

Boosting Textual Compression in Optimal Linear Time.
Shortest Vector In A Lattice is NP-Hard to approximate
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Longest Common Subsequence
Generalization and Specialization of Kernelization Daniel Lokshtanov.
Fixed Parameter Complexity Algorithms and Networks.
Refining Edits and Alignments Υλικό βασισμένο στο κεφάλαιο 12 του βιβλίου: Dan Gusfield, Algorithms on Strings, Trees and Sequences, Cambridge University.
The number of edge-disjoint transitive triples in a tournament.
Complexity 16-1 Complexity Andrei Bulatov Non-Approximability.
Computability and Complexity 23-1 Computability and Complexity Andrei Bulatov Search and Optimization.
Complexity 15-1 Complexity Andrei Bulatov Hierarchy Theorem.
CS Master – Introduction to the Theory of Computation Jan Maluszynski - HT Lecture 4 Context-free grammars Jan Maluszynski, IDA, 2007
NP-Complete Problems Reading Material: Chapter 10 Sections 1, 2, 3, and 4 only.
An Efficient Fixed Parameter Algorithm for 3-Hitting Set
The Theory of NP-Completeness
NP-Complete Problems Problems in Computer Science are classified into
Sequence Alignment Variations Computing alignments using only O(m) space rather than O(mn) space. Computing alignments with bounded difference Exclusion.
Polynomial time approximation scheme Lecture 17: Mar 13.
Faster Algorithm for String Matching with k Mismatches Amihood Amir, Moshe Lewenstin, Ely Porat Journal of Algorithms, Vol. 50, 2004, pp Date.
Time Complexity.
Computability and Complexity 24-1 Computability and Complexity Andrei Bulatov Approximation.
Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.
Chapter 11: Limitations of Algorithmic Power
Toward NP-Completeness: Introduction Almost all the algorithms we studies so far were bounded by some polynomial in the size of the input, so we call them.
On efficient fixed-parameter algorithms for weighted vertex cover By Rolf Niedermeier & Peter Rossmanith Presentation by Peerapol Bhuaratnarunkon April.
Variable-Length Codes: Huffman Codes
Chapter 11 Limitations of Algorithm Power Copyright © 2007 Pearson Addison-Wesley. All rights reserved.
Linear Programming and Parameterized Algorithms. Linear Programming n real-valued variables, x 1, x 2, …, x n. Linear objective function. Linear (in)equality.
Experts and Boosting Algorithms. Experts: Motivation Given a set of experts –No prior information –No consistent behavior –Goal: Predict as the best expert.
Data reduction lower bounds: Problems without polynomial kernels Hans L. Bodlaender Joint work with Downey, Fellows, Hermelin, Thomasse, Yeo.
LP formulation of Economic Dispatch
Regular Model Checking Ahmed Bouajjani,Benget Jonsson, Marcus Nillson and Tayssir Touili Moran Ben Tulila
Fixed Parameter Complexity Algorithms and Networks.
Computational Complexity Polynomial time O(n k ) input size n, k constant Tractable problems solvable in polynomial time(Opposite Intractable) Ex: sorting,
Closest String with Wildcards ( CSW ) Parameterized Complexity Analysis for the Closest String with Wildcards ( CSW ) Problem Danny Hermelin Liat Rozenberg.
Geometric Matching on Sequential Data Veli Mäkinen AG Genominformatik Technical Fakultät Bielefeld Universität.
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter 1: Exact String Matching.
CSE 3813 Introduction to Formal Languages and Automata Chapter 14 An Introduction to Computational Complexity These class notes are based on material from.
1 Lower Bounds Lower bound: an estimate on a minimum amount of work needed to solve a given problem Examples: b number of comparisons needed to find the.
CSCI 3160 Design and Analysis of Algorithms Tutorial 10 Chengyu Lin.
NP-COMPLETE PROBLEMS. Admin  Two more assignments…  No office hours on tomorrow.
Efficient Algorithms for Some Variants of the Farthest String Problem Chih Huai Cheng, Ching Chiang Huang, Shu Yu Hu, Kun-Mao Chao.
ON THE EFFICIENCY OF THE HAMMING C-CENTERSTRING PROBLEMS Amihood Amir Liam Roditty Jessica Ficler Oren Sar Shalom.
Minicourse on parameterized algorithms and complexity Part 4: Linear programming Dániel Marx (slides by Daniel Lokshtanov) Jagiellonian University in Kraków.
Approximation Algorithms Department of Mathematics and Computer Science Drexel University.
Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.
A * Search A* (pronounced "A star") is a best first, graph search algorithm that finds the least-cost path from a given initial node to one goal node out.
1 How to establish NP-hardness Lemma: If L 1 is NP-hard and L 1 ≤ L 2 then L 2 is NP-hard.
Strings Basic data type in computational biology A string is an ordered succession of characters or symbols from a finite set called an alphabet Sequence.
Algorithms for hard problems Parameterized complexity – definitions, sample algorithms Juris Viksna, 2015.
Algorithms for hard problems Introduction Juris Viksna, 2015.
Compression for Fixed-Width Memories Ori Rottenstriech, Amit Berman, Yuval Cassuto and Isaac Keslassy Technion, Israel.
TU/e Algorithms (2IL15) – Lecture 12 1 Linear Programming.
Searching a Linear Subspace Lecture VI. Deriving Subspaces There are several ways to derive the nullspace matrix (or kernel matrix). ◦ The methodology.
Kernel Bounds for Path and Cycle Problems Bart M. P. Jansen Joint work with Hans L. Bodlaender & Stefan Kratsch September 8 th 2011, Saarbrucken.
The geometric GMST problem with grid clustering Presented by 楊劭文, 游岳齊, 吳郁君, 林信仲, 萬高維 Department of Computer Science and Information Engineering, National.
1 SAT SAT: Given a Boolean function in CNF representation, is there a way to assign truth values to the variables so that the function evaluates to true?
1 Maximum Flows CONTENTS Introduction to Maximum Flows (Section 6.1) Introduction to Minimum Cuts (Section 6.1) Applications of Maximum Flows (Section.
Computing smallest and largest repetition factorization in O(n log n) time Hiroe Inoue, Yoshiaki Matsuoka, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai,
ICS 353: Design and Analysis of Algorithms NP-Complete Problems King Fahd University of Petroleum & Minerals Information & Computer Science Department.
Chapter 10 NP-Complete Problems.
New Characterizations in Turnstile Streams with Applications
Homomorphic Hashing for Sparse Coefficient Extraction
ICS 353: Design and Analysis of Algorithms
On the k-Closest Substring and k-Consensus Pattern Problems
Chapter 11 Limitations of Algorithm Power
An O(n log n)-Time Algorithm for the k-Center Problem in Trees
Dynamic Programming II DP over Intervals
Presentation transcript:

Fixed-Parameter Algorithms for CLOSEST STRING and Related Problems Algorithmica(2003) Jens Gramm, Rolf Niedermeier, Peter Rossmanith

Outline Introduction Preliminaries Linear-Time solution for constant d Related Problems Linear-Time solution for fixed k Conclusion

Intro : Problem Definition Input: String s 1, s 2, …, s k over alphabet Σ of length L each, and a nonnegative integer d. Question: Is there a string s of length L such that d H (s, s i )≤d for all i=1,…,k d H (s 1, s 2 ) = |{i|s 1 [i]≠s 2 [i]}|, |s 1 |=|s 2 |

NP-completeness CLOSEST STRING is NP-complete d is usually small in biological applications O(kL+kd*d d ) result in this paper PTAS by Li et al

Extended problems d-MISMATCH DISTINGUISHING STRING SELECTION DISTINGUISHING SUBSTRING SELECTION

Preliminaries Given a set of string S={s 1, …,s k }, each of length L s is optimal center string iff no s ’ such that max i=1, …,k d H (s ’,s i )<max i=1, …,k d H (s,s i ) s is optimal median string iff no s ’ such that Σ i=1, …,k d H (s ’,s i )<Σ i=1, …,k d H (s,s i )

Given a set of k strings of length L, think of this string as k x L matrix Optimal median string : a c c a s1abcd s2aadb s3bcda s4accc

Main idea Search! Fixed-parameter tractibility Reduction to problem kernel

LEMMA 1. Given a set of strings S={s 1, …,s k }, each of length L, and a permutationσ:{1,…,L}  {1,…,L}. Then s is an optimal center string for {s 1,…,s k } iff σ(s) is an optimal center string for {σ(s 1 ), σ(s 2 ), …, σ(s k )}

LEMMA 2. To compute an optimal center string, it is sufficient to solve a normalized and reordered instance. From this, the solution of the original instance can be derived in linear time s1abcd s2aadb s3bcda s4accc s1abaa s2acbb s3babc s4aaad s1baaa s2cabb s3abbc s4aaad

LEMMA 3. A CLOSEST STRING instance with arbitrary alphabet Σ, |Σ|>k, isomorphic to a CLOSEST STRING instance with alphabet Σ’, |Σ’|=k. By normalization

LEMMA 4. Given a CLOSTEST STRING instance s 1, …,s k of length L and d. If the resulting k x L matrix has more than kd dirty dirty columns, then there is no string s with max i=1, …,k d H (s,s i )≤d A column is dirty iff it contains at least two different symbols from alphabet Σ By pigeon theorem

A Linear-Time solution for constant d Bounded search tree algorithm LEMMA 5. Given a set of strings S={s 1, …,s k } and a positive integer d. If there are i, j  {1, …,k} with d H {s i,s j }>2d, then there is no string s with max i=1, …,k d H (s, s i )≤d

Theorem 1. Given a set of string S={s 1, …,s k } and d, Algorithm D determines in O(kL+kd*d d ) time. By lemma 4, reduced the input instance to O(kd) in O(kL) time Depth=d, Time(D0+D1+D2+D3)=kd by building a table containing the distances of candidate s 1 to all other given strings

correctness Show only the correctness of first step If s 1 is not a solution but there exists a center string s P :={p|s 1 [p]≠s i [p]}, |P|=d+1 P s1≠s=s i := {p|s 1 [p]≠s[p]=s i [p]}  goal! P s1≠s=si =P s≠si ∪ P (disjoint), |P s≠si |≤d So d+1 subcases is sufficient

Related Problems d-MISMATCH problem S i,p,L denote the length L substring of a given string s i starting at position p Whether there is a string of length L and a position p with 1≤p≤n-L+1, such that d H (s,s i,p,L )≤d, for all I Stojanvoic et al give a linear time algorithm fo 1-MISMATCH Theorem 2. d-MISMATCH is solvable in O(kL+(n- L)kd*d d ) time which O(n*k) for fixed d Naively: O(n*(KL+kd*d d )) Maintain the queue of dirty columns Considering only the first L columns, we can build a FIFO queue in O(kL) Update at each position in O(k) time

DSS problem DISTINGUISHING STRING SELECTION Given S={s 1, …,s k1 }, S ’ ={s ’ 1, …,s ’ k2 } all of the same length L, and d 1,d 2 ≥0, is there a s such that LEMMA 6. Given two set of strings S 1 ={s 1,…,s k1 } and S 2 ={s’ 1,…,s’ k2 } and positive d1,d2. If there are i  {1, …,k 1 } and j  {1, … k 2 } with d H (s i,s ’ j )<L-(d1+d2), then there is no string s satisfying both max i=1, …,k1 d H (s,s i )≤d 1 and min j=1,…,k2 d H (s,s’ j )≥L-d 2 d H (s,s’ j )≤d H (s,s i )+d H (s i,s’ j )

A Linear-Time Solution for Fixed k Is CLOSEST STRING fixed parameter tractable? Use integer linear programming (ILP) Lenstra: ILP with a fixed number of variables can be solved in linear time(exponential space)

CLOSEST STRING in ILP Column types for k For k=3: (a,a,a) t, (a,a,b) t, (a,b,a) t, (b,a,a) t, (a,b,c) t |column types|=B(k)≤k! X t,φ, t: column type, φ  Σ Number of column type t whose corresponding character in the desired solution string of CLOSEST STRING is set to φ B(k)*k Variables needed Minimize Φ t,i denates the alphabet symbol at the i th entry of column type t

Conclusion Fixed parameter tractability for CLOSEST STRING in d, k Improve previous work in d-MISMATCH DSS CLOSEST SUBSTRING ?