Closest String with Wildcards ( CSW ) Parameterized Complexity Analysis for the Closest String with Wildcards ( CSW ) Problem Danny Hermelin Liat Rozenberg.

Slides:



Advertisements
Similar presentations
Complexity Classes: P and NP
Advertisements

 2004 SDU Lecture17-P,NP, NPC.  2004 SDU 2 1.Decision problem and language decision problem decision problem and language 2.P and NP Definitions of.
Fixed-Parameter Algorithms for CLOSEST STRING and Related Problems Algorithmica(2003) Jens Gramm, Rolf Niedermeier, Peter Rossmanith.
Fingerprint Clustering - CPM Fingerprint Clustering with Bounded Number of Missing Values Paola Bonizzoni, Gianluca Della Vedova, Giancarlo Mauri.
CSC5160 Topics in Algorithms Tutorial 2 Introduction to NP-Complete Problems Feb Jerry Le
Complexity 11-1 Complexity Andrei Bulatov NP-Completeness.
On the complexity of finding common approximate substrings Theoretical Computer Science 306 (2003) Patricia A. Evans, Andrew D. Smith, H. Todd.
Computability and Complexity 23-1 Computability and Complexity Andrei Bulatov Search and Optimization.
Complexity 15-1 Complexity Andrei Bulatov Hierarchy Theorem.
CS21 Decidability and Tractability
Complexity and Approximation of the Minimum Recombinant Haplotype Configuration Problem Authors: Lan Liu, Xi Chen, Jing Xiao & Tao Jiang.
1 Introduction to Linear and Integer Programming Lecture 9: Feb 14.
1 Optimization problems such as MAXSAT, MIN NODE COVER, MAX INDEPENDENT SET, MAX CLIQUE, MIN SET COVER, TSP, KNAPSACK, BINPACKING do not have a polynomial.
Finding Compact Structural Motifs Presented By: Xin Gao Authors: Jianbo Qian, Shuai Cheng Li, Dongbo Bu, Ming Li, and Jinbo Xu University of Waterloo,
1 CSE 417: Algorithms and Computational Complexity Winter 2001 Lecture 23 Instructor: Paul Beame.
1 CSE 417: Algorithms and Computational Complexity Winter 2001 Lecture 24 Instructor: Paul Beame.
Integer Programming Difference from linear programming –Variables x i must take on integral values, not real values Lots of interesting problems can be.
Chapter 11: Limitations of Algorithmic Power
Linear Programming and Parameterized Algorithms. Linear Programming n real-valued variables, x 1, x 2, …, x n. Linear objective function. Linear (in)equality.
Hardness Results for Problems
Data reduction lower bounds: Problems without polynomial kernels Hans L. Bodlaender Joint work with Downey, Fellows, Hermelin, Thomasse, Yeo.
Complexity Classes Kang Yu 1. NP NP : nondeterministic polynomial time NP-complete : 1.In NP (can be verified in polynomial time) 2.Every problem in NP.
Fixed Parameter Complexity Algorithms and Networks.
1 The TSP : NP-Completeness Approximation and Hardness of Approximation All exact science is dominated by the idea of approximation. -- Bertrand Russell.
The Complexity of Optimization Problems. Summary -Complexity of algorithms and problems -Complexity classes: P and NP -Reducibility -Karp reducibility.
Final Exam Review Final exam will have the similar format and requirements as Mid-term exam: Closed book, no computer, no smartphone Calculator is Ok Final.
MCS 312: NP Completeness and Approximation algorthms Instructor Neelima Gupta
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
NP-COMPLETENESS PRESENTED BY TUSHAR KUMAR J. RITESH BAGGA.
EMIS 8373: Integer Programming NP-Complete Problems updated 21 April 2009.
Approximation Schemes Open Shop Problem. O||C max and Om||C max {J 1,..., J n } is set of jobs. {M 1,..., M m } is set of machines. J i : {O i1,..., O.
CSCI 3160 Design and Analysis of Algorithms Tutorial 10 Chengyu Lin.
1 Chapter 8 NP and Computational Intractability Slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved.
Unit 9: Coping with NP-Completeness
NP-Complete Problems. Running Time v.s. Input Size Concern with problems whose complexity may be described by exponential functions. Tractable problems.
NP-COMPLETE PROBLEMS. Admin  Two more assignments…  No office hours on tomorrow.
Efficient Algorithms for Some Variants of the Farthest String Problem Chih Huai Cheng, Ching Chiang Huang, Shu Yu Hu, Kun-Mao Chao.
Lecture 6 NP Class. P = ? NP = ? PSPACE They are central problems in computational complexity.
NP-Complete problems.
Instructor Neelima Gupta Table of Contents Class NP Class NPC Approximation Algorithms.
ON THE EFFICIENCY OF THE HAMMING C-CENTERSTRING PROBLEMS Amihood Amir Liam Roditty Jessica Ficler Oren Sar Shalom.
Linear Program Set Cover. Given a universe U of n elements, a collection of subsets of U, S = {S 1,…, S k }, and a cost function c: S → Q +. Find a minimum.
Minicourse on parameterized algorithms and complexity Part 4: Linear programming Dániel Marx (slides by Daniel Lokshtanov) Jagiellonian University in Kraków.
CS 3343: Analysis of Algorithms Lecture 25: P and NP Some slides courtesy of Carola Wenk.
CSE 589 Part V One of the symptoms of an approaching nervous breakdown is the belief that one’s work is terribly important. Bertrand Russell.
Strings Basic data type in computational biology A string is an ordered succession of characters or symbols from a finite set called an alphabet Sequence.
CS6045: Advanced Algorithms NP Completeness. NP-Completeness Some problems are intractable: as they grow large, we are unable to solve them in reasonable.
Lecture 25 NP Class. P = ? NP = ? PSPACE They are central problems in computational complexity.
Chapter 11 Introduction to Computational Complexity Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.
Algorithms for hard problems Introduction Juris Viksna, 2015.
NP Completeness Piyush Kumar. Today Reductions Proving Lower Bounds revisited Decision and Optimization Problems SAT and 3-SAT P Vs NP Dealing with NP-Complete.
Conditional Lower Bounds for Dynamic Programming Problems Karl Bringmann Max Planck Institute for Informatics Saarbrücken, Germany.
TU/e Algorithms (2IL15) – Lecture 12 1 Linear Programming.
The NP class. NP-completeness Lecture2. The NP-class The NP class is a class that contains all the problems that can be decided by a Non-Deterministic.
1 Design and Analysis of Algorithms Yoram Moses Lecture 13 June 17, 2010
Approximation algorithms
ICS 353: Design and Analysis of Algorithms NP-Complete Problems King Fahd University of Petroleum & Minerals Information & Computer Science Department.
The NP class. NP-completeness
Chapter 10 NP-Complete Problems.
Lecture 2-2 NP Class.
ICS 353: Design and Analysis of Algorithms
Bart M. P. Jansen June 3rd 2016, Algorithms for Optimization Problems
REDUCESEARCH Polynomial Kernels for Hitting Forbidden Minors under Structural Parameterizations Bart M. P. Jansen Astrid Pieterse ESA 2018 August.
On the k-Closest Substring and k-Consensus Pattern Problems
Chapter 11 Limitations of Algorithm Power
“(More) Consequences of Falsifying SETH
On The Quantitative Hardness of the Closest Vector Problem
Complexity Theory in Practice
CS21 Decidability and Tractability
Presentation transcript:

Closest String with Wildcards ( CSW ) Parameterized Complexity Analysis for the Closest String with Wildcards ( CSW ) Problem Danny Hermelin Liat Rozenberg CPM - Moscow - June 2014

The Closest String problem Given : – a set of m strings, each of length n – a positive integer d Find Find : – a string s of length n such that the Hamming Distance between s and each input string ≤ d s is called a closest string ( or consensus, or center string )

The Closest String problem s = t g a c S = S = {“c g a t”, “t c g c”, “a g a c” } d = 2 c g a t t c g c a g a c m n input : S and d Output : s

The Closest String problem s = t g a c S = S = {“c g a t”, “t c g c”, “a g a c” } d = 2 c g a t t c g c a g a c m n input : S and d Output : s

The Closest String problem s = t g a c S = S = {“c g a t”, “t c g c”, “a g a c” } d = 2 c g a t t c g c a g a c m n input : S and d Output : s

The Closest String problem s = t g a c S = S = {“c g a t”, “t c g c”, “a g a c” } d = 2 c g a t t c g c a g a c m n input : S and d Output : s

The Closest String problem S = S = {“c g a t”, “t c g c”, “a g a c” } d = 1 c g a t t c g c a g a c m n input : S and d Output : s No closest string with d=1

The Closest String problem Proved to be NP-hard, even for binary strings Frances and Litman (1997) Practical solutions mainly focused on Polynomial Time Approximation Scheme (PTAS) algorithms Exact algorithms based on Fixed Parameter solutions (FPT) -Lanctot, Li, Ma and Wang (1999) -Li, Ma and Wang (2002) -Marx (2008) -Ma and Sun (2009) -Amir, Paryenty and Roditty (2011) -Chimani, Wost and Bocker (2012) -… -Stojanovic, Berman, Gumucio, Hardison and Miller (1997) -Gramm, Niedermeier and Rossmanith (2003) -Ma and Sun (2009) -…

The Closest String with Wildcards (CSW) problem

Closest String with Wildcards The input strings may include wildcard characters * * matches all characters in ∑ The closest string is required NOT to contain any wildcards

Closest String with Wildcards S = S = {“c * a t”, “t c * c”, “* g a *” } d = 1 c * a t t c * c * g a * m n input : S and d Output : s s = t c a t

Closest String with Wildcards s = t c a t S = S = {“c * a t”, “t c * c”, “* g a *” } d = 1 c * a t t c * c * g a * m n input : S and d Output : s

Generalization of the classical Closest String problem For biological applications, the wildcards version represents more realistic data Closest String with Wildcards

Many biological applications ask for a representative string that satisfy a distance constraint from a set of generated sequences These sequences are subject to errors or have missing fragments, which create gaps that we model as wildcards Closest String with Wildcards

Mentioned as a possible extension of the 1- region problem described in Stojanovic et al. (1997) Yet, the problem left open As far as we know, there are no published results for this problem Closest String with Wildcards

CSW is NP-hard Parameterized complexity of: Our results m # input strings n strings length d distance bound k min # of wildcards in any string whether or not these yield fixed - parame ter algorithms

A problem is Fixed-Parameter Tractable (FPT) if it has a solution in time Parameterized Complexity parameterInput length

A problem is Fixed-Parameter Tractable (FPT) if it has a solution in time Parameterized Complexity parameterInput length arbitrary (typically super-polynomial) function independent of n

CSW is NP-hard Parameterized complexity of: Our results m # input strings n strings length d distance bound k min # of wildcards in any string show FPT for: - m - n - d + k prove NO FPT for: - k - d ≥ 2 The case of d = 1

NO FPT with respect to k k = min # of wildcards in any string Immediate from the fact that the Closest String problem is NP-hard k = 0

NO FPT with respect to d We prove that CSW problem is NP-hard for d ≥ 2, even when the strings are over binary alphabet

No FPT when d =2 1 * 0 * * 0 * * * - Each clause creates a string - Each position j corresponds to a variable : - if x j appears = 1 / 0 - Otherwise = *

No FPT when d =2 1 * 0 * * 0 * * * - A satisfying assignment corresponds to a string that has Hamming Distance at most 2 from each input string

No FPT when d =2 1 * 0 * * 0 * * * - In the satisfying assignment : - x 1 = 1 and / or - x 3 = 0 and / or - x 6 = 0

No FPT when d =2 1 * 0 * * 0 * * * - On the other way, a string with Hamming Distance ≤ 2 must have : - x 1 = 1 and / or - x 3 = 0 and / or - x 6 = 0

No FPT when d =2 Thus, a solution to the CSW instance with d =2, corresponds to a solution of our 3- SAT instance Which proves that the CSW problem with d =2 is NP - complete What about larger distances?

No FPT when d >2 A reduction from NP-hardness for some d implies hardness for d+1 input string s m n m To each string s i we add a suffix which has 1 in the i ’ s position and 0’ s in all other positions

No FPT when d >2 A reduction from NP-hardness for some d implies hardness for d+1 input string s m n s ≤d≤d

No FPT when d >2 A reduction from NP-hardness for some d implies hardness for d+1 input string s m n s ≤ d +1

NO FPT with respect to d Thus, if CSW is NP - hard for d, then it also NP - hard for d +1 The CSW problem is NP - hard for d ≥ 2 even for binary alphabet What about d =1 ?

d =1 and binary strings We show a polynomial - time algorithm for binary alphabet Open problem : arbitrary alphabet

d =1 and binary strings Our polynomial-time algorithm uses a reduction to the 2-SAT problem Determining whether a 2-SAT formula is satisfiable can be done in linear-time ( Aspvall et al. [1979], Even et al.[1976] )

d =1 and binary strings the input strings as m x n matrix * 1 0 * 1 * * 1 0 n variables, one for each column Up to 4 clauses For each pair of columns Formula size O ( n 2 ) x1 x2 x3x4x1 x2 x3x4

d =1 and binary strings the input strings as m x n matrix * 1 0 * 1 * * 1 0 x1 x2 x3x4x1 x2 x3x4 The main idea : any assignment that does NOT satisfy a clause, has exactly 2 errors with one of the strings

d =1 and binary strings the input strings as m x n matrix * 1 0 * 1 * * 1 0 x1 x2 x3x4x1 x2 x3x4 Conversely : If there is a satisfying assignment, then it corresponds to a binary string with distance ≤ 1 from each input string

d =1 and binary strings the input strings as m x n matrix * 1 0 * 1 * * 1 0 x1 x2 x3x4x1 x2 x3x4 Conclusion : The resulting 2- SAT is satisfiable iff there is a closest string with d ≤ 1

Conclusions & Open Problems New variant of the CS problem – CSW We focused only in optimal exact solutions Interesting directions : Optimization versions (e.g. minimum possible d, maximum number of strings m’≤m for some d, etc..)

Conclusions & Open Problems e Note that it is likely that most techniques for CS won ’ t carry through to CSW since the triangle inequality does not hold d = 0 d = n * 0n0n 1n1n n

Thank You for listening