On the k-Closest Substring and k-Consensus Pattern Problems

Slides:



Advertisements
Similar presentations
Shortest Vector In A Lattice is NP-Hard to approximate
Advertisements

Minimum Clique Partition Problem with Constrained Weight for Interval Graphs Jianping Li Department of Mathematics Yunnan University Jointed by M.X. Chen.
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
Private Approximation of Search Problems Amos Beimel Paz Carmi Kobbi Nissim Enav Weinreb Ben Gurion University Research partially Supported by the Frankel.
Approximation Algorithms Chapter 5: k-center. Overview n Main issue: Parametric pruning –Technique for approximation algorithms n 2-approx. algorithm.
Noga Alon Institute for Advanced Study and Tel Aviv University
Complexity 16-1 Complexity Andrei Bulatov Non-Approximability.
Introduction to Approximation Algorithms Lecture 12: Mar 1.
Approximation Algorithms: Combinatorial Approaches Lecture 13: March 2.
1 Optimization problems such as MAXSAT, MIN NODE COVER, MAX INDEPENDENT SET, MAX CLIQUE, MIN SET COVER, TSP, KNAPSACK, BINPACKING do not have a polynomial.
NP-Complete Problems Reading Material: Chapter 10 Sections 1, 2, 3, and 4 only.
Finding Compact Structural Motifs Presented By: Xin Gao Authors: Jianbo Qian, Shuai Cheng Li, Dongbo Bu, Ming Li, and Jinbo Xu University of Waterloo,
Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.
Hardness Results for Problems
Constant Factor Approximation of Vertex Cuts in Planar Graphs Eyal Amir, Robert Krauthgamer, Satish Rao Presented by Elif Kolotoglu.
The Theory of NP-Completeness 1. Nondeterministic algorithms A nondeterminstic algorithm consists of phase 1: guessing phase 2: checking If the checking.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
The Theory of NP-Completeness 1. What is NP-completeness? Consider the circuit satisfiability problem Difficult to answer the decision problem in polynomial.
Fixed Parameter Complexity Algorithms and Networks.
Approximating Minimum Bounded Degree Spanning Tree (MBDST) Mohit Singh and Lap Chi Lau “Approximating Minimum Bounded DegreeApproximating Minimum Bounded.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Presenter: Jen Hua Chi Adviser: Yeong Sung Lin Network Games with Many Attackers and Defenders.
Closest String with Wildcards ( CSW ) Parameterized Complexity Analysis for the Closest String with Wildcards ( CSW ) Problem Danny Hermelin Liat Rozenberg.
Expanders via Random Spanning Trees R 許榮財 R 黃佳婷 R 黃怡嘉.
Batch Scheduling of Conflicting Jobs Hadas Shachnai The Technion Based on joint papers with L. Epstein, M. M. Halldórsson and A. Levin.
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Chapter 15 Approximation Algorithm Introduction Basic Definition Difference Bounds Relative Performance Bounds Polynomial approximation Schemes Fully Polynomial.
1 Combinatorial Algorithms Parametric Pruning. 2 Metric k-center Given a complete undirected graph G = (V, E) with nonnegative edge costs satisfying the.
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
1 Approximate Algorithms (chap. 35) Motivation: –Many problems are NP-complete, so unlikely find efficient algorithms –Three ways to get around: If input.
1 On Completing Latin Squares Iman Hajirasouliha Joint work with Hossein Jowhari, Ravi Kumar, and Ravi Sundaram.
ON THE EFFICIENCY OF THE HAMMING C-CENTERSTRING PROBLEMS Amihood Amir Liam Roditty Jessica Ficler Oren Sar Shalom.
Linear Program Set Cover. Given a universe U of n elements, a collection of subsets of U, S = {S 1,…, S k }, and a cost function c: S → Q +. Find a minimum.
The geometric GMST problem with grid clustering Presented by 楊劭文, 游岳齊, 吳郁君, 林信仲, 萬高維 Department of Computer Science and Information Engineering, National.
Clustering Data Streams A presentation by George Toderici.
COSC 3101A - Design and Analysis of Algorithms 14 NP-Completeness.
Approximation algorithms
Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.
ICS 353: Design and Analysis of Algorithms NP-Complete Problems King Fahd University of Petroleum & Minerals Information & Computer Science Department.
Design and Analysis of Approximation Algorithms
The Theory of NP-Completeness
The NP class. NP-completeness
More NP-Complete and NP-hard Problems
P & NP.
Chapter 10 NP-Complete Problems.
8.3.2 Constant Distance Approximations
Introduction to Approximation Algorithms
Richard Anderson Lecture 26 NP-Completeness
Hans Bodlaender, Marek Cygan and Stefan Kratsch
Richard Anderson Lecture 26 NP-Completeness
Approximate Algorithms (chap. 35)
Approximation algorithms
Haim Kaplan and Uri Zwick
Algorithms for hard problems
Chapter 5. Optimal Matchings
Computability and Complexity
k-center Clustering under Perturbation Resilience
ICS 353: Design and Analysis of Algorithms
Enumerating Distances Using Spanners of Bounded Degree
The Subset Sum Game Revisited
Parameterised Complexity
Richard Anderson Lecture 25 NP-Completeness
Richard Anderson Lecture 28 NP-Completeness
Consensus Partition Liang Zheng 5.21.
Fair Clustering through Fairlets ( NIPS 2017)
Prabhas Chongstitvatana
The Theory of NP-Completeness
Clustering.
15th Scandinavian Workshop on Algorithm Theory
Presentation transcript:

On the k-Closest Substring and k-Consensus Pattern Problems Yishan Jiao, Jingyi Xu Institute of Computing Technology Chinese Academy of Sciences Ming Li University of Waterloo July 5, 2004 2019/1/16

Outline Motivation & background Our contributions Conclusion A PTAS for k -Closest Substring Problem The NP-hardness of (2-)-approximation of the HRC problem A PTAS for k -Consensus Pattern Problem Conclusion 1.Firtstly, general introduction the main content of the paper 2.Most related works accomplished before 3.

Motivation Given n protein sequences, find a “conserved” region separately: N sequences L 1.While the original departure point of this study has been separating repeats in our DNA sequence assembly project, we have quickly realized that the problems we have abstracted relate to many widely studied in different areas from geometric clustering to DNA multiple motif finding. 2. Red/blue regions are different conserved regions, or motifs. They don’t have to be exactly the same. They match with higher scores than other regions.

Focused problem k -Closest Substring Problem(k -CSS) The definition of k-closest substring problem is presented here. Given a string set S, the length of the string in S is m. Find k center string c one through c k ,which has length L. A special case when k =2 

2-KCSS L … L … L … S C one and c two are center strings of two separate clusters.

Related work Closest Substring problem: L=m geometric Closest Substring problem Hamming Radius k-clustering problem (HRC) Geometric k-center problem counterpart L=m Closest String problem Closest Substring problem: A PTAS; M.Li et al. ,JACM 49(2):157-171,2002 Hamming Radius O(1)-clustering problem (O(1)-HRC): A RPTAS for Hamming Radius O(1)-clustering problem ; Doctoral dessertation,J.Jansson,2003.

Outline Motivation & background Our contributions Conclusion A PTAS for k -Closest Substring Problem The NP-hardness of (2- )-approximation of the HRC problem A PTAS for k -Consensus Pattern Problem Conclusion 1.Firtstly, general introduction the main content of the paper 2.Most related works accomplished before 3.

The PTAS for k-CSS Difficulties: Method: Result: How to choose n closest substrings? How to partition strings into k sets accordingly? Method: Extend random sampling strategy in [M.Li et al. , JACM 49(2):157-171,2002] Construct h to approximate the Hamming distance. Result: A PTAS for O(1)- CSS.

P-Q decomposition L positions Q P R …… …

P-Q decomposition

Random sampling strategy : ???? The random sampling strategy R1(R2):randomly pick O(log(mn)) positions from P1(P2)

Random sampling Strategy H approximate hamming distance very well h approximate Hamming distance well.

Scheme of PTAS

Scheme of PTAS 5. Get final approximating center strings Outputs (c1”, c2”) ,{t1,t2,…,tn} in polynomial time Satisfying with high probability: Extend to k=O(1) case: trivial

Sum up: Here, we partition the string set S based on the c prime one, c prime two and h. According to Lemma 3 and Lemma 1, h approximate the Hamming distance within the error of O(dopt). According to Lemma 1, c prime one and c prime two approximate the optimal c one and c two within the error of O(dopt). If we soly use random sampling strategy without P-Q decomposition, If we random sample positions over the whole string, the Lemma 3 only grant that the approximation bound is O( length of string).This is not sufficient when dopt is small. Thanks to the P-Q decomposition, we can random sample over P set instead.Thus,the approximation bound is O(dopt). So , h approximate true Hamming distance very well and the partition and the choice T1, T2 obtained by (c1’,c2’,h) is good approximation.

Outline Motivation & background Our contributions Conclusion A PTAS for k -Closest Substring Problem The NP-hardness of (2- )-approximation of the HRC problem A PTAS for k -Consensus Pattern Problem Conclusion 1.Firtstly, general introduction the main content of the paper 2.Most related works accomplished before 3.

The NP-hardness of (2-)-approximation of the HRC problem Main Ideas: Given any instance G=(V,E) of the Vertex Cover Problem, |V|=n, |E|= m' . Construct an instance <S ,k > of the Hamming radius k-clustering problem, which has a k-clustering with the maximum cluster radius not exceeding 2 . if and only if G has a vertex cover with k-m' vertices.

Thus finding an approximate solution within an approximation factor less than 2 is no easier than finding an exact solution.

We can proof: Given k  2m', k-m' vertices in V can cover E , if and only if there is a k-clustering of S with the maximum cluster radius equal to 2. if there is a polynomial algorithm for the Hamming radius k -clustering problem within an approximation factor less than 2 the exact vertex cover number of any instance G can be solved in polynomial time. This is a contradiction.

Outline Motivation & background Our contributions Conclusion A PTAS for k -Closest Substring Problem the NP-hardness of (2- )-approximation of the HRC problem A PTAS for k -Consensus Pattern Problem Conclusion Another contribution of our paper is that we give a PTAS for k -Consensus Pattern Problem. It’s a simple extension of some previous work. Due to time limitation, we skip it.

Conclusion A nice combination of Combinatorial argument (P-Q decomposition) with the random sampling strategy in solving k -CSS problem. An alternative and direct proof of the NP-hardness of (2- )-approximation of the HRC problem. Here, we present a nice combination of combinatorial argument with the random sampling strategy in solving the k-Closest Substring problem. As mentioned in the key ideas of the proof, it’s not sufficient to just use anyone of them. The role of such a combination is illustrated by our example.

Contact Us Authors Yishan Jiao, Jingyi Xu : {jys,xjy}@ict.ac.cn Bioinformatics lab, Institute of Computing Technology, Chinese Academy of Sciences Ming Li: mli@uwaterloo.ca University of Waterloo

Thank You!

Outline Motivation & background Our contributions Conclusion The PTAS for k-Closest Substring Problem the NP-hardness of (2-)-approximation of the HRC problem The PTAS for k-Consensus Pattern Problem Conclusion 1.Firtstly, general introduction the main content of the paper 2.Most related works accomplished before 3.

Deterministic PTAS for O(1)-Consensus Pattern problem 1 k-Consensus Pattern problem Most related works: The Hamming O(1) -median clustering problem  O(1)-Consensus Pattern problem when L= m. A RPTAS ; R. Ostrovsky et al. ,JACM 49(2):139-156,2002 The Consensus Pattern problem  k-Consensus Pattern problem when k= 1. A PTAS; M.Li et al., STOC’99. 给出O(1)-Consensus Pattern Problem的一个确定性PTAS,并证明。

DPTAS for O(1)-CP 1 Outline: 1.Suppose in the optimal solution: ({c1,c2}, {t1,t2,…,tn}, {C1,C2}) C1,C2: instances of Consensus Pattern problem 2.Trying all possibilities, get and satisfying Lemma 3 in M.Li et al., STOC’99.

DPTAS for O(1)-CP 2 Outline: 3. Get c1’,c2’ c1’: the column-wise majority string of c2’: the column-wise majority string of 4.Partition each into C1’,C2’ as follows: otherwise 5.Get closest substrings (tl’) in T1’,T2’ satisfying

DPTAS for O(1)-CP 3 Outline: 6.Get a good approximation solution where c1”,c2” are the column-wise majority string of all string in T1’,T2’ respectively. 7.Conclusion: Output a solution in polynomial time with total cost at most

PTAS for 2-Consensus Pattern problem

Definition of PTAS A family of approximation algorithms for problem P,{Ak}k, is called a polynomial (time) approximation scheme or PTAS, if algorithm Ak is a (1+k)-approximation algorithm and its running time is polynomial in the size of the input for a fixed k.

Vertex-cover problem Vertex cover: given an undirected graph G=(V,E), then a subset V'V such that if (u,v)E, then uV' or v V' (or both). Size of a vertex cover: the number of vertices in it. Vertex-cover problem: find a vertex-cover of minimal size.

Vertex-cover problem Vertex-cover problem is NP-complete. (See section 34.5.2). Vertex-cover belongs to NP. Vertex-cover is NP-hard (CLIQUEPvertex-cover.) Reduce <G,k> where G=<V,E> of a CLIQUE instance to <G',|V|-k> where G'=<V,E'> where E'={(u,v): u,vV, uv and <u,v>E} of a vertex-cover instance. So find an approximate algorithm.

Conclusion for the approximation solution Outline Get a good approximation solution where 10.Conclusion: Outputs (c1”, c2”) in polynomial time Satisfying with high probability: Can be derandomized by standard method [MR95]. Extend to k=O(1) case: trivial

PTAS for 2-CSS

Notation

P-Q decomposition L positions … Q P R ……