Presentation is loading. Please wait.

Presentation is loading. Please wait.

Longest Common Rigid Subsequence Bin Ma and Kaizhong Zhang Department of Computer Science University of Western Ontario Ontario, Canada.

Similar presentations


Presentation on theme: "Longest Common Rigid Subsequence Bin Ma and Kaizhong Zhang Department of Computer Science University of Western Ontario Ontario, Canada."— Presentation transcript:

1 Longest Common Rigid Subsequence Bin Ma and Kaizhong Zhang Department of Computer Science University of Western Ontario Ontario, Canada.

2 (Rigid) Subsequence Subsequence: COMBINATORIALPATTERNMATCHING CPM Rigid Subsequence: 0123456789012345678901234567 COMBINATORIALPATTERNMATCHING CPM, (13,7)

3 Common (Rigid) Subsequence Longest Common Subsequence (LCS) –combinatorial pattern matching –longest common rigid subsequence comnienc Longest Common Rigid Subsequence (LCRS) – combinatorial pattern matching –longest common rigid subsequence comni,(1,1,3,5)

4 Previous Results LCS and LCRS of two strings: –polynomial time solvable LCS of many strings: –Cannot be approximated within ratio in polynomial time (Jiang and Li 1995, SIAM J COMP). –For random instances, a simple greedy algorithm can give an almost optimal solution with only small error. LCRS of many strings: –Exponential time algorithms. –Our CPM paper tries to answer the time complexity.

5 Motivation in Bioinformatics In biochemistry, a motif is a recurring pattern in DNA/protein sequences. A protein motif (SH3 domain binding motif) in J. Biological Chemistry 269:24034-9. Many motifs can be found at PROSITE database of ExPASy.

6 Motivation Rigoutsos and Floratos proposed the following problem (Bioinformatics 14:55-67,1998). –Given n strings and a positive number K, find a longest “rigid pattern” (rigid subsequence) that occurs in at least K of the n strings. When K=n, it is LCRS. Exponential time algorithms were studied. NP-hardness unknown.

7 Our Results LCRS is MAX-SNP hard –Therefore, Rigoutsos and Floratos’ problem is also MAX-SNP hard. For random instances, there is an algorithm solves LCRS with quasi-polynomial average running time. –The algorithm also works for Rigoutsos and Floratos’ problem with simple modifications.

8 MAX-SNP hard L-reduction from Max-Cut vertex edge delimiter

9 The construction of each edge aaa aba bab contributes 0 aaa aba bab contributes 1 aaa aba bab contributes 1 Three possible configurations in an ungapped alignment

10 The Algorithm Let S i be the set of length-i common rigid subsequences. We only need to prove that

11 Sketch of Proof For each rigid subsequence in S i, the probability it occurs in one random string of length n The prob. that it occurs in every input string There are in total length i rigid subsequences. This can be done by two cases i 2 logn.

12 Acknowledgement Supported by NSERC, PREA and CRC.


Download ppt "Longest Common Rigid Subsequence Bin Ma and Kaizhong Zhang Department of Computer Science University of Western Ontario Ontario, Canada."

Similar presentations


Ads by Google