Download presentation
Presentation is loading. Please wait.
Published byDeonte Summerhays Modified over 9 years ago
1
Longest Common Rigid Subsequence Bin Ma and Kaizhong Zhang Department of Computer Science University of Western Ontario Ontario, Canada.
2
(Rigid) Subsequence Subsequence: COMBINATORIALPATTERNMATCHING CPM Rigid Subsequence: 0123456789012345678901234567 COMBINATORIALPATTERNMATCHING CPM, (13,7)
3
Common (Rigid) Subsequence Longest Common Subsequence (LCS) –combinatorial pattern matching –longest common rigid subsequence comnienc Longest Common Rigid Subsequence (LCRS) – combinatorial pattern matching –longest common rigid subsequence comni,(1,1,3,5)
4
Previous Results LCS and LCRS of two strings: –polynomial time solvable LCS of many strings: –Cannot be approximated within ratio in polynomial time (Jiang and Li 1995, SIAM J COMP). –For random instances, a simple greedy algorithm can give an almost optimal solution with only small error. LCRS of many strings: –Exponential time algorithms. –Our CPM paper tries to answer the time complexity.
5
Motivation in Bioinformatics In biochemistry, a motif is a recurring pattern in DNA/protein sequences. A protein motif (SH3 domain binding motif) in J. Biological Chemistry 269:24034-9. Many motifs can be found at PROSITE database of ExPASy.
6
Motivation Rigoutsos and Floratos proposed the following problem (Bioinformatics 14:55-67,1998). –Given n strings and a positive number K, find a longest “rigid pattern” (rigid subsequence) that occurs in at least K of the n strings. When K=n, it is LCRS. Exponential time algorithms were studied. NP-hardness unknown.
7
Our Results LCRS is MAX-SNP hard –Therefore, Rigoutsos and Floratos’ problem is also MAX-SNP hard. For random instances, there is an algorithm solves LCRS with quasi-polynomial average running time. –The algorithm also works for Rigoutsos and Floratos’ problem with simple modifications.
8
MAX-SNP hard L-reduction from Max-Cut vertex edge delimiter
9
The construction of each edge aaa aba bab contributes 0 aaa aba bab contributes 1 aaa aba bab contributes 1 Three possible configurations in an ungapped alignment
10
The Algorithm Let S i be the set of length-i common rigid subsequences. We only need to prove that
11
Sketch of Proof For each rigid subsequence in S i, the probability it occurs in one random string of length n The prob. that it occurs in every input string There are in total length i rigid subsequences. This can be done by two cases i 2 logn.
12
Acknowledgement Supported by NSERC, PREA and CRC.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.