Longest Common Rigid Subsequence Bin Ma and Kaizhong Zhang Department of Computer Science University of Western Ontario Ontario, Canada.

Slides:



Advertisements
Similar presentations
Max Cut Problem Daniel Natapov.
Advertisements

Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
What is Intractable? Some problems seem too hard to solve efficiently. Question 1: Does an efficient algorithm exist?  An O(a ) algorithm, where a > 1,
CPSC 335 Dynamic Programming Dr. Marina Gavrilova Computer Science University of Calgary Canada.
1 NP-Complete Problems. 2 We discuss some hard problems:  how hard? (computational complexity)  what makes them hard?  any solutions? Definitions 
Combinatorial Pattern Matching CS 466 Saurabh Sinha.
Complexity 16-1 Complexity Andrei Bulatov Non-Approximability.
Computability and Complexity 20-1 Computability and Complexity Andrei Bulatov Random Sources.
Complexity Theory CSE 331 Section 2 James Daly. Reminders Project 4 is out Due Friday Dynamic programming project Homework 6 is out Due next week (on.
The Complexity of the Network Design Problem Networks, 1978 Classic Paper Reading
1 Module 13 Studying the internal structure of REC, the set of solvable problems –Complexity theory overview –Automata theory preview Motivating Problem.
1 Optimization problems such as MAXSAT, MIN NODE COVER, MAX INDEPENDENT SET, MAX CLIQUE, MIN SET COVER, TSP, KNAPSACK, BINPACKING do not have a polynomial.
On Uniform Amplification of Hardness in NP Luca Trevisan STOC 05 Paper Review Present by Hai Xu.
The Theory of NP-Completeness
Finding Compact Structural Motifs Presented By: Xin Gao Authors: Jianbo Qian, Shuai Cheng Li, Dongbo Bu, Ming Li, and Jinbo Xu University of Waterloo,
CSE 326: Data Structures NP Completeness Ben Lerner Summer 2007.
Analysis of Algorithms CS 477/677
Computability and Complexity 24-1 Computability and Complexity Andrei Bulatov Approximation.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Spring, 2008 Design Patterns for Optimization Problems Dynamic Programming.
Physical Mapping II + Perl CIS 667 March 2, 2004.
1 Efficient Discovery of Conserved Patterns Using a Pattern Graph Inge Jonassen Pattern Discovery Arwa Zabian 13/07/2015.
Approximation Algorithms Motivation and Definitions TSP Vertex Cover Scheduling.
Hardness Results for Problems
Approximation Algorithms
Approximation Algorithms: Bristol Summer School 2008 Seffi Naor Computer Science Dept. Technion Haifa, Israel TexPoint fonts used in EMF. Read the TexPoint.
Brandon Andrews.  Longest Common Subsequences  Global Sequence Alignment  Scoring Alignments  Local Sequence Alignment  Alignment with Gap Penalties.
MCS312: NP-completeness and Approximation Algorithms
Complexity Classes (Ch. 34) The class P: class of problems that can be solved in time that is polynomial in the size of the input, n. if input size is.
Great Theoretical Ideas in Computer Science.
Prabhas Chongstitvatana1 NP-complete proofs The circuit satisfiability proof of NP- completeness relies on a direct proof that L  p CIRCUIT-SAT for every.
Télécom 2A – Algo Complexity (1) Time Complexity and the divide and conquer strategy Or : how to measure algorithm run-time And : design efficient algorithms.
TECH Computer Science NP-Complete Problems Problems  Abstract Problems  Decision Problem, Optimal value, Optimal solution  Encodings  //Data Structure.
CSC 413/513: Intro to Algorithms NP Completeness.
CSC 172 P, NP, Etc. “Computer Science is a science of abstraction – creating the right model for thinking about a problem and devising the appropriate.
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
CSE 3813 Introduction to Formal Languages and Automata Chapter 14 An Introduction to Computational Complexity These class notes are based on material from.
Combinatorial Optimization Problems in Computational Biology Ion Mandoiu CSE Department.
1 CPSC 320: Intermediate Algorithm Design and Analysis July 28, 2014.
1 Approximate Algorithms (chap. 35) Motivation: –Many problems are NP-complete, so unlikely find efficient algorithms –Three ways to get around: If input.
PatternHunter II: Highly Sensitive and Fast Homology Search Bioinformatics and Computational Molecular Biology (Fall 2005): Representation R 林語君.
1 On Completing Latin Squares Iman Hajirasouliha Joint work with Hossein Jowhari, Ravi Kumar, and Ravi Sundaram.
Greedy Methods and Backtracking Dr. Marina Gavrilova Computer Science University of Calgary Canada.
CSCI 3160 Design and Analysis of Algorithms Tutorial 10 Chengyu Lin.
PatternHunter: A Fast and Highly Sensitive Homology Search Method Bin Ma Department of Computer Science University of Western Ontario.
Comp. Genomics Recitation 10 Clustering and analysis of microarrays.
Experimenting an approximation algorithm for the LCS Paola Bonizzoni, Gianluca Della Vedova., Giancarlo Mauri Discrete Applied Mathematics 110 (2001) 13–24.
Fixed parameter algorithms for protein similarity search under mRNA structure constrains A joint work by: G. Blin, G. Fertin, D. Hermelin, and S. Vialette.
Unique Games Approximation Amit Weinstein Complexity Seminar, Fall 2006 Based on: “Near Optimal Algorithms for Unique Games" by M. Charikar, K. Makarychev,
Computer Science Background for Biologists CSC 487/687 Computing for Bioinformatics Fall 2005.
Survivable Paths in Multilayer Networks Marzieh Parandehgheibi Hyang-won Lee Eytan Modiano 46 th Annual Conference on Information Sciences and Systems.
Chapter 11 Introduction to Computational Complexity Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.
Conditional Lower Bounds for Dynamic Programming Problems Karl Bringmann Max Planck Institute for Informatics Saarbrücken, Germany.
TU/e Algorithms (2IL15) – Lecture 9 1 NP-Completeness NOT AND OR AND NOT AND.
Instructor: Shengyu Zhang 1. Optimization Very often we need to solve an optimization problem.  Maximize the utility/payoff/gain/…  Minimize the cost/penalty/loss/…
9/27/10 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne Adam Smith Algorithm Design and Analysis L ECTURE 16 Dynamic.
COSC 3101A - Design and Analysis of Algorithms 14 NP-Completeness.
The 2x2 Simple Packing Problem André van Renssen Supervisor: Bettina Speckmann.
Conceptual Foundations © 2008 Pearson Education Australia Lecture slides for this course are based on teaching materials provided/referred by: (1) Statistics.
Lecture 5 Dynamic Programming
Approximate Algorithms (chap. 35)
Approximation algorithms
Lecture 22 Complexity and Reductions
Lecture 5 Dynamic Programming
Computability and Complexity
CS154, Lecture 16: More NP-Complete Problems; PCPs
Introduction to Algorithms Second Edition by
On the k-Closest Substring and k-Consensus Pattern Problems
Complexity Theory in Practice
CS154, Lecture 16: More NP-Complete Problems; PCPs
15th Scandinavian Workshop on Algorithm Theory
Presentation transcript:

Longest Common Rigid Subsequence Bin Ma and Kaizhong Zhang Department of Computer Science University of Western Ontario Ontario, Canada.

(Rigid) Subsequence Subsequence: COMBINATORIALPATTERNMATCHING CPM Rigid Subsequence: COMBINATORIALPATTERNMATCHING CPM, (13,7)

Common (Rigid) Subsequence Longest Common Subsequence (LCS) –combinatorial pattern matching –longest common rigid subsequence comnienc Longest Common Rigid Subsequence (LCRS) – combinatorial pattern matching –longest common rigid subsequence comni,(1,1,3,5)

Previous Results LCS and LCRS of two strings: –polynomial time solvable LCS of many strings: –Cannot be approximated within ratio in polynomial time (Jiang and Li 1995, SIAM J COMP). –For random instances, a simple greedy algorithm can give an almost optimal solution with only small error. LCRS of many strings: –Exponential time algorithms. –Our CPM paper tries to answer the time complexity.

Motivation in Bioinformatics In biochemistry, a motif is a recurring pattern in DNA/protein sequences. A protein motif (SH3 domain binding motif) in J. Biological Chemistry 269: Many motifs can be found at PROSITE database of ExPASy.

Motivation Rigoutsos and Floratos proposed the following problem (Bioinformatics 14:55-67,1998). –Given n strings and a positive number K, find a longest “rigid pattern” (rigid subsequence) that occurs in at least K of the n strings. When K=n, it is LCRS. Exponential time algorithms were studied. NP-hardness unknown.

Our Results LCRS is MAX-SNP hard –Therefore, Rigoutsos and Floratos’ problem is also MAX-SNP hard. For random instances, there is an algorithm solves LCRS with quasi-polynomial average running time. –The algorithm also works for Rigoutsos and Floratos’ problem with simple modifications.

MAX-SNP hard L-reduction from Max-Cut vertex edge delimiter

The construction of each edge aaa aba bab contributes 0 aaa aba bab contributes 1 aaa aba bab contributes 1 Three possible configurations in an ungapped alignment

The Algorithm Let S i be the set of length-i common rigid subsequences. We only need to prove that

Sketch of Proof For each rigid subsequence in S i, the probability it occurs in one random string of length n The prob. that it occurs in every input string There are in total length i rigid subsequences. This can be done by two cases i 2 logn.

Acknowledgement Supported by NSERC, PREA and CRC.