Download presentation
Presentation is loading. Please wait.
1
Finding Compact Structural Motifs Presented By: Xin Gao Authors: Jianbo Qian, Shuai Cheng Li, Dongbo Bu, Ming Li, and Jinbo Xu University of Waterloo, Ontario, Canada j3qian@cs.uwaterloo.ca
2
Outline Introduction to Structural Motif Related Work Compact Motif-finding Problem Formulation NP-Hard of the Compact Motif-finding Problem A Polynomial Time Approximate Scheme
3
Outline Introduction to Structural Motif Related Work Compact Motif-finding Problem Formulation NP-Hard of the Compact Motif-finding Problem A Polynomial Time Approximate Scheme
4
Introduction Protein is a sequence of amino acids. A protein always folds into a specific 3-D shape. Structures are important to proteins: The functional properties of proteins depend on their 3-D structures. Structures are more conserved than sequence during the evolution of proteins.
5
Structural Motif Structural motif is a frequently occurring substructure of proteins. Motifs are thought to be tightly related to protein functions. Identifying motifs from a set of proteins can help us to know their evolutionary history and functions.
6
Structural Motif Finding Problem Given a set of protein structures, to find the frequently occurring substructure. Informally, to find one substructure from each protein, that exhibit the highest degree of similarity.
7
How to measure the similarity of two substructures? Two popular measurements: dRMSD: measure the root mean square Euclidean distance between the corresponding residues from different protein structures. cRMSD: calculate the internal distance matrix for each protein, and compare the distance matrices for input structures.
8
Outline Introduction to Structural Motif Related Work Compact Motif-finding Problem Formulation NP-Hard of the Compact Motif-finding Problem A Polynomial Time Approximate Scheme
9
Related Work L.P.Chew proposed an iterative algorithm to compute the conserved shape and proved its convergence. (2002) D. Bandyopadhyay applied graph-based data- mining tools to find the family-specific fingerprints. (2006) M. Shatsky presented an algorithm to uncover the binding pattern. (2006) DALI and CE attempt to identify structural alignment with minimal dRMSD. STRUCTRAL and TM-Align employ heuristics to detect the alignment with minimal cRMSD.
10
Related Work (continued) However, these methods are all heuristic; the solutions are not guaranteed to be optimal or near optimal. The first PTAS for pairwise structural alignment: R. Kolodny explored the Lipschitz property of the scoring function. (2004) Though this algorithm can be extended to the case of multiple structure alignment, the simple extension has a time complexity exponential in the number of proteins. Is there a PTAS to multiple structure motif finding?
11
Outline Introduction to Structural Motif Related Work Compact Motif-finding Problem Formulation NP-Hard of the Compact Motif-finding Problem A Polynomial Time Approximate Scheme
12
We focus on (R, C)-Compact Motif. What is (R, C)-compact motif? A motif is bounded in a minimum ball with radius R. In this ball, at most C residues do not belong to this motif. (R,C)-compact motif is biologically meaningful since We focus on globular proteins. We allows at most C exceptions.
13
(R, C)-Compact Motif Finding Problem Input: protein structures S 1 …, S n, and length l Output: a consensus consists of l 3D points q=(q 1, …, q l ) a substructure u i from each protein Si Objective: min ( 1 i n d 2 (q, u i )) 1/2 Here, we adopt the dRMSD distance function, i.e., d(q, u i )=min ||q- (u i )|| 2 consists of a rotation and a translation ||*|| 2 is the Euclidean metric.
14
Outline Introduction to Structural Motif Related Work Compact Motif-finding Problem Formulation NP-Hard of the Compact Motif-finding Problem A Polynomial Time Approximate Scheme
15
(R,C)-compact motif finding is still NP-Hard. Reduction from the Sequence Consensus Problem Input: n binary strings S 1, …, S n, each is of length m Output: A substring t i of length l from each string S i, 1 i n, Objective: minimize 1 i <i’ n d H (t i, t i’ ), where d H is Hamming distance. Basic Idea: Try to find a way of reduction to make: dRMSD=Hamming Distance
16
(R,C)-compact motif finding is still NP-Hard. Each l-mer is transformed into 6l 3D points. 110 110 001 000000 111111 0 (0, 2i, 0), 1 (1, 2i, 0)
17
(R,C)-compact motif finding is still NP-Hard. Each l-mer is transformed into 6l 3D points. 110 110 001 000000 111111 0 (0, 2i, 0), 1 (1, 2i, 0) The centroid will be (1/2, 2i, 0) (Easy translation) Large “tail” no rotation RMSD = Hamming Distance Small distortion to each point to make it protein- like. Sequence Consensus Problem (1,0)- Compact Motif Finding Problem
18
Outline Introduction to Structural Motif Related Work Compact Motif-finding Problem Formulation NP-Hard of the Compact Motif-finding Problem A Polynomial Time Approximate Scheme
19
The Basic Idea of Our PTAS There are always a few “important” sub- structures, whose consensus holds most of the “secrets” of the true optimal motif. Therefore, if we can simply do exhaustive search to find these few sub-structures, then the trivial optimal solution for these sub- structures is a good approximation to the real optimal solution.
20
Technique 1: Sampling We sample only r proteins, consider each motif in a sampled protein, we can say we almost know the optimal solution.
21
Sampling will introduce only a bit of error. There is at least one selection schema, whose consensus has a cost value less than (1+1/r)OPT. So, we can find this schema by simply enumerating operation.
22
Technique 2: Discretize the Rotation Space Each rotation is parameterized by three angles 1, 2, 3 [0, 2 ) Discretize the angles with step size ’ we get an ’-rotation net.
23
Discretized rotation will not introduce a large error, either. A parameterized algorithm for protein structure alignment. J. Xu, F. Jiao, and B. Berger. RECOMB2006.
24
PTAS
26
Performance Ratio Analysis
27
Running Time Each protein contains M motifs M is a polynomial of protein length Each motif can adopt W rotations W depends on the constant So the number of consensus is less than O(n r (MW) r )= O((nMW) r )
28
Conclusion and Future Work We prove the (R,C)-compact motif finding problem is NP-hard We obtain a PTAS for this problem. Future Work: Further reduce the time complexity Design some practical algorithms. Solve a more general case.
29
Thank You. Questions…
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.