Download presentation
Presentation is loading. Please wait.
1
1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M. Shatsky, R. Nussinov, H. Wolfson Presented by: Einat Engel
2
2 Introduction Proteins are flexible structures
3
3 Outline Introduction Proteins (reminder) Protein motion Structural alignment – rigid & flexible General Description: Problem’s description Discussion
4
4 Outline Detailed Description: FPSA problem description FlexProt algorithm for the FPSA problem Experimental results Heuristic Algorithm for FPSA Clustering Conclusions & Discussion: Summary of algorithm Major results Discussion
5
5 Reminder: Protein Structure Proteins are made up of 20 different amino acids (or "residues"). Different levels of protein structures: Primary – amino acid sequence Secondary – local folding of amino acid chains Tertiary – 3D structure of a protein Quaternary – forming multi-chained proteins
6
6 Reminder: Protein Structure Primary structure Tertiary structure lysozyme
7
7 Flexibility & Protein Motion Proteins are flexible molecules that undergo significant structural changes as part of their normal function. Motion often serves as an essential link between structure and function.
8
8 Flexibility & Protein Motion Protein motions are involved in numerous basic functions. In fact, highly mobile proteins have been implicated in a number of diseases, e.g., the motion of gp41 in AIDS
9
9 Structural Alignment When flexible molecules are compared to each other as rigid bodies, even strong similarities can be missed Yet, most existing protein alignment algorithms treat them as rigid objects We’ll see a technique for the alignment of flexible proteins
10
10 The Goal Go back
11
11 Existing Approaches – Rigid Structural Alignment Exhaustive 3D search – search all possible rotations. (Matthews & Rossman) Fragment alignment – comparison of contiguous fragments. Geometric Hashing – Local reference frame, preprocessing & recognition (Fischer) Curve Matching – match curves using Fourier Transform (Schwartz & Sharir)
12
12 Existing Approaches – Flexible Structural Alignment Domain detection – requires a-priori knowledge of the corresponding pairs of amino-acid residues (Wriggers & Schulten) Geometric hashing – requires a-priori knowledge of the hinge location (Verbitsky) Data base screening – requires a-priori knowledge of hinges (Rigoutsos)
13
13 Outline Introduction Proteins (reminder) Protein motion Structural alignment – rigid & flexible General Description: Problem’s description Discussion
14
14 Terminology Two fragments are almost congruent (matched) if: 1.Their sequence length is the same. 2.There exists a 3D rotation and translation which superimposes the corresponding atoms with small RMSD. (Reminder: RMSD measures alignment error.)
15
15 Problem Definition Input: two protein molecules M1 and M2. Task: divide the two molecules into fragments of maximal size, such that the matched fragments will be almost congruent.
16
16 Problem Discussion The regions between the fragments are called flexible (hinge) regions. We’d like to minimize the number of flexible regions and maximize the alignment size Our goal is to find a balanced solution Conflict! Example
17
17 Problem Discussion Consider two different solutions: I. 3 rigid parts. Total size = 200 atoms II. 2 rigid parts. Total size = 150 atoms Q: Which is better? A: I don’t know. Let’s divide the results according to the number of rigid parts.
18
18 Major Results Introducing FlexProt, a new technique for the alignment of flexible proteins. Unlike other algorithms, FlexProt does not require a priori knowledge of the locations of the flexible, hinge-bending sites The pairs of rigid matching fragments and the flexible regions are detected simultaneously
19
19 Outline Detailed Description: FPSA problem description FlexProt algorithm for the FPSA problem Experimental results Heuristic Algorithms for FPSA Clustering Conclusions & Discussion: Summary of algorithm Major results Discussion
20
20 Flexible Protein Structural Alignment (FPSA) Input Two proteins, Threshold error MaxRMSD MaxFlexNum parameter A weight function w
21
21 FPSA Problem Terminology A rigid fragment pair is defined as: and has the following property: Whereis defined as follows: T is a 3D rigid transformation, meaning rotation and translation
22
22 FPSA Problem Terminology Letbe a list of rigid fragment pairs, such thatwhere Let w is a weight function that reflects the “goodness” of linking two rigid fragment pairs.
23
23 The FPSA Problem Example:
24
24 The FPSA Problem For Each detectsuch that: Remember, is a list of rigid fragment pairs
25
25 The FlexProt Algorithm for FPSA I. Detection of all rigid fragment pairs, that satisfy the MaxRMSD constraint II. Detection of optimal configurations between rigid fragment pairs,
26
26 I. Detect all Rigid Fragment Pairs In order to find all possible pairs, Iterate over three indices where and select the pairs satisfying, do:
27
27 I. Complexity We assume that Remember, a rigid fragment pair - Iterate over Compute RMSD for each triplet – linear in the detected fragment size (Sharir) Total complexity -
28
28 II. Detect Optimal Configuration Now, we have a set of congruent fragment pairs. Let’s find an optimal subset of it. This subset will describe an alignment of M 2 with M 1. We’ll use dynamic programming Dynamic programming – solves optimization problem by caching subproblem solutions rather than recomputing them.
29
29 II. Detect Optimal Configuration In General: define a graph Vertices represent the rigid fragment pairs The directed edges represent flexible regions connecting the rigid fragment pairs A weight function w is applied to the edges. it reflects the goodness of connecting two rigidly matched fragment pairs
30
30 II. Detect Optimal Configuration Vertices A directed edge betweenand is defined if: 1. The fragments are ascending 2. The gaps between consecutive fragments are limited by MaxGap 1 and MaxGap 2 (user defined)
31
31 II. Detect Optimal Configuration A B C Define: MaxGap1=3 MaxGap2=3
32
32 II. Detect Optimal Configuration The weight function (smaller is better): Δ is half of the maximal overlapping interval Part A rewards quadratically the size of Part B punishes large gaps Part C punishes difference between Gap 1 and Gap 2
33
33 II. Detect Optimal Configuration AB C e1e1 e2e2
34
34 II. Detect Optimal Configuration We built a weighted directed acyclic graph (DAG) Shortest weighted paths correspond to alignments of consecutive, long, congruent matching fragments. Almost Finished
35
35 Reminder: Shortest Paths in DAGs First, we perform a topological sort of the Directed Acyclic Graph (DAG). Then, we make just one pass over the vertices according to their order. For each vertex, we relax each edge that leaves it. 0 ∞∞∞∞ 2 7 -2 6 1 4 2 02664265533
36
36 II. Detect Optimal Configuration We run the Shortest Paths in DAGs algorithm. A simple case (no limit on the number of nodes in the shortest path): The shortest path in G corresponds to a minimal weighted sequence of rigid fragment pairs, F**, such that Complexity -
37
37 II. Detect Optimal Configuration We’ll make a small change in the algorithm since we need to detect shortest paths with exactly s nodes, In the simple case, each node holds a pointer to a preceding node with the shortest path. Instead, each node will hold MaxFlexNum pointers. Pointer s points to a preceding node with a shortest path of size s-1
38
38 II. Complexity During Relaxation, we check all MaxFlexNum possibilities and therefore the complexity is The number of nodes in the graph can be proportional to Graph of n vertices has edges Total complexity of stage II :
39
39 Summary of FlexProt Algorithm Theoretical worst case complexity is In practice – FlexProt is highly efficient (with some changes) The average running time is approximately seven seconds (for molecules of 300 amino acids) So… What does it look like??
40
40 Experimental Results
41
41 Experimental Results
42
42 Running FlexProt http://bioinfo3d.math.tau.ac.il/FlexProt http://www.umass.edu/microbio/chime/expl orer/pe.htmhttp://www.umass.edu/microbio/chime/expl orer/pe.htm
43
43 Heuristic Improvement of Step I In step I, we detected all of the rigid fragment pairs. Time complexity – The procedure takes several minutes, even for small proteins. Instead, we can use a greedy algorithm, that only takes
44
44 Heuristic Improvement of Step I Start by aligning a single matching atom pair where and Iteratively, add one matching atom pair to the left and one to the right. Stop when we exceed the RMSD threshold – when the list can’t be extended to the left or the right.
45
45 Heuristic Improvement of Step I a b a+1 b+1 i+l-1 j+l-1 a-1 b-1 j i After the extension process, we have a match-list is almost congruent to The next alignment is and not initiated at
46
46 Complexity Updating the RMSD at each step is Thus, finding a particular is linear in the length of the fragments - The time complexity, is:
47
47 Complexity Theoretically, some atom pairs, might participate in at most n fragment pairs. In practice, a pair of atoms participates in at most 2 fragment pairs. There are O(n 2 ) rigid fragment pairs
48
48 Clustering This stage can be viewed as an extension to the FPSA problem. The algorithm clusters consecutive fragment pairs, that have a similar 3D transformation, even if they are not directly linked.
49
49 Clustering Example: Two β-strands (A and B) are connected by loops of different lengths. Stage I of the FlexProt algorithm aligns each separately. A and B have almost the same 3D rigid transformation and in the clustering stage, they are joined into one structure.
50
50 The Clustering Algorithm We take each path detected in stage II. Remember, vertices = congruent fragment pair The first vertex is a singleton cluster Take the second vertex. Check if there is a rigid transformation which superimposes both fragments. If successful – do the same for the next vertex If successful – do the same for the next vertex Else – start a new cluster with the vertex that failed to join the previous cluster
51
51 Clustering Complexity The number of iterations equals the number of rigid fragment pairs in the flexible alignment solution. Time complexity: is the number of flexible alignments. It is bounded by n 2 vertices
52
52 Running FlexProt http://bioinfo3d.math.tau.ac.il/FlexProt http://www.umass.edu/microbio/chime/expl orer/pe.htmhttp://www.umass.edu/microbio/chime/expl orer/pe.htm
53
53 Outline Detailed Description: FPSA problem description FlexProt algorithm for the FPSA problem Experimental results Heuristic Algorithms for FPSA Clustering Conclusions & Discussion: Summary of algorithm Major results Discussion
54
54 Summary of Algorithm Exact solution of FPSA Step I (detection of fragment pairs) Step II (detection of optimal configuration) Heuristic solution Step I Step IIClustering O(n 3 ) O(n 6 ) O(n 2 ) O(n 4 ) O(n 2 )
55
55 Major Results Unlike other algorithms, FlexProt does not require a priori knowledge of the locations of the flexible, hinge-bending sites The pairs of rigid matching fragments and the flexible regions are detected simultaneously The speed of the method allows extensive database comparison.
56
56 Significance of Results Proteins are flexible. They may appear in different conformations. FlexProt incorporates flexibility in structure comparison. Proteins function through binding. Flexibility is one of the characters of binding sites. So, it is important to detect hinge-bending sites.
57
57 Significance of Results Comparing proteins despite the motion that they have undergone is helpful for protein classification These comparisons are also useful in drug design, detecting binding sites, and the range of motions that proteins display
58
58 FlexProt – Discussion Differs from other flexible alignment algorithms (and of course, rigid) Does not violate the protein sequence order Given two alignments, each giving better results in different measures, which is better? Clustering is optional. Which proteins are compared?
59
59 Websites PDB – http://www.rcsb.org/pdb/ SCOP – http://scop.mrc-lmb.cam.ac.uk/scop/data/scop.b.d.bbh.b.b.be.html Database of motions – http://molmovdb.mbb.yale.edu/molmovdb/
60
60 Bibliography Cormen, “Introduction to Algorithms”, chapter 24, Single Shortest Paths Gerstein, Database of molecular Movement Shatsky, Nussinov and Wolfson, “Flexible Protein Alignment and Hinge Detection”, Proteins: Structure, function and genetics: 48, 242-256 (2002) Wolfson, a “Structural Bioinformatics – 2003” presentation
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.