Download presentation
Presentation is loading. Please wait.
1
Department of Computer Science, University of California, Santa Barbara August 11-14, 2003 CTSS: A Robust and Efficient Method for Protein Structure Alignment Based on Local Geometrical and Biological Features Tolga Can and Yuan-Fang Wang
2
2 CSB2003, August 11-14, 2003 Introduction Importance of discovering structural relationships between proteins Structural Alignment: NP-Hard Protein structure representation: no standard as in sequence alignment Many algorithms Inter-atomic Distances (CE, DALI) SSE vectors (VAST, 3D-Lookup) Different similarity measures RMSD, p-value, etc.
3
3 CSB2003, August 11-14, 2003 Problem Definition Given a protein structure, find similar protein structures from a database of protein structures. 1fse:A 1jek:B 1alu:_ 2spc:A 1l3l:C 1k61:D 1kzu:B 1et1:A 1jig:A 1wdc:A 1nkd:_ 1fmh:A 1gl2:A ? 1l3l:C1kzu:B1jig:A1nkd:_ =
4
4 CSB2003, August 11-14, 2003 Protein Structure? HEADER PHEROMONE 20-DEC-95 2ERL.................................. SEQRES 1 40 ASP ALA CYS GLU GLN ALA.................................. ATOM 1 N ASP 1 -1.115 8.537 7.075 ATOM 2 CA ASP 1 -1.925 7.470 6.547 ATOM 3 C ASP 1 -2.009 6.333 7.522 ATOM 4 O ASP 1 -1.467 6.394 8.624 ATOM 5 CB ASP 1 -1.526 6.993 5.163 ATOM 6 N ALA 2 -2.745 5.280 7.165 ATOM 7 CA ALA 2 -2.945 4.152 7.987 ATOM 8 C ALA 2 -1.606 3.448 8.305 ATOM 9 O ALA 2 -1.440 3.010 9.454 ATOM 10 CB ALA 2 -3.966 3.256 7.436 ATOM 11 N CYS 3 -0.777 3.267 7.329 ATOM 12 CA CYS 3 0.570 2.624 7.511 ATOM 13 C CYS 3 1.328 3.308 8.626 ATOM 14 O CYS 3 1.802 2.679 9.562 ATOM 15 CB CYS 3 1.351 2.667 6.209 ATOM 16 SG CYS 3 2.981 1.901 6.318.................................. We use C α coordinates to represent the protein structure. PDB File
5
5 CSB2003, August 11-14, 2003 Protein Structure HEADER PHEROMONE 20-DEC-95 2ERL.................................. SEQRES 1 40 ASP ALA CYS GLU GLN ALA.................................. ATOM 1 N ASP 1 -1.115 8.537 7.075 ATOM 2 CA ASP 1 -1.925 7.470 6.547 ATOM 3 C ASP 1 -2.009 6.333 7.522 ATOM 4 O ASP 1 -1.467 6.394 8.624 ATOM 5 CB ASP 1 -1.526 6.993 5.163 ATOM 6 N ALA 2 -2.745 5.280 7.165 ATOM 7 CA ALA 2 -2.945 4.152 7.987 ATOM 8 C ALA 2 -1.606 3.448 8.305 ATOM 9 O ALA 2 -1.440 3.010 9.454 ATOM 10 CB ALA 2 -3.966 3.256 7.436 ATOM 11 N CYS 3 -0.777 3.267 7.329 ATOM 12 CA CYS 3 0.570 2.624 7.511 ATOM 13 C CYS 3 1.328 3.308 8.626 ATOM 14 O CYS 3 1.802 2.679 9.562 ATOM 15 CB CYS 3 1.351 2.667 6.209 ATOM 16 SG CYS 3 2.981 1.901 6.318.................................. The C α coordinates of a protein define a curve in 3D space. PDB File
6
6 CSB2003, August 11-14, 2003 Spline Approximation HEADER PHEROMONE 20-DEC-95 2ERL.................................. SEQRES 1 40 ASP ALA CYS GLU GLN ALA.................................. ATOM 1 N ASP 1 -1.115 8.537 7.075 ATOM 2 CA ASP 1 -1.925 7.470 6.547 ATOM 3 C ASP 1 -2.009 6.333 7.522 ATOM 4 O ASP 1 -1.467 6.394 8.624 ATOM 5 CB ASP 1 -1.526 6.993 5.163 ATOM 6 N ALA 2 -2.745 5.280 7.165 ATOM 7 CA ALA 2 -2.945 4.152 7.987 ATOM 8 C ALA 2 -1.606 3.448 8.305 ATOM 9 O ALA 2 -1.440 3.010 9.454 ATOM 10 CB ALA 2 -3.966 3.256 7.436 ATOM 11 N CYS 3 -0.777 3.267 7.329 ATOM 12 CA CYS 3 0.570 2.624 7.511 ATOM 13 C CYS 3 1.328 3.308 8.626 ATOM 14 O CYS 3 1.802 2.679 9.562 ATOM 15 CB CYS 3 1.351 2.667 6.209 ATOM 16 SG CYS 3 2.981 1.901 6.318.................................. We smooth the C α curve based on secondary structure information. PDB File
7
7 CSB2003, August 11-14, 2003 Spline Approximation HEADER PHEROMONE 20-DEC-95 2ERL.................................. SEQRES 1 40 ASP ALA CYS GLU GLN ALA.................................. ATOM 1 N ASP 1 -1.115 8.537 7.075 ATOM 2 CA ASP 1 -1.925 7.470 6.547 ATOM 3 C ASP 1 -2.009 6.333 7.522 ATOM 4 O ASP 1 -1.467 6.394 8.624 ATOM 5 CB ASP 1 -1.526 6.993 5.163 ATOM 6 N ALA 2 -2.745 5.280 7.165 ATOM 7 CA ALA 2 -2.945 4.152 7.987 ATOM 8 C ALA 2 -1.606 3.448 8.305 ATOM 9 O ALA 2 -1.440 3.010 9.454 ATOM 10 CB ALA 2 -3.966 3.256 7.436 ATOM 11 N CYS 3 -0.777 3.267 7.329 ATOM 12 CA CYS 3 0.570 2.624 7.511 ATOM 13 C CYS 3 1.328 3.308 8.626 ATOM 14 O CYS 3 1.802 2.679 9.562 ATOM 15 CB CYS 3 1.351 2.667 6.209 ATOM 16 SG CYS 3 2.981 1.901 6.318.................................. We smooth the C α curve based on secondary structure information. HelixTurn PDB File
8
8 CSB2003, August 11-14, 2003 Matching Two Curves Are they similar?
9
9 CSB2003, August 11-14, 2003 Curvature and Torsion Curvature: Torsion: If two single-valued continuous functions (s) and (s) are given for s > 0, then there exists exactly one space curve, determined except for orientation and position in space (i.e., up to a Euclidian motion), where s is the intrinsic arc length, is the curvature, and is the torsion. Fundamental Theorem of Space Curves: Measure of how far the curve deviates from being planar Measure of how far the curve deviates from being linear
10
10 CSB2003, August 11-14, 2003 Curvature and Torsion They are invariant to rotation and translation. They are localized. Curvature Torsion
11
11 CSB2003, August 11-14, 2003 Feature Extraction For each amino acid a (Curvature, Torsion) tuple is computed and Secondary Structure assignment information from PDB web site is gathered This constitutes a 3D feature vector of length n, where n is the number of amino acids in the protein + Curvature Torsion Secondary Structure Information (3 rd dimension not shown above)
12
12 CSB2003, August 11-14, 2003 Indexing the Features Why is indexing necessary? Hash Table (show in 2D below, 3 rd Dimension is the SSE type) Torsion Curvature A Hash Bin
13
13 CSB2003, August 11-14, 2003 Query Execution Hierarchical approach: Pruning before detailed pairwise alignment hash table Accumulate vote vote protein ++ Normalize vote vote protein /length protein Threshold
14
14 CSB2003, August 11-14, 2003 Query Execution Pairwise alignment by Smith-Waterman dynamic programming technique performed after screening process: Distance Matrix SW 1fse:A 1l3l:C Gap length:63 RMSD:1.61 A o
15
15 CSB2003, August 11-14, 2003 SW Alignment Result 1fse:A 1l3l:C
16
16 CSB2003, August 11-14, 2003 Sample Query Results Query: 1faz:A, database: 1938 protein chains Screening time: 18 seconds Pairwise Alignment time: 29 seconds length:42 RMSD:2.8 A o 1faz:A & 1ytf:D length:38 RMSD:3.68 A o 1faz:A & 1dj7:A
17
17 CSB2003, August 11-14, 2003 Sample Query Results Query: 1b16:A, database: 1938 protein chains Screening time: 25 seconds Pairwise Alignment time: 68 seconds length:35 RMSD:3.26 A o 1b16:A & 1h05:A length:35 RMSD:1.58 A o 1b16:A & 1qp8:A
18
18 CSB2003, August 11-14, 2003 Current and Future Work Evaluation of Accuracy Comparison with SCOP classification Efficiency Comparison with other techniques like CE, or DALI Better index structures Faster and more accurate screening of candidates Incorporating biological, chemical properties of amino acids to the structure signatures of proteins.
19
19 CSB2003, August 11-14, 2003 Conclusions A new method for protein structure alignment is presented: Extracted structural features are: Compact: O(n) Localized: computed for each amino acid Robust: error handling by spline approximation Invariant: suitable for indexing Meaningful: Biological, chemical properties can be incorporated easily An indexing technique is deployed to avoid exhaustive scan of the structure database Experiment results show that this method is suitable for finding structural motifs.
20
20 CSB2003, August 11-14, 2003 Thank you for your attention! Tolga Can Department of Computer Science University of California at Santa Barbara Santa Barbara, CA 93106, U.S. Email: tcan@cs.ucsb.edu URL: http://www.cs.ucsb.edu/~tcan/CTSS/http://www.cs.ucsb.edu/~tcan/CTSS/ For More Information:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.