Department of Computer Science, University of California, Santa Barbara August 11-14, 2003 CTSS: A Robust and Efficient Method for Protein Structure Alignment.

Slides:



Advertisements
Similar presentations
Arc-length computation and arc-length parameterization
Advertisements

Pairwise Sequence Alignment Sushmita Roy BMI/CS 576 Sushmita Roy Sep 10 th, 2013 BMI/CS 576.
Presented by Xinyu Chang
3D Molecular Structures C371 Fall Morgan Algorithm (Leach & Gillet, p. 8)
Extended Gaussian Images
Clustering the Temporal Sequences of 3D Protein Structure Mayumi Kamada +*, Sachi Kimura, Mikito Toda ‡, Masami Takata +, Kazuki Joe + + : Graduate School.
Robust Global Registration Natasha Gelfand Niloy Mitra Leonidas Guibas Helmut Pottmann.
BLAST, PSI-BLAST and position- specific scoring matrices Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and.
3D Shape Histograms for Similarity Search and Classification in Spatial Databases. Mihael Ankerst,Gabi Kastenmuller, Hans-Peter-Kriegel,Thomas Seidl Univ.
Object Recognition using Invariant Local Features Applications l Mobile robots, driver assistance l Cell phone location or object recognition l Panoramas,
A 3-D reference frame can be uniquely defined by the ordered vertices of a non- degenerate triangle p1p1 p2p2 p3p3.
A Versatile Depalletizer of Boxes Based on Range Imagery Dimitrios Katsoulas*, Lothar Bergen*, Lambis Tassakos** *University of Freiburg **Inos Automation-software.
Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.
Structural Bioinformatics Workshop Max Shatsky Workshop home page:
Iterative closest point algorithms
Mismatch string kernels for discriminative protein classification By Leslie. et.al Presented by Yan Wang.
Agenda A brief introduction The MASS algorithm The pairwise case Extension to the multiple case Experimental results.
Finding Compact Structural Motifs Presented By: Xin Gao Authors: Jianbo Qian, Shuai Cheng Li, Dongbo Bu, Ming Li, and Jinbo Xu University of Waterloo,
The Protein Data Bank (PDB)
ProteinStructuralDatabases. Proteins are built from amino-acids. Introduction H | NH2-c-CO2H | R.
Object Recognition. Geometric Task : find those rotations and translations of one of the point sets which produce “large” superimpositions of corresponding.
Sequence Alignment III CIS 667 February 10, 2004.
A unified statistical framework for sequence comparison and structure comparison Michael Levitt Mark Gerstein.
1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M.
Sebastian Thrun CS223B Computer Vision, Winter Stanford CS223B Computer Vision, Winter 2005 Lecture 3 Advanced Features Sebastian Thrun, Stanford.
Sequence alignment, E-value & Extreme value distribution
Protein Structures.
Comparing protein structure and sequence similarities Sumi Singh Sp 2015.
Using 3D-SURFER. Before you start 3D-Surfer can be accessed at For visualization.
Inverse Kinematics for Molecular World Sadia Malik April 18, 2002 CS 395T U.T. Austin.
IBGP/BMI 705 Lab 4: Protein structure and alignment TA: L. Cooper.
AdvisorStudent Dr. Jia Li Shaojun Liu Dept. of Computer Science and Engineering, Oakland University 3D Shape Classification Using Conformal Mapping In.
Protein Tertiary Structure Prediction
SUPERVISED NEURAL NETWORKS FOR PROTEIN SEQUENCE ANALYSIS Lecture 11 Dr Lee Nung Kion Faculty of Cognitive Sciences and Human Development UNIMAS,
Structural alignment Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.
Gapped BLAST and PSI-BLAST : a new generation of protein database search programs Team2 邱冠儒 黃尹柔 田耕豪 蕭逸嫻 謝朝茂 莊閔傑 2014/05/12 1.
Structural alignments of Proteins using by TOPOFIT method Vitkup D., Melamud E., Moult J., Sander C. Completeness in structural genomics. Nature Struct.
Final Exam Review CS485/685 Computer Vision Prof. Bebis.
Chapter 9 Superposition and Dynamic Programming 1 Chapter 9 Superposition and dynamic programming Most methods for comparing structures use some sorts.
October 14, 2014Computer Vision Lecture 11: Image Segmentation I 1Contours How should we represent contours? A good contour representation should meet.
PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches Gaurav Sahni, Ph.D.
FlowString: Partial Streamline Matching using Shape Invariant Similarity Measure for Exploratory Flow Visualization Jun Tao, Chaoli Wang, Ching-Kuang Shene.
David Hoksza, Supervisor: Tomáš Skopal, KSI MFF UK Similarity Search in Protein Databases.
Protein Structure Comparison. Sequence versus Structure The protein sequence is a string of letters: there is an optimal solution (DP) to the problem.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Pharm 201 Lecture 10, Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne.
A data-mining approach for multiple structural alignment of proteins WY Siu, N Mamoulis, SM Yiu, HL Chan The University of Hong Kong Sep 9, 2009.
Similarity Searching in High Dimensions via Hashing Paper by: Aristides Gionis, Poitr Indyk, Rajeev Motwani.
DDPIn Distance and Density Based Protein Indexing David Hoksza Charles University in Prague Department of Software Engineering Czech Republic.
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Data and Knowledge Engineering Laboratory Clustered Segment Indexing for Pattern Searching on the Secondary Structure of Protein Sequences Minkoo Seo Sanghyun.
MINRMS: an efficient algorithm for determining protein structure similarity using root-mean-squared-distance Andrew I. Jewett, Conrad C. Huang and Thomas.
Polish Infrastructure for Supporting Computational Science in the European Research Space EUROPEAN UNION Examining Protein Folding Process Simulation and.
Lecture 11 CS5661 Structural Bioinformatics – Structure Comparison Motivation Concepts Structure Comparison.
CS-ROSETTA Yang Shen et al. Presented by Jonathan Jou.
EMBL-EBI Eugene Krissinel SSM - MSDfold. EMBL-EBI MSDfold (SSM)
An Efficient Index-based Protein Structure Database Searching Method 陳冠宇.
Mismatch String Kernals for SVM Protein Classification Christina Leslie, Eleazar Eskin, Jason Weston, William Stafford Noble Presented by Pradeep Anand.
Computational Challenges in BIG DATA 28/Apr/2012 China-Korea-Japan Workshop Takeaki Uno National Institute of Informatics & Graduated School for Advanced.
Find the optimal alignment ? +. Optimal Alignment Find the highest number of atoms aligned with the lowest RMSD (Root Mean Squared Deviation) Find a balance.
EBI is an Outstation of the European Molecular Biology Laboratory. PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches.
Scale Invariant Feature Transform (SIFT)
A Perceptual Shape Descriptor
Bayesian Refinement of Protein Functional Site Matching
The SIFT (Scale Invariant Feature Transform) Detector and Descriptor
Protein Structures.
Protein structure prediction.
Sequence alignment, E-value & Extreme value distribution
Presentation transcript:

Department of Computer Science, University of California, Santa Barbara August 11-14, 2003 CTSS: A Robust and Efficient Method for Protein Structure Alignment Based on Local Geometrical and Biological Features Tolga Can and Yuan-Fang Wang

2 CSB2003, August 11-14, 2003 Introduction Importance of discovering structural relationships between proteins Structural Alignment: NP-Hard Protein structure representation: no standard as in sequence alignment Many algorithms  Inter-atomic Distances (CE, DALI)  SSE vectors (VAST, 3D-Lookup) Different similarity measures  RMSD, p-value, etc.

3 CSB2003, August 11-14, 2003 Problem Definition Given a protein structure, find similar protein structures from a database of protein structures. 1fse:A 1jek:B 1alu:_ 2spc:A 1l3l:C 1k61:D 1kzu:B 1et1:A 1jig:A 1wdc:A 1nkd:_ 1fmh:A 1gl2:A ? 1l3l:C1kzu:B1jig:A1nkd:_ =

4 CSB2003, August 11-14, 2003 Protein Structure? HEADER PHEROMONE 20-DEC-95 2ERL SEQRES 1 40 ASP ALA CYS GLU GLN ALA ATOM 1 N ASP ATOM 2 CA ASP ATOM 3 C ASP ATOM 4 O ASP ATOM 5 CB ASP ATOM 6 N ALA ATOM 7 CA ALA ATOM 8 C ALA ATOM 9 O ALA ATOM 10 CB ALA ATOM 11 N CYS ATOM 12 CA CYS ATOM 13 C CYS ATOM 14 O CYS ATOM 15 CB CYS ATOM 16 SG CYS We use C α coordinates to represent the protein structure. PDB File

5 CSB2003, August 11-14, 2003 Protein Structure HEADER PHEROMONE 20-DEC-95 2ERL SEQRES 1 40 ASP ALA CYS GLU GLN ALA ATOM 1 N ASP ATOM 2 CA ASP ATOM 3 C ASP ATOM 4 O ASP ATOM 5 CB ASP ATOM 6 N ALA ATOM 7 CA ALA ATOM 8 C ALA ATOM 9 O ALA ATOM 10 CB ALA ATOM 11 N CYS ATOM 12 CA CYS ATOM 13 C CYS ATOM 14 O CYS ATOM 15 CB CYS ATOM 16 SG CYS The C α coordinates of a protein define a curve in 3D space. PDB File

6 CSB2003, August 11-14, 2003 Spline Approximation HEADER PHEROMONE 20-DEC-95 2ERL SEQRES 1 40 ASP ALA CYS GLU GLN ALA ATOM 1 N ASP ATOM 2 CA ASP ATOM 3 C ASP ATOM 4 O ASP ATOM 5 CB ASP ATOM 6 N ALA ATOM 7 CA ALA ATOM 8 C ALA ATOM 9 O ALA ATOM 10 CB ALA ATOM 11 N CYS ATOM 12 CA CYS ATOM 13 C CYS ATOM 14 O CYS ATOM 15 CB CYS ATOM 16 SG CYS We smooth the C α curve based on secondary structure information. PDB File

7 CSB2003, August 11-14, 2003 Spline Approximation HEADER PHEROMONE 20-DEC-95 2ERL SEQRES 1 40 ASP ALA CYS GLU GLN ALA ATOM 1 N ASP ATOM 2 CA ASP ATOM 3 C ASP ATOM 4 O ASP ATOM 5 CB ASP ATOM 6 N ALA ATOM 7 CA ALA ATOM 8 C ALA ATOM 9 O ALA ATOM 10 CB ALA ATOM 11 N CYS ATOM 12 CA CYS ATOM 13 C CYS ATOM 14 O CYS ATOM 15 CB CYS ATOM 16 SG CYS We smooth the C α curve based on secondary structure information. HelixTurn PDB File

8 CSB2003, August 11-14, 2003 Matching Two Curves Are they similar?

9 CSB2003, August 11-14, 2003 Curvature and Torsion Curvature: Torsion: If two single-valued continuous functions  (s) and  (s) are given for s > 0, then there exists exactly one space curve, determined except for orientation and position in space (i.e., up to a Euclidian motion), where s is the intrinsic arc length,  is the curvature, and  is the torsion. Fundamental Theorem of Space Curves: Measure of how far the curve deviates from being planar Measure of how far the curve deviates from being linear

10 CSB2003, August 11-14, 2003 Curvature and Torsion They are invariant to rotation and translation. They are localized. Curvature Torsion

11 CSB2003, August 11-14, 2003 Feature Extraction For each amino acid a (Curvature, Torsion) tuple is computed and Secondary Structure assignment information from PDB web site is gathered This constitutes a 3D feature vector of length n, where n is the number of amino acids in the protein + Curvature Torsion Secondary Structure Information (3 rd dimension not shown above)

12 CSB2003, August 11-14, 2003 Indexing the Features Why is indexing necessary? Hash Table (show in 2D below, 3 rd Dimension is the SSE type) Torsion Curvature A Hash Bin

13 CSB2003, August 11-14, 2003 Query Execution Hierarchical approach:  Pruning before detailed pairwise alignment hash table  Accumulate vote  vote protein ++  Normalize vote  vote protein /length protein  Threshold

14 CSB2003, August 11-14, 2003 Query Execution Pairwise alignment by Smith-Waterman dynamic programming technique performed after screening process: Distance Matrix SW 1fse:A 1l3l:C Gap length:63 RMSD:1.61 A o

15 CSB2003, August 11-14, 2003 SW Alignment Result 1fse:A 1l3l:C

16 CSB2003, August 11-14, 2003 Sample Query Results Query: 1faz:A, database: 1938 protein chains Screening time: 18 seconds Pairwise Alignment time: 29 seconds length:42 RMSD:2.8 A o 1faz:A & 1ytf:D length:38 RMSD:3.68 A o 1faz:A & 1dj7:A

17 CSB2003, August 11-14, 2003 Sample Query Results Query: 1b16:A, database: 1938 protein chains Screening time: 25 seconds Pairwise Alignment time: 68 seconds length:35 RMSD:3.26 A o 1b16:A & 1h05:A length:35 RMSD:1.58 A o 1b16:A & 1qp8:A

18 CSB2003, August 11-14, 2003 Current and Future Work Evaluation of  Accuracy  Comparison with SCOP classification  Efficiency  Comparison with other techniques like CE, or DALI Better index structures  Faster and more accurate screening of candidates Incorporating biological, chemical properties of amino acids to the structure signatures of proteins.

19 CSB2003, August 11-14, 2003 Conclusions A new method for protein structure alignment is presented:  Extracted structural features are:  Compact: O(n)  Localized: computed for each amino acid  Robust: error handling by spline approximation  Invariant: suitable for indexing  Meaningful: Biological, chemical properties can be incorporated easily  An indexing technique is deployed to avoid exhaustive scan of the structure database Experiment results show that this method is suitable for finding structural motifs.

20 CSB2003, August 11-14, 2003 Thank you for your attention! Tolga Can Department of Computer Science University of California at Santa Barbara Santa Barbara, CA 93106, U.S. URL: For More Information: