Shape Modeling and Matching in Protein Structure Identification Sasakthi Abeysinghe, Tao Ju Washington University, St. Louis, USA Matthew Baker, Wah Chiu.

Slides:



Advertisements
Similar presentations
BiG-Align: Fast Bipartite Graph Alignment
Advertisements

Bio-CAD M. Ramanathan Bio-CAD. Molecular surfaces Bio-CAD.
3D Molecular Structures C371 Fall Morgan Algorithm (Leach & Gillet, p. 8)
Protein Structure Prediction using ROSETTA
CS171 Introduction to Computer Science II Graphs Strike Back.
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
Protein Structure Prediction With Evolutionary Algorithms Natalio Krasnogor, U of the West of England William Hart, Sandia National Laboratories Jim Smith,
PDB-Protein Data Bank SCOP –Protein structure classification CATH –Protein structure classification genTHREADER–3D structure prediction Swiss-Model–3D.
With thanks to Zhijun Wu An introduction to the algorithmic problems of Distance Geometry.
Structural bioinformatics
A COMPLEX NETWORK APPROACH TO FOLLOWING THE PATH OF ENERGY IN PROTEIN CONFORMATIONAL CHANGES Del Jackson CS 790G Complex Networks
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Automated Extraction and Parameterization of Motions in Large Data Sets SIGGRAPH’ 2004 Lucas Kovar, Michael Gleicher University of Wisconsin-Madison.
Protein structure (Part 2 of 2).
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Detecting and Tracking Moving Objects for Video Surveillance Isaac Cohen and Gerard Medioni University of Southern California.
The Protein Data Bank (PDB)
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Solving the Protein Threading Problem in Parallel Nocola Yanev, Rumen Andonov Indrajit Bhattacharya CMSC 838T Presentation.
Protein Tertiary Structure. Primary: amino acid linear sequence. Secondary:  -helices, β-sheets and loops. Tertiary: the 3D shape of the fully folded.
SubSea: An Efficient Heuristic Algorithm for Subgraph Isomorphism Vladimir Lipets Ben-Gurion University of the Negev Joint work with Prof. Ehud Gudes.
Molecular modelling / structure prediction (A computational approach to protein structure) Today: Why bother about proteins/prediction Concepts of molecular.
1 Protein Structure Prediction Charles Yan. 2 Different Levels of Protein Structures The primary structure is the sequence of residues in the polypeptide.
1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M.
Protein Structure Prediction Samantha Chui Oct. 26, 2004.
Model Database. Scene Recognition Lamdan, Schwartz, Wolfson, “Geometric Hashing”,1988.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structures.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Query Planning for Searching Inter- Dependent Deep-Web Databases Fan Wang 1, Gagan Agrawal 1, Ruoming Jin 2 1 Department of Computer.
Homology Modeling David Shiuan Department of Life Science and Institute of Biotechnology National Dong Hwa University.
Protein Tertiary Structure Prediction
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
Structural alignment Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
Representations of Molecular Structure: Bonds Only.
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
ProteinShop: A Tool for Protein Structure Prediction and Modeling Silvia Crivelli Computational Research Division Lawrence Berkeley National Laboratory.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
Introduction to Bioinformatics Biological Networks Department of Computing Imperial College London March 18, 2010 Lecture hour 18 Nataša Pržulj
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Computing a Family of Skeletons of Volumetric Models for Shape Description Tao Ju Washington University in St. Louis.
Multiple Mapping Method with Multiple Templates (M4T): optimizing sequence-to-structure alignments and combining unique information from multiple templates.
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
Protein Folding and Modeling Carol K. Hall Chemical and Biomolecular Engineering North Carolina State University.
Stabbing balls and simplifying proteins Ovidiu Daescu and Jun Luo Department of Computer Science University of Texas at Dallas Richardson, TX
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Modeling Protein Secondary Structures from Three Dimensional Cryo-EM Density Images Dong Si June,30 th 2014.
Graph-based Deformable Matching of 3D Line Segments with Application in Protein Fitting 12 1 HANG DOU 1, MATTHEW L BAKER 2, TAO JU Washington University.
Introduction to Bioinformatics Algorithms Algorithms for Molecular Biology CSCI Elizabeth White
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Lecture 10 CS566 Fall Structural Bioinformatics Motivation Concepts Structure Solving Structure Comparison Structure Prediction Modeling Structural.
Modelling genome structure and function Ram Samudrala University of Washington.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Zhijun Wu Department of Mathematics Program on Bio-Informatics and Computational Biology Iowa State University Joint Work with Tauqir Bibi, Feng Cui, Qunfeng.
Modelling Genome Structure and Function Ram Samudrala University of Washington.
Yinghui Wu, SIGMOD Incremental Graph Pattern Matching Wenfei Fan Xin Wang Yinghui Wu University of Edinburgh Jianzhong Li Jizhou Luo Harbin Institute.
1 A Methodology for automatic retrieval of similarly shaped machinable components Mark Ascher - Dept of ECE.
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
Molecular mechanics Classical physics, treats atoms as spheres Calculations are rapid, even for large molecules Useful for studying conformations Cannot.
1 Double-Patterning Aware DSA Template Guided Cut Redistribution for Advanced 1-D Gridded Designs Zhi-Wen Lin and Yao-Wen Chang National Taiwan University.
Protein Structures.
BIOINFORMATICS Summary
Matthew L. Baker, Tao Ju, Wah Chiu  Structure 
SEG5010 Presentation Zhou Lanjun.
謝孫源 (Sun-Yuan Hsieh) 成功大學 電機資訊學院 資訊工程系
Presentation transcript:

Shape Modeling and Matching in Protein Structure Identification Sasakthi Abeysinghe, Tao Ju Washington University, St. Louis, USA Matthew Baker, Wah Chiu Baylor College of Medicine, Houston, USA

Shape Matching Shape comparison – How similar are shape A and shape B? – Application: 3D model retrieval Shape alignment – What is the best alignment of A onto B? – Application: object recognition and registration

Shape Matching Shape comparison – How similar are shape A and shape B? – Application: 3D model retrieval Shape alignment – What is the best alignment of A onto B? – Application: object recognition and registration 3D Protein Image 1D Protein Sequence

Structural Biology Protein: a sequence of amino acids – Folds into a 3D structure in order to interact with other molecules – Protein function derived from its 3D structure Identifying protein structure – Imaging methods: X-ray, NMR – Drawback: can not resolve large assemblies, like viruses. …

Domain Problem Cryo-electron microscopy (Cryo-EM) – Produces 3D density volumes – Drawback: insufficient resolution to resolve atom locations How to determine protein structure in a cryo-EM volume? ?

Shape Matching Formulation Matching 1D protein sequence with 3D density volume Intermediate goal: Matching alpha-helices – One of the basic building blocks in a protein – Identified as cylindrical densities in the volume [Baker 07] How to align the protein sequence with the cryo-EM volume to match the two sets of helices? + ?

Method Overview Compatible shape representation – 1D sequence and 3D volume as attributed relational graphs Graph-based shape matching – A new constrained graph matching problem and an optimal solution – Error-tolerant (inexact) matching

Shape Representation Protein sequence as attributed relation graph – An edge: a helix segment or a non-helix segment Attribute: number of amino acids in the segment – A node: end of a helix of end of the sequence – Add additional edges that skip at most m helix segments To allow matching with a cryo-EM volume that has missing helices

Shape Representation Graph representation of Cryo-EM volume via skeletons – 3D Skeleton [Ju 06] builds connectivity among detected helices – An edge: a detected helix or a skeleton path between two helices Attribute: length of the helix or skeleton path – A node: end of a helix of end of the protein – Add additional edges between helix-ends less than d apart To account for missing helix connectivity in the skeleton

Shape Matching - Problem Finding two matching chains of helices – Same number of edges – Alternating types between non-helix and helix – Minimal attribute matching error Uniqueness of this problem: – Inexact: not all edges/nodes in the two graphs are used in the matched sequence – Constrained: the match must have a linear topology

Shape Matching - Review Previous work on graph matching – Exact matching Graph mono-morphism [Wong 90] Sub-graph isomorphism [Ullmann 76, Cordella 99] – Inexact matching A* search [Nilsson 80], simulated annealing [Herault 90], neural networks [Feng 94], probabilistic relaxation [Christmas 95], genetic algorithms [Wang 97], graph decomposition [Messmer 98] All designed for un-constrained problems where there is no restriction on the topology of the matched sub-graphs.

Shape Matching - Method Key idea: utilize the linearity of chains. Performing depth-first tree-search – Append matching nodes to the incomplete chain with minimal matching error A*-search – Reduce node expansion by estimating future matching error – Optimal if future error estimation is smaller than the actual error. – 3 future error functions are designed {3,3} 63 {2,2} 42 {2,3} 85 {2,4} 92 {2,5} 40 {3,2} 61 {3,4} 72 {3,4} 48 {3,5} 91 {4,3} 99 {4,5} 51 {6,6} 58 {1,1} Sequence GraphVolume Graph

Experimental Setup Test data – Simulated data: 8 proteins (taken from Protein Data Bank) – Authentic data: 3 proteins (produced at Baylor) Test modes – Automatic – With a few user-specified helix correspondences Validation with the actual helix correspondence – Produce a list of candidates sorted by their matching errors – Find out where the actual correspondence ranks in the list

Results - 1 SequenceCryo-EM volume and its skeleton + Top Matching Bluetongue Virus (simulated, 10 helices, 0 missing) – Actual correspondence ranks #1

Results - 2 SequenceCryo-EM volume and its skeleton + Human Insulin Receptor (simulated, 9 helices, 1 missing) – Actual correspondence ranks #1 + Top Matching

Results - 3 SequenceCryo-EM skeleton Top Matching Bacteriophage P22 (authentic, 11 helices, 6 missing) – Actual correspondence ranks #4 + Actual Correspondence

Results - 4 Sequence Cryo-EM skeleton with 2 use-specified helix pairs Top Matching Without user- specification Triose Phosphate Isomerase (simulated, 12 helices, 3 missing) – Before user-specification: actual correspondence not in the candidate list – Given 2 specified helix pairs: actual correspondence ranks #9 + Actual Correspondence

Result - Summary Among the 11 proteins, the correct correspondence ranks among the candidate list computed by our method: – Top 1: 4 proteins – Within top 10: 2 proteins (1 simulated) – Top 1 after user-interaction: 2 proteins (both simulated) 4 specified helix pairs in a 14/20-helix protein. – Within top 10 after user-interaction: 3 proteins 2 specified helix pairs in a 6/9/12-helix protein Performance – Under 4 seconds for proteins with 20 helices – Compare: [Wu 05] uses exhaustive search and takes 16 hours for finding correspondences in proteins with 8 helices

Conclusion Formulate protein structure identification as shape matching – 1D protein sequence vs. 3D cryo-EM density volume – Compatible representation of disparate biological data as graphs Formulate a constrained inexact matching problem and propose an optimal solution – Based on A*-search Validation on simulated and authentic data

Future Work (Bio) Incorporating beta-sheets for improved accuracy – Challenge: the match is no longer a linear chain Integrating homology and ab initio modeling – Utilizing known 3D structure of segments – Refining the alignment by molecular energy minimization

Future Work (CS) Faster graph matching algorithm – Explore variants of A*-search to reduce running time for larger proteins (>20 helices) Better skeleton generation – Generate skeletons directly from gray-scale density volume for iso-value-independent representation – Utilize cell-complex-based skeleton for better skeleton geometry Currently used for topology editing, see [Ju, Zhou and Hu. Siggraph 2007]

Pacific Graphics Hawaii 2007 Oct 29 – Nov 2, in Maui, Hawaii Conference Chair: Ron Goldman Program co-chairs: Marc Alexa, Steven Gortler, Tao Ju

Results - 1 SequenceCryo-EM volume and its skeleton + Top Matching Bluetongue Virus (simulated, 10 helices, 0 missing) – Actual correspondence ranks #1