Download presentation
Presentation is loading. Please wait.
Published byKerry Miles Modified over 9 years ago
1
Shape Modeling and Matching in Protein Structure Identification Sasakthi Abeysinghe, Tao Ju Washington University, St. Louis, USA Matthew Baker, Wah Chiu Baylor College of Medicine, Houston, USA
2
Shape Matching Shape comparison – How similar are shape A and shape B? – Application: 3D model retrieval Shape alignment – What is the best alignment of A onto B? – Application: object recognition and registration
3
Shape Matching Shape comparison – How similar are shape A and shape B? – Application: 3D model retrieval Shape alignment – What is the best alignment of A onto B? – Application: object recognition and registration 3D Protein Image 1D Protein Sequence
4
Structural Biology Protein: a sequence of amino acids – Folds into a 3D structure in order to interact with other molecules – Protein function derived from its 3D structure Identifying protein structure – Imaging methods: X-ray, NMR – Drawback: can not resolve large assemblies, like viruses. …
5
Domain Problem Cryo-electron microscopy (Cryo-EM) – Produces 3D density volumes – Drawback: insufficient resolution to resolve atom locations How to determine protein structure in a cryo-EM volume? ?
6
Shape Matching Formulation Matching 1D protein sequence with 3D density volume Intermediate goal: Matching alpha-helices – One of the basic building blocks in a protein – Identified as cylindrical densities in the volume [Baker 07] How to align the protein sequence with the cryo-EM volume to match the two sets of helices? + ?
7
Method Overview Compatible shape representation – 1D sequence and 3D volume as attributed relational graphs Graph-based shape matching – A new constrained graph matching problem and an optimal solution – Error-tolerant (inexact) matching
8
Shape Representation Protein sequence as attributed relation graph – An edge: a helix segment or a non-helix segment Attribute: number of amino acids in the segment – A node: end of a helix of end of the sequence – Add additional edges that skip at most m helix segments To allow matching with a cryo-EM volume that has missing helices
9
Shape Representation Graph representation of Cryo-EM volume via skeletons – 3D Skeleton [Ju 06] builds connectivity among detected helices – An edge: a detected helix or a skeleton path between two helices Attribute: length of the helix or skeleton path – A node: end of a helix of end of the protein – Add additional edges between helix-ends less than d apart To account for missing helix connectivity in the skeleton
10
Shape Matching - Problem Finding two matching chains of helices – Same number of edges – Alternating types between non-helix and helix – Minimal attribute matching error Uniqueness of this problem: – Inexact: not all edges/nodes in the two graphs are used in the matched sequence – Constrained: the match must have a linear topology
11
Shape Matching - Review Previous work on graph matching – Exact matching Graph mono-morphism [Wong 90] Sub-graph isomorphism [Ullmann 76, Cordella 99] – Inexact matching A* search [Nilsson 80], simulated annealing [Herault 90], neural networks [Feng 94], probabilistic relaxation [Christmas 95], genetic algorithms [Wang 97], graph decomposition [Messmer 98] All designed for un-constrained problems where there is no restriction on the topology of the matched sub-graphs.
12
Shape Matching - Method Key idea: utilize the linearity of chains. Performing depth-first tree-search – Append matching nodes to the incomplete chain with minimal matching error A*-search – Reduce node expansion by estimating future matching error – Optimal if future error estimation is smaller than the actual error. – 3 future error functions are designed {3,3} 63 {2,2} 42 {2,3} 85 {2,4} 92 {2,5} 40 {3,2} 61 {3,4} 72 {3,4} 48 {3,5} 91 {4,3} 99 {4,5} 51 {6,6} 58 {1,1} Sequence GraphVolume Graph
13
Experimental Setup Test data – Simulated data: 8 proteins (taken from Protein Data Bank) – Authentic data: 3 proteins (produced at Baylor) Test modes – Automatic – With a few user-specified helix correspondences Validation with the actual helix correspondence – Produce a list of candidates sorted by their matching errors – Find out where the actual correspondence ranks in the list
14
Results - 1 SequenceCryo-EM volume and its skeleton + Top Matching Bluetongue Virus (simulated, 10 helices, 0 missing) – Actual correspondence ranks #1
15
Results - 2 SequenceCryo-EM volume and its skeleton + Human Insulin Receptor (simulated, 9 helices, 1 missing) – Actual correspondence ranks #1 + Top Matching
16
Results - 3 SequenceCryo-EM skeleton Top Matching Bacteriophage P22 (authentic, 11 helices, 6 missing) – Actual correspondence ranks #4 + Actual Correspondence
17
Results - 4 Sequence Cryo-EM skeleton with 2 use-specified helix pairs Top Matching Without user- specification Triose Phosphate Isomerase (simulated, 12 helices, 3 missing) – Before user-specification: actual correspondence not in the candidate list – Given 2 specified helix pairs: actual correspondence ranks #9 + Actual Correspondence
18
Result - Summary Among the 11 proteins, the correct correspondence ranks among the candidate list computed by our method: – Top 1: 4 proteins – Within top 10: 2 proteins (1 simulated) – Top 1 after user-interaction: 2 proteins (both simulated) 4 specified helix pairs in a 14/20-helix protein. – Within top 10 after user-interaction: 3 proteins 2 specified helix pairs in a 6/9/12-helix protein Performance – Under 4 seconds for proteins with 20 helices – Compare: [Wu 05] uses exhaustive search and takes 16 hours for finding correspondences in proteins with 8 helices
19
Conclusion Formulate protein structure identification as shape matching – 1D protein sequence vs. 3D cryo-EM density volume – Compatible representation of disparate biological data as graphs Formulate a constrained inexact matching problem and propose an optimal solution – Based on A*-search Validation on simulated and authentic data
20
Future Work (Bio) Incorporating beta-sheets for improved accuracy – Challenge: the match is no longer a linear chain Integrating homology and ab initio modeling – Utilizing known 3D structure of segments – Refining the alignment by molecular energy minimization
21
Future Work (CS) Faster graph matching algorithm – Explore variants of A*-search to reduce running time for larger proteins (>20 helices) Better skeleton generation – Generate skeletons directly from gray-scale density volume for iso-value-independent representation – Utilize cell-complex-based skeleton for better skeleton geometry Currently used for topology editing, see [Ju, Zhou and Hu. Siggraph 2007]
22
Pacific Graphics Hawaii 2007 Oct 29 – Nov 2, in Maui, Hawaii Conference Chair: Ron Goldman Program co-chairs: Marc Alexa, Steven Gortler, Tao Ju
23
Results - 1 SequenceCryo-EM volume and its skeleton + Top Matching Bluetongue Virus (simulated, 10 helices, 0 missing) – Actual correspondence ranks #1
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.