Inbar, Y.1, Wolfson, H.J.1, Nussinov, R.2,3

Slides:



Advertisements
Similar presentations
ARCHITECTURES FOR ARTIFICIAL INTELLIGENCE SYSTEMS
Advertisements

Best-First Search: Agendas
A 3-D reference frame can be uniquely defined by the ordered vertices of a non- degenerate triangle p1p1 p2p2 p3p3.
Iterative Relaxation of Constraints (IRC) Can’t solve originalCan solve relaxed PRMs sample randomly but… start goal C-obst difficult to sample points.
Structural bioinformatics
Protein Structure Alignment Human Myoglobin pdb:2mm1 Human Hemoglobin alpha-chain pdb:1jebA Sequence id: 27% Structural id: 90% Another example: G-Proteins:
Two Examples of Docking Algorithms With thanks to Maria Teresa Gil Lucientes.
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
Alignment of Flexible Molecular Structures. Motivation Proteins are flexible. One would like to align proteins modulo the flexibility. Hinge and shear.
Shirokuro : A Backtracking Approach Benjamin Bush Faculty Advisors: Dr. Russ Abbott, Dr. Gary Brookfield Department of Computer Science, Department of.
Multiple Sequence Alignment Algorithms in Computational Biology Spring 2006 Most of the slides were created by Dan Geiger and Ydo Wexler and edited by.
Whole Genome Alignment using Multithreaded Parallel Implementation Hyma S Murthy CMSC 838 Presentation.
Agenda A brief introduction The MASS algorithm The pairwise case Extension to the multiple case Experimental results.
Docking of Protein Molecules
FLEX* - REVIEW.
Solving the Protein Threading Problem in Parallel Nocola Yanev, Rumen Andonov Indrajit Bhattacharya CMSC 838T Presentation.
Object Recognition. Geometric Task : find those rotations and translations of one of the point sets which produce “large” superimpositions of corresponding.
1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M.
Protein Structure Prediction Samantha Chui Oct. 26, 2004.
Model Database. Scene Recognition Lamdan, Schwartz, Wolfson, “Geometric Hashing”,1988.
Introduction to Bioinformatics Algorithms Clustering and Microarray Analysis.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
BIONFORMATIC ALGORITHMS Ryan Tinsley Brandon Lile May 9th, 2014.
Physical Mapping of DNA Shanna Terry March 2, 2004.
Chapter 9 Superposition and Dynamic Programming 1 Chapter 9 Superposition and dynamic programming Most methods for comparing structures use some sorts.
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
Protein Sequence Alignment and Database Searching.
A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model.
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
GATree: Genetically Evolved Decision Trees 전자전기컴퓨터공학과 데이터베이스 연구실 G 김태종.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
Combinatorial docking approach for structure prediction of large proteins and multi-molecular assemblies Yuval Inbar 1, Hadar Benyamini 2, Ruth Nussinov.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
25. Lecture WS 2008/09Bioinformatics III1 V25 – protein docking, FFT Fast Fourier Transform.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Part I : Introduction to Protein Structure A/P Shoba Ranganathan Kong Lesheng National University of Singapore.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
NP-COMPLETE PROBLEMS. Admin  Two more assignments…  No office hours on tomorrow.
PROTEIN STRUCTURE SIMILARITY CALCULATION AND VISUALIZATION CMPS 561-FALL 2014 SUMI SINGH SXS5729.
Surflex: Fully Automatic Flexible Molecular Docking Using a Molecular Similarity-Based Search Engine Ajay N. Jain UCSF Cancer Research Institute and Comprehensive.
1 Microarray Clustering. 2 Outline Microarrays Hierarchical Clustering K-Means Clustering Corrupted Cliques Problem CAST Clustering Algorithm.
Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Local Flexibility Aids Protein Multiple Structure Alignment Matt Menke Bonnie Berger Lenore Cowen.
Clustering [Idea only, Chapter 10.1, 10.2, 10.4].
EBI is an Outstation of the European Molecular Biology Laboratory. PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches.
Modelling and Solving Configuration Problems on Business
Advanced Algorithms Analysis and Design
Semi-Supervised Clustering
Memory Allocation The main memory must accommodate both:
Rule Induction for Classification Using
SOFTWARE DESIGN AND ARCHITECTURE
Research in Computational Molecular Biology , Vol (2008)
Complete automation in CCP4 What do we need and how to achieve it?
Extra Tree Classifier-WS3 Bagging Classifier-WS3
Support Vector Machine (SVM)
Study Guide for ES205 Yu-Chi Ho Jonathan T. Lee Nov. 7, 2000
Objective of This Course
SAT-Based Area Recovery in Technology Mapping
CIS 488/588 Bruce R. Maxim UM-Dearborn
Clustering.
Algorithms for Budget-Constrained Survivable Topology Design
Protein structure prediction.
謝孫源 (Sun-Yuan Hsieh) 成功大學 電機資訊學院 資訊工程系
Structural Flexibility of CaV1. 2 and CaV2
Protein Structure Alignment
Major Design Strategies
Communication Driven Remapping of Processing Element (PE) in Fault-tolerant NoC-based MPSoCs Chia-Ling Chen, Yen-Hao Chen and TingTing Hwang Department.
Clustering.
Major Design Strategies
Presentation transcript:

Protein Folding Prediction: Combinatorial Assembly of Protein Building Blocks Using CombDock Inbar, Y.1, Wolfson, H.J.1, Nussinov, R.2,3 1School of Computer Science, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Israel 2Department of Human Genetics and Molecular Medicine, Sackler School of Medicine, Tel Aviv University, Israel 3Laboratory of Experimental and Computational Biology, NCI-FCRDC, Frederick MD. USA

Motivation Protein folding is considered to be a hierarchal event: Local fragments of the sequence fold to form stand alone local structural elements (building blocks) The building blocks further fold to form the overall structure. Reduce the complexity of protein folding prediction: Predicting structures of subsequences. Get the overall structure by combining these substructures together.

Folding Prediction flow The target sequence MHCKCDITLQEII…… Building Blocks Assignment Building blocks (structures of subsequences) Combinatorial Assembly Complexes of the building blocks Structure Completion and Refinement Complete structural models

The CombDock strategy: assembly of protein building blocks by solving a 3D puzzle* ? *The assembly of protein building blocks has an additional constraint: A distance constrain between the N terminus and C terminus of consecutive building blocks.

CombDock method Combinatorial assembly through multiple pairwise docking. The algorithm has the following three modules All Pairs Docking: In this module we check how any original “puzzle piece” can connect to any other original piece. Since our pieces of puzzle are protein structural units, we simply run a docking application for each pair of building blocks. Given N building blocks, there are N(N-1)/2 pairs. The combinatorial assembly stage: This module generates complexes of the N given building blocks, using the transformations that were found in the previous stage. The final scoring Stage: For the top C results (sorted by the temporary score) that the combinatorial assembly stage, we compute a final scoring. The scoring function weighs compactness, hydrophobicity as well as surface area between all the building blocks. In this stage we also cluster the results in order to avoid similar results.

All Pairs Docking: This stage outputs K possible orientations (transformations) between each pairs of building blocks. This is done by applying a geometric docking algorithm, and keeping the best K scoring solutions. pairs transformations 1 2 3

2. Combinatorial Assembly: The input is the N building blocks and the K transformations between each pair. The output is a set of C complexes each contains the N building blocks. We reduce the problem of finding complexes using the pairwise transformations, to the problem of finding a spanning trees* in a complete graph with parallel edges. Each node represents a building block. Each edge represents a transformation. An edge represents a transformation The input is a complete graph with parallel edges The output is a set of spanning trees* Any spanning tree represents a different complex of all of the given building blocks * A tree is a graph with one connectivity component and no circles. A spanning tree of a graph is a tree that has all of the graphs nodes.

3. Final Scoring: A final score is computed for the best scoring complexes that were generated by the combinatorial assembly. The score is based on geometric features (Compactness and penetrations rate) and on chemical attributes (non-polar buried surface area). Clustering of the results in this stage is performed by using an efficient clustering procedure that enable us to get a fine clustering at low price (running time) ?

More about the Combinatorial Assembly By the reduction to graph theory we know the number of potential complexes (which is the number of spanning trees): where N is the number of building blocks and K is the number of transformations between a pair of building blocks. Not any tree represents a valid complex. There might be significant penetrations, or one of the distance constraints might be violated. Because some of the trees are invalid, a general algorithm for finding the best spanning tree cannot be applied. The problem is NP-Complete. We developed an algorithm which provides a heuristic solution to the problem.

Basic concepts of the algorithm Restrict the topology of the generated trees: use sequential folding steps. An efficient search (and generation) method. Keep only best sub trees at each level, in order to reduce complexity (greedy selection) The algorithm both exploit the hierarchal characteristic of the problem, and the kinetics of protein folding.

Restricted topology Local-interaction in proteins plays an important role in the kinetic of protein folding. We build only sequential trees: A sequential tree is either a tree with one vertex, or it is a join of two adjacent sequential subtrees. * In the figure numbers represent the building blocks and their sequential order. Letters represent super building blocks which are complexes of one or more building blocks. Edges represent transformations between the building blocks it connects. Folding steps: we use only sequential folding steps. In order to use an edge it must connect two super building blocks that are consecutive: edge (1,3) can be used only after the first step was taken.

The search method Different trees share common sub-trees We join two valid sub-trees in order to get a new one: We need to check only the inter-sub-trees constraints. Since we build sequential sub-trees, we use the backbone constrains in an early stages.

Greedy Selection of Subtrees When we built a tree we join two subtrees of smaller size We limit the number of trees that we save for each size at each position to D We keep only the best D. A tree score is the sum of the score of its transformations The memory complexity of each stage is O(DN), its runtime complexity is The memory complexity of the whole algorithm and its runtime complexity is

Initial Folding Steps Many studies of long-range and short-range interactions within proteins were conducted. Sandeep et. al. have shown the apearane critical building blocks These BBs, which are well buried, play an important role in the stability of proteins. A critical BB usually has a lot of non-sequential interactions with the other BBs. In the restricted topology of trees that we had presented, a non-sequential interaction, even if it is a significant one, might be lost. An Initial folding step is a join of three SBBs: a pair of adjacent SBBs and one SBB which is not adjacent to either of the first two. We called it an initial folding step because we only make this type of steps as the first step of the complex construction. We do not assume any prior-knoledge about critical BBs, rather we enable each BB to have these non-local interactions. After an initial step is applied, we continue generating the complex, using only the sequential folding steps.

Experimental Results ...

Input Type 1: Building blocks from the same protein Input is generated by cutting a protein to its building blocks. Each building block is then separated and translated. A1 A2 A3 A4 A1 A2 A1 A3 A2 A4 A3 A4

Example 1: Interleukine4, 130 amino acids, 4 building blocks We can see the steps that formed the suggested complex: First, applying transformation between building blocks 1 and 2. In the second step, applying transformation between building blocks 1 and 3. In the third step, applying transformation between building blocks 1 and 4. The resulting complex. RMSD= 0.8 Å from native (crystal structure).

Example 2: Citrate synthase, 377 amino acids, 13 building blocks (each in a different color) Native arrangement Model arrangement

Input Type 2: Building blocks from homologues proteins Input is generated by cutting related proteins, then taking one building block from each protein. A1 A2 B1 B2 C1 C2 D1 D2 A3 A4 B3 B4 C3 C4 D3 D4 A1 B2 C5 D4

Complex generated by CombDock Example 3: Nucleoside diphosphate kinase Complex generated by CombDock Native structure 1ehwA 11-30 52% 143aa 2nckR 66-89 46% 144aa 1nsp 150aa 1nsqA 32-48 62% 152aa 1be4A 91-117 63% 151aa 1nue 48-66 63% 151aa 1nsp 122-154

Complex generated by CombDock Example 4: Lagune lectin Complex generated by CombDock Native structure 1sbf 4-32 46% 234aa 1bqpA 118-156 40% 181aa 1led (target) 242aa 1lenA 28-94 40% 181aa 1ax0 166-199 38% 239aa 1lgAb 93-115 40% 181aa 1led 206-243

Input Type 3:Building blocks from non homologues proteins. Input is generated by cutting unrelated proteins. C1 C2 D3 D1 A1 A2 B1 B2 D2 C3 C4 A3 A4 B3 B4 C5 C6 D4 A1 B2 A1 C5 D4 B2 C5 D4

Complex generated by CombDock Example 5: Myoglobin Complex generated by CombDock Native structure 2spl - Myoglobin, Globin-like, all α 1def 21-44 Peptide deformylase, Peptide deformylase, α+β. 147 32% 1flp 24_69 Hemoglobin I, Globin-like all α. 142 36% 119l 83-103 Phage T4 lysozyme, Lysozyme-like, α+β. 164 31% 1abv 16-36 Atp synthesis, F1F0-ATP synthase, all α. 105 26% 1utg 19_43 Steroid binding, Uteroglobin-like, all α. 70 19% 1vsd 162_178 Retroviral integrase, Ribonuclease H-like motif α+β 146 32%

Conclusions and future work The goal of finding the right spatial arrangement of the building blocks was achieved for all of the input we have generated. The accuracy ranged from ~1Å to ~8Å depended on the accuracy of the building blocks structure, and the size of the unassigned fragments between the building blocks. Assuming that a near-native arrangement of the building blocks is enough for molecular dynamics techniques to refine the model in order to get a near-native overall model, the CombDock algorithm may play a major role in protein folding prediction applications. A better scoring function should be applied: the algorithm finds an arrangement very similar to the arrangement in the native structure in most of the cases we’ve checked. But it wasn’t necessarily ranked in the top scoring solutions.

…Conclusions and future work Finding of good solutions depend strongly on the quality of the transformation generated by the all pairs docking stage. A specific docking algorithm developed for the assembly of building blocks might generate better transformation than the generic docking algorithm we use. Generation of a final model from a complex generated as a solution by the CombDock algorithm is not trivial. Penetrations must be dealt in order to apply molecular dynamics techniques. The longer the gaps between consecutive building blocks are, the less accurate the results of CombDock are, and longer is its runtime. An assignment of building blocks with restricted gaps is preferred.

Potential applications Assembly of Domains to form multi-domains folds. Oligomer formation Docking Molecules to create multi-molecular assemblies. Flexible hinge based docking of rigid subunits.

Reassembly of the 6 domains of Gelsolin Complex generated by CombDock Native structure Each domain is different color, linkers in grey.

Acknowledgements The research of H.J. Wolfson and R. Nussinov (in Israel) has been supported in part by the ``Center of Excellence in Geometric Computing and its Applications'' funded by the Israel Science Foundation (administered by the Israel Academy of Sciences, by the Tel Aviv University Research Foundation grants and by the Adams Brain Center. The research of H.J.W. is partially supported by the Hermann Minkowski-Minerva Center for Geometry at Tel Aviv University. This project has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under contract number NO1-CO-56000. The content of this publication does not necessarily reflect the view or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organization imply endorsement by the U.S. Government.