Protein Structure Prediction Samantha Chui Oct. 26, 2004.

Slides:



Advertisements
Similar presentations
Clustering.
Advertisements

Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
Rosetta Energy Function Glenn Butterfoss. Rosetta Energy Function Major Classes: 1. Low resolution: Reduced atom representation Simple energy function.
Protein Structure Prediction using ROSETTA
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Structural bioinformatics
Tertiary protein structure viewing and prediction July 1, 2009 Learning objectives- Learn how to manipulate protein structures with Deep View software.
Two Examples of Docking Algorithms With thanks to Maria Teresa Gil Lucientes.
Strict Regularities in Structure-Sequence Relationship
Protein Docking and Interactions Modeling CS 374 Maria Teresa Gil Lucientes November 4, 2004.
CISC667, F05, Lec21, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction 3-Dimensional Structure.
Protein Structure, Databases and Structural Alignment
Genetic Algorithms and Protein Folding Based on lecture by Dr. Steffen Schulze-Kremer
Multiple Sequence Alignment Algorithms in Computational Biology Spring 2006 Most of the slides were created by Dan Geiger and Ydo Wexler and edited by.
Agenda A brief introduction The MASS algorithm The pairwise case Extension to the multiple case Experimental results.
University of CreteCS4831 The use of Minimum Spanning Trees in microarray expression data Gkirtzou Ekaterini.
Tertiary protein structure viewing and prediction July 5, 2006 Learning objectives- Learn how to manipulate protein structures with Deep View software.
Docking of Protein Molecules
FLEX* - REVIEW.
Protein Structure Space Patrice Koehl Computer Science and Genome Center
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
BL5203: Molecular Recognition & Interaction Lecture 5: Drug Design Methods Ligand-Protein Docking (Part I) Prof. Chen Yu Zong Tel:
A unified statistical framework for sequence comparison and structure comparison Michael Levitt Mark Gerstein.
1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M.
Identification of Domains using Structural Data Niranjan Nagarajan Department of Computer Science Cornell University.
Genetic Threading By J.Yadgari and A.Amir Published: special issue on Bioinformatics in Journal of Constraints, June 2001 Alexandre Tchourbanov University.
BIONFORMATIC ALGORITHMS Ryan Tinsley Brandon Lile May 9th, 2014.
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
COMPARATIVE or HOMOLOGY MODELING
A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model.
02/03/10 CSCE 769 Dihedral Angles Homayoun Valafar Department of Computer Science and Engineering, USC.
 Four levels of protein structure  Linear  Sub-Structure  3D Structure  Complex Structure.
Representations of Molecular Structure: Bonds Only.
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
Statistical Physics of the Transition State Ensemble in Protein Folding Alfonso Ramon Lam Ng, Jose M. Borreguero, Feng Ding, Sergey V. Buldyrev, Eugene.
Rotamer Packing Problem: The algorithms Hugo Willy 26 May 2010.
Identification of Regulatory Binding Sites Using Minimum Spanning Trees Pacific Symposium on Biocomputing, pp , 2003 Reporter: Chu-Ting Tseng Advisor:
Combinatorial docking approach for structure prediction of large proteins and multi-molecular assemblies Yuval Inbar 1, Hadar Benyamini 2, Ruth Nussinov.
Ab Initio Methods for Protein Structure Prediction CS882 Presentation, by Shuai C., Li.
Doug Raiford Lesson 19.  Framework model  Secondary structure first  Assemble secondary structure segments  Hydrophobic collapse  Molten: compact.
25. Lecture WS 2008/09Bioinformatics III1 V25 – protein docking, FFT Fast Fourier Transform.
Approximation of Protein Structure for Fast Similarity Measures Fabian Schwarzer Itay Lotan Stanford University.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Applied Bioinformatics Week 12. Bioinformatics & Functional Proteomics How to classify proteins into functional classes? How to compare one proteome with.
Protein Structure Prediction
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Selecting Diverse Sets of Compounds C371 Fall 2004.
. Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Exploring Protein Folding Trajectories Using Geometric Spanners Daniel Russel and Leonidas Guibas Stanford University.
Structural alignment methods Like in sequence alignment, try to find best correspondence: –Look at atoms –A 3-dimensional problem –No a priori knowledge.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Surflex: Fully Automatic Flexible Molecular Docking Using a Molecular Similarity-Based Search Engine Ajay N. Jain UCSF Cancer Research Institute and Comprehensive.
CS-ROSETTA Yang Shen et al. Presented by Jonathan Jou.
Mean Field Theory and Mutually Orthogonal Latin Squares in Peptide Structure Prediction N. Gautham Department of Crystallography and Biophysics University.
Lecture 10 CS566 Fall Structural Bioinformatics Motivation Concepts Structure Solving Structure Comparison Structure Prediction Modeling Structural.
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
Elon Yariv Graduate student in Prof. Nir Ben-Tal’s lab Department of Biochemistry and Molecular Biology, Tel Aviv University.
Semi-Supervised Clustering
Inbar, Y.1, Wolfson, H.J.1, Nussinov, R.2,3
Protein structure prediction.
Protein structure prediction
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Presentation transcript:

Protein Structure Prediction Samantha Chui Oct. 26, 2004

Central Dogma of Biology Question: Given a protein sequence, to what conformation will it fold? DNA sequenceProtein sequenceProtein structure transcription & translation folding

How does nature do it? Hydrophobicity vs. hydrophilicity Van der Waals interaction Electrostatic interaction Hydrogen bonds Disulfide bonds

Current Approaches Experimental Methods X-ray crystallography NMR spectroscopy Computational Methods Homology modeling Similar sequences fold into similar structures Threading Dissimilar sequences may fold into similar structures Ab initio No similarity assumptions Conformational search

Assembly of sub-structural units known structures … fragment library protein sequence predicted structure

“Small Libraries of Protein Fragments Model Native Protein Structures Accurately” Rachel Kolodny, Patrice Koehl, Leonidas Guibas, and Michael Levitt, 2002 Goal: Find finite set of protein fragments that can be used to construct accurate discrete conformations for any protein 1. Generate fragments from known proteins 2. Cluster fragments to identify common structural motifs 3. Test library accuracy on proteins not in the initial set

Datasets of protein fragments 200 unique protein domains from Protein Data Bank (PDB) 36,397 residues Four sets of backbone fragments 4, 5, 6, and 7-residue long fragments Divide each protein domain into consecutive fragments beginning at random initial position f

Fragment structural similarity Coordinate root-mean-square (cRMS) deviation of C α atoms cRMS(A,B) = sqrt( Σ d i 2 /N) one to one mapping between atoms in structure A and structure B Translate and rotate to find best alignment 0 if superimpose perfectly

Pruning and clustering Outliers have large cRMS deviation from all other fragments Discard according to some fragment-length specific threshold k-means simulated annealing clustering Repeatedly run k-means clustering, merge nearby clusters and split disperse clusters Scoring function: total variance = Σ (x – μ) 2 Less sensitive to initial choice of cluster centers than k-means

Compiling the libraries Select cluster centroids as library entries Minimum sum of cRMS deviations from all the other cluster fragments Form representative set of protein fragments Library contents highly dependent upon clustering procedure For each set of fragments, start with 50 random seeds and choose library with minimal total variance score

Evaluating quality of a library Local-fit How well library fits local conformation of all proteins in test set. Global-fit How well library fits global three- dimensional conformation of all proteins in test set

Local-fit method Protein structures broken into set of all overlapping fragments of length f Find for each protein fragment the most similar fragment in the library (cRMS) Score = Average cRMS value over all fragments in all proteins in the test set

Local-fit results

Global-fit method Concatenate best local-fit library fragments just found Determine fragment’s orientation by superimposing its first three C α atoms onto last three C α atoms of preceding fragment

Global-fit method Number of possible sequences of fragments exponential in protein’s length Greedy algorithm finds good rather than best global-fit approximation Start at N terminus, approximate increasingly larger segments of the protein Concatenate library fragment which will yield structure of minimal cRMS deviation from corresponding segment Deterministic, linear time

Global-fit results 100 fragments 5 residues 10 states/residue 20 fragments 5 residues 4.47 states/residue 0.91 Å1.85 Å 50 fragments 7 residues 2.66 states/residue 2.78 Å

Assembly of sub-structural units known structures … fragment library protein sequence predicted structure

“Protein structure prediction via combinatorial assembly of sub-structural units” Yuval Inbar, Hadar Benyamini, Ruth Nussinov, and Haim J. Wolfson, 2003

CombDock Input: structural units (SUs) with known 3D conformations SUs considered rigid bodies rotated and translated with respect to each other Goal: predict overall structure Constraints Penetration: avoid steric clashes Backbone: restriction on maximum distance between consecutive SUs

All pairs docking N(N-1)/2 pairs of SUs Calculate candidate transformations according to matching complementary local features on surface of SUs Apply transformation on 2 nd SU of pair Keep K best for each Clustering to ensure all K transformations yield significantly different complexes

Combinatorial assembly Multigraph representation Vertices = SUs Edges = transformations between two SUs K parallel edges between any two vertices Final protein conformation = spanning tree N SUs, one connectivity component, no cycles 12K … i k j Transformation between i and k induced by transformations (ij, jk)

Combinatorial Assembly N N-2 K N-1 different spanning trees Not all spanning trees are valid complexes Use heuristical algorithm Two subtrees adjacent iff there exists an index i so that vertex i is in one subtree and i+1 is in the other Sequential tree: recursive definition One vertex Tree with edge that connects two adjacent sequential trees

Combinatorial Assembly Hierarchical algorithm of N stages i th stage: generate sequential trees with i vertices Construct trees by connecting adjacent sequential trees of smaller sizes generated earlier Keep D best sequential trees at each step Discard trees which do not meet backbone and penetration constraints Score = sum of scores of transformations

Combinatorial Assembly

CombDock Results

Conclusion Experimental Methods X-ray crystallography NMR spectroscopy Computational Methods Homology modeling Similar sequences fold into similar structures Threading Dissimilar sequences may fold into similar structures Ab initio No similarity assumptions Conformational search known structures … fragment library protein sequence predicted structure

References Kolodny et al., “Small libraries of protein fragments model protein structures accurately” Inbar et al., “Protein structure prediction via combinatorial assembly of sub-structural units”