Bioinformatics Ayesha M. Khan Spring 2013.

Slides:



Advertisements
Similar presentations
PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
Advertisements

Tutorial Homology Modelling. A Brief Introduction to Homology Modeling.
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Protein Tertiary Structure Prediction
Structural bioinformatics
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
Protein structure determination. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography,
CISC667, F05, Lec21, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction 3-Dimensional Structure.
Protein structure (Part 2 of 2).
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Protein Fold recognition Morten Nielsen, Thomas Nordahl CBS, BioCentrum, DTU.
Thomas Blicher Center for Biological Sequence Analysis
Mossbauer Spectroscopy in Biological Systems: Proceedings of a meeting held at Allerton House, Monticello, Illinois. Editors: J. T. P. DeBrunner and E.
The Protein Data Bank (PDB)
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Tertiary protein structure modelling May 31, 2005 Graded papers will handed back Thursday Quiz#4 today Learning objectives- Continue to learn how to manipulate.
Protein Tertiary Structure. Primary: amino acid linear sequence. Secondary:  -helices, β-sheets and loops. Tertiary: the 3D shape of the fully folded.
Molecular modelling / structure prediction (A computational approach to protein structure) Today: Why bother about proteins/prediction Concepts of molecular.
Protein structure determination & prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray.
1 Protein Structure Prediction Charles Yan. 2 Different Levels of Protein Structures The primary structure is the sequence of residues in the polypeptide.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structures.
Protein Structure Prediction Dr. G.P.S. Raghava Protein Sequence + Structure.
Homology Modeling David Shiuan Department of Life Science and Institute of Biotechnology National Dong Hwa University.
Protein Tertiary Structure Prediction
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
Tertiary Structure Prediction Methods Any given protein sequence Structure selection Compare sequence with proteins have solved structure Homology Modeling.
Macromolecular structure
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
COMPARATIVE or HOMOLOGY MODELING
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
Protein Structure Prediction. Historical Perspective Protein Folding: From the Levinthal Paradox to Structure Prediction, Barry Honig, 1999 A personal.
Lecture 10 – protein structure prediction. A protein sequence.
Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches1 By Jayakumar Rudhrasenan S Primary Supervisor: Prof. Heiko Schroder.
Representations of Molecular Structure: Bonds Only.
Lecture 12 CS5661 Structural Bioinformatics Motivation Concepts Structure Prediction Summary.
1 P9 Extra Discussion Slides. Sequence-Structure-Function Relationships Proteins of similar sequences fold into similar structures and perform similar.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
Part I : Introduction to Protein Structure A/P Shoba Ranganathan Kong Lesheng National University of Singapore.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Applied Bioinformatics Week 12. Bioinformatics & Functional Proteomics How to classify proteins into functional classes? How to compare one proteome with.
Module 3 Protein Structure Database/Structure Analysis Learning objectives Understand how information is stored in PDB Learn how to read a PDB flat file.
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
Structure prediction: Homology modeling
Protein structure prediction Anttu Kurttio Ville Pietiläinen.
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
Predicting Protein Structure: Comparative Modeling (homology modeling)
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Protein Structure Prediction Graham Wood Charlotte Deane.
Homology Modeling 原理、流程,還有如何用該工具去預測三級結構 Lu Chih-Hao 1 1.
Protein Structure and Bioinformatics. Chapter 2 What is protein structure? What are proteins made of? What forces determines protein structure? What is.
BMC Bioinformatics 2005, 6(Suppl 4):S3 Protein Structure Prediction not a trivial matter Strict relation between protein function and structure Gap between.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Protein backbone Biochemical view:
What is Protein Folding? Implications of Misfolding Computational Techniques Background image: Staphylococcal protein A, Z Domain (
Protein Tertiary Structure Prediction Structural Bioinformatics.
PROTEIN MODELLING Presented by Sadhana S.
Protein Structure Prediction and Protein Homology modeling
Protein Structures.
Molecular Modeling By Rashmi Shrivastava Lecturer
3-Dimensional Structure
Homology Modeling.
Protein structure prediction.
Presentation transcript:

Bioinformatics Ayesha M. Khan Spring 2013

Protein Modelling

What is protein modelling? Suppose we have no resources or expertise of X- ray crystallography or NMR, but only the protein sequence (target) available and we would like to know its 3D structure. Use of computational methods: provide a useful model and fill the gap between sequence and structure space. Protein modelling is the only way to obtain structural information computationally, where it is difficult to pursue an experiment. Many proteins are simply too large for NMR analysis and cannot be crystallized for X-ray diffraction and thus protein modelling acts as a substitute in these cases. Time consuming as well

Methods of protein modelling 1. Comparative protein modelling Homology modelling Protein threading 2. De novo or Ab initio Protein Modelling Two broad classes: Comparative modelling and de novo modelling

Comparative modelling Uses previously solved structures as starting points or “templates”. Why is it effective? Although the number of actual proteins is vast, there is a limited set of tertiary structural motifs to which most proteins belong. Only around 2000 distinct protein folds in nature, although several million different proteins. When the structure of one protein in a family has been determined by experimentation, the other members of the same family can be modelled, based on their alignment to the known structure.

Homology modelling Based on the reasonable assumption that two homologous proteins will share very similar structures. Given the amino acid sequence of an unknown structure and the solved structure of a homologous protein, each amino acid in the solved structure is mutated, computationally, into the corresponding amino acid from the unknown structure. It is the prediction of a 3D structure of a target protein from the amino acid sequence of a homologous (template) protein for which an X- ray or NMR structure is available.

Programs for homology modeling: Many programs for automated homology modeling are now available, so anyone can construct a homology model on a regular PC. However, construction of a “good” homology model (at least for sequences that are not highly similar) usually requires some expertise and usually should be done with human intervention, rather than in a fully automated fashion. A few of the freely available programs for homology modeling: SWISS-MODEL– Produces accurate models; fast; good tutorials available. http://swissmodel.expasy.org/ I-TASSER– Produces accurate models; easy to use, but slow http://zhanglab.ccmb.med.umich.edu/I-TASSER/ Modeller– must be downloaded and installed locally http://salilab.org/modeller/modeller.html

Databases of homology models: The rate of new protein sequence determination is far outpacing the rate of structure determination by X-ray crystallography and NMR. Therefore, initiatives are underway to automatically generate homology models for large numbers of new protein sequences. One database of automatically generated homology models is SWISS-MODEL Repository: http://swissmodel.expasy.org/repository/

Is a homology model CORRECT? Since the actual (experimentally determined) structure of the target is not known, there is no way to say whether or not the homology model is “correct.” The best a researcher can do is compare the homology model to the structure of the template from which it was derived. If the atom positions in the model do not deviate very much from those of the template, the homology model is said to be “accurate.” The greater the deviation between model and template, the lower the accuracy of the model. When is a homology model definitely INCORRECT? A homology model has regions that are incorrect if it contains structural features that do not occur in native proteins, such as: • Hydrophobic side chains on the surface of the model (these side chains should be buried) • Buried polar or ionic groups that do not have their hydrogen-bonding or ionic-bonding capabilities “satisfied” by neighboring groups • Unreasonable bond lengths or angles • Unfavorable noncovalent contacts between atoms (clashes) • Unreasonable dihedral angles

Accuracy of homology modeling The template selection and alignment accuracy are crucial to the accuracy of a homology model. The accuracy of the model depends on the percentage of sequence identity between the target and template. The average coordinate agreement between the modeled structure and the actual structure drops ~0.3 Å for each 10% reduction in sequence identity. The largest structural differences between homologous proteins are in surface loops. In other words, the structure of the protein core is more highly conserved. Therefore, the regions that are most likely to be in error in a homology model are the surface loops.

Accuracy of homology modeling (contd) High-accuracy homology models can be built when the target and template have 50% or greater sequence identity. Errors are mostly mistakes in side-chain packing, small shifts of the core backbone regions, and occasionally larger errors in loops. Medium-accuracy homology models can be built when the proteins share 30-50% sequence identity. There can be alignment mistakes, and there are more frequent side-chain packing, core distortion, and loop modeling errors. Low-accuracy homology models are based on proteins that share <30% sequence identity. If a model is based on an almost insignificant alignment to a known structure, the model may have an entirely incorrect fold. The best model-building programs will produce models of similar accuracy, provided that the methods are used optimally.

Building the framework Building non-conserved loops Building the model Building the framework Building non-conserved loops Backbone generation Side chain modelling

Protein threading Protein threading scans the amino acid sequence of an unknown structure against a database of solved structures.

Threading for tertiary structure prediction Structure is more conserved than sequence, so many proteins share similar folds, even in the absence of sequence similarity. If a suitable template does not exist for homology modeling of a target sequence, threading can be used to identify a potential structure for the target from among known structures of proteins that do not share significant sequence similarity with the target sequence. Threading predicts the structural fold of a protein by fitting its sequence into a structural database and selecting the best-fitting fold. Essentially, the target sequence is tested for compatibility with all structures in the database. Various methods are used to compare the target sequence to the known structures and determine which one, if any, it fits best. Unlike homology modeling, threading does not result in an all-atom structural model for the target sequence. Nevertheless, these relatively poor models can still potentially provide insight into the function of a new protein. There is a high rate of false positives when using threading.

A few of the freely available programs for threading: GenTHREADER– another version called pGenTHREADER makes use of profiles and predicted secondary structure to increase accuracy. http://bioinf.cs.ucl.ac.uk/web_servers/ 3D-PSSM– beware: template library may be outdated http://www.sbg.bio.ic.ac.uk/~3dpssm/index2.html 3D-JIGSAW http://bmm.cancerresearchuk.org/~3djigsaw/

Ab initio structural prediction Ab initio predictions are based on sequence information only, without the aid of any known structures. Since proteins fold on their own to their correct structures, there must be information about that structure inherent in the amino acid sequence. Ab initio methods try to use what is currently known about the physicochemical laws governing protein folding to predict the structure of a protein from its sequence. The normal, functional structure of a protein (its “native state”) is often a conformation that has the lowest possible free energy. Ab initio methods predict a structure for a target protein by attempting to find the lowest energy conformation that the polypeptide chain can adopt. One approach would be to try out ALL possible conformations to determine which one has the lowest energy. However, this is not computationally possible at this time. (It would take 1020 years for a 40-residue protein!) Ab initio methods, therefore, use a variety of heuristic approaches to sample only some of the possible conformations in an attempt to find the one with the lowest energy.

Ab initio structural prediction: Ab initio methods are largely unsuccessful. Ab initio methods are useful only in cases where homology modeling and threading fail, and then the prediction should be interpreted very cautiously. Structural proteomics efforts are underway which may soon make ab initio methods largely obsolete. It is estimated that most of the possible structural folds have already been solved by x-ray diffraction or NMR. When at least one protein structure from each possible fold has been determined experimentally, it will then be possible to predict other structures using homology modeling and threading, obviating the need for ab initio methods. In the meantime, ab initio methods are useful for exploring the relationship between sequence and structure, and providing insight into the process of protein folding. In addition, ab initio methods can sometimes be useful during homology modeling for building portions of proteins not present in the template structure (loop building).

Accuracy and application of protein structure models. Structures A-C are homology models based on about 60% (A), 40% (B), and 30% (C) sequence identity to their template structure. Structures D and E are ab initio predictions using a program called Rosetta. Predicted structures are in red, and actual structures are in blue. The accuracy of the models decrease significantly in going from A to E, but the overall structure is still roughly correct.

CASP: Critical Assessment of Techniques for Protein Structure Prediction CASP is an international contest held every two years in which scientists try to predict the structures of proteins using methods they have developed that include homology modeling, threading, and ab initio techniques. Contestants are given the sequences of proteins whose structures have been determined by x-ray crystallography or NMR but have not yet been made public. After contestants have made and submitted their predictions, the actual structures are released, the predictions are compared to the actual structures, and the predictions are assessed for accuracy. The CASP contest is a major driving force in the development of tertiary structure prediction methods. CASP began in 1994; CASP10 was held in 2012.

When dealing with predicted protein structures, it is important to remember: “Models are not molecules observed” No matter how they are obtained, before we ask what they tell us, we must ask how well macromolecular models fit with other things we already know. A model is like any scientific theory: it is useful only to the extent that it supports predictions that we can test by experiment. Our initial confidence in it is justified only to the extent that it fits what we already know. Our confidence can grow only if its predictions are verified.