Representations of Molecular Structure: Bonds Only.

Slides:



Advertisements
Similar presentations
PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
Advertisements

Rosetta Energy Function Glenn Butterfoss. Rosetta Energy Function Major Classes: 1. Low resolution: Reduced atom representation Simple energy function.
Protein Structure Prediction using ROSETTA
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
Structural bioinformatics
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
CISC667, F05, Lec21, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction 3-Dimensional Structure.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Can protein model accuracy be identified? Morten Nielsen, CBS, BioCentrum, DTU.
Thomas Blicher Center for Biological Sequence Analysis
Protein Tertiary Structure Prediction. Protein Structure Prediction & Alignment Protein structure Secondary structure Tertiary structure Structure prediction.
The Protein Data Bank (PDB)
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Tertiary protein structure modelling May 31, 2005 Graded papers will handed back Thursday Quiz#4 today Learning objectives- Continue to learn how to manipulate.
1 Protein Structure Prediction Reporter: Chia-Chang Wang Date: April 1, 2005.
Molecular modelling / structure prediction (A computational approach to protein structure) Today: Why bother about proteins/prediction Concepts of molecular.
IV. Protein Structure Prediction and Determination Methods of protein structure determination Critical assessment of structure prediction Homology modelling.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structures.
Bioinformatics Ayesha M. Khan Spring 2013.
Protein Structural Prediction. Protein Structure is Hierarchical.
Computational Structure Prediction Kevin Drew BCH364C/391L Systems Biology/Bioinformatics 2/12/15.
Protein Structure Prediction Dr. G.P.S. Raghava Protein Sequence + Structure.
Protein Tertiary Structure Prediction
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
Macromolecular structure
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
COMPARATIVE or HOMOLOGY MODELING
Protein Structure Prediction. Historical Perspective Protein Folding: From the Levinthal Paradox to Structure Prediction, Barry Honig, 1999 A personal.
ProteinShop: A Tool for Protein Structure Prediction and Modeling Silvia Crivelli Computational Research Division Lawrence Berkeley National Laboratory.
Lecture 12 CS5661 Structural Bioinformatics Motivation Concepts Structure Prediction Summary.
1 P9 Extra Discussion Slides. Sequence-Structure-Function Relationships Proteins of similar sequences fold into similar structures and perform similar.
© Wiley Publishing All Rights Reserved. Protein 3D Structures.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
Secondary structure prediction
Doug Raiford Lesson 19.  Framework model  Secondary structure first  Assemble secondary structure segments  Hydrophobic collapse  Molten: compact.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
Applied Bioinformatics Week 12. Bioinformatics & Functional Proteomics How to classify proteins into functional classes? How to compare one proteome with.
Protein Folding and Modeling Carol K. Hall Chemical and Biomolecular Engineering North Carolina State University.
Structure prediction: Homology modeling
New Strategies for Protein Folding Joseph F. Danzer, Derek A. Debe, Matt J. Carlson, William A. Goddard III Materials and Process Simulation Center California.
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Predicting Protein Structure: Comparative Modeling (homology modeling)
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Modelling protein tertiary structure Ram Samudrala University of Washington.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson Let’s think!
Programme Last week’s quiz results + Summary Fold recognition Break Exercise: Modelling remote homologues Summary.
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
CS-ROSETTA Yang Shen et al. Presented by Jonathan Jou.
Modelling genome structure and function Ram Samudrala University of Washington.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
3.3b1 Protein Structure Threading (Fold recognition) Boris Steipe University of Toronto (Slides evolved from original material.
Protein Structure Visualisation
Computational Structure Prediction
Protein Structure Prediction and Protein Homology modeling
Protein Structure Prediction
Protein Structures.
Rosetta: De Novo determination of protein structure
Homology Modeling.
Protein structure prediction.
Protein Homology Modelling
Protein structure prediction
Presentation transcript:

Representations of Molecular Structure: Bonds Only

Representations of Molecular Structure: Atoms Only

Representations of Molecular Structure: Atoms and Bonds

Representations of Molecular Structure: Ribbons

Representations of Molecular Structure: Mixed

Representations of Molecular Structure: van der Waals Surface

Representations of Molecular Structure: Solvent Excluded Surface

Protein Structure Prediction

Protein folding is different from structure prediction Folding is concerned with the process of taking the 3D shape, usually based on physical principles. Prediction uses any statistical, theoretical or empirical data to try to get at the end result.

Protein Structure Prediction A bit of history: Asilomar, 1994, 1996, 1998, 2000, 2002, & 2004 (pending) Three approaches to structure prediction: a. Homology modeling b. Sequence-structure threading c. Ab initio prediction

Asilomar Experimentalists who had structures that would be solved before date of CASP meeting submitted the sequences of the unknowns to a central repository. Predictors could download sequence and minimal information about protein (name), and could enter one of three categories. Assessors use automatic programs for analysis in addition to expertise to evaluate quality of predictions.

CASP6 in Numbers Number of human expert groups registered 228 Number of prediction servers registered 65 Number of targets released 87 Targets canceled 11 Valid targets 76 Targets for human expert prediction 76 Targets for server prediction 76

CASP6: Accepted Predictions Prediction formatNo. groupsNo. 1 ModelsAll Models 3D coordinates Alignments to PDB Residue-residue contacts Domains assignments Disordered regions Function prediction All228 (unique)

Asilomar Categories Homology Modeling (sequences with high homology to sequences of known structures) Given a sequence with homology > 25-30% with known structure in PDB, use known structure as starting point to create a model of the 3D structure of the sequence. Takes advantage of knowledge of a closely related protein. Use sequence alignment techniques to establish correspondences between known “template” and unknown.

Asilomar Categories Fold recognition (sequences with no sequence identity (<= 30%) to sequences of known structure. Given the sequence, and a set of folds observed in PDB, see if any of the sequences could adopt one the known folds. Takes advantage of knowledge of existing structures, and principles by which they are stabilized (favorable interactions).

Fold Recognition New sequence: MLDTNMKTQLKAYLEKLTKPVELIATL DDSAKSAEIKELL… Library of known folds:

Asilomar Categories Ab initio prediction (no known homology with any sequence of known structure) Given only the sequence, predict the 3D structure from “first principles”, based on energetic or statistical principles. Secondary structure prediction and multiple alignment techniques used to predict features of these molecules. Then, some method necessary for assembling 3D structure.

Ab initio prediction New sequence: MLDTNMKTQLKAYLEKLTKPVELIATLDD SAKSAEIKELL… Predict secondary structure: MLDTNMKTQLKAYLEKLTKPVELIATLDD SAKSAEIKELL… HHHHHCCCCCHHHHHHHHHHCCCCBBB BBBBCCBBBB… Predict 3D structure entirely:

Asilomar Results How to evaluate predictions? RMSD Overall identification and topology of secondary structures Energy considerations (contacts, H- bonds) Similarity of hydrophobic core Sequence alignment quality (and systematic shift)

Homology Modeling When sequence homology is > 70%, high resolution models are possible (< 3 Å RMSD). Sophisticated energy minimization techniques do not dramatically improve upon initial guess. Rigorous criteria applied such as torsion angles, van der Waals violations, RMSD.

Homology Modeling Samples Thick backbone shows known structure. Thin lines show modeled structures. Some sidechains are not positioned correctly, but backbone and other sidechains look quite good.

Homology Modeling Mistakes a. Sidechain mistakes b. Shifts with correct alignment c. No template d. Misalignment e. Incorrect template

Limitations of Homology Modeling

Useful Conclusions from CASP Use of sensitive multiple alignment techniques helped get best alignments. Side chain modeling uses libraries of known amino acid conformations. Success ranged from 45% to 80% correct (= angles within 30° of experimental structure). Energy based refinement still not improving the structures.

Ab Initio Predictions – From Primary to Secondary Range of accuracy from 66% to 77% (3 state labeling: helix, coil or beta). Human hand editing improves the accuracy. Multiple sequence alignments improve the performance of secondary structure prediction.

Ab Initio Predictions – From Secondary to Tertiary Sensitive to errors in secondary structure Predictors were more likely to predict previously known structures.

Ab Initio Predictions – From Primary to Tertiary Predict interresidue contacts and then compute structure (mild success) Simplified energy term + reduced search space (phi/psi or lattice) (moderate success) Creative ways to memorize sequence structure correlations in short segments from the PDB, and use these to model new structures. database method. (moderate success)

Ab Initio Predictions – Tertiary (1 to 3): Good Methods Associate sequence of unknown with known 3D structure library, and then optimizing contact frequency of amino acids, as measured in PDB (Baker et al). Generate all folds on lattice and then filter the bad ones out (Samudrala et al) Combine multiple sequence alignment, secondary structure prediction and lattice. (Skolnick et al)

Lattice Model: Overcoming Entropic Barriers

Substructure/Fragment Model: Overcoming Entropic Barriers Break target into fragments of 9 amino acids Search for similar PDB sequences based on sequence similarity Start with extended chain, and evaluate the effect of introducing the fragments into the chain.

Substructure/Fragment Model: Overcoming Entropic Barriers Use Metropolis-type algorithm for optimization, using following terms: – hydrophobic burial – polar side-chain interactions – hydrogen bonding between beta-strands – hard sphere repulsion (van der Waals) Create 1000 structures, cluster them. Choose one representative from each cluster as possible prediction…

Successful Stories of Rosseta

Fold Recognition Becoming More Important CASP1: Of 21 target proteins, 11 wound up having folds that were previously known. CASP2: Of 22 targets, 15 with available folds CASP3: Of 43 targets, 36 with available folds …

Fold Recognition Every predictor does well on something. Common folds (more examples) are easier to recognize. Fold recognition was the surprise performer at the first competition. Incremental progress at second, third, fourth …

Fold Recognition Not “all or none”. List of top N hits much better than top hit. Common folds easier to recognize. Quality of alignments that result is NOT good. Potentials include: residue pair contact terms, hydrophobicity, polarity, H-bonds, local structure terms.

1 = target, 2 = Fold in PDB

Elements of a fold recognition algorithm Library of protein structures, suitably processed - All structures - Representative subset - Structures with loops removed Scoring function - contact potential - environmental evaluation function Method for generating initial alignments and/or searching for better alignments.

Scoring: Contact Potential Instead of modeling energies from first physical principles, simplify the problem by positioning only amino acids, and compute empirical energies from the observed associations of amino acids. “GLU is attracted to LYS” = E(glu, lys)

Scoring: Contact Potential Create energy terms between amino acids: E(interaction) = -KT ln[frequency of interaction] Frequency of interaction is measured in database of known structures. Higher frequency, more favorable interaction.

Sippl Contact Potential Given: a = amino acid type a (ALA, VAL, etc...) b = amino acid type b s = separation in sequence Δ E abs (r) = E abs (r) — E s (r) Energy of interaction between a and b minus average energy at that separation equals the energy difference that contributes to stability.

Sippl Contact Potential Thus we have: ΔE abs (r) = -KT ln [ f abs (r) / f s (r) ] For any given sequence in 3D, compute distances between all pairs of amino acids (usually up to r = 10-15Å), and sum. ΔEtot = Σ ΔEabs(r) all a,b pairs

Using Contact Potential Given 3D structure, need to mount the sequence on the structure. – dynamic programming (okay) – exhaustive enumeration (too expensive) (recent paper shows that this is NP-hard) – heuristic enumeration—limit on gap lengths, loop lengths (heuristic) Evaluate the contact potential for the alignment. [Optional] Locally optimize the potential score. Compare potential with random shuffle of sequence, and with other sequences to approximate z-score.

Future of Structure Predictions Protein fold recognition will get asymptotically better, as we get more folds. Best ab initio methods use knowledge of database, and will thus also improve. Estimates are that we now have between 30% and 50% of folds that occur. Given fold, we need to improve refinement with homology modeling techniques.