Protein Structure Prediction. Historical Perspective Protein Folding: From the Levinthal Paradox to Structure Prediction, Barry Honig, 1999 A personal.

Slides:



Advertisements
Similar presentations
Secondary structure prediction from amino acid sequence.
Advertisements

Blast to Psi-Blast Blast makes use of Scoring Matrix derived from large number of proteins. What if you want to find homologs based upon a specific gene.
PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
Protein Tertiary Structure Prediction
Structural bioinformatics
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
Strict Regularities in Structure-Sequence Relationship
CISC667, F05, Lec21, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction 3-Dimensional Structure.
Protein structure (Part 2 of 2).
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Protein Fold recognition Morten Nielsen, Thomas Nordahl CBS, BioCentrum, DTU.
Thomas Blicher Center for Biological Sequence Analysis
Energetics and kinetics of protein folding. Comparison to other self-assembling systems?
Mossbauer Spectroscopy in Biological Systems: Proceedings of a meeting held at Allerton House, Monticello, Illinois. Editors: J. T. P. DeBrunner and E.
Summary Protein design seeks to find amino acid sequences which stably fold into specific 3-D structures. Modeling the inherent flexibility of the protein.
The Protein Data Bank (PDB)
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Protein Tertiary Structure. Primary: amino acid linear sequence. Secondary:  -helices, β-sheets and loops. Tertiary: the 3D shape of the fully folded.
Molecular modelling / structure prediction (A computational approach to protein structure) Today: Why bother about proteins/prediction Concepts of molecular.
Protein structure determination & prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray.
1 Protein Structure Prediction Charles Yan. 2 Different Levels of Protein Structures The primary structure is the sequence of residues in the polypeptide.
Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structures.
Bioinformatics Ayesha M. Khan Spring 2013.
Protein Structure Prediction and Analysis
Bioinf. Data Analysis & Tools Molecular Simulations & Sampling Techniques117 Jan 2006 Bioinformatics Data Analysis & Tools Molecular simulations & sampling.
Homology Modeling David Shiuan Department of Life Science and Institute of Biotechnology National Dong Hwa University.
Protein Tertiary Structure Prediction
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
Forces and Prediction of Protein Structure Ming-Jing Hwang ( 黃明經 ) Institute of Biomedical Sciences Academia Sinica
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,
Using Motion Planning to Study Protein Folding Pathways Susan Lin, Guang Song and Nancy M. Amato Department of Computer Science Texas A&M University
Representations of Molecular Structure: Bonds Only.
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
Lecture 12 CS5661 Structural Bioinformatics Motivation Concepts Structure Prediction Summary.
1 P9 Extra Discussion Slides. Sequence-Structure-Function Relationships Proteins of similar sequences fold into similar structures and perform similar.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
Secondary structure prediction
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Part I : Introduction to Protein Structure A/P Shoba Ranganathan Kong Lesheng National University of Singapore.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
Structure prediction: Homology modeling
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson Let’s think!
Protein Structure Prediction Graham Wood Charlotte Deane.
Comparative methods Basic logics: The 3D structure of the protein is deduced from: 1.Similarities between the protein and other proteins 2.Statistical.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Protein Folding & Biospectroscopy Lecture 4 F14PFB David Robinson.
CS-ROSETTA Yang Shen et al. Presented by Jonathan Jou.
Modelling genome structure and function Ram Samudrala University of Washington.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structure Prediction. Protein Sequence Analysis Molecular properties (pH, mol. wt. isoelectric point, hydrophobicity) Secondary Structure Super-secondary.
Protein Structure BL
Protein dynamics Folding/unfolding dynamics
Protein Structures.
Rosetta: De Novo determination of protein structure
Protein structure prediction.
Presentation transcript:

Protein Structure Prediction

Historical Perspective Protein Folding: From the Levinthal Paradox to Structure Prediction, Barry Honig, 1999 A personal perspective on advances and developments in protein folding over the last 40 years

Levinthal Paradox Cyrus Levinthal, Columbia University, 1968 Observed that there is insufficient time to randomly search the entire conformational space of a protein Resolution: Proteins have to fold through some directed process Goal is to understand the dynamics of this process

Old vs. New Views Old:  Heirarchical view of protein folding  Secondary structures form, then interact to form tertiary structures  General order of events New:  Statistical ensembles of states  Potential energy landscape  Folding “Funnel” Not all that different; most important ideas were theorized many years ago

Secondary Structures Consensus view is that secondary structure formation is the earliest part of the folding process Numerous studies indicate that local sequence codes for local structures  Helical sequences in a folded protein tend to be helical in isolation Current SSE prediction algorithms about 70% correct (1993). Failure indicates some tertiary interactions in stabilizing SSEs

However… Not clear what sequence elements code for overall topology One factor is the existence of hydrophobic faces on the surface of SSEs Still challenges in predicting topology of SSEs, even when protein class is known

Atomic level calculations Molecular calculations have made great impact in our understanding of protein folding Harold Scheraga, 1968 Shneior Lifson, 1969 Martin Karplus’s laboratory, ~1979 Early calculations had trouble dealing with solvent effects

Secondary Structure Many of the essential elements of protein energetics can be derived from looking at SSE formation Early experimental work: Ingwall et all, 1968 Baldwin et all, 1989, Worked on stabilizing shorter helices Dyson, Wright, 1991, demonstrated that even short peptides in solution can be partially structured

Results Yang and Honig, 1995 Alpha-helices stabilized by hydrophobic interactions and close packing; hydrogen bonding has little effect Beta-sheets stabilized by non-polar interactions between residues on adjacent strands Work supports idea that SSEs coded for locally in the sequence

Folding Pathways SSEs can change conformation in the presence of a relatively small number of tertiary interactions Free-energy difference between alpha-helix, beta-sheet, and coil is not great Individual helices can be changed into beta- sheets by changing just a few amino acids This suggests that proteins have a “structural plasticity” which allows for changes in conformation

Folding Pathways Early in folding processes, many different combinations of SSEs have very similar stabilities In the end, it is the tertiary interactions which drive towards the native topology Early in folding, “flickering” of SSEs, eventually stabilized by tertiary interactions and converge to native state Suggests that multiple folding pathways exist, which can all lead to the same end result once stabilized

Structure Prediction Recently, a split has been seen  Protein prediction problem Trying to predict the end result of folding, using a large amount of comparison between known and unknown structures  Protein folding problem Trying to understand the folding path which leads to the end result of folding, typically by MD simulations or energy calculation Authors contention that both areas will need to be used together to fully understand protein folding

PrISM Yang and Honig, 1999 Software suite which integrates prediction based on simulations and known information about structures  Sequence analysis  Structure based sequence alignment  Fast structure-structure superposition using a structural domain database  Multiple Structure alignment  Fold recognition and homology model building Used to make predictions for all 43 targets of CASP3 conference (more on CASP later)

Conclusions Much of the current understanding of protein folding was theorized long ago Vague and speculative ideas have been replaced by carefully defined theoretical concepts and rigorous experimental observations

Conclusions Polypeptide backbone is the most important determinant of structure SSEs are “meta-stable”; statement that sequence determines structure not wholly accurate More accurate statement is that sequence chooses from a limited set of available SSEs and determines how they are ordered in space

Conclusions Free-energy differences between alternate conformations is not large: may provide a bases for rapid evolutionary change

CASP A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction, John Moult CASP = Critical Assessment of Structure Prediction First held in 1994, every 2 years afterwards Teams make structure predictions from sequences alone

CASP Two categories of predictors  Automated Automatic Servers, must complete analysis within 48 hours Shows what is possible through computer analysis alone  Non-automated Groups spend considerable time and effort on each target Utilize computer techniques and human analysis techniques

CASP CASP6, 1994  200 prediction teams from 24 countries  Over 30,000 predictions for 64 protein targets collected and evaluated  Conference held after to discuss results, with many teams presenting individual results and methodologies  Helps to steer future work

Modeling classes Comparative modeling based on a clear sequence relationship Modeling based on more distant evolutionary relationships Modeling based on non-homologous fold relationships Template free modeling

Comparative modeling based on a clear sequence relationship Easily detectable sequence relationship between the target protein and one or more known protein structures, typically through BLAST Copy from template, however:  Must align target and template sequences  In general, reliably building regions not present in the template is still a challenge  Sidechain accuracy is poor Refinement remains a challenge

Comparative modeling based on a clear sequence relationship Progress in MD needed for refinement Models useful for identifying which members of a protein family have similar functionalities, and which are different

Modeling based on more distant evolutionary relationships Makes use of PSI-BLAST and hidden Markov models Compile a profile for the sequence, compare this profile to other known profiles Allows for prediction of structures, even when sequence is not close Use of metaservers to find consensus structures between CASP4 and CASP5 has led to improved accuracy

Modeling based on more distant evolutionary relationships Limitations:  Correct template may not be identified  Alignment of target sequence to template is not trivial  Significant fraction of residues will have no structural equivalent in the template; modeling of these regions is hit or miss  Although regions are similar, they are not identical, and the greater the difference, the higher the error Details are thus not accurate, but overall structure can be useful For improvements, must work together with template-free methodologies

Modeling based on more distant evolutionary relationships

Modeling based on non- homologous fold relationships Protein “threading” In recent CASP experiments, these methods have not been competitive with template free models

Template-free Modeling For sequences where no template is available Historically physics based approaches were used Newer methods focus on substructures  While we have not seen all folds, we have probably seen nearly all substructures Make use of substructure relationships  From a few residues through SSEs to super- secondary structures

Template-free Modeling Range of possible conformations and considered Most successful package has been ROSETTA For proteins less than ~100 residues, produce one or several approximately correct structures (4-6 A rmsd for C-alpha atoms) Selecting the most accurate structures from all possibilities is still to be solved, typically make use of clustering currently Development of atomic models is crucial to further progress

Template-free Modeling

CASP Progress