Protein Structure Prediction
Historical Perspective Protein Folding: From the Levinthal Paradox to Structure Prediction, Barry Honig, 1999 A personal perspective on advances and developments in protein folding over the last 40 years
Levinthal Paradox Cyrus Levinthal, Columbia University, 1968 Observed that there is insufficient time to randomly search the entire conformational space of a protein Resolution: Proteins have to fold through some directed process Goal is to understand the dynamics of this process
Old vs. New Views Old: Heirarchical view of protein folding Secondary structures form, then interact to form tertiary structures General order of events New: Statistical ensembles of states Potential energy landscape Folding “Funnel” Not all that different; most important ideas were theorized many years ago
Secondary Structures Consensus view is that secondary structure formation is the earliest part of the folding process Numerous studies indicate that local sequence codes for local structures Helical sequences in a folded protein tend to be helical in isolation Current SSE prediction algorithms about 70% correct (1993). Failure indicates some tertiary interactions in stabilizing SSEs
However… Not clear what sequence elements code for overall topology One factor is the existence of hydrophobic faces on the surface of SSEs Still challenges in predicting topology of SSEs, even when protein class is known
Atomic level calculations Molecular calculations have made great impact in our understanding of protein folding Harold Scheraga, 1968 Shneior Lifson, 1969 Martin Karplus’s laboratory, ~1979 Early calculations had trouble dealing with solvent effects
Secondary Structure Many of the essential elements of protein energetics can be derived from looking at SSE formation Early experimental work: Ingwall et all, 1968 Baldwin et all, 1989, Worked on stabilizing shorter helices Dyson, Wright, 1991, demonstrated that even short peptides in solution can be partially structured
Results Yang and Honig, 1995 Alpha-helices stabilized by hydrophobic interactions and close packing; hydrogen bonding has little effect Beta-sheets stabilized by non-polar interactions between residues on adjacent strands Work supports idea that SSEs coded for locally in the sequence
Folding Pathways SSEs can change conformation in the presence of a relatively small number of tertiary interactions Free-energy difference between alpha-helix, beta-sheet, and coil is not great Individual helices can be changed into beta- sheets by changing just a few amino acids This suggests that proteins have a “structural plasticity” which allows for changes in conformation
Folding Pathways Early in folding processes, many different combinations of SSEs have very similar stabilities In the end, it is the tertiary interactions which drive towards the native topology Early in folding, “flickering” of SSEs, eventually stabilized by tertiary interactions and converge to native state Suggests that multiple folding pathways exist, which can all lead to the same end result once stabilized
Structure Prediction Recently, a split has been seen Protein prediction problem Trying to predict the end result of folding, using a large amount of comparison between known and unknown structures Protein folding problem Trying to understand the folding path which leads to the end result of folding, typically by MD simulations or energy calculation Authors contention that both areas will need to be used together to fully understand protein folding
PrISM Yang and Honig, 1999 Software suite which integrates prediction based on simulations and known information about structures Sequence analysis Structure based sequence alignment Fast structure-structure superposition using a structural domain database Multiple Structure alignment Fold recognition and homology model building Used to make predictions for all 43 targets of CASP3 conference (more on CASP later)
Conclusions Much of the current understanding of protein folding was theorized long ago Vague and speculative ideas have been replaced by carefully defined theoretical concepts and rigorous experimental observations
Conclusions Polypeptide backbone is the most important determinant of structure SSEs are “meta-stable”; statement that sequence determines structure not wholly accurate More accurate statement is that sequence chooses from a limited set of available SSEs and determines how they are ordered in space
Conclusions Free-energy differences between alternate conformations is not large: may provide a bases for rapid evolutionary change
CASP A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction, John Moult CASP = Critical Assessment of Structure Prediction First held in 1994, every 2 years afterwards Teams make structure predictions from sequences alone
CASP Two categories of predictors Automated Automatic Servers, must complete analysis within 48 hours Shows what is possible through computer analysis alone Non-automated Groups spend considerable time and effort on each target Utilize computer techniques and human analysis techniques
CASP CASP6, 1994 200 prediction teams from 24 countries Over 30,000 predictions for 64 protein targets collected and evaluated Conference held after to discuss results, with many teams presenting individual results and methodologies Helps to steer future work
Modeling classes Comparative modeling based on a clear sequence relationship Modeling based on more distant evolutionary relationships Modeling based on non-homologous fold relationships Template free modeling
Comparative modeling based on a clear sequence relationship Easily detectable sequence relationship between the target protein and one or more known protein structures, typically through BLAST Copy from template, however: Must align target and template sequences In general, reliably building regions not present in the template is still a challenge Sidechain accuracy is poor Refinement remains a challenge
Comparative modeling based on a clear sequence relationship Progress in MD needed for refinement Models useful for identifying which members of a protein family have similar functionalities, and which are different
Modeling based on more distant evolutionary relationships Makes use of PSI-BLAST and hidden Markov models Compile a profile for the sequence, compare this profile to other known profiles Allows for prediction of structures, even when sequence is not close Use of metaservers to find consensus structures between CASP4 and CASP5 has led to improved accuracy
Modeling based on more distant evolutionary relationships Limitations: Correct template may not be identified Alignment of target sequence to template is not trivial Significant fraction of residues will have no structural equivalent in the template; modeling of these regions is hit or miss Although regions are similar, they are not identical, and the greater the difference, the higher the error Details are thus not accurate, but overall structure can be useful For improvements, must work together with template-free methodologies
Modeling based on more distant evolutionary relationships
Modeling based on non- homologous fold relationships Protein “threading” In recent CASP experiments, these methods have not been competitive with template free models
Template-free Modeling For sequences where no template is available Historically physics based approaches were used Newer methods focus on substructures While we have not seen all folds, we have probably seen nearly all substructures Make use of substructure relationships From a few residues through SSEs to super- secondary structures
Template-free Modeling Range of possible conformations and considered Most successful package has been ROSETTA For proteins less than ~100 residues, produce one or several approximately correct structures (4-6 A rmsd for C-alpha atoms) Selecting the most accurate structures from all possibilities is still to be solved, typically make use of clustering currently Development of atomic models is crucial to further progress
Template-free Modeling
CASP Progress