Download presentation
Presentation is loading. Please wait.
1
Protein Structure Prediction
Ming-Jing Hwang (黃明經) N121, Institute of Biomedical Sciences Academia Sinica
2
Objective To understand the what, why, and how aspects of protein structure prediction, as well as its current status and use.
3
Science 2005
4
Why structure? Most proteins fold to function. Structure allows us to understand how a protein functions, often with mechanistic details, more than sequence can. With the knowledge we can design experiments to further probe the protein’s function, or, in the case of a disease protein, devise ways to counter the disease process (e.g. drug design).
5
Ex: Structure & function of potassium channel
MacKinnon, 1998 (2003 Nobelist)
6
Some structure biology Nobel winners
F. H. C. Crick, J. D. Watson, M. H. C. Wilkins (Physiology or Medicine, 1962)for their discoveries concerning the molecular structure of nuclear acids and its significance for information transfer in living material M. F. Perutz, Sir J. C. Kendrew (Chemistry, 1962)for their studies of the structures of globular proteins D. Crowfoot Hodgkin (Chemistry, 1964)for her determinations by X-ray techniques of the structures of important biochemical substances Sir A. Klug (Chemistry, 1982) for his development of crystallographic electron microscopy and his structural elucidation of biologically important nuclei acid-protein complexes J. Deisenhofer, R. Huber, H. Michel (Chemistry, 1988) for the determination of the three-dimensional structure of a photosynthetic reaction centre P. D. Boyer, J. E. Walker, J. C. Skou (Chemistry, 1997)for their elucidation of the enzymatic mechanism underlying the synthesis of adenosine triphosphate (ATP) [Boyer, Walker] for the first discovery of an ion-transporting enzyme, Na+, K+ -ATPase [Skou] J. B. Fenn, K. Tanaka, K. Wüthrich (Chemistry, 2002) for the development of methods for identification and structure analyses of biological macromolecules for their development of soft desorption ionisation methods for mass spectrometric analyses of biological macromolecules [Fenn, Tanaka] for his development of nuclear magnetic resonance spectroscopy for determining the three-dimensional structure of biological macromolecules in solution[Wüthrich] R. D. Kornberg (Chemistry, 2006)for his studies of the molecular basis of eukaryotic transcription V. Ramakrishnan, T.A. Steitz, A.E. Yonath (Chemistry 2009)for studies of the structure and function of the ribosome
7
Why prediction? Structure determination by experimental methods (X-ray, NMR, etc.) is still hard, especially with obstacles at early steps (e.g. expression and crystallization) To bridge the widening gap between sequence and structure
8
Sequence/Structure Gap
As of June 02, 2009, the number of entries in protein sequence and structure database: SWISS-PROT/TREMBL : 468,851/7,916,844 PDB : 57,835 Sequence Structure
9
Structural genomics and drug design
Structural Genomics: HM as work horse Structural genomics and drug design Baker & Sali, 2001
10
Structure prediction: 1D->3D, then Function
MADWVTGKVTKVQNWTDALFSLTVHAPVLPFTAGQFTKLGLEIDGERVQRAYSYVNSPDNPDLEFYLVTVPDGKLSPRLAALKPGDEVQVVSEAAGFFVLDEVPHCETLWMLATGTAIGPYLSILR UNKNOWN KNOWN
11
Prediction is very hard, especially if you are predicting unknowns.
12
Why do we believe in prediction at all?
Christian Anfinson, in an elegant experiment in 1957, showed that ribonuclease A (124 aa’s), after having been completely denatured using 8M urea and 2-mercapto-ethanol, regained full enzymatic activity when the urea and 2-ME were slowly removed by dialysis. All the information needed to fold is contained within the primary sequence. (1957)
13
Theory of Structure Prediction
Energy Landscape Theory of Structure Prediction Nature makes the landscapes of real proteins funneled. You have to work to make the energy landscapes of structure prediction schemes funneled. Let me show you some of the things you have to consider. Zaida (Zan) Luthey-Schulten
14
How to do 1D3D? (I) Physics-based approach: computing energy as a function of structure (surfing the energy surface)
15
Molecular Mechanics (Force Field)
16
Levitt
18
A POP study: 1-microsecond MD simulation
980ns villin headpiece 36 a.a. 3000 H2O 12,000 atoms 256 CPUs (CRAY) ~4 months single trajectory Duan & Kollman, 1998
19
Science 2010 (1 millisecond; previous longest 10 microsecond; Amber FF) Fig. 1 Folding proteins at x-ray resolution, showing comparison of x-ray structures (blue) (15, 24) and last frame of MD simulation (red): (A) simulation of villin at 300 K, (B) simulation of FiP35 at 337 K. Simulations were initiated from completely extended structures. Villin and FiP35 folded to their native states after 68 µs and 38 µs, respectively, and simulations were continued for an additional 20 µs after the folding event to verify the stability of the native fold.
20
Massively distributed computing
Letters to nature (2002) engineered protein (BBA5) zinc finger fold (w/o metal) 23 a.a. solvation model thousands of trajectories each of 5-20 ns, totaling 700 ms 30,000 internet volunteers several months, or ~a million CPU days of simulation
21
Worldwide distributed computing
Pande group
22
Massively distributed computing
…
23
The problem: timescales
Bond vibration Isomeris- ation Water dynamics Helix forms Fastest folders typical folders slow folders 10-15 femto 10-12 pico 10-9 nano 10-6 micro 10-3 milli 100 seconds MD step long MD run where we need to be where we’d love to be 16 order of magnitude range Femtosecond timesteps Need to simulate micro to milliseconds Pande group
24
Biology Can’t Wait! (Evolution to rescue)
One Big Family.
25
How to do 1D->3D ab initio
How to do 1D->3D ab initio? (II) Biology-based approach: data (knowledge)-mining Ignore the actual folding process in cell, instead focus on the end point!
26
The 123 (1D fragment3D) approach
Primary LGINCRGSSQCGLSGGNLMVRIRDQACGNQGQTWCPGERRAKVCGTGNSISAYVQSTNNCISGTEACRHLTNLVNHGCRVCGSDPLYAGNDVSRGQLTVNYVNSC seq. to str. mapping fragment (structural motifs) Tertiary fragment assembly
27
The I-sites library (Baker’s group)
28
Fragment insertion Monte Carlo
Rosetta: a folding simulation program (a try and error process) Fragment insertion Monte Carlo backbone torsion angles accept or reject fragments Energy function Choose a fragment change backbone angles evaluate Convert to 3D course/ 2002/ cbio/ handouts/ Class8
29
Does it work? The CASP experiments
30
One lab dominated in CASP4
Baker’s group dominates the ab initio (knowledge-based) prediction in CASP4 One lab dominated in CASP4
31
Some CASP4 successes Baker’s group
32
# of residues with cRMS below 4Å/6Å
ROSETTA results at CASP5 # of residues with cRMS below 4Å/6Å Name Length human Automatic Best decoy T135 106 83/98 54/64 94/105 T149 116 52/71 44/62 76/92 T161 154 45/83 57/79 55/95 Rosetta predictions in CASP5: Successes, failures, and prospect for complete automation. Baker et all, Proteins, 53: (2003)
33
Toward High-Resolution de Novo Structure Prediction for Small Proteins
--Philip Bradley, Kira M. S. Misura, David Baker (Science 2005) The prediction of protein structure from amino acid sequence is a grand challenge of computational molecular biology. By using a combination of improved low- and high-resolution conformational sampling methods, improved atomically detailed potential functions that capture the jigsaw puzzle–like packing of protein cores, and high-performance computing, high-resolution structure prediction (<1.5 angstroms) can be achieved for small protein domains (<85 residues). The primary bottleneck to consistent high-resolution prediction appears to be conformational sampling.
34
Still, not practical for most …
Small proteins Expensive (computationally): sampling Not for everyday biologists …
35
HM: the poor man’s solution
36
Similar sequences
37
Similar structures with low sequence similarity
9% sequence identity Shapiro & Harris, 2000
38
Another example FtsZ and tubulin would not be recognized as homologous by sequence comparison Burns, R., Nature 391: (1998)
39
Fold recognition Query sequence Library of known folds Best-fit fold
Mark Gerstein Lab
40
FR by threading Query sequence:
Thread the sequence onto the fold template Use structural properties to evaluate the fit Environment Pairwise interactions Mark Gerstein Lab
41
Pitfalls of comparative (homology) modeling
Difficult to detect and correct alignment errors More similar to template than to true structure Cannot predict novel folds (template may be wrong!)
42
Structure Prediction Methods
Twilight zone Homology modeling Fold recognition ab initio % sequence identity
43
Protein Structure Prediction
clickable map
44
Reliability and uses of comparative models
Marti-Renom et al. (2000)
45
Success and limitations of structure prediction
Models of large and remotely related proteins are not very accurate Domain boundaries are difficult to define Models often do not provide details for functional annotation Success: Accuracy scores almost doubled from CASP1 to CASP6, might be because of database size Models of small targets are very accurate Kryshtafovych et al 2005 Manager/ Files/ Panchenko/ shaitan_kurs_lab2.ppt
46
Structural Bioinformatics: Sequence/Structure Relationship
Percent Identity 100 90 80 70 60 50 40 30 20 10 All possible sequences of amino acids Protein structures observed in nature Twilight zone Midnight zone Protein sequences observed in nature
47
Final exam assignment Find a protein sequence of any organism sharing no greater than 40% sequence identify with any accessible entry in PDB. Predict the 3D structure of your protein using whatever method/tool/server/database. Write a ~5 page report to document how you find the sequence, how you do (or get) the prediction, and how you visualize/describe the predicted model, along with thoughts/comments on your learning process. Submit your report to Cathy by 6/22/2011. Need help? Ask, read/surf, and try it!
48
PDB: the one-stop shop for structure bioinformatics
49
Selected Structural Biology Databases, Servers and Services
CASP-certified protein structure prediction servers I-TASSER ROBETTA HHpred METATASSER MULTICOM Pcons SAM-T08 3D-Jury THREADER Comparative Modeling Servers SwissModel MODELLER Protein secondary structure prediction servers PSIpred JPRED Database of protein structures PDB - Protein Data Bank Structural classifications of proteins SCOP CATH Structural neighbors database Dali Database
50
Thank You!
51
3D to 1D? Science 2003
52
A computer-designed protein (93 aa) with 1.2 A resolution
53
Structure prediction servers
54
Hybrid approach for solving macromolecular complex structures
55
(Rost, 1996)
56
Levinthal’s paradox (1969)
If we assume three possible states for every flexible dihedral angle in the backbone of a 100-residue protein, the number of possible backbone configurations is Even an incredibly fast computational or physical sampling in s would mean that a complete sampling would take 1080 s, which exceeds the age of the universe by more than 60 orders of magnitude. Yet proteins fold in seconds or less! Berendsen
57
The Rosetta method DECOYS: DISCRIMINATION: Kochl
Generate a large number of possible shapes DISCRIMINATION: Select the correct, native-like fold Need good decoy structures Need a good energy function Kochl
58
Nature 2007
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.