Protein Structure Prediction

Protein Structure Prediction
Ming-Jing Hwang (黃明經) N121, Institute of Biomedical Sciences Academia Sinica

Objective To understand the what, why, and how aspects of protein structure prediction, as well as its current status and use.

Science 2005

Why structure? Most proteins fold to function. Structure allows us to understand how a protein functions, often with mechanistic details, more than sequence can. With the knowledge we can design experiments to further probe the protein’s function, or, in the case of a disease protein, devise ways to counter the disease process (e.g. drug design).

Ex: Structure & function of potassium channel
MacKinnon, 1998 (2003 Nobelist)

Some structure biology Nobel winners
F. H. C. Crick, J. D. Watson, M. H. C. Wilkins (Physiology or Medicine, 1962)for their discoveries concerning the molecular structure of nuclear acids and its significance for information transfer in living material M. F. Perutz, Sir J. C. Kendrew (Chemistry, 1962)for their studies of the structures of globular proteins D. Crowfoot Hodgkin (Chemistry, 1964)for her determinations by X-ray techniques of the structures of important biochemical substances Sir A. Klug (Chemistry, 1982) for his development of crystallographic electron microscopy and his structural elucidation of biologically important nuclei acid-protein complexes J. Deisenhofer, R. Huber, H. Michel (Chemistry, 1988) for the determination of the three-dimensional structure of a photosynthetic reaction centre P. D. Boyer, J. E. Walker, J. C. Skou (Chemistry, 1997)for their elucidation of the enzymatic mechanism underlying the synthesis of adenosine triphosphate (ATP) [Boyer, Walker] for the first discovery of an ion-transporting enzyme, Na+, K+ -ATPase [Skou] J. B. Fenn, K. Tanaka, K. Wüthrich (Chemistry, 2002) for the development of methods for identification and structure analyses of biological macromolecules for their development of soft desorption ionisation methods for mass spectrometric analyses of biological macromolecules [Fenn, Tanaka] for his development of nuclear magnetic resonance spectroscopy for determining the three-dimensional structure of biological macromolecules in solution[Wüthrich] R. D. Kornberg (Chemistry, 2006)for his studies of the molecular basis of eukaryotic transcription V. Ramakrishnan, T.A. Steitz, A.E. Yonath (Chemistry 2009)for studies of the structure and function of the ribosome

Why prediction? Structure determination by experimental methods (X-ray, NMR, etc.) is still hard, especially with obstacles at early steps (e.g. expression and crystallization) To bridge the widening gap between sequence and structure

Sequence/Structure Gap
As of June 02, 2009, the number of entries in protein sequence and structure database: SWISS-PROT/TREMBL : 468,851/7,916,844 PDB : 57,835 Sequence Structure

Structural genomics and drug design
Structural Genomics: HM as work horse Structural genomics and drug design Baker & Sali, 2001

Structure prediction: 1D->3D, then Function
MADWVTGKVTKVQNWTDALFSLTVHAPVLPFTAGQFTKLGLEIDGERVQRAYSYVNSPDNPDLEFYLVTVPDGKLSPRLAALKPGDEVQVVSEAAGFFVLDEVPHCETLWMLATGTAIGPYLSILR UNKNOWN KNOWN

Prediction is very hard, especially if you are predicting unknowns.

Why do we believe in prediction at all?
Christian Anfinson, in an elegant experiment in 1957, showed that ribonuclease A (124 aa’s), after having been completely denatured using 8M urea and 2-mercapto-ethanol, regained full enzymatic activity when the urea and 2-ME were slowly removed by dialysis. All the information needed to fold is contained within the primary sequence. (1957)

Theory of Structure Prediction
Energy Landscape Theory of Structure Prediction Nature makes the landscapes of real proteins funneled. You have to work to make the energy landscapes of structure prediction schemes funneled. Let me show you some of the things you have to consider. Zaida (Zan) Luthey-Schulten

How to do 1D3D? (I) Physics-based approach: computing energy as a function of structure (surfing the energy surface)

Molecular Mechanics (Force Field)

Levitt

A POP study: 1-microsecond MD simulation
980ns villin headpiece 36 a.a. 3000 H2O 12,000 atoms 256 CPUs (CRAY) ~4 months single trajectory Duan & Kollman, 1998

Science 2010 (1 millisecond; previous longest 10 microsecond; Amber FF) Fig. 1 Folding proteins at x-ray resolution, showing comparison of x-ray structures (blue) (15, 24) and last frame of MD simulation (red): (A) simulation of villin at 300 K, (B) simulation of FiP35 at 337 K. Simulations were initiated from completely extended structures. Villin and FiP35 folded to their native states after 68 µs and 38 µs, respectively, and simulations were continued for an additional 20 µs after the folding event to verify the stability of the native fold.

Massively distributed computing
Letters to nature (2002) engineered protein (BBA5) zinc finger fold (w/o metal) 23 a.a. solvation model thousands of trajectories each of 5-20 ns, totaling 700 ms 30,000 internet volunteers several months, or ~a million CPU days of simulation

Worldwide distributed computing
Pande group

Massively distributed computing
…

The problem: timescales
Bond vibration Isomeris- ation Water dynamics Helix forms Fastest folders typical folders slow folders 10-15 femto 10-12 pico 10-9 nano 10-6 micro 10-3 milli 100 seconds MD step long MD run where we need to be where we’d love to be 16 order of magnitude range Femtosecond timesteps Need to simulate micro to milliseconds Pande group

Biology Can’t Wait! (Evolution to rescue)
One Big Family.

How to do 1D->3D ab initio
How to do 1D->3D ab initio? (II) Biology-based approach: data (knowledge)-mining Ignore the actual folding process in cell, instead focus on the end point!

The 123 (1D fragment3D) approach
Primary LGINCRGSSQCGLSGGNLMVRIRDQACGNQGQTWCPGERRAKVCGTGNSISAYVQSTNNCISGTEACRHLTNLVNHGCRVCGSDPLYAGNDVSRGQLTVNYVNSC seq. to str. mapping fragment (structural motifs) Tertiary fragment assembly

The I-sites library (Baker’s group)

Fragment insertion Monte Carlo
Rosetta: a folding simulation program (a try and error process) Fragment insertion Monte Carlo backbone torsion angles accept or reject fragments Energy function Choose a fragment change backbone angles evaluate Convert to 3D course/ 2002/ cbio/ handouts/ Class8

Does it work? The CASP experiments

One lab dominated in CASP4
Baker’s group dominates the ab initio (knowledge-based) prediction in CASP4 One lab dominated in CASP4

Some CASP4 successes Baker’s group

# of residues with cRMS below 4Å/6Å
ROSETTA results at CASP5 # of residues with cRMS below 4Å/6Å Name Length human Automatic Best decoy T135 106 83/98 54/64 94/105 T149 116 52/71 44/62 76/92 T161 154 45/83 57/79 55/95 Rosetta predictions in CASP5: Successes, failures, and prospect for complete automation. Baker et all, Proteins, 53: (2003)

Toward High-Resolution de Novo Structure Prediction for Small Proteins
--Philip Bradley, Kira M. S. Misura, David Baker (Science 2005) The prediction of protein structure from amino acid sequence is a grand challenge of computational molecular biology. By using a combination of improved low- and high-resolution conformational sampling methods, improved atomically detailed potential functions that capture the jigsaw puzzle–like packing of protein cores, and high-performance computing, high-resolution structure prediction (<1.5 angstroms) can be achieved for small protein domains (<85 residues). The primary bottleneck to consistent high-resolution prediction appears to be conformational sampling.

Still, not practical for most …
Small proteins Expensive (computationally): sampling Not for everyday biologists …

HM: the poor man’s solution

Similar sequences

Similar structures with low sequence similarity
9% sequence identity Shapiro & Harris, 2000

Another example FtsZ and tubulin would not be recognized as homologous by sequence comparison Burns, R., Nature 391: (1998)

Fold recognition Query sequence Library of known folds Best-fit fold
Mark Gerstein Lab

FR by threading Query sequence:
Thread the sequence onto the fold template Use structural properties to evaluate the fit Environment Pairwise interactions Mark Gerstein Lab

Pitfalls of comparative (homology) modeling
Difficult to detect and correct alignment errors More similar to template than to true structure Cannot predict novel folds (template may be wrong!)

Structure Prediction Methods
Twilight zone Homology modeling Fold recognition ab initio % sequence identity

Protein Structure Prediction
clickable map

Reliability and uses of comparative models
Marti-Renom et al. (2000)

Success and limitations of structure prediction
Models of large and remotely related proteins are not very accurate Domain boundaries are difficult to define Models often do not provide details for functional annotation Success: Accuracy scores almost doubled from CASP1 to CASP6, might be because of database size Models of small targets are very accurate Kryshtafovych et al 2005 Manager/ Files/ Panchenko/ shaitan_kurs_lab2.ppt

Structural Bioinformatics: Sequence/Structure Relationship
Percent Identity 100 90 80 70 60 50 40 30 20 10 All possible sequences of amino acids Protein structures observed in nature Twilight zone Midnight zone Protein sequences observed in nature

Final exam assignment Find a protein sequence of any organism sharing no greater than 40% sequence identify with any accessible entry in PDB. Predict the 3D structure of your protein using whatever method/tool/server/database. Write a ~5 page report to document how you find the sequence, how you do (or get) the prediction, and how you visualize/describe the predicted model, along with thoughts/comments on your learning process. Submit your report to Cathy by 6/22/2011. Need help? Ask, read/surf, and try it!

PDB: the one-stop shop for structure bioinformatics

Selected Structural Biology Databases, Servers and Services
CASP-certified protein structure prediction servers I-TASSER ROBETTA HHpred METATASSER MULTICOM Pcons SAM-T08 3D-Jury THREADER Comparative Modeling Servers SwissModel MODELLER Protein secondary structure prediction servers PSIpred JPRED Database of protein structures PDB - Protein Data Bank Structural classifications of proteins SCOP CATH Structural neighbors database Dali Database

Thank You!

3D to 1D? Science 2003

A computer-designed protein (93 aa) with 1.2 A resolution

Structure prediction servers

Hybrid approach for solving macromolecular complex structures

(Rost, 1996)

Levinthal’s paradox (1969)
If we assume three possible states for every flexible dihedral angle in the backbone of a 100-residue protein, the number of possible backbone configurations is Even an incredibly fast computational or physical sampling in s would mean that a complete sampling would take 1080 s, which exceeds the age of the universe by more than 60 orders of magnitude. Yet proteins fold in seconds or less! Berendsen

The Rosetta method DECOYS: DISCRIMINATION: Kochl
Generate a large number of possible shapes DISCRIMINATION: Select the correct, native-like fold Need good decoy structures Need a good energy function Kochl

Nature 2007

Protein Structure Prediction

Similar presentations

Presentation on theme: "Protein Structure Prediction"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Protein Structure Prediction

Similar presentations

Presentation on theme: "Protein Structure Prediction"— Presentation transcript:

Similar presentations

About project

Feedback