Protein Structure Prediction

Slides:

Advertisements

Similar presentations

Tutorial Homology Modelling. A Brief Introduction to Homology Modeling.

Advertisements

Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.

Protein Tertiary Structure Prediction

Structural bioinformatics

Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.

Protein Structure, Databases and Structural Alignment

Protein structure (Part 2 of 2).

Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]

Protein Fold recognition Morten Nielsen, Thomas Nordahl CBS, BioCentrum, DTU.

Thomas Blicher Center for Biological Sequence Analysis

The Protein Data Bank (PDB)

. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]

Protein Tertiary Structure. Primary: amino acid linear sequence. Secondary:  -helices, β-sheets and loops. Tertiary: the 3D shape of the fully folded.

Molecular modelling / structure prediction (A computational approach to protein structure) Today: Why bother about proteins/prediction Concepts of molecular.

1 Protein Structure Prediction Charles Yan. 2 Different Levels of Protein Structures The primary structure is the sequence of residues in the polypeptide.

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.

Protein Tertiary Structure Prediction Structural Bioinformatics.

Bioinformatics Ayesha M. Khan Spring 2013.

Protein Structural Prediction. Protein Structure is Hierarchical.

Bioinf. Data Analysis & Tools Molecular Simulations & Sampling Techniques117 Jan 2006 Bioinformatics Data Analysis & Tools Molecular simulations & sampling.

Computational Structure Prediction Kevin Drew BCH364C/391L Systems Biology/Bioinformatics 2/12/15.

Protein Structure Prediction Dr. G.P.S. Raghava Protein Sequence + Structure.

Homology Modeling David Shiuan Department of Life Science and Institute of Biotechnology National Dong Hwa University.

Protein Tertiary Structure Prediction

Construyendo modelos 3D de proteinas ‘fold recognition / threading’

Forces and Prediction of Protein Structure Ming-Jing Hwang ( 黃明經 ) Institute of Biomedical Sciences Academia Sinica

Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.

COMPARATIVE or HOMOLOGY MODELING

PROTEINS PROTEINS Levels of Protein Structure.

Lecture 12 CS5661 Structural Bioinformatics Motivation Concepts Structure Prediction Summary.

1 P9 Extra Discussion Slides. Sequence-Structure-Function Relationships Proteins of similar sequences fold into similar structures and perform similar.

© Wiley Publishing All Rights Reserved. Protein 3D Structures.

Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.

Function first: a powerful approach to post-genomic drug discovery Stephen F. Betz, Susan M. Baxter and Jacquelyn S. Fetrow GeneFormatics Presented by.

Modelling Genome Structure and Function Ram Samudrala University of Washington.

Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009

Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.

Applied Bioinformatics Week 12. Bioinformatics & Functional Proteomics How to classify proteins into functional classes? How to compare one proteome with.

Protein Folding and Modeling Carol K. Hall Chemical and Biomolecular Engineering North Carolina State University.

1 Protein Structure Prediction (Lecture for CS397-CXZ Algorithms in Bioinformatics) April 23, 2004 ChengXiang Zhai Department of Computer Science University.

Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.

Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.

Predicting Protein Structure: Comparative Modeling (homology modeling)

Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.

Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.

Programme Last week’s quiz results + Summary Fold recognition Break Exercise: Modelling remote homologues Summary.

Structural proteomics Handouts. Proteomics section from book already assigned.

Protein Structure Prediction Graham Wood Charlotte Deane.

BMC Bioinformatics 2005, 6(Suppl 4):S3 Protein Structure Prediction not a trivial matter Strict relation between protein function and structure Gap between.

Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.

Lecture 10 CS566 Fall Structural Bioinformatics Motivation Concepts Structure Solving Structure Comparison Structure Prediction Modeling Structural.

Protein Tertiary Structure Prediction Structural Bioinformatics.

Challenges and accomplishments in molecular prediction Yanay Ofran.

Forces and Prediction of Protein Structure Ming-Jing Hwang ( 黃明經 ) Institute of Biomedical Sciences Academia Sinica

Protein Structure Prediction. Protein Sequence Analysis Molecular properties (pH, mol. wt. isoelectric point, hydrophobicity) Secondary Structure Super-secondary.

PROTEIN MODELLING Presented by Sadhana S.

Computational Structure Prediction

Protein Structure Prediction and Protein Homology modeling

Protein dynamics Folding/unfolding dynamics

Homology 3D modeling and effect of mutations

Protein dynamics Folding/unfolding dynamics

Protein Structure Prediction

Protein Structure Prediction

Protein Structures.

Rosetta: De Novo determination of protein structure

Homology Modeling.

Protein structure prediction.

Programme Last week’s quiz results + Summary

Protein Homology Modelling

Presentation transcript:

Protein Structure Prediction Ming-Jing Hwang (黃明經) N121, Institute of Biomedical Sciences Academia Sinica http://gln.ibms.sinica.edu.tw/

Objective To understand the what, why, and how aspects of protein structure prediction, as well as its current status and use.

Science 2005

Why structure? Most proteins fold to function. Structure allows us to understand how a protein functions, often with mechanistic details, more than sequence can. With the knowledge we can design experiments to further probe the protein’s function, or, in the case of a disease protein, devise ways to counter the disease process (e.g. drug design).

Ex: Structure & function of potassium channel MacKinnon, 1998 (2003 Nobelist)

Some structure biology Nobel winners F. H. C. Crick, J. D. Watson, M. H. C. Wilkins (Physiology or Medicine, 1962)for their discoveries concerning the molecular structure of nuclear acids and its significance for information transfer in living material M. F. Perutz, Sir J. C. Kendrew (Chemistry, 1962)for their studies of the structures of globular proteins D. Crowfoot Hodgkin (Chemistry, 1964)for her determinations by X-ray techniques of the structures of important biochemical substances Sir A. Klug (Chemistry, 1982) for his development of crystallographic electron microscopy and his structural elucidation of biologically important nuclei acid-protein complexes J. Deisenhofer, R. Huber, H. Michel (Chemistry, 1988) for the determination of the three-dimensional structure of a photosynthetic reaction centre P. D. Boyer, J. E. Walker, J. C. Skou (Chemistry, 1997)for their elucidation of the enzymatic mechanism underlying the synthesis of adenosine triphosphate (ATP) [Boyer, Walker] for the first discovery of an ion-transporting enzyme, Na+, K+ -ATPase [Skou] J. B. Fenn, K. Tanaka, K. Wüthrich (Chemistry, 2002) for the development of methods for identification and structure analyses of biological macromolecules for their development of soft desorption ionisation methods for mass spectrometric analyses of biological macromolecules [Fenn, Tanaka] for his development of nuclear magnetic resonance spectroscopy for determining the three-dimensional structure of biological macromolecules in solution[Wüthrich] R. D. Kornberg (Chemistry, 2006)for his studies of the molecular basis of eukaryotic transcription V. Ramakrishnan, T.A. Steitz, A.E. Yonath (Chemistry 2009)for studies of the structure and function of the ribosome http://www.imb-jena.de/IMAGE_NOBEL.html

Why prediction? Structure determination by experimental methods (X-ray, NMR, etc.) is still hard, especially with obstacles at early steps (e.g. expression and crystallization) To bridge the widening gap between sequence and structure

Sequence/Structure Gap As of June 02, 2009, the number of entries in protein sequence and structure database: SWISS-PROT/TREMBL : 468,851/7,916,844 PDB : 57,835 Sequence Structure

Structural genomics and drug design Structural Genomics: HM as work horse Structural genomics and drug design Baker & Sali, 2001

Structure prediction: 1D->3D, then Function MADWVTGKVTKVQNWTDALFSLTVHAPVLPFTAGQFTKLGLEIDGERVQRAYSYVNSPDNPDLEFYLVTVPDGKLSPRLAALKPGDEVQVVSEAAGFFVLDEVPHCETLWMLATGTAIGPYLSILR UNKNOWN KNOWN

Prediction is very hard, especially if you are predicting unknowns.

Why do we believe in prediction at all? Christian Anfinson, in an elegant experiment in 1957, showed that ribonuclease A (124 aa’s), after having been completely denatured using 8M urea and 2-mercapto-ethanol, regained full enzymatic activity when the urea and 2-ME were slowly removed by dialysis. All the information needed to fold is contained within the primary sequence. (1957)

Theory of Structure Prediction Energy Landscape Theory of Structure Prediction Nature makes the landscapes of real proteins funneled. You have to work to make the energy landscapes of structure prediction schemes funneled. Let me show you some of the things you have to consider. Zaida (Zan) Luthey-Schulten

How to do 1D3D? (I) Physics-based approach: computing energy as a function of structure (surfing the energy surface)

Molecular Mechanics (Force Field) http://cmm.info.nih.gov/modeling/guide_documents/molecular_mechanics_document.html

Levitt

A POP study: 1-microsecond MD simulation 980ns villin headpiece 36 a.a. 3000 H2O 12,000 atoms 256 CPUs (CRAY) ~4 months single trajectory Duan & Kollman, 1998

Science 2010 (1 millisecond; previous longest 10 microsecond; Amber FF) Fig. 1 Folding proteins at x-ray resolution, showing comparison of x-ray structures (blue) (15, 24) and last frame of MD simulation (red): (A) simulation of villin at 300 K, (B) simulation of FiP35 at 337 K. Simulations were initiated from completely extended structures. Villin and FiP35 folded to their native states after 68 µs and 38 µs, respectively, and simulations were continued for an additional 20 µs after the folding event to verify the stability of the native fold.

Massively distributed computing Letters to nature (2002) engineered protein (BBA5) zinc finger fold (w/o metal) 23 a.a. solvation model thousands of trajectories each of 5-20 ns, totaling 700 ms Folding@home 30,000 internet volunteers several months, or ~a million CPU days of simulation

Worldwide distributed computing Pande group

Massively distributed computing SETI@home: Folding@home FightAIDS@home …

The problem: timescales Bond vibration Isomeris- ation Water dynamics Helix forms Fastest folders typical folders slow folders 10-15 femto 10-12 pico 10-9 nano 10-6 micro 10-3 milli 100 seconds MD step long MD run where we need to be where we’d love to be 16 order of magnitude range Femtosecond timesteps Need to simulate micro to milliseconds Pande group

Biology Can’t Wait! (Evolution to rescue) One Big Family.

How to do 1D->3D ab initio How to do 1D->3D ab initio? (II) Biology-based approach: data (knowledge)-mining Ignore the actual folding process in cell, instead focus on the end point!

The 123 (1D fragment3D) approach Primary LGINCRGSSQCGLSGGNLMVRIRDQACGNQGQTWCPGERRAKVCGTGNSISAYVQSTNNCISGTEACRHLTNLVNHGCRVCGSDPLYAGNDVSRGQLTVNYVNSC seq. to str. mapping fragment (structural motifs) Tertiary fragment assembly

The I-sites library (Baker’s group)

Fragment insertion Monte Carlo Rosetta: a folding simulation program (a try and error process) Fragment insertion Monte Carlo backbone torsion angles accept or reject fragments Energy function Choose a fragment change backbone angles evaluate Convert to 3D http://www.cs.huji.ac.il/ course/ 2002/ cbio/ handouts/ Class8

Does it work? The CASP experiments

One lab dominated in CASP4 Baker’s group dominates the ab initio (knowledge-based) prediction in CASP4 One lab dominated in CASP4

Some CASP4 successes Baker’s group

# of residues with cRMS below 4Å/6Å ROSETTA results at CASP5 # of residues with cRMS below 4Å/6Å Name Length human Automatic Best decoy T135 106 83/98 54/64 94/105 T149 116 52/71 44/62 76/92 T161 154 45/83 57/79 55/95 Rosetta predictions in CASP5: Successes, failures, and prospect for complete automation. Baker et all, Proteins, 53:457-468 (2003)

Toward High-Resolution de Novo Structure Prediction for Small Proteins --Philip Bradley, Kira M. S. Misura, David Baker (Science 2005) The prediction of protein structure from amino acid sequence is a grand challenge of computational molecular biology. By using a combination of improved low- and high-resolution conformational sampling methods, improved atomically detailed potential functions that capture the jigsaw puzzle–like packing of protein cores, and high-performance computing, high-resolution structure prediction (<1.5 angstroms) can be achieved for small protein domains (<85 residues). The primary bottleneck to consistent high-resolution prediction appears to be conformational sampling.

Still, not practical for most … Small proteins Expensive (computationally): sampling Not for everyday biologists …

HM: the poor man’s solution

Similar sequences

Similar structures with low sequence similarity 9% sequence identity Shapiro & Harris, 2000

Another example FtsZ and tubulin would not be recognized as homologous by sequence comparison Burns, R., Nature 391:121-123 (1998)

Fold recognition Query sequence Library of known folds Best-fit fold Mark Gerstein Lab

FR by threading Query sequence: Thread the sequence onto the fold template Use structural properties to evaluate the fit Environment Pairwise interactions Mark Gerstein Lab

Pitfalls of comparative (homology) modeling Difficult to detect and correct alignment errors More similar to template than to true structure Cannot predict novel folds (template may be wrong!)

Structure Prediction Methods Twilight zone Homology modeling Fold recognition ab initio 0 10 20 30 40 50 60 70 80 90 100 % sequence identity

Protein Structure Prediction clickable map http://speedy.embl-heidelberg.de/gtsp/flowchart2.html

Reliability and uses of comparative models Marti-Renom et al. (2000)

Success and limitations of structure prediction Models of large and remotely related proteins are not very accurate Domain boundaries are difficult to define Models often do not provide details for functional annotation Success: Accuracy scores almost doubled from CASP1 to CASP6, might be because of database size Models of small targets are very accurate Kryshtafovych et al 2005 http://www.bioeng.ru/ Manager/ Files/ Panchenko/ shaitan_kurs_lab2.ppt

Structural Bioinformatics: Sequence/Structure Relationship Percent Identity 100 90 80 70 60 50 40 30 20 10 All possible sequences of amino acids Protein structures observed in nature Twilight zone Midnight zone Protein sequences observed in nature

Final exam assignment Find a protein sequence of any organism sharing no greater than 40% sequence identify with any accessible entry in PDB. Predict the 3D structure of your protein using whatever method/tool/server/database. Write a ~5 page report to document how you find the sequence, how you do (or get) the prediction, and how you visualize/describe the predicted model, along with thoughts/comments on your learning process. Submit your report to Cathy by 6/22/2011. Need help? Ask, read/surf, and try it!

PDB: the one-stop shop for structure bioinformatics

Selected Structural Biology Databases, Servers and Services CASP-certified protein structure prediction servers I-TASSER ROBETTA HHpred METATASSER MULTICOM Pcons SAM-T08 3D-Jury THREADER Comparative Modeling Servers SwissModel MODELLER Protein secondary structure prediction servers PSIpred JPRED Database of protein structures PDB - Protein Data Bank Structural classifications of proteins SCOP CATH Structural neighbors database Dali Database

Thank You!

3D to 1D? Science 2003

A computer-designed protein (93 aa) with 1.2 A resolution

Structure prediction servers http://bioinfo.pl/cafasp/list.html

Hybrid approach for solving macromolecular complex structures

(Rost, 1996)

Levinthal’s paradox (1969) If we assume three possible states for every flexible dihedral angle in the backbone of a 100-residue protein, the number of possible backbone configurations is 3200. Even an incredibly fast computational or physical sampling in 10-15 s would mean that a complete sampling would take 1080 s, which exceeds the age of the universe by more than 60 orders of magnitude. Yet proteins fold in seconds or less! Berendsen

The Rosetta method DECOYS: DISCRIMINATION: Kochl Generate a large number of possible shapes DISCRIMINATION: Select the correct, native-like fold Need good decoy structures Need a good energy function Kochl

Nature 2007