Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

Slides:



Advertisements
Similar presentations
Protein Structure Prediction using ROSETTA
Advertisements

Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala.
Protein Tertiary Structure Prediction
Structural bioinformatics
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
Protein Structure, Databases and Structural Alignment
Thomas Blicher Center for Biological Sequence Analysis
The Protein Data Bank (PDB)
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
1 Protein Structure Prediction Reporter: Chia-Chang Wang Date: April 1, 2005.
Protein Tertiary Structure. Primary: amino acid linear sequence. Secondary:  -helices, β-sheets and loops. Tertiary: the 3D shape of the fully folded.
1 Protein Structure Prediction Charles Yan. 2 Different Levels of Protein Structures The primary structure is the sequence of residues in the polypeptide.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Bioinformatics Ayesha M. Khan Spring 2013.
Protein Structure Prediction and Analysis
Protein Structural Prediction. Protein Structure is Hierarchical.
Bioinf. Data Analysis & Tools Molecular Simulations & Sampling Techniques117 Jan 2006 Bioinformatics Data Analysis & Tools Molecular simulations & sampling.
Computational Structure Prediction Kevin Drew BCH364C/391L Systems Biology/Bioinformatics 2/12/15.
Homology Modeling David Shiuan Department of Life Science and Institute of Biotechnology National Dong Hwa University.
Protein Tertiary Structure Prediction
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
Chapter 12 Protein Structure Basics. 20 naturally occurring amino acids Free amino group (-NH2) Free carboxyl group (-COOH) Both groups linked to a central.
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
COMPARATIVE or HOMOLOGY MODELING
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
 Four levels of protein structure  Linear  Sub-Structure  3D Structure  Complex Structure.
Representations of Molecular Structure: Bonds Only.
ProteinShop: A Tool for Protein Structure Prediction and Modeling Silvia Crivelli Computational Research Division Lawrence Berkeley National Laboratory.
Lecture 12 CS5661 Structural Bioinformatics Motivation Concepts Structure Prediction Summary.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
Protein Structure Prediction Ram Samudrala University of Washington.
Modelling Genome Structure and Function Ram Samudrala University of Washington.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Part I : Introduction to Protein Structure A/P Shoba Ranganathan Kong Lesheng National University of Singapore.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
Applied Bioinformatics Week 12. Bioinformatics & Functional Proteomics How to classify proteins into functional classes? How to compare one proteome with.
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Predicting Protein Structure: Comparative Modeling (homology modeling)
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Modelling protein tertiary structure Ram Samudrala University of Washington.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson Let’s think!
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
Protein Structure Prediction Graham Wood Charlotte Deane.
Modelling proteomes Ram Samudrala Department of Microbiology How does the genome of an organism specify its behaviour and characteristics?
Protein Structure and Bioinformatics. Chapter 2 What is protein structure? What are proteins made of? What forces determines protein structure? What is.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
CS-ROSETTA Yang Shen et al. Presented by Jonathan Jou.
Structure/function studies of HIV proteins HIV gp120 V3 loop modelling using de novo approaches HIV protease-inhibitor binding energy prediction.
Ab-initio protein structure prediction ? Chen Keasar BGU Any educational usage of these slides is welcomed. Please acknowledge.
Modelling genome structure and function Ram Samudrala University of Washington.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Modelling Genome Structure and Function Ram Samudrala University of Washington.
Modelling proteomes Ram Samudrala University of Washington How does the genome of an organism specify its behaviour and characteristics?
Computational Structure Prediction
University of Washington
Modelling the rice proteome
Protein Structure Prediction and Protein Homology modeling
Protein dynamics Folding/unfolding dynamics
Protein Structures.
Molecular Modeling By Rashmi Shrivastava Lecturer
Protein structure prediction.
Protein structure prediction
Presentation transcript:

Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115

2 Protein Structure Prediction Ram Samudrala University of Washington

STAT1153 Outline Motivations and introduction Protein 2 nd structure predictionProtein 2 nd structure prediction Protein 3D structure prediction –CASPCASP –Homology modelingHomology modeling –Fold recognitionFold recognition –ab initio predictionab initio prediction –Manual vs automationManual vs automation Structural genomics

STAT1154 Protein Structure Sequence determines structure, structure determines function Most proteins can fold by itself very quickly Folded structure: lowest energy state

5 Protein Structure Main forces for considerations –Steric complementarity –Secondary structure preferences (satisfy H bonds) –Hydrophobic/polar patterning –Electrostatics

6 Rationale for understanding protein structure and function Protein sequence -large numbers of sequences, including whole genomes Protein function - rational drug design and treatment of disease - protein and genetic engineering - build networks to model cellular pathways - study organismal function and evolution ? structure determination structure prediction homology rational mutagenesis biochemical analysis model studies Protein structure - three dimensional - complicated - mediates function

STAT1157 Protein Databases SwissProt: protein knowledgebaseSwissProt PDB: Protein Data Bank, 3D structurePDB

8 View Protein Structure Free interactive viewers Download 3D coordinate file from PDB Quick and dirty: –VRML –Rasmol –Chime More powerful –Swiss-PdbViewer

9 Compare Protein Structures Structure is more conserved than sequence Why compare? –Detect evolutionary relationships –Identify recurring structural motifs –Predicting function based on structure –Assess predicted structures Protein structure comparison and classification –Manual: SCOP –Automated: DALI

10 Compare protein structures Need ways to determine if two protein structures are related and to compare predicted models to experimental structures Commonly used measure is the root mean square deviation (RMSD) of the Cartesian atoms between two structures after optimal superposition (McLachlan, 1979): Usually use C  atoms 3.6 Å2.9 Å NK-lysin (1nkl)Bacteriocin T102/as48 (1e68)T102 best model Other measures include contact maps and torsion angle RMSDs

STAT11511 SCOP Compare protein structure, identify recurring structural motifs, predict function A. Murzin et al, 1995 –Manual classification –A few folds are highly populated –5 folds contain 20% of all homologous superfamilies –Some folds are multifunctional

STAT11512 Determine Protein Structure X-ray crystallography (gold standard) –Grow crystals, rate limiting, relies on the repeating structure of a crystalline lattice –Collect a diffraction pattern –Map to real space electron density, build and refine structural model –Painstaking and time consuming

STAT11513 Protein Structure Prediction Since AA sequence determines structure, can we predict protein structure from its AA sequence? = predicting the three angles, unlimited DoF! Physical properties that determine fold –Rigidity of the protein backbone –Interactions among amino acids, including Electrostatic interactions van der Waals forces Volume constraints Hydrogen, disulfide bonds –Interactions of amino acids with water

14 unfolded Protein folding landscape Large multi-dimensional space of changing conformations free energy folding reaction molten globule J=10 -8 s native J=10 -3 s *G**G* barrier height

15 Protein primary structure twenty types of amino acids R H C OHOH O N H H Cα two amino acids join by forming a peptide bond R Cα H C O N H HN H C O OHOH R H R H C O N H N H C O R H R H C O N H N H C O R H             each residue in the amino acid main chain has two degrees of freedom (  and  the amino acid side chains can have up to four degrees of freedom  1-4 

STAT nd Structure Prediction  helix,  sheet, turn/loop

STAT nd Structure Prediction Chou-Fasman 1974 Base on 15 proteins (2473 AAs) of known conformation, determine P , P  fromP , P   Empirical rules for 2 nd struct nucleation –4 H  or h  out of 6 AA, extends to both dir, P  > 1.03, P  > P , no  breakers –3 H  or h  out of 5 AA, extends to both dir, P  > 1.05, P  > P , no  breakers Have ~50-60% accuracy

STAT11518 P  and P 

STAT nd Structure Prediction Garnier, Osguthorpe, Robson, 1978 Assumption: each AA influenced by flanking positions GOR scoring tables (problem: limited dataset) Add scores, assign 2 nd with highest score

STAT nd Structure Prediction D. Eisenberg, 1986 –Plot hydrophobicity as function of sequence position, look for periodic repeats –Period = 3-4 AA,  (3.6 aa / turn) –Period = 2 AA,  sheet Best overall JPRED by Geoffrey Barton, use many different approaches, get consensusJPRED –Overall accuracy: 72.9%

STAT D Protein Structure Prediction CASP contest: Critical Assessment of Structure Prediction Biannual meeting since 1994 at Asilomar, CA Experimentalists: before CASP, submit sequence of to-be-solved structure to central repository Predictors: download sequence and minimal information, make predictions in three categories Assessors: automatic programs and experts to evaluate predictions quality

STAT11522 CASP Category I Homology Modeling (sequences with high homology to sequences of known structure) Given a sequence with homology > 25-30% with known structure in PDB, use known structure as starting point to create a model of the 3D structure of the sequence Takes advantage of knowledge of a closely related protein. Use sequence alignment techniques to establish correspondences between known “template” and unknown.

STAT11523 CASP Category II Fold recognition (sequences with no sequence identity (<= 30%) to sequences of known structure Given the sequence, and a set of folds observed in PDB, see if any of the sequences could adopt one of the known folds Takes advantage of knowledge of existing structures, and principles by which they are stabilized (favorable interactions)

STAT11524 CASP Category III Ab initio prediction (no known homology with any sequence of known structure) Given only the sequence, predict the 3D structure from “first principles”, based on energetic or statistical principles Secondary structure prediction and multiple alignment techniques used to predict features of these molecules. Then, some method necessary for assembling 3D structure.

STAT11525 Structure Prediction Evaluation Hydrophobic core similar? 2 nd struct identified? Energy: minimized? H-bond contacts? Compare with solved crystal structure: gold standard

26 Comparative modelling of protein structure KDHPFGFAVPTKNPDGTMNLMNWECAIP KDPPAGIGAPQDN----QNIMLWNAVIP ** * * * * * * * ** …… scan align build initial model construct non-conserved side chains and main chains refine

STAT11527 Homology Modeling Results When sequence homology is > 70%, high resolution models are possible (< 3 Å RMSD) MODELLER (Sali et al)MODELLER –Find homologous proteins with known structure and align –Collect distance distributions between atoms in known protein structures –Use these distributions to compute positions for equivalent atoms in alignment –Refine using energetics

STAT11528 Homology Modeling Results Many places can go wrong: –Bad template - it doesn’t have the same structure as the target after all –Bad alignment (a very common problem) –Good alignment to good template still gives wrong local structure –Bad loop construction –Bad side chain positioning

STAT11529 Homology Modeling Results Use of sensitive multiple alignment (e.g. PSI-BLAST) techniques helped get best alignments Sophisticated energy minimization techniques do not dramatically improve upon initial guess

STAT11530 Fold Recognition Results Also called protein threading Given new sequence and library of known folds, find best alignment of sequence to each fold, returned the most favorable one

STAT11531 Fold Recognition with Dynamic Programming Environmental class for each AA based on known folds (buried status, polarity, 2 nd struct)

STAT11532 Protein Folding with Dynamic Programming D. Eisenburg 1994 Align sequence to each fold (a string of environmental classes) Advantages: fast and works pretty well Disadvantages: do not consider AA contacts

STAT11533 Fold Recognition Results Each predictor can submit N top hits Every predictor does well on something Common folds (more examples) are easier to recognize Fold recognition was the surprise performer at CASP1. Incremental progress at CASP2, CASP3, CASP4…

STAT11534 Fold Recognition Results Alignment (seq to fold) is a big problem

STAT11535 ab initio Predict interresidue contacts and then compute structure (mild success) Simplified energy term + reduced search space (phi/psi or lattice) (moderate success) Creative ways to memorize sequence  structure correlations in short segments from the PDB, and use these to model new structures: ROSETTA

36 Ab initio prediction of protein structure sample conformational space such that native-like conformations are found astronomically large number of conformations 5 states/100 residues = = select hard to design functions that are not fooled by non-native conformations (“decoys”)

37 Sampling conformational space – continuous approaches Most work in the field - Molecular dynamics - Continuous energy minimization (follow a valley) - Monte Carlo simulation - Genetic Algorithms Like real polypeptide folding process Cannot be sure if native-like conformations are sampled energy

38 Molecular dynamics Force = -dU/dx (slope of potential U); acceleration, force = m ×a(t) All atoms are moving so forces between atoms are complicated functions of time Analytical solution for x(t) and v(t) is impossible; numerical solution is trivial Atoms move for very short times of seconds or picoseconds (ps) x(t+  t) = x(t) + v(t)  t + [4a(t) – a(t-  t)]  t 2 /6 v(t+  t) = v(t) + [2a(t+  t)+5a(t)-a(t-  t)]  t/6 U kinetic = ½ Σ m i v i (t) 2 = ½ n K B T Total energy (U potential + U kinetic ) must not change with time new position old position new velocity old velocity acceleration n is number of coordinates (not atoms)

39 Energy minimization For a given protein, the energy depends on thousands of x,y,z Cartesian atomic coordinates; reaching a deep minimum is not trivial Furthermore, we want to minimize the free energy, not just the potential energy. energy number of steps deep minimum starting conformation

40 Monte Carlo Simulation Propose moves in torsion or Cartesian conformation space Evaluate energy after every move, compute  E Accept the new conformation based on If run infinite time, the simulated conformation follows the Boltzmann distribution Many variations, including simulated annealing and other heuristic approaches.

41 Scoring/energy functions Need a way to select native-like conformations from non-native ones Physics-based functions: electrostatics, van der Waals, solvation, bond/angle terms. Knowledge-based scoring functions: –Derive information about atomic properties from a database of experimentally determined conformations –Common parameters include pairwise atomic distances and amino acid burial/exposure.

STAT11542 Rosetta D. Baker, U. Wash Break sequence into short segments (7-9 AA) Sample 3D from library of known segment structures, parallel computation Use simulated annealing (metropolis-type algorithm) for global optimization –Propose a change, if better energy, take; otherwise take at smaller probability Create 1000 structures, cluster and choose one representative from each cluster to submit

STAT11543 Manual Improvements and Automation Very often manual examination could improve prediction –Catch errors –Need domain knowledge –A. Murzin’s success at CASP2 CAFASP: Critical Assessment of Fully Automated Structure Prediction –Murzin Can’t play!! MetaServers: combine different methods to get consensus

STAT11544 CAFASP Evaluation

STAT11545 Structural Genomics With more and more solved structures and novel folds, computational protein structure prediction is going to improve Structural genomics:Structural genomics –Worldwide initiative to high throughput determine many protein structures –Especially, solve structures that have no homology

STAT11546 Summary Protein structures: 1 st, 2 nd, 3 rd, 4 th –Different DB: SwissProt, PDB and SCOP –Determine structure: X-ray crystallography Protein structure prediction: –2 nd structure prediction –Homology modeling –Fold recognition –Ab initio –Evaluation: energy, RMSD, etc –CASP and CAFASP contest Manual improvement and combination of computational approaches work better Structural Genomics, still very difficult problem…

STAT11547 Acknowledgement Amy Keating Michael Yaffe Mark Craven Russ Altman