Protein Structure and Prediction

Slides:



Advertisements
Similar presentations
Determination of Protein Structure. Methods for Determining Structures X-ray crystallography – uses an X-ray diffraction pattern and electron density.
Advertisements

PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
Protein Structure Prediction
Protein Structure Prediction using ROSETTA
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala.
Protein Tertiary Structure Prediction
Structural bioinformatics
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
Protein Structure, Databases and Structural Alignment
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Thomas Blicher Center for Biological Sequence Analysis
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
1 Protein Structure Prediction Reporter: Chia-Chang Wang Date: April 1, 2005.
Protein Tertiary Structure. Primary: amino acid linear sequence. Secondary:  -helices, β-sheets and loops. Tertiary: the 3D shape of the fully folded.
Molecular modelling / structure prediction (A computational approach to protein structure) Today: Why bother about proteins/prediction Concepts of molecular.
Protein structure determination & prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray.
1 Protein Structure Prediction Charles Yan. 2 Different Levels of Protein Structures The primary structure is the sequence of residues in the polypeptide.
Protein Tertiary Structure Prediction Structural Bioinformatics.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Bioinformatics Ayesha M. Khan Spring 2013.
Protein Structural Prediction. Protein Structure is Hierarchical.
Computational Structure Prediction Kevin Drew BCH364C/391L Systems Biology/Bioinformatics 2/12/15.
Bioinformatics for biomedicine Protein domains and 3D structure Lecture 4, Per Kraulis
Homology Modeling David Shiuan Department of Life Science and Institute of Biotechnology National Dong Hwa University.
Protein Tertiary Structure Prediction
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
Chapter 12 Protein Structure Basics. 20 naturally occurring amino acids Free amino group (-NH2) Free carboxyl group (-COOH) Both groups linked to a central.
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
COMPARATIVE or HOMOLOGY MODELING
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
Introduction to Protein Structure
Protein Structure and Prediction Michael Strong, Ph.D. Integrated Center for Genes, Environment, and Health National Jewish Health.
 Four levels of protein structure  Linear  Sub-Structure  3D Structure  Complex Structure.
Lecture 12 CS5661 Structural Bioinformatics Motivation Concepts Structure Prediction Summary.
1 P9 Extra Discussion Slides. Sequence-Structure-Function Relationships Proteins of similar sequences fold into similar structures and perform similar.
Secondary structure prediction
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
1 Protein Structure Prediction (Lecture for CS397-CXZ Algorithms in Bioinformatics) April 23, 2004 ChengXiang Zhai Department of Computer Science University.
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Modelling protein tertiary structure Ram Samudrala University of Washington.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Protein Design with Backbone Optimization Brian Kuhlman University of North Carolina at Chapel Hill.
Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson Let’s think!
. Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
Protein Structure and Bioinformatics. Chapter 2 What is protein structure? What are proteins made of? What forces determines protein structure? What is.
Query sequence MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDN GVDGEWTYTE Structure-Sequence alignment “Structure is better preserved than sequence” Me! Non-redundant.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Lecture 10 CS566 Fall Structural Bioinformatics Motivation Concepts Structure Solving Structure Comparison Structure Prediction Modeling Structural.
Modelling genome structure and function Ram Samudrala University of Washington.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Modelling Genome Structure and Function Ram Samudrala University of Washington.
PROTEIN MODELLING Presented by Sadhana S.
Computational Structure Prediction
Protein Structure Prediction and Protein Homology modeling
Protein dynamics Folding/unfolding dynamics
Protein Structure Prediction
Protein Structures.
Rosetta: De Novo determination of protein structure
Homology Modeling.
Protein structure prediction.
Presentation transcript:

Protein Structure and Prediction Michael Strong, PhD National Jewish Health 11/16/10

From Sequence to Structure HIV Protease With Inhibitor HIV Protease Experimental Approach PQITLWKRPLVTIRIGGQLKEALLDTGADDTVLEEMNLPGKWKPKMIGGIGGFIKVRQYDQIPIEICGHKAIGTVLVGPT PVNIIGRNLLTQIGCTLNF

From Sequence to Structure H1N1 NA MNPNQKIITIGSVCMTIGMANLILQIGNIISIWISHSIQLGNQN QIETCNQSVITYENNTWVNQTYVNISNTNFAAGQSVVSVKLAGNSSLCPVSGWAIYSK DNSVRIGSKGDVFVIREPFISCSPLECRTFFLTQGALLNDKHSNGTIKDRSPYRTLMS CPIGEVPSPYNSRFESVAWSASACHDGINWLTIGISGPDNGAVAVLKYNGIITDTIKS WRNNILRTQESECACVNGSCFTVMTDGPSNGQASYKIFRIEKGKIVKSVEMNAPNYHY EECSCYPDSSEITCVCRDNWHGSNRPWVSFNQNLEYQIGYICSGIFGDNPRPNDKTGS CGPVSSNGANGVKGFSFKYGNGVWIGRTKSISSRNGFEMIWDPNGWTGTDNNFSIKQD IVGINEWSGYSGSFVQHPELTGLDCIRPCFWVELIRGRPKENTIWTSGSSISFCGVNS DTVGWSWPDGAELPFTIDK" Computational Approach http://www.proteopedia.org/wiki/index.php/User:Michael_Strong/H1N1/NA

Protein Building Blocks Typical Protein Sequence MNPNQKIITIGSVCMTIGMANLILQIGNIISIWISHSIQLGNQN

Protein Building Blocks

Amino Acid Side Chain (R groups)

Amino Acid Side Chain (R groups)

Amino Acid Side Chain (R groups)

Most Proteins Spontaneously Fold DNA Transcribed by RNA polymerase RNA Translated by Ribosome Folded Protein Some proteins chaperones for correct folding

Most Proteins Spontaneously Fold Folded protein Anfinsen’s Experiment native state, Folded protein spontaneous self-organisation (~1 second) Denaturing conditions Unfolded protein Native conditions

Most Proteins Spontaneously Fold Important to Computational Biologists, because this suggests that all information relating to the correct folding of a protein is contained in it’s primary amino acid sequence, but …

Most Proteins Spontaneously Fold But Proteins lack easy rules for folding as compared to DNA DNA Protein

Many Factors Influence Protein Folding Proteins Assume the Lowest Energy Structure Factors that influence folding include: Hydrophobic Interactions / collapse (particularly within the core) Hydrogen bonds – lead to secondary structures Disulfide Bonds (Cysteine residues) Salt Bridges / Ionic Interactions (among charged residues) Multimeric interactions with same type or other proteins Protein

Common Secondary Structures Alpha helix

Common Secondary Structures Beta Sheet

Common Secondary Structures Loop Regions

Experimental Methods of Structure Determination X-ray crystallography High resolution structure determination Grow a protein Crystal

Experimental Methods of Structure Determination X-ray crystallography High resolution structure determination

Experimental Methods of Structure Determination X-ray crystallography High resolution structure determination Intensities and phases of all reflections are combined in a Fourier transform to provide maps of electron density Phases determined by using heavy metals or selenomethionine (MAD)

Experimental Methods of Structure Determination NMR – Nuclear Magnetic Resonance High resolution structure determination Smaller Proteins than X-ray Distances between pairs of hydrogen atoms Lots of information about dynamics Requires soluble, non-aggregating material Assignment sometimes difficult NOE cross-peak if they are within 5.0 Å

Experimental Methods of Structure Determination Cryo Electron Microscopy Low to medium resolution structure determination Low to medium resolution ~10-15Å Limited information about dynamics Can be used for very large molecules and complexes

Database of Protein Structures PDB – Protein Data Bank

Database of Protein Structures PDB – Protein Data Bank 64,036 protein structures as of 11/16/2010

Database of Protein Structures PDB – Protein Data Bank Even so, the number of solved structures greatly lags behind the rate of new genes being sequenced … Solution: Computational Structural Methods

GenBank Sequences

Database of Protein Structures PDB – Protein Data Bank Files   Atoms in pdb files are defined by their Cartesian coordinates:

Visualization of PDB files Pymol, Jmol, Chimera, etc

Visualization of PDB files Pymol, Jmol, Chimera, etc

From Sequence to Structure H1N1 NA MNPNQKIITIGSVCMTIGMANLILQIGNIISIWISHSIQLGNQN QIETCNQSVITYENNTWVNQTYVNISNTNFAAGQSVVSVKLAGNSSLCPVSGWAIYSK DNSVRIGSKGDVFVIREPFISCSPLECRTFFLTQGALLNDKHSNGTIKDRSPYRTLMS CPIGEVPSPYNSRFESVAWSASACHDGINWLTIGISGPDNGAVAVLKYNGIITDTIKS WRNNILRTQESECACVNGSCFTVMTDGPSNGQASYKIFRIEKGKIVKSVEMNAPNYHY EECSCYPDSSEITCVCRDNWHGSNRPWVSFNQNLEYQIGYICSGIFGDNPRPNDKTGS CGPVSSNGANGVKGFSFKYGNGVWIGRTKSISSRNGFEMIWDPNGWTGTDNNFSIKQD IVGINEWSGYSGSFVQHPELTGLDCIRPCFWVELIRGRPKENTIWTSGSSISFCGVNS DTVGWSWPDGAELPFTIDK" Secondary Structure Prediction Alpha Helix, Beta Strand, or Other Tertiary Predictions: Homology Modeling Fold Recognition De Novo Protein Structure Prediction Computational Approach

Secondary Structure Prediction 1st and 2nd generation – looked at probability of amino acid to be in a helix, strand, or other (coil/loop) based on known structures. Chou-Fasman (short runs of amino acids), GOR (Bayesian, takes neighbors into account) - helices – no prolines, periodicity 3.6 residues/turn - strands – alternating hydropathy, or ends hydrophillic and center hydrophobic -other – small, polar, flexible residues, and prolines But, stalled at 55- 60% accuracy 3rd generation – also used position specific profiles based on multiple sequence alignments (evolutionary information) (ie insertion/deletion more likely to be in coil/turn), PSI BLAST and HMM, NN and SVM (improved to about 75-80%)

Secondary Structure Prediction But we really want to know how the protein folds in three dimensions

But we really want to know how the protein folds in three dimensions

CASP - Critical Assessment of Techniques for Protein Structure Prediction Started in 1994, Helped push the field of structure prediction “Contest-like” setup Catagories include: Homology Modeling / Comparative Modeling Fold Recognition / Threading Ab Initio, De novo Partially vs. Automated Methods (now quite similar results) Goal: Predict structures of solved but unpublished/unreleased structures (used to evaluate predictions. Every year, predictions / algorithms get better

Comparative Modeling “Homology Modeling” Proteins that have similar sequences (i.e., related by evolution) are likely to have similar three-dimensional structures 1. BLAST sequence of Interest against PDB to identify a template Multiple templates can be used if desired Templates with Ligands bound can be used to identify binding sites and interacting residues in the homology model Sequence identity required depends on protein length. A good rule of thumb is to have at least 40% sequence identity. Higher sequence identity is best. Lower than 25% is not reliable (zone of uncertainty) Above 75% sequence identity, usually quite reliable homology model Accurate sequence alignments very important Programs include Modeller and Swiss Model

Comparative Modeling “Homology Modeling” Steps include: Template recognition and initial alignment Alignment Correction (Multiple Sequence Alignment can Help) Backbone Generation (transfer coordinates from template) Loop Modeling (loops hard to predict with insertions) Side Chain Modeling (usually similar tortion angles at high sequenc ID) Model Optimization (minor energy minimization steps or restrain some atom positions) Model Validation (Higher ID more accurate usually, Calculate energy, or normality index (bond length, tortion angles)) Iteration (to refine)

Protein Threading, Fold Recognition Often, seemingly unrelated proteins adopt similar folds. -Divergent evolution, convergent evolution. For sequences with low or no sequence homology Protein Threading § Generalization of homology modeling method • Homology Modeling: Align sequence to sequence • Threading: Align sequence to structure (templates) For each alignment, the probability that that each amino acid residue would occur in such an environment is calculated based on observed preferences in determined structures. § Rationale: • Limited number of basic folds found in nature • Amino acid preferences for different structural environments provides sufficient information to choose the best-fitting protein fold (structure)

Fold recognition NK-lysin (1nkl) Bacteriocin T102/as48 (1e68) The number of possible protein structures/folds is limited (large number of sequences but relatively few folds (some estimate ~1000)) (most apparent when 50% of structures with no seq homology were solved and had folds similar to known structures) 90% of new structures deposited in PDB have similar folds to those already known Proteins that do not have similar sequences sometimes have similar three-dimensional structures (such as B-barrel TIM fold) A sequence whose structure is not known is fitted directly (or “threaded”) onto a known structure and the “goodness of fit” is evaluated using a discriminatory function Need ways to move model closer to the native structure 3.6 Å 5% ID NK-lysin (1nkl) Bacteriocin T102/as48 (1e68)

Ab initio prediction of protein structure – concept Difficult because search space is huge. Much larger conformational space Goal: Predict Structure only given its amino acid sequence In theory: Lowest Energy Conformation Go from sequence to structure by sampling the conformational space in a reasonable manner and select a native-like conformation using a good discrimination function Difficult for sequences larger that 150aa Rosetta (David Baker lab) one of best (CASP evaluation)

Rosetta structure prediction 2 phases Low-resolution phase – statistical scoring function and fragment assembly A. local structure conformations using info from PDB (3 and 9mer stretches) B. multiple fragment substitution simulated annealing – to find best arrangement of the fragments (Monte Carlo Search) C. low resolution ensemble of decoy conformations 2. Atomic refinement phase using rotamers and small backbone angle moves (in populated regions of Ramachandran plot) A. Refinement B. Then structures clustered based on RMSD C. Center of the Largest Clusters chosen as representative folds (likely to be correct fold)

Ramachandran Plot – Phi Psi angles Quality Assessment Ramachandran Plot – Phi Psi angles To identify residues that may be in wrong conformation Procheck, What_check

DALI Structural Alignments Align Protein Structures, Structure Superposition Generates a comparison matrix (transform protein into a 2D array of distances between C-alpha atoms. Z score reflects reliability, lowest RMSD identified