Chapter 13 Protein structure Bioinformatics and Functional Genomics

Slides:



Advertisements
Similar presentations
Secondary structure prediction from amino acid sequence.
Advertisements

PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
Protein Structure Prediction
PDB-Protein Data Bank SCOP –Protein structure classification CATH –Protein structure classification genTHREADER–3D structure prediction Swiss-Model–3D.
1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Strict Regularities in Structure-Sequence Relationship
Protein structure determination. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography,
An Introduction to Bioinformatics Protein Structure Prediction.
Protein Structure, Databases and Structural Alignment
Protein structure (Part 2 of 2).
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, All rights reserved.
Summary Protein design seeks to find amino acid sequences which stably fold into specific 3-D structures. Modeling the inherent flexibility of the protein.
The Protein Data Bank (PDB)
CISC667, F05, Lec20, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction Protein Secondary Structure.
Protein structure prediction May 30, 2002 Quiz#4 on June 4 Learning objectives-Understand difference between primary secondary and tertiary structure.
Protein structure determination & prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray.
Computing for Bioinformatics Lecture 8: protein folding.
Protein Structure Analysis - I
BMI 731 Protein Structures and Related Database Searches.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structures.
Protein Structure Prediction and Analysis
Protein Structure Prediction Dr. G.P.S. Raghava Protein Sequence + Structure.
Protein Tertiary Structure Prediction
Chapter 12 Protein Structure Basics. 20 naturally occurring amino acids Free amino group (-NH2) Free carboxyl group (-COOH) Both groups linked to a central.
Macromolecular structure
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches Gaurav Sahni, Ph.D.
Molecules, Genes, and Diseases Sun 23/2/2014 Session 2 Protein Structure and Folding Dr. Mona A. Rasheed.
 Four levels of protein structure  Linear  Sub-Structure  3D Structure  Complex Structure.
CS 177 Proteins, part 2 (Computational modeling) Review of protein structures Computational Modeling Three-dimensional structural analysis in laboratory.
1 P9 Extra Discussion Slides. Sequence-Structure-Function Relationships Proteins of similar sequences fold into similar structures and perform similar.
Protein Folding & Biospectroscopy F14PFB David Robinson Mark Searle Jon McMaster
Molecular visualization
Protein Structure Prediction and Structural Genomics Computer Science Department North Dakota State University Fargo, ND.
1 Enter the following Micro-RNA sequence into the box Run MFold and look at the results MFold Using MFold to predict RNA secondary structure
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Part I : Introduction to Protein Structure A/P Shoba Ranganathan Kong Lesheng National University of Singapore.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Protein Strucure Comparison Chapter 6,7 Orengo. Helices α-helix4-turn helix, min. 4 residues helix3-turn helix, min. 3 residues π-helix5-turn helix,
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
1 Protein Structure Prediction (Lecture for CS397-CXZ Algorithms in Bioinformatics) April 23, 2004 ChengXiang Zhai Department of Computer Science University.
Module 3 Protein Structure Database/Structure Analysis Learning objectives Understand how information is stored in PDB Learn how to read a PDB flat file.
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Protein Structure and Bioinformatics. Chapter 2 What is protein structure? What are proteins made of? What forces determines protein structure? What is.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Lecture 11 CS5661 Structural Bioinformatics – Structure Comparison Motivation Concepts Structure Comparison.
Lecture 10 CS566 Fall Structural Bioinformatics Motivation Concepts Structure Solving Structure Comparison Structure Prediction Modeling Structural.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Proteins Structure Predictions Structural Bioinformatics.
EBI is an Outstation of the European Molecular Biology Laboratory. PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches.
Protein Structure BL
Protein Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form in a biologically functional.
Protein Structure Prediction and Protein Homology modeling
The Peptide Bond Amino acids are joined together in a condensation reaction that forms an amide known as a peptide bond.
The Peptide Bond Amino acids are joined together in a condensation reaction that forms an amide known as a peptide bond.
There are four levels of structure in proteins
Protein Structure Prediction
Protein Structures.
CS 177 Proteins, part 2 (Computational modeling)
Homology Modeling.
Protein structure prediction.
David R Buckler, Yuchen Zhou, Ann M Stock  Structure 
The Three-Dimensional Structure of Proteins
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

Chapter 13 Protein structure Bioinformatics and Functional Genomics 3rd edition (Wiley, 2015) Jonathan Pevsner, Ph.D. You may use (and modify) these slides for teaching purposes

Learning objectives Upon completing this material you should be able to: ■ understand the principles of protein primary, secondary, tertiary, and quaternary structure; ■use the NCBI tool CN3D to view a protein structure; ■use the NCBI tool VAST to align two structures; ■explain the role of PDB including its purpose, contents, and tools; ■explain the role of structure annotation databases such as SCOP and CATH; and ■describe approaches to modeling the three-dimensional structure of proteins. B&FG 3e Page 589

Outline Overview of protein structure Principles of protein structure Protein Data Bank Protein structure prediction Intrinsically disordered proteins Protein structure and disease

Overview: protein structure The three-dimensional structure of a protein determines its capacity to function. Christian Anfinsen and others denatured ribonuclease, observed rapid refolding, and demonstrated that the primary amino acid sequence determines its three-dimensional structure. We can study protein structure to understand problems such as the consequence of disease-causing mutations; the properties of ligand-binding sites; and the functions of homologs. B&FG 3e Page 590

Outline Overview of protein structure Principles of protein structure Protein Data Bank Protein structure prediction Intrinsically disordered proteins Protein structure and disease

Protein primary and secondary structure Results from three secondary structure programs are shown, with their consensus. h: alpha helix; c: random coil; e: extended strand B&FG 3e Fig. 13.1 Page 592

Protein tertiary and quaternary structure Quarternary structure: the four subunits of hemoglobin are shown (with an α 2β2 composition and one beta globin chain high- lighted) as well as four noncovalently attached heme groups. B&FG 3e Fig. 13.1 Page 592

The peptide bond; phi and psi angles B&FG 3e Fig. 13.2 Page 593

The peptide bond; phi and psi angles in DeepView B&FG 3e Fig. 13.2 Page 593

Protein secondary structure Protein secondary structure is determined by the amino acid side chains. Myoglobin is an example of a protein having many a-helices. These are formed by amino acid stretches 4-40 residues in length. Thioredoxin from E. coli is an example of a protein with many b sheets, formed from b strands composed of 5-10 residues. They are arranged in parallel or antiparallel orientations. B&FG 3e Page 594

Protein secondary structure: myoglobin (alpha helical) B&FG 3e Fig. 13.3 Page 595 Myoglobin (John Kendrew, 1958) in Cn3D software (NCBI)

Protein secondary structure: pepsin (beta sheets) Click residues in the sequence viewer (highlighted in yellow) to see the corresponding residues (here a beta strand; arrow at top) highlighted in the structure image. B&FG 3e Fig. 13.3 Page 595

Thioredoxin: structure having beta sheets (brown arrows) and alpha helices (green cylinders).

Protein secondary structure: Ramachandran plot Myoglobin (left) is mainly alpha helical (see arrow 1); pepsin (right) has beta sheets (see region of arrow 2) B&FG 3e Fig. 13.4 Page 596

Secondary structure prediction Chou and Fasman (1974) developed an algorithm based on the frequencies of amino acids found in a helices, b-sheets, and turns. Proline: occurs at turns, but not in a helices. GOR (Garnier, Osguthorpe, Robson): related algorithm Modern algorithms: use multiple sequence alignments and achieve higher success rate (about 70-75%) B&FG 3e Page 594

Secondary structure prediction: conformational preferences of the amino acids B&FG 3e Tab. 13.1 Page 597

Secondary structure prediction Web servers include: GOR4 Jpred NNPREDICT PHD Predator PredictProtein PSIPRED SAM-T99sec B&FG 3e Tab. 13.2 Page 597

Secondary structure prediction: codes from the DSSP database DSSP is a dictionary of secondary structure, including a standardized code for secondary structure assignment. B&FG 3e Tab. 13.3 Page 598

Tertiary protein structure: protein folding Main approaches: [1] Experimental determination (X-ray crystallography, NMR) [2] Prediction ► Comparative modeling (based on homology) ► Threading ► Ab initio (de novo) prediction B&FG 3e Page 598

Experimental approaches to protein structure [1] X-ray crystallography -- Used to determine 80% of structures -- Requires high protein concentration -- Requires crystals -- Able to trace amino acid side chains -- Earliest structure solved was myoglobin [2] NMR -- Magnetic field applied to proteins in solution -- Largest structures: 350 amino acids (40 kD) -- Does not require crystallization B&FG 3e Page 599

X-ray crystallography B&FG 3e Box 13.1 Page 599

Steps in obtaining a protein structure Target selection Obtain, characterize protein Determine, refine, model the structure Deposit in repository

Target selection for protein structure determination B&FG 3e Fig. 13.5 Page 599

Priorities for target selection for protein structures Historically, small, soluble, abundant proteins were studied (e.g. hemoglobin, cytochromes c, insulin). Modern criteria: Represent all branches of life Represent previously uncharacterized families Identify medically relevant targets Some are attempting to solve all structures within an individual organism (Methanococcus jannaschii, Mycobacterium tuberculosis) B&FG 3e Page 600

From classical structural biology to structural genomics B&FG 3e Fig. 13.6 Page 601

Outline Overview of protein structure Principles of protein structure Protein Data Bank Protein structure prediction Intrinsically disordered proteins Protein structure and disease

The Protein Data Bank (PDB) PDB is the principal repository for protein structures Established in 1971 Accessed at http://www.rcsb.org/pdb or simply http://www.pdb.org Currently contains >100,000 structure entities B&FG 3e Page 602

Protein Data Bank (PDB) B&FG 3e Fig. 13.7 Page 603

Protein Data Bank (PDB) holdings B&FG 3e Tab. 13.4 Page 604 Accessed 2015

PDB: number of searchable structures per year B&FG 3e Fig. 13.8 Page 604

PDB query for myoglobin Result of a PDB query for myoglobin. There are several hundred results organized into categories such as UniProt gene names, structural domains, and ontology terms. B&FG 3e Fig. 13.9 Page 605

Interactive visualization tools for PDB protein structures B&FG 3e Tab. 13.5 Page 606 Visualization tools are available within PDB and elsewhere.

Visualizing myoglobin structure 3RGK: Jmol applet B&FG 3e Fig. 13.11 Page 607 Jmol is available at PDB.

Visualizing structures: Jmol applet options B&FG 3e Fig. 13.11 Page 607

Viewing structures at PDB: WebMol

Protein Data Bank gateways to access PDB files Swiss-Prot, NCBI, EMBL CATH, Dali, SCOP, FSSP databases that interpret PDB files

Access to PDB through NCBI You can access PDB data at the NCBI several ways. Go to the Structure site, from the NCBI homepage Perform a DELTA BLAST (or BLASTP) search, restricting the output to the PDB database

Access to PDB structures through NCBI Molecular Modeling DataBase (MMDB) Cn3D (“see in 3D” or three dimensions): structure visualization software Vector Alignment Search Tool (VAST): view multiple structures

Access to PDB through NCBI: visit the Structure home page

Access to PDB through NCBI: query the Structure home page

Molecular Modeling Database (MMDB) at NCBI B&FG 3e Fig. 13.12 Page 608 You can study PDB structures from NCBI. MMDB offers tools to analyze protein (and other) structures.

Cn3D: NCBI software for visualizing protein structures Several display formats are shown

Do a DELTA BLAST (or BLASTP) search; set the database to pdb (Protein Data Bank)

Access protein structures by using DELTA-BLAST (or BLASTP) restricting searches to proteins with PDB entries Structure accession (e.g. 2JTZ) B&FG 3e Fig. 13.13 Page 609

Access to structure data at NCBI: VAST Vector Alignment Search Tool (VAST) offers a variety of data on protein structures, including -- PDB identifiers -- root-mean-square deviation (RMSD) values to describe structural similarities -- NRES: the number of equivalent pairs of alpha carbon atoms superimposed -- percent identity B&FG 3e Page 609

Vector Alignment Search Tool (VAST) at NCBI: comparison of two or more structures B&FG 3e Fig. 13.14 Page 610

VAST: NCBI tool to compare two structures B&FG 3e Fig. 13.15 Page 611

VAST: information listed for each structural neighbor B&FG 3e Box 13.2 Page 611

Integrated views of universe of protein folds Chothia (1992) predicted a total of 1500 protein folds It is challenging to map protein fold space because of the varying definitions of domains, folds, and structural elements We can consider three resources: CATH, SCOP, and the Dali Domain Dictionary Structural Classification of Proteins (SCOP) database provides a comprehensive description of protein structures and evolutionary relationships based upon a hierarchical classification scheme. SCOPe is a SCOP extended database. B&FG 3e Page 609

Holdings of the SCOP-e database B&FG 3e Table 13.6 Page 612

SCOP-e database: hierarchy of terms The results of a search for myoglobin are shown, including its membership in a class (all alpha proteins), fold, superfamily, and family. Two of the myoglobin structures are shown B&FG 3e Fig. 13.6 Page 613

The CATH Hierarchy

Globins are highlighted. CATH organizes protein structures by a hierarchical scheme of class, architecture, topology (fold family), and homologous superfamily Globins are highlighted. B&FG 3e Fig. 13.17 Page 614

CATH globin superfamily B&FG 3e Fig. 13.18 Page 615

Superposition of globin superfamily members in CATH B&FG 3e Fig. 13.18 Page 615

DaliLite pairwise structural alignment: myoglobin and alpha globin Dali is an acronym for distance matrix alignment. The Dali server allows a comparison of two 3D structures. Left: PDB identifiers for myoglobin and beta globin are entered. Right: output includes a pairwise structural alignment. B&FG 3e Fig. 13.1 Page 597

DaliLite pairwise structural alignment: myoglobin and alpha globin DALI server output includes a Z score (here a highly significant value of 21.4) based on quality measures such as: the resolution and amount of shared secondary structure; a root mean squared deviation (RMSD); percent identity; and a sequence alignment indicating secondary structure features. B&FG 3e Fig. 13.1 Page 597

Comparisons of SCOP, CATH, and Dali For some proteins (such as those listed here) these three authoritative resources list different numbers of domains: The field of structural biology provides rigorous measurements of the three-dimensional structure of proteins, and yet classifying domains can be a complex problem requiring expert human judgments. SCOP is especially oriented towards classify- ing whole proteins, while CATH is oriented towards classifying domains. B&FG 3e Tab. 13.8 Page 617

Beyond PDB, CATH, SCOP, Dali: partial list of protein structure databases B&FG 3e Tab. 13.8 Page 617

Outline Overview of protein structure Principles of protein structure Protein Data Bank Protein structure prediction Intrinsically disordered proteins Protein structure and disease

Three main structure prediction strategies There are three main approaches to protein structure prediction. Homology modeling (comparative modeling). This is most useful when a template (protein of interest) can be matched (e.g. by BLAST) to proteins of known structure. Fold recognition (threading). A target sequence lacks identifiable sequence matches and yet may have folds in common with proteins of known structure. Ab initio prediction (template-free modeling). Assumes: (1) all the information about the structure of a protein is contained in its amino acid sequence; and (2) a globular protein folds into the structure with the lowest free energy. B&FG 3e Page 617

Three main structure prediction strategies B&FG 3e Fig. 13.20 Page 618

Structure prediction techniques as a function of sequence identity B&FG 3e Fig. 13.21 Page 620

Websites for structure prediction by comparative modeling, and for quality assessment B&FG 3e Tab. 13.9 Page 620

Predicting protein structure: CASP competition Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction (“CASP”) http://predictioncenter.org/ The competition organizers and selected experts solve a wide range of three dimensional structures (typically by X-ray crystallography) and hold the correct answers. Participants in the competition are given the primary amino acid sequence and a set amount of time to submit predictions. These predictions are then assessed by comparison to the correct structures. CASP allows the structural biology community to assess which methods perform best (and to identify challenging areas). B&FG 3e Page 621

CASP competition: Hubbard plots Distance cutoff (Å) Percent of residues (Ca) Most groups had the right prediction for this structure (arrow 1) but some did not (arrow 2). In correct predictions, a large percent of residues (e.g. >>50%) are a small distance (e.g. <2Å) from the true structure coordinates. B&FG 3e Fig. 13.22 Page 623

CASP competition: Hubbard plots most groups had the wrong prediction for this structure B&FG 3e Fig. 13.22 Page 623

CASP competition: Hubbard plots Half the groups got it wrong (arrow 3), half got it right (arrow 4). The key reason is differences in a multiple sequence alignment. B&FG 3e Fig. 13.22 Page 623

Outline Overview of protein structure Principles of protein structure Protein Data Bank Protein structure prediction Intrinsically disordered proteins Protein structure and disease

Intrinsic disorder Many proteins do not adopt stable three-dimensional structures, and this may be an essential aspect of their ability to function properly. Intrinsically disordered proteins are defined as having unstructured regions of significant size such as at least 30 or 50 amino acids. Such regions do not adopt a fixed three-dimensional structure under physiological conditions, but instead exist as dynamic ensembles in which the backbone amino acid positions vary over time without adopting stable equilibrium values. The Database of Intrinsic Disorder is available at http://www.disprot.org. B&FG 3e Page 622

Outline Overview of protein structure Principles of protein structure Protein Data Bank Protein structure prediction Intrinsically disordered proteins Protein structure and disease

Protein structure and human disease In some cases, a single amino acid substitution can induce a dramatic change in protein structure. For example, the DF508 mutation of CFTR alters the a helical content of the protein, and disrupts intracellular trafficking. Other changes are subtle. The E6V mutation in the gene encoding hemoglobin beta causes sickle- cell anemia. The substitution introduces a hydrophobic patch on the protein surface, leading to clumping of hemoglobin molecules. B&FG 3e Page 622

Protein structure and disease Examples of proteins associated with diseases for which subtle change in protein sequence leads to change in structure. B&FG 3e Tab. 13.10 Page 624

Perspective The aim of structural genomics is to define structures that span the entire space of protein folds. This project has many parallels to the Human Genome Project. Both are ambitious endeavors that require the international cooperation of many laboratories. Both involve central repositories for the deposit of raw data, and in each the growth of the databases is exponential. It is realistic to expect that the great majority of protein folds will be defined in the near future. Each year, the proportion of novel folds declines rapidly. A number of lessons are emerging: proteins assume a limited number of folds; a single three-dimensional fold may be used by proteins to perform entirely distinct functions; and the same function may be performed by proteins using entirely different folds. B&FG 3e Page 625