The Protein Data Bank (PDB)

Slides:



Advertisements
Similar presentations
Protein Structure.
Advertisements

Web Resources for Bioinformatics Vadim Alexandrov and Mark Gerstein.
PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
Protein Structure Database Introduction Database of Comparative Protein Structure Models ModBase 生資所 g 詹濠先.
Hidden Markov models for detecting remote protein homologies Kevin Karplus, Christian Barrett, Richard Hughey Georgia Hadjicharalambous.
PDB-Protein Data Bank SCOP –Protein structure classification CATH –Protein structure classification genTHREADER–3D structure prediction Swiss-Model–3D.
1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala.
Protein Tertiary Structure Prediction
Tema 14. Bases of protein structure and structural prediction. Structural data bank. Protein Data Bank. Molecular Visualization Tools for 3D. Prediction.
Structural bioinformatics
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Protein structure (Part 2 of 2).
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Protein Structure Analysis - II
Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, All rights reserved.
Appendix: Automated Methods for Structure Comparison Basic problem: how are any two given structures to be automatically compared in a meaningful way?
CISC667, F05, Lec20, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction Protein Secondary Structure.
Protein Tertiary Structure. Primary: amino acid linear sequence. Secondary:  -helices, β-sheets and loops. Tertiary: the 3D shape of the fully folded.
Protein structure prediction May 30, 2002 Quiz#4 on June 4 Learning objectives-Understand difference between primary secondary and tertiary structure.
1 Protein Structure Prediction Charles Yan. 2 Different Levels of Protein Structures The primary structure is the sequence of residues in the polypeptide.
Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Sequence/Structure Alignment Resources from NCBI Steve Bryant Protein Data Bank Rutgers University November 19, 2005.
Protein structure Classification Ole Lund, Associate professor, CBS, DTU.
Protein Structure Analysis - I
BMI 731 Protein Structures and Related Database Searches.
BLOSUM Information Resources Algorithms in Computational Biology Spring 2006 Created by Itai Sharon.
PDB-Protein Data Bank SCOP –Protein structure classification CATH –Protein structure classification genTHREADER–3D structure prediction Swiss-Model–3D.
Protein Structure Prediction II
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structures.
Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology.
Protein Structure Prediction and Analysis
Protein Tertiary Structure Prediction
Cédric Notredame (30/08/2015) Chemoinformatics And Bioinformatics Cédric Notredame Molecular Biology Bioinformatics Chemoinformatics Chemistry.
Chapter 12 Protein Structure Basics. 20 naturally occurring amino acids Free amino group (-NH2) Free carboxyl group (-COOH) Both groups linked to a central.
SUPERVISED NEURAL NETWORKS FOR PROTEIN SEQUENCE ANALYSIS Lecture 11 Dr Lee Nung Kion Faculty of Cognitive Sciences and Human Development UNIMAS,
Part II : Introduction To Protein Structure Kong Lesheng Victor Tong Joo Chuan National University of Singapore.
Structural alignment Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
Gene Annotation and Analysis Lab Work Reference: European Multimedia Bioinformatics Educational Resource.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
1 P9 Extra Discussion Slides. Sequence-Structure-Function Relationships Proteins of similar sequences fold into similar structures and perform similar.
PROTEIN STRUCTURE CLASSIFICATION SUMI SINGH (sxs5729)
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
1 Enter the following Micro-RNA sequence into the box Run MFold and look at the results MFold Using MFold to predict RNA secondary structure
BLOCKS Multiply aligned ungapped segments corresponding to most highly conserved regions of proteins- represented in profile.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Protein Strucure Comparison Chapter 6,7 Orengo. Helices α-helix4-turn helix, min. 4 residues helix3-turn helix, min. 3 residues π-helix5-turn helix,
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
DDPIn Distance and Density Based Protein Indexing David Hoksza Charles University in Prague Department of Software Engineering Czech Republic.
Guidelines for sequence reports. Outline Summary Results & Discussion –Sequence identification –Function assignment –Fold assignment –Identification of.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Lecture 11 CS5661 Structural Bioinformatics – Structure Comparison Motivation Concepts Structure Comparison.
InterPro Sandra Orchard.
An Efficient Index-based Protein Structure Database Searching Method 陳冠宇.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Chapter 13 Protein structure Bioinformatics and Functional Genomics
Chapter 14 Protein Structure Classification
Demo: Protein Information Resource
Protein Structure Prediction and Protein Homology modeling
PIR: Protein Information Resource
Classification: understanding the diversity and principles of
Protein Structures.
Homology Modeling.
Protein structure prediction.
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

The Protein Data Bank (PDB) PDB is the principal repository for protein structures Established in 1971 Accessed at http://www.rcsb.org/pdb or simply http://www.pdb.org Currently contains over 32,000 structure entities Updated 9/05 Page 287

PDB content growth (www.pdb.org) structures year Fig. 9.6 Page 281

PDB holdings (September, 2005) 29,876 proteins, peptides 1,338 protein/nucl. complexes 1,500 nucleic acids 13 carbohydrates 32,727 total Table 9-2 Page 281

Protein Data Bank gateways to access PDB files Swiss-Prot, NCBI, EMBL CATH, Dali, SCOP, FSSP databases that interpret PDB files Fig. 9.10 Page 285

Access to PDB through NCBI You can access PDB data at the NCBI several ways. Go to the Structure site, from the NCBI homepage Use Entrez Perform a BLAST search, restricting the output to the PDB database Page 289

Access to PDB through NCBI Molecular Modeling DataBase (MMDB) Cn3D (“see in 3D” or three dimensions): structure visualization software Vector Alignment Search Tool (VAST): view multiple structures Page 291

Fig. 9.15 Page 290

Fig. 9.15 Page 290

Fig. 9.16 Page 291

Fig. 9.16 Page 291

Fig. 9.16 Page 291

Fig. 9.16 Page 291

Fig. 9.16 Page 291

Fig. 9.17 Page 292

Access to structure data at NCBI: VAST Vector Alignment Search Tool (VAST) offers a variety of data on protein structures, including -- PDB identifiers -- root-mean-square deviation (RMSD) values to describe structural similarities -- NRES: the number of equivalent pairs of alpha carbon atoms superimposed -- percent identity Page 294

Many databases explore protein structures SCOP CATH Dali Domain Dictionary FSSP Page 293

Structural Classification of Proteins (SCOP) SCOP describes protein structures using a hierarchical classification scheme: Classes Folds Superfamilies (likely evolutionary relationship) Families Domains Individual PDB entries http://scop.mrc-lmb.cam.ac.uk/scop/ Page 293

Class, Architecture, Topology, and Homologous Superfamily (CATH) database CATH clusters proteins at four levels: C Class (a, b, a&b folds) A Architecture (shape of domain, e.g. jelly roll) T Topology (fold families; not necessarily homologous) H Homologous superfamily http://www.biochem.ucl.ac.uk/basm/cath_new Page 293

SCOP statistics (September, 2005) Class # folds # superfamilies # families All a 218 376 608 All b 144 290 560 a/b 136 222 629 a+b 279 409 717 … Total 945 1539 2845 Table 9-4 Page 298 a/b = parallel b sheets a+b = antiparallel b sheets

Fig. 9.23 Page 298

Fig. 9.24 Page 299

Fig. 9.25 Page 300

Fig. 9.25 Page 300

Fig. 9.26 Page 301

Fig. 9.27 Page 302

Fig. 9.28 Page 303

Dali Domain Dictionary Dali contains a numerical taxonomy of all known structures in PDB. Dali integrates additional data for entries within a domain class, such as secondary structure predictions and solvent accessibility. Page 302

Fig. 9.29 Page 303

Fig. 9.30 Page 304

Fig. 9.30 Page 304

Fig. 9.30 Page 304

Fold classification based on structure-structure alignment of proteins (FSSP) FSSP is based on a comprehensive comparison of PDB proteins (greater than 30 amino acids in length). Representative sets exclude sequence homologs sharing > 25% amino acid identity. The output includes a “fold tree.” http://www.ebi.ac.uk/dali/fssp Page 293

Fig. 9.31 Page 305

FSSP: fold tree Fig. 9.32 Page 306

Fig. 9.33 Page 307

Fig. 9.34 Page 307

Approaches to predicting protein structures There are about >20,000 structures in PDB, and about 1 million protein sequences in SwissProt/ TrEMBL. For most proteins, structural models derive from computational biology approaches, rather than experimental methods. The most reliable method of modeling and evaluating new structures is by comparison to previously known structures. This is comparative modeling. An alternative is ab initio modeling. Page 303-305

Approaches to predicting protein structures obtain sequence (target) fold assignment comparative modeling ab initio modeling Fig. 9.35 Page 308 build, assess model

Comparative modeling of protein structures [1] Perform fold assignment (e.g. BLAST, CATH, SCOP); identify structurally conserved regions [2] Align the target (unknown protein) with the template. This is performed for >30% amino acid identity over a sufficient length [3] Build a model [4] Evaluate the model Page 305

Errors in comparative modeling Errors may occur for many reasons [1] Errors in side-chain packing [2] Distortions within correctly aligned regions [3] Errors in regions of target that do not match template [4] Errors in sequence alignment [5] Use of incorrect templates Page 306

Comparative modeling In general, accuracy of structure prediction depends on the percent amino acid identity shared between target and template. For >50% identity, RMSD is often only 1 Å. Page 306

Fig. 9.36 Page 308 Baker and Sali (2000)

Comparative modeling Many web servers offer comparative modeling services. Examples are SWISS-MODEL (ExPASy) Predict Protein server (Columbia) WHAT IF (CMBI, Netherlands) Page 309