Worldwide Protein Data Bank www.wwpdb.org What the Protein Data Bank teaches us about structural biology Helen M. Berman NCMI Workshop December 13, 2008.

Slides:



Advertisements
Similar presentations
Tactile Teaching: Exploring the Molecular World with Physical Models of Proteins and Other Molecular Structures by Tim Herman and Margaret Franzen Milwaukee.
Advertisements

Data Curation in Crystallography: Publisher Perspectives JISC Data Cluster Consultation Workshop CCLRC, Didcot, Oxon 10 October 2006.
Publisher perspective eBank/R4L/SPECTRa Joint Consultation Workshop London Metropole Hotel 20 October 2006.
Determination of Protein Structure. Methods for Determining Structures X-ray crystallography – uses an X-ray diffraction pattern and electron density.
A glimpse into the course material Topic 1. Course Information Curricula materials: Structural Bioinformatics, 2nd edition Editors: Gu and Bourne Publisher:
1.
Catalytic Strategies. Basic Catalytic Principles What is meant by the binding energy as it relates to enzyme substrate interactions? –free energy released.
Enzyme Mechanisms.
Basics of Molecular Biology
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break 14:45 – 15:15Regulatory pathways lecture 15:15 – 15:45Exercise.
MCSG Site Visit, Argonne, January 30, 2003 Genome Analysis to Select Targets which Probe Fold and Function Space  How many protein superfamilies and families.
Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK.
Introduction: stepping into the science What kind of research is being done on the project? What is an Arabidopsis plant? How does the ABE workshop fit.
Bioinformatics Gene Introduction Oct NTUST.
Management and Distribution of Chemical Data in the Protein Data Bank John Westbrook, Dimitris Dimitropoulos, Jasmine Young, Peter Rose, Philip E. Bourne.
FROM GENE TO PROTEIN: TRANSLATION & MUTATIONS Chapter 17.
Final Review C483 Spring Replication.
Bioinformatics and Computational Molecular Biology Geoff Barton
Number of released entries Year. Growth of Molecular Complexity Number of Chains Year Number of Structures Containing that Number of Chains.
Bioinformatics Predrag Radivojac I NDIANA U NIVERSITY.
Bioinformatics.
Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose
Evaluation of Structure Quality Using RCSB PDB Tools Kyle Burkhardt, Lead Data Annotator The RCSB PDB at Rutgers University.
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
Structural Bioinformatics R. Sowdhamini National Centre for Biological Sciences Tata Institute of Fundamental Research Bangalore, INDIA.
Topic B – Part 9 Respiration IB Chemistry Topic B – Biochem.
The Chemistry of Protein Catalysis
Worldwide Protein Data Bank Worldwide Protein Data Bank History of the PDB  1970s  Community discussions about how to establish.
Properties of Enzymes. Enzymes are catalysts What properties would ideal catalysts have?
Protein structure and modelling ● Orientation ● Protein structure ● Protein modelling Andreas Heger University of Helsinki Bioinformatics Group Slides.
VIRUS STRUCTURE Basic rules of virus architecture, structure, and assembly are the same for all families Some structures are much more complex than others,
Proteins Protein Structures and Shapes Protein Functions.
Cell Signaling Ontology Takako Takai-Igarashi and Toshihisa Takagi Human Genome Center, Institute of Medical Science, University of Tokyo.
Towards Data Attribution & Citation in the Life Sciences Philip E. Bourne UCSD 8/22/11Data Attribution and Citation.
WHAT IS BIOLOGY? Technically, the “study” (Gr =logos) of “life” (Gr = bios) Where do we draw the line between living and non-living entities? For examples:
Data Integration and Management A PDB Perspective.
Protein Disordered Regions and the Evolution of Eukaryotes Allan Wu Phar 201 Phil Bourne.
Protein Data Bank: An Introduction Learning to Use the RCSB PDB Portal.
Lecture 1. Self-organization of biological systems Self-organization of biological systems: self-assembly into compartments active transport molecular.
©1993. Used by permission of Springer-Verlag. 1. Cells are fundamental units of life 2. Cells use chemical or solar energy to function and reproduce 3.
Real World Experiences in Operating a Collaboratory: The Protein Data Bank Helen M. Berman Board of Governors Professor of Chemistry.
EM Maps and Models in EMDB/PDB. Growth of EM entries
GO-Slim term Cluster frequency cytoplasm 1944 out of 2727 genes, 71.3% 70 out of 97 genes, 72.2% out of 72 genes, 86.1% out.
PROTEIN STRUCTURE (Donaldson, March 10,2003) What are we trying to learn about genes and their proteins: Predict function for unknown protein by comparison.
From DNA to Proteins Section 2.3 BC Science Probe 9 Pages
AP Biology Proteins AP Biology Proteins Multipurpose molecules.
Protein. Protein and Roles 1: biological process unknown 1.1 Structural categories 1.2 organism categories 1.3 cellular component o unlocalized.
The Electron Microscopy Data Bank and OME Rich data, quality assessment, and cloud computing Christoph Best European Bioinformatics Institute, Cambridge,
Proteins. Protein Proteins are polymers of molecules called amino acids.
Methods for Structure Determination Chemistry and Chemical Biology Rutgers University.
Economics and Impact of the Protein Data Bank (PDB) Archive
CHM 708: MEDICINAL CHEMISTRY
The Function of DNA.
How do we determine these structures?
7.3 Translation udent_view0/chapter3/animation__how_translation_work s.html.
The Protein Data Bank: Evolution of a key resource in biology
Number of released entries
Proteins Types Function/Example
DNA Structure and Function
SUMMARY OVERVIEW OF PROTEIN SYNTHESIS
Chapter Three: Enzymes
Type Today’s Date Here and Today’s Objective Here.
Volume 11, Issue 1, Pages (January 2003)
Proteins and Enzymes 2:3.
Volume 102, Issue 5, Pages (September 2000)
Helen M. Berman, Gerard J. Kleywegt, Haruki Nakamura, John L. Markley 
Proteins and Enzymes 2:3.
Journal Entry 1 What do you know about DNA? Tell me at least
So how do we get from DNA to Protein?
Presentation transcript:

Worldwide Protein Data Bank What the Protein Data Bank teaches us about structural biology Helen M. Berman NCMI Workshop December 13, 2008

1960’s  Protein crystallography begins to take off  Emerging interest in protein folding  Use of computer graphics to represent structure  Nobel Prize awarded for the first 3D protein structures: myoglobin and hemoglobin Lysozyme Hemoglobin Ribonuclease Myoglobin Myoglobin: Kendrew, Bodo, Dintzis, Parrish, Wyckoff, Phillips (1958) Nature ; Hemoglobin: Perutz (1962) Proc. R. Soc. A265, ; Lysozyme: Blake, Koenig, Mair, North, Phillips, Sarma (1965) Nature ; Ribonuclease: Kartha, Bello, Harker (1967) Nature 213, ; Wyckoff, Hardman, Allewell, Inagami, Johnson, Richards (1967) J. Biol. Chem. 242,

1970’s Grass roots community efforts to archive data Protein crystallographers discuss how to archive data June 1971 Cold Spring Harbor meeting brings groups together (Cold Spring Harbor Symposia on Quantitative Biology, vol. XXXVI, 1972) October 1971 PDB is announced in Nature New Biology (7 structures; vol 233, 1971, page 223) 1975 PDB receives first funding from NSF (~32 structures)

Hemoglobin M.F. Perutz (1962) Proc. R. Soc. A265: Carboxypeptidase A F.A. Quiocho, W.N. Lipscomb (1971) Adv Protein Chem 25:1-78 Myoglobin J.C. Kendrew, G. Bodo, H.M. Dintzis, R.G. Parrish, H. Wyckoff, D.C. Phillips (1958) Nature 181: Subtilisin R.A. Alden, J.J. Birktoft, J. Kraut, J.D. Robertus, C.S. Wright (1971) Biochem Biophys Res Commun 45: Alpha-chymotrypsin J.J. Birktoft, D.M. Blow (1972) J Mol Biol 68: Pancreatic trypsin inhibitor R. Huber, D. Kukla, A. Ruhlmann, O. Epp, H. Formanek (1970) Nature 57: Rubredoxin K.D. Watenpaugh, L.C. Sieker, J.R. Herriott, L.H. Jensen (1973) Acta Crystallogr B29: Lactate dehydrogenase J.L. White, M.L. Hackert, M. Buehner, M.J. Adams, G.C. Ford, P.J. Lentz Jr., I.E. Smilely, S.J. Steindel, M.G. Rossmann (1976) J Mol Biol 102: Cytochrome b 5 F.S. Mathews, P. Argos, M. Levine (1972) Cold Spring Harb Symp Quant Biol 36: Papain J. Drenth, J.N. Jansonius, R. Koekoek, H.M. Swen, B.G. Wolthers (1968) Nature 218:

Ligases Isomerases Lyases Hydrolases Transferases Oxidoreductases Proportion of enzyme classes relative to total enzyme structures Enzyme Class Total Oxidoreductases Transferases Hydrolases Lyases Isomerases Ligases Total Enzymes In the beginning Lysozyme Blake, Koenig, Mair, North, Phillips, Sarma (1965) Nature Ribonuclease Kartha, Bello, Harker (1967) Nature 213, ; Wyckoff, Hardman, Allewell, Inagami, Johnson, Richards (1967) J. Biol. Chem. 242, Decade: Percent

In the beginning RNA-containing structures (1317) Protein/RNA complexes RNA only Decade: Number of Structures DNA/RNA hybrid Protein/DNA/RNA complexes J.L. Sussman, S.-H. Kim (1976) Biochem Biophys Res Commun. 68:89-96; J.D. Robertus, J.E. Ladner, J.T. Finch, D. Rhodes, R.S. Brown, B.F.C. Clark, & A. Klug (1974) Nature 250: tRNA

1980’s Technology takes off Structural biology is able to focus on medical problems Community efforts to promote data sharing IUCr guidelines requiring data deposition in the PDB are published

In the beginning DNA-containing structures (2474) Protein/DNA complexes DNA only DNA/RNA hybrid Protein/DNA/RNA complexes Z-DNAB-DNA 1bna Dickerson & Drew (1981) J. Mol. Biol. 149: dcg Wang, Quigley, Kolpak, Crawford, van Boom, van der Marel, Rich (1979) Nature 282: Decade

In the beginning Phage 434 repressor-operator Protein-nucleic acid complexes (1920) Protein/DNA complexes Protein/RNA complexes Protein/DNA/RNA complexes Number of Structures 2or1 Aggarwal, Rodgers, Drottar, Ptashne, & Harrison (1988) Science 242: Decade:

Viruses (280 total) In the beginning Hopper, Harrison, Sauer (1984) Structure of tomato bushy stunt virus. V. Coat protein sequence determination and its structural implications J.Mol.Biol. 177: Silva, Rossmann (1985) The refinement of southern bean mosaic virus in reciprocal space Acta Crystallogr. B41: >=2000 Number of Structures Decade

Cooperative community action  Individual letters to editors of journals  Committees –IUCr commission on Biological Macromolecules –ACA/USNCCr –Richards committee  Funding agencies  Articles in journals Marvin Cassman Fred Richards Richard Dickerson

1990’s  Number of structures increases exponentially  Complexity of structures increases  mmCIF dictionary created  New databases begin to emerge  User base expands dramatically  PDB archive moves mmCIF Working Group Members

In the beginning Electron Microscopy structures Bacteriorhodopsin Henderson, Baldwin, Ceska, Zemlin, Beckmann, Downing (1990) J.Mol.Biol. 213:

Ribosome structures (214) Prokaryotic Eukaryotic In the beginning Ban, Nissen, Hansen, Moore, & Steitz (2000) Science 289: ; Clemons Jr., May, Wimberly, McCutcheon, Capel, & Ramakrishnan (1999) Nature 400: ; Schluenzen, Tocilj, Zarivach, Harms, Gluehmann, Janell, Bashan, Bartels, Agmon, Franceschi, Yonath (2000) Cell 102: ; Yusupova, Yusupov, Cate,& Noller (2001) Cell 106: Ribosome 30S 50S

2000’s  wwPDB is formed  Continued growth in structures  Structural genomics takes off

Number of released entries Year: Depositions to the PDB by decade

July 2008

What can we learn from the PDB?

Structure distribution Other Protein only Protein- DNA complexes DNA only Protein-RNA complexes RNA only RNA-DNA hybrid t * * * * * GO process

Number of structures Structure determination methods April 30, 2008 Decade 6 176

Resolution distribution of protein structures Resolution distribution of other structures Year Resolution Resolution distribution of all structures

Structures containing distinct protein sequences (<98%) Structures containing novel protein sequences (<30%) Distinct and novel protein sequences Decade Percent of distinct/novel structures Subset of PSI structures Subset of other SG structures % 37% 51% 27% 32% 14% 39% 16% 7% 25% 4% 2% 10%

Redundancy: protein clusters Cluster # Total distinct chains in cluster Protein clusterFirst structureDeposition Date 1459Bacteriophage T4 lysozyme2LZM Hen white lysozyme2LYZ Human lysozyme1GFE Mouse immunoglobulin Fc&Fab fragments1GIG Human immunoglobulin Fc&Fab fragments1FC HIV-1 protease2HVP Trypsin (serine protease)5PTP Thrombin2HGT Human carbonic anhydrase II1CA Whale myoglobin1MBN Human leukocyte antigen1HLA Human hemoglobin  -subunit 3HHB Human hemoglobin  -subunit 3HHB Ribonuclease A2RNS Human cyclin-dependant kinase 2 (CDK2)1HCK

Lysozyme: Lessons learned Blake, Koenig, Mair, North, Phillips, Sarma (1965) Nature 206: 757. T4 bacteriophage (459 structures)  Amino acid replacement studies suggest that fraction of amino acid residues that define the structure of T4 lysozyme is about 50% B.W. Matthews (1996) FASEB J.10: Insight into folding and catalysis Hen egg white (297 structures)  Low sequence identity  Structural similarity of active site to T4 B.W. Matthews, M.G. Remington, M.G. Grutter, W.F. Anderson (1981) J.Mol.Biol. 147: Insight into evolution and catalysis

Myoglobin and hemoglobin: Lessons learned Lodish et al. 6 1 Kuriyan, Wilz, Karplus, Petsko (1986) J. Mol. Biol. 192:133–154; 2 Quillin, Arduini, Olson, Phillips, Jr. (1993) J. Mol. Biol. 234: 140–155, Carver, Brantley Jr, Singleton, Arduini, Quillin, Phillips Jr, Olson (1992) J. Biol. Chem. 267:14443–14450; 3 Bourgeois, Vallone, Schotte, Arcovito, Miele, Sciara, Wulff, Anfinrud, Brunori (2003) PNAS 100: ; 4 Dickerson, Geis (1983) Hemoglobin: structure, function, and pathology; 5 Kidd, Baker, Mathews, Brittain Baker (2001) Prot. Sci. 10: , Harrington, Adachi, Royer Jr. (1998) J. Biol. Chem. 273: ; 6 Lodish, Berk, Zipursky, Matsudaira, Balitmore, Darnell (2000) Molecular Cell Biology WH Freeman & Co. Whale myoglobin (185 structures)  Different ligands: oxygen, carbon dioxide 1  Amino acid substitution studies 2  Laue studies 3 Insight into function and dynamics Other species myoglobin  Low sequence identity, same structure 4 Insight into evolution Human hemoglobin (178 structures) Insight into function and disease (sickle cell anemia, thalassemia) 5 Other species hemoglobin  Low sequence identity, same structure 4 Profound insight into evolution

TIM barrel proteins: Lessons learned TIM barrel structures (1727)  Share the same fold but represent significant sequence and functional diversity  Are enzymes or enzyme-related proteins involved in molecular or energy metabolism  Comparative structure analysis indicates evolutionary relatedness of TIM barrel proteins Banner, Bloomer, Petsko, Phillips, Wilson, (1976) Biochem.Biophys.Res. Commun. 72: Nagano, Orengo, Thornton (2002) J.Mol. Biol. 321:

HIV-related structures (609) Number of Structures Decade Protease Reverse Transcriptase Gag protein Integrase Other

Amprenavir (GSK)Fosamprenavir (GSK) Lopinavir (Abbott)Atazanavir (BMS) Nelfinavir (Agouron)Darunavir (Tibotec) Tipranavir (BI)Indinavir (Merck) Ritonavir (Abbott)Saquinavir (Roche) HIV-1 protease (311) Navia, Fitzgerald, McKeever, Leu, Heimbach, Herber, Sigal, Darke, Springer (1989) Nature 337: ; Wlodawer, Miller, Jaskolski, Sathyanarayana, Baldwin, Weber, Selk, Clawson, Schneider, Kent (1989) Science 245: structures with ligands 2R5P, 2B7Z, 2AVV, 2AVO, 2AVS, 1SGU, 1SDT, 1SDV, 1SDU, 1K6C, 1C6Y, 2BPX, 1HSG, 1HSH 1T7J, 1HPV 2B60, 1RL8, 1SH9, 1N49, 1HXW 2QAK, 2PYM, 2Q63, 2PYN, 2Q64, 2R5Q, 1OHR 2O4N, 2O4L, 2O4P, 1D4Y, 1D4S 3D1X, 3D1Y, 3CYX, 2NMW, 2NMZ, 2NNP, 2NMY, 2NNK, 1C6Z, 1FB7 2FXE, 2FXD, 2O4K, 2AQU, 2FND 2RKG, 2RKF, 2QHC, 2Z54, 2Q5K, 2O4S, 1RV7, 1MUI

Abacavir (GSK) Nevirapine (BI)Stavudin (BMS) Efavirenz (BMS)Lamivudine (GSK) Zidovudine (GSK)Emtricitabine (Gilead) Tenofovir (Gilead)Zalcitabine (Hoffmann- LaRoche) Etravirine (Tibotec)Delavirdine (Pfizer) HIV-1 reverse transcriptase (110) Year Number of Structures Wang, Smerdon, Jager, Kohlstaedt, Rice, Friedman, Steitz, (1994) Proc.Natl.Acad.Sci.USA 91: structures with ligands 2HND, 2HNY, 1S1U, 1S1X, 1LW0, 1LWE, 1LWC, 1LWF, 1JLB, 1JLF, 1FKP, 1VRT, 3HVT 1JKH, 1IKW, 1IKV, 1FKO, 1FK9 1T05 1S6P

KEGG Pathway Number of Structures Complement and coagulation cascades506 Small cell lung cancer506 Regulation of actin cytoskeleton449 Non-small cell lung cancer407 Pyrimidine metabolism402 Nitrogen metabolism399 Two-component system - General360 Ribosome333 Base excision repair328 Purine metabolism310 Antigen processing and presentation281 Nicotinate and nicotinamide metabolism252 Insulin signaling pathway248 Porphyrin and chlorophyll metabolism248 ABC transporters - General246 Prostate cancer244 Structural coverage of KEGG pathways structures structures associated with KEGG pathway (33%)

Human biological pathways Genes that contain a PDB structure are in red Complement and coagulation cascades pathway Small cell lung cancerNon small cell lung cancer Regulation of actin cytoskeleton KEGG (

EM maps and Models in the PDB

How EM experiments are archived

Nuclear pore complex, 85 Å EMD-1097 Rotavirus V6 protein, 3.8 Å EMD-1461 EMDataBank  Created by EBI in 2002 for archiving EM maps  US deposition/annotation site added this year  Maps stored in CCP4/MRC format  Associated metadata stored in xml format 580 entries total

EM entries in the PDB  Atomic coordinate models fitted to EM maps  Storage format for models and metadata is CIF  Matrix representations possible  Some large entries “break” PDB format PBCV-1 (1m4x, 1680 matrices) 80S ribosome (1s1h + 1s1i) 230 entries total

PDBj

Goals  Common data model  Data harvesting tools  “One-stop shop” for deposition and retrieval  Tools for visualization, segmentation, and assessment

Acknowledgements Wellcome Trust, EU, CCP4, BBSRC, MRC, EMBL NLM BIRD-JST, MEXT NSF, NIGMS, DOE, NLM, NCI, NCRR, NIBIB, NINDS, NIDDK

Acknowledgements NIH GM (Baylor, Rutgers, EBI) EU Network of Excellence LSHG-CT (EBI)