Worldwide Protein Data Bank What the Protein Data Bank teaches us about structural biology Helen M. Berman NCMI Workshop December 13, 2008
1960’s Protein crystallography begins to take off Emerging interest in protein folding Use of computer graphics to represent structure Nobel Prize awarded for the first 3D protein structures: myoglobin and hemoglobin Lysozyme Hemoglobin Ribonuclease Myoglobin Myoglobin: Kendrew, Bodo, Dintzis, Parrish, Wyckoff, Phillips (1958) Nature ; Hemoglobin: Perutz (1962) Proc. R. Soc. A265, ; Lysozyme: Blake, Koenig, Mair, North, Phillips, Sarma (1965) Nature ; Ribonuclease: Kartha, Bello, Harker (1967) Nature 213, ; Wyckoff, Hardman, Allewell, Inagami, Johnson, Richards (1967) J. Biol. Chem. 242,
1970’s Grass roots community efforts to archive data Protein crystallographers discuss how to archive data June 1971 Cold Spring Harbor meeting brings groups together (Cold Spring Harbor Symposia on Quantitative Biology, vol. XXXVI, 1972) October 1971 PDB is announced in Nature New Biology (7 structures; vol 233, 1971, page 223) 1975 PDB receives first funding from NSF (~32 structures)
Hemoglobin M.F. Perutz (1962) Proc. R. Soc. A265: Carboxypeptidase A F.A. Quiocho, W.N. Lipscomb (1971) Adv Protein Chem 25:1-78 Myoglobin J.C. Kendrew, G. Bodo, H.M. Dintzis, R.G. Parrish, H. Wyckoff, D.C. Phillips (1958) Nature 181: Subtilisin R.A. Alden, J.J. Birktoft, J. Kraut, J.D. Robertus, C.S. Wright (1971) Biochem Biophys Res Commun 45: Alpha-chymotrypsin J.J. Birktoft, D.M. Blow (1972) J Mol Biol 68: Pancreatic trypsin inhibitor R. Huber, D. Kukla, A. Ruhlmann, O. Epp, H. Formanek (1970) Nature 57: Rubredoxin K.D. Watenpaugh, L.C. Sieker, J.R. Herriott, L.H. Jensen (1973) Acta Crystallogr B29: Lactate dehydrogenase J.L. White, M.L. Hackert, M. Buehner, M.J. Adams, G.C. Ford, P.J. Lentz Jr., I.E. Smilely, S.J. Steindel, M.G. Rossmann (1976) J Mol Biol 102: Cytochrome b 5 F.S. Mathews, P. Argos, M. Levine (1972) Cold Spring Harb Symp Quant Biol 36: Papain J. Drenth, J.N. Jansonius, R. Koekoek, H.M. Swen, B.G. Wolthers (1968) Nature 218:
Ligases Isomerases Lyases Hydrolases Transferases Oxidoreductases Proportion of enzyme classes relative to total enzyme structures Enzyme Class Total Oxidoreductases Transferases Hydrolases Lyases Isomerases Ligases Total Enzymes In the beginning Lysozyme Blake, Koenig, Mair, North, Phillips, Sarma (1965) Nature Ribonuclease Kartha, Bello, Harker (1967) Nature 213, ; Wyckoff, Hardman, Allewell, Inagami, Johnson, Richards (1967) J. Biol. Chem. 242, Decade: Percent
In the beginning RNA-containing structures (1317) Protein/RNA complexes RNA only Decade: Number of Structures DNA/RNA hybrid Protein/DNA/RNA complexes J.L. Sussman, S.-H. Kim (1976) Biochem Biophys Res Commun. 68:89-96; J.D. Robertus, J.E. Ladner, J.T. Finch, D. Rhodes, R.S. Brown, B.F.C. Clark, & A. Klug (1974) Nature 250: tRNA
1980’s Technology takes off Structural biology is able to focus on medical problems Community efforts to promote data sharing IUCr guidelines requiring data deposition in the PDB are published
In the beginning DNA-containing structures (2474) Protein/DNA complexes DNA only DNA/RNA hybrid Protein/DNA/RNA complexes Z-DNAB-DNA 1bna Dickerson & Drew (1981) J. Mol. Biol. 149: dcg Wang, Quigley, Kolpak, Crawford, van Boom, van der Marel, Rich (1979) Nature 282: Decade
In the beginning Phage 434 repressor-operator Protein-nucleic acid complexes (1920) Protein/DNA complexes Protein/RNA complexes Protein/DNA/RNA complexes Number of Structures 2or1 Aggarwal, Rodgers, Drottar, Ptashne, & Harrison (1988) Science 242: Decade:
Viruses (280 total) In the beginning Hopper, Harrison, Sauer (1984) Structure of tomato bushy stunt virus. V. Coat protein sequence determination and its structural implications J.Mol.Biol. 177: Silva, Rossmann (1985) The refinement of southern bean mosaic virus in reciprocal space Acta Crystallogr. B41: >=2000 Number of Structures Decade
Cooperative community action Individual letters to editors of journals Committees –IUCr commission on Biological Macromolecules –ACA/USNCCr –Richards committee Funding agencies Articles in journals Marvin Cassman Fred Richards Richard Dickerson
1990’s Number of structures increases exponentially Complexity of structures increases mmCIF dictionary created New databases begin to emerge User base expands dramatically PDB archive moves mmCIF Working Group Members
In the beginning Electron Microscopy structures Bacteriorhodopsin Henderson, Baldwin, Ceska, Zemlin, Beckmann, Downing (1990) J.Mol.Biol. 213:
Ribosome structures (214) Prokaryotic Eukaryotic In the beginning Ban, Nissen, Hansen, Moore, & Steitz (2000) Science 289: ; Clemons Jr., May, Wimberly, McCutcheon, Capel, & Ramakrishnan (1999) Nature 400: ; Schluenzen, Tocilj, Zarivach, Harms, Gluehmann, Janell, Bashan, Bartels, Agmon, Franceschi, Yonath (2000) Cell 102: ; Yusupova, Yusupov, Cate,& Noller (2001) Cell 106: Ribosome 30S 50S
2000’s wwPDB is formed Continued growth in structures Structural genomics takes off
Number of released entries Year: Depositions to the PDB by decade
July 2008
What can we learn from the PDB?
Structure distribution Other Protein only Protein- DNA complexes DNA only Protein-RNA complexes RNA only RNA-DNA hybrid t * * * * * GO process
Number of structures Structure determination methods April 30, 2008 Decade 6 176
Resolution distribution of protein structures Resolution distribution of other structures Year Resolution Resolution distribution of all structures
Structures containing distinct protein sequences (<98%) Structures containing novel protein sequences (<30%) Distinct and novel protein sequences Decade Percent of distinct/novel structures Subset of PSI structures Subset of other SG structures % 37% 51% 27% 32% 14% 39% 16% 7% 25% 4% 2% 10%
Redundancy: protein clusters Cluster # Total distinct chains in cluster Protein clusterFirst structureDeposition Date 1459Bacteriophage T4 lysozyme2LZM Hen white lysozyme2LYZ Human lysozyme1GFE Mouse immunoglobulin Fc&Fab fragments1GIG Human immunoglobulin Fc&Fab fragments1FC HIV-1 protease2HVP Trypsin (serine protease)5PTP Thrombin2HGT Human carbonic anhydrase II1CA Whale myoglobin1MBN Human leukocyte antigen1HLA Human hemoglobin -subunit 3HHB Human hemoglobin -subunit 3HHB Ribonuclease A2RNS Human cyclin-dependant kinase 2 (CDK2)1HCK
Lysozyme: Lessons learned Blake, Koenig, Mair, North, Phillips, Sarma (1965) Nature 206: 757. T4 bacteriophage (459 structures) Amino acid replacement studies suggest that fraction of amino acid residues that define the structure of T4 lysozyme is about 50% B.W. Matthews (1996) FASEB J.10: Insight into folding and catalysis Hen egg white (297 structures) Low sequence identity Structural similarity of active site to T4 B.W. Matthews, M.G. Remington, M.G. Grutter, W.F. Anderson (1981) J.Mol.Biol. 147: Insight into evolution and catalysis
Myoglobin and hemoglobin: Lessons learned Lodish et al. 6 1 Kuriyan, Wilz, Karplus, Petsko (1986) J. Mol. Biol. 192:133–154; 2 Quillin, Arduini, Olson, Phillips, Jr. (1993) J. Mol. Biol. 234: 140–155, Carver, Brantley Jr, Singleton, Arduini, Quillin, Phillips Jr, Olson (1992) J. Biol. Chem. 267:14443–14450; 3 Bourgeois, Vallone, Schotte, Arcovito, Miele, Sciara, Wulff, Anfinrud, Brunori (2003) PNAS 100: ; 4 Dickerson, Geis (1983) Hemoglobin: structure, function, and pathology; 5 Kidd, Baker, Mathews, Brittain Baker (2001) Prot. Sci. 10: , Harrington, Adachi, Royer Jr. (1998) J. Biol. Chem. 273: ; 6 Lodish, Berk, Zipursky, Matsudaira, Balitmore, Darnell (2000) Molecular Cell Biology WH Freeman & Co. Whale myoglobin (185 structures) Different ligands: oxygen, carbon dioxide 1 Amino acid substitution studies 2 Laue studies 3 Insight into function and dynamics Other species myoglobin Low sequence identity, same structure 4 Insight into evolution Human hemoglobin (178 structures) Insight into function and disease (sickle cell anemia, thalassemia) 5 Other species hemoglobin Low sequence identity, same structure 4 Profound insight into evolution
TIM barrel proteins: Lessons learned TIM barrel structures (1727) Share the same fold but represent significant sequence and functional diversity Are enzymes or enzyme-related proteins involved in molecular or energy metabolism Comparative structure analysis indicates evolutionary relatedness of TIM barrel proteins Banner, Bloomer, Petsko, Phillips, Wilson, (1976) Biochem.Biophys.Res. Commun. 72: Nagano, Orengo, Thornton (2002) J.Mol. Biol. 321:
HIV-related structures (609) Number of Structures Decade Protease Reverse Transcriptase Gag protein Integrase Other
Amprenavir (GSK)Fosamprenavir (GSK) Lopinavir (Abbott)Atazanavir (BMS) Nelfinavir (Agouron)Darunavir (Tibotec) Tipranavir (BI)Indinavir (Merck) Ritonavir (Abbott)Saquinavir (Roche) HIV-1 protease (311) Navia, Fitzgerald, McKeever, Leu, Heimbach, Herber, Sigal, Darke, Springer (1989) Nature 337: ; Wlodawer, Miller, Jaskolski, Sathyanarayana, Baldwin, Weber, Selk, Clawson, Schneider, Kent (1989) Science 245: structures with ligands 2R5P, 2B7Z, 2AVV, 2AVO, 2AVS, 1SGU, 1SDT, 1SDV, 1SDU, 1K6C, 1C6Y, 2BPX, 1HSG, 1HSH 1T7J, 1HPV 2B60, 1RL8, 1SH9, 1N49, 1HXW 2QAK, 2PYM, 2Q63, 2PYN, 2Q64, 2R5Q, 1OHR 2O4N, 2O4L, 2O4P, 1D4Y, 1D4S 3D1X, 3D1Y, 3CYX, 2NMW, 2NMZ, 2NNP, 2NMY, 2NNK, 1C6Z, 1FB7 2FXE, 2FXD, 2O4K, 2AQU, 2FND 2RKG, 2RKF, 2QHC, 2Z54, 2Q5K, 2O4S, 1RV7, 1MUI
Abacavir (GSK) Nevirapine (BI)Stavudin (BMS) Efavirenz (BMS)Lamivudine (GSK) Zidovudine (GSK)Emtricitabine (Gilead) Tenofovir (Gilead)Zalcitabine (Hoffmann- LaRoche) Etravirine (Tibotec)Delavirdine (Pfizer) HIV-1 reverse transcriptase (110) Year Number of Structures Wang, Smerdon, Jager, Kohlstaedt, Rice, Friedman, Steitz, (1994) Proc.Natl.Acad.Sci.USA 91: structures with ligands 2HND, 2HNY, 1S1U, 1S1X, 1LW0, 1LWE, 1LWC, 1LWF, 1JLB, 1JLF, 1FKP, 1VRT, 3HVT 1JKH, 1IKW, 1IKV, 1FKO, 1FK9 1T05 1S6P
KEGG Pathway Number of Structures Complement and coagulation cascades506 Small cell lung cancer506 Regulation of actin cytoskeleton449 Non-small cell lung cancer407 Pyrimidine metabolism402 Nitrogen metabolism399 Two-component system - General360 Ribosome333 Base excision repair328 Purine metabolism310 Antigen processing and presentation281 Nicotinate and nicotinamide metabolism252 Insulin signaling pathway248 Porphyrin and chlorophyll metabolism248 ABC transporters - General246 Prostate cancer244 Structural coverage of KEGG pathways structures structures associated with KEGG pathway (33%)
Human biological pathways Genes that contain a PDB structure are in red Complement and coagulation cascades pathway Small cell lung cancerNon small cell lung cancer Regulation of actin cytoskeleton KEGG (
EM maps and Models in the PDB
How EM experiments are archived
Nuclear pore complex, 85 Å EMD-1097 Rotavirus V6 protein, 3.8 Å EMD-1461 EMDataBank Created by EBI in 2002 for archiving EM maps US deposition/annotation site added this year Maps stored in CCP4/MRC format Associated metadata stored in xml format 580 entries total
EM entries in the PDB Atomic coordinate models fitted to EM maps Storage format for models and metadata is CIF Matrix representations possible Some large entries “break” PDB format PBCV-1 (1m4x, 1680 matrices) 80S ribosome (1s1h + 1s1i) 230 entries total
PDBj
Goals Common data model Data harvesting tools “One-stop shop” for deposition and retrieval Tools for visualization, segmentation, and assessment
Acknowledgements Wellcome Trust, EU, CCP4, BBSRC, MRC, EMBL NLM BIRD-JST, MEXT NSF, NIGMS, DOE, NLM, NCI, NCRR, NIBIB, NINDS, NIDDK
Acknowledgements NIH GM (Baylor, Rutgers, EBI) EU Network of Excellence LSHG-CT (EBI)