Presentation is loading. Please wait.

Presentation is loading. Please wait.

Worldwide Protein Data Bank www.wwpdb.org What the Protein Data Bank teaches us about structural biology Helen M. Berman NCMI Workshop December 13, 2008.

Similar presentations


Presentation on theme: "Worldwide Protein Data Bank www.wwpdb.org What the Protein Data Bank teaches us about structural biology Helen M. Berman NCMI Workshop December 13, 2008."— Presentation transcript:

1 Worldwide Protein Data Bank www.wwpdb.org What the Protein Data Bank teaches us about structural biology Helen M. Berman NCMI Workshop December 13, 2008

2

3 1960’s  Protein crystallography begins to take off  Emerging interest in protein folding  Use of computer graphics to represent structure  Nobel Prize awarded for the first 3D protein structures: myoglobin and hemoglobin Lysozyme Hemoglobin Ribonuclease Myoglobin Myoglobin: Kendrew, Bodo, Dintzis, Parrish, Wyckoff, Phillips (1958) Nature 181 662-666; Hemoglobin: Perutz (1962) Proc. R. Soc. A265, 161-187; Lysozyme: Blake, Koenig, Mair, North, Phillips, Sarma (1965) Nature 206 757; Ribonuclease: Kartha, Bello, Harker (1967) Nature 213, 862-865; Wyckoff, Hardman, Allewell, Inagami, Johnson, Richards (1967) J. Biol. Chem. 242, 3753-3757.

4 1970’s Grass roots community efforts to archive data Protein crystallographers discuss how to archive data June 1971 Cold Spring Harbor meeting brings groups together (Cold Spring Harbor Symposia on Quantitative Biology, vol. XXXVI, 1972) October 1971 PDB is announced in Nature New Biology (7 structures; vol 233, 1971, page 223) 1975 PDB receives first funding from NSF (~32 structures)

5 Hemoglobin M.F. Perutz (1962) Proc. R. Soc. A265:161-187 Carboxypeptidase A F.A. Quiocho, W.N. Lipscomb (1971) Adv Protein Chem 25:1-78 Myoglobin J.C. Kendrew, G. Bodo, H.M. Dintzis, R.G. Parrish, H. Wyckoff, D.C. Phillips (1958) Nature 181:662-666 Subtilisin R.A. Alden, J.J. Birktoft, J. Kraut, J.D. Robertus, C.S. Wright (1971) Biochem Biophys Res Commun 45: 337-344 Alpha-chymotrypsin J.J. Birktoft, D.M. Blow (1972) J Mol Biol 68: 187-240 Pancreatic trypsin inhibitor R. Huber, D. Kukla, A. Ruhlmann, O. Epp, H. Formanek (1970) Nature 57: 389-392 Rubredoxin K.D. Watenpaugh, L.C. Sieker, J.R. Herriott, L.H. Jensen (1973) Acta Crystallogr B29: 943-956 Lactate dehydrogenase J.L. White, M.L. Hackert, M. Buehner, M.J. Adams, G.C. Ford, P.J. Lentz Jr., I.E. Smilely, S.J. Steindel, M.G. Rossmann (1976) J Mol Biol 102: 759-779 Cytochrome b 5 F.S. Mathews, P. Argos, M. Levine (1972) Cold Spring Harb Symp Quant Biol 36: 387-395 Papain J. Drenth, J.N. Jansonius, R. Koekoek, H.M. Swen, B.G. Wolthers (1968) Nature 218: 929-932

6 Ligases Isomerases Lyases Hydrolases Transferases Oxidoreductases Proportion of enzyme classes relative to total enzyme structures Enzyme Class 1972-791980-891990-992000-08 Total Oxidoreductases52591829773925 Transferases329142352466701 Hydrolases29123279768469795 Lyases2345113371793 Isomerases12280716999 Ligases04123652779 Total4018659921777423992 Enzymes In the beginning Lysozyme Blake, Koenig, Mair, North, Phillips, Sarma (1965) Nature 206 757 Ribonuclease Kartha, Bello, Harker (1967) Nature 213, 862- 865; Wyckoff, Hardman, Allewell, Inagami, Johnson, Richards (1967) J. Biol. Chem. 242, 3753-3757. Decade: Percent

7 In the beginning RNA-containing structures (1317) Protein/RNA complexes RNA only 1972-1979 1980-1989 1990-1999 2000-2008 Decade: Number of Structures 0 200 400 600 800 1000 1200 DNA/RNA hybrid Protein/DNA/RNA complexes J.L. Sussman, S.-H. Kim (1976) Biochem Biophys Res Commun. 68:89-96; J.D. Robertus, J.E. Ladner, J.T. Finch, D. Rhodes, R.S. Brown, B.F.C. Clark, & A. Klug (1974) Nature 250: 546-551. tRNA

8 1980’s Technology takes off Structural biology is able to focus on medical problems Community efforts to promote data sharing IUCr guidelines requiring data deposition in the PDB are published

9 In the beginning DNA-containing structures (2474) Protein/DNA complexes DNA only DNA/RNA hybrid Protein/DNA/RNA complexes Z-DNAB-DNA 1bna Dickerson & Drew (1981) J. Mol. Biol. 149: 761-786 2dcg Wang, Quigley, Kolpak, Crawford, van Boom, van der Marel, Rich (1979) Nature 282: 680-686 Decade

10 In the beginning Phage 434 repressor-operator Protein-nucleic acid complexes (1920) Protein/DNA complexes Protein/RNA complexes Protein/DNA/RNA complexes Number of Structures 2or1 Aggarwal, Rodgers, Drottar, Ptashne, & Harrison (1988) Science 242: 899-907 Decade:

11 Viruses (280 total) In the beginning Hopper, Harrison, Sauer (1984) Structure of tomato bushy stunt virus. V. Coat protein sequence determination and its structural implications J.Mol.Biol. 177: 701-713 Silva, Rossmann (1985) The refinement of southern bean mosaic virus in reciprocal space Acta Crystallogr. B41: 147-157 20 121 139 0 20 40 60 80 100 120 140 160 1980-19891990-1999>=2000 Number of Structures Decade

12 Cooperative community action  Individual letters to editors of journals  Committees –IUCr commission on Biological Macromolecules –ACA/USNCCr –Richards committee  Funding agencies  Articles in journals Marvin Cassman Fred Richards Richard Dickerson

13 1990’s  Number of structures increases exponentially  Complexity of structures increases  mmCIF dictionary created  New databases begin to emerge  User base expands dramatically  PDB archive moves mmCIF Working Group Members

14 In the beginning Electron Microscopy structures Bacteriorhodopsin Henderson, Baldwin, Ceska, Zemlin, Beckmann, Downing (1990) J.Mol.Biol. 213: 899-929.

15 Ribosome structures (214) Prokaryotic Eukaryotic In the beginning Ban, Nissen, Hansen, Moore, & Steitz (2000) Science 289: 905-920; Clemons Jr., May, Wimberly, McCutcheon, Capel, & Ramakrishnan (1999) Nature 400: 833-840; Schluenzen, Tocilj, Zarivach, Harms, Gluehmann, Janell, Bashan, Bartels, Agmon, Franceschi, Yonath (2000) Cell 102: 615-623; Yusupova, Yusupov, Cate,& Noller (2001) Cell 106: 233-241. Ribosome 30S 50S

16 2000’s  wwPDB is formed  Continued growth in structures  Structural genomics takes off

17 www.wwpdb.org

18 Number of released entries Year: Depositions to the PDB by decade

19 July 2008

20 What can we learn from the PDB?

21 Structure distribution Other Protein only Protein- DNA complexes DNA only Protein-RNA complexes RNA only RNA-DNA hybrid 17988 23466 819 4445 t 500 2911 218 280 * * * * * GO process

22 Number of structures Structure determination methods April 30, 2008 Decade 6 176

23 Resolution distribution of protein structures Resolution distribution of other structures Year Resolution Resolution distribution of all structures

24 Structures containing distinct protein sequences (<98%) Structures containing novel protein sequences (<30%) Distinct and novel protein sequences Decade Percent of distinct/novel structures Subset of PSI structures Subset of other SG structures 1972-1979 1980-1989 1990-1999 2000-2008 0 10 20 30 40 50 60 70 63% 37% 51% 27% 32% 14% 39% 16% 7% 25% 4% 2% 10%

25 Redundancy: protein clusters Cluster # Total distinct chains in cluster Protein clusterFirst structureDeposition Date 1459Bacteriophage T4 lysozyme2LZM1977-03-28 2297Hen white lysozyme2LYZ1975-02-01 3196Human lysozyme1GFE1984-10-12 4445 Mouse immunoglobulin Fc&Fab fragments1GIG1993-01-20 5218 Human immunoglobulin Fc&Fab fragments1FC11981-05-21 6330HIV-1 protease2HVP1989-04-10 7302Trypsin (serine protease)5PTP1977-12-19 8254Thrombin2HGT1991-06-03 9229Human carbonic anhydrase II1CA21976-05-22 10185Whale myoglobin1MBN1973-04-05 11182Human leukocyte antigen1HLA1987-10-15 12178 Human hemoglobin  -subunit 3HHB1975-04-01 13176 Human hemoglobin  -subunit 3HHB1975-04-01 14160Ribonuclease A2RNS1973-04-01 15153 Human cyclin-dependant kinase 2 (CDK2)1HCK1996-06-03

26 Lysozyme: Lessons learned Blake, Koenig, Mair, North, Phillips, Sarma (1965) Nature 206: 757. T4 bacteriophage (459 structures)  Amino acid replacement studies suggest that fraction of amino acid residues that define the structure of T4 lysozyme is about 50% B.W. Matthews (1996) FASEB J.10: 35-41. Insight into folding and catalysis Hen egg white (297 structures)  Low sequence identity  Structural similarity of active site to T4 B.W. Matthews, M.G. Remington, M.G. Grutter, W.F. Anderson (1981) J.Mol.Biol. 147: 545-58. Insight into evolution and catalysis

27 Myoglobin and hemoglobin: Lessons learned Lodish et al. 6 1 Kuriyan, Wilz, Karplus, Petsko (1986) J. Mol. Biol. 192:133–154; 2 Quillin, Arduini, Olson, Phillips, Jr. (1993) J. Mol. Biol. 234: 140–155, Carver, Brantley Jr, Singleton, Arduini, Quillin, Phillips Jr, Olson (1992) J. Biol. Chem. 267:14443–14450; 3 Bourgeois, Vallone, Schotte, Arcovito, Miele, Sciara, Wulff, Anfinrud, Brunori (2003) PNAS 100: 8704-8709; 4 Dickerson, Geis (1983) Hemoglobin: structure, function, and pathology; 5 Kidd, Baker, Mathews, Brittain Baker (2001) Prot. Sci. 10:1739-1749, Harrington, Adachi, Royer Jr. (1998) J. Biol. Chem. 273: 32690 - 32696; 6 Lodish, Berk, Zipursky, Matsudaira, Balitmore, Darnell (2000) Molecular Cell Biology WH Freeman & Co. Whale myoglobin (185 structures)  Different ligands: oxygen, carbon dioxide 1  Amino acid substitution studies 2  Laue studies 3 Insight into function and dynamics Other species myoglobin  Low sequence identity, same structure 4 Insight into evolution Human hemoglobin (178 structures) Insight into function and disease (sickle cell anemia, thalassemia) 5 Other species hemoglobin  Low sequence identity, same structure 4 Profound insight into evolution

28 TIM barrel proteins: Lessons learned TIM barrel structures (1727) http://www.cathdb.info  Share the same fold but represent significant sequence and functional diversity  Are enzymes or enzyme-related proteins involved in molecular or energy metabolism  Comparative structure analysis indicates evolutionary relatedness of TIM barrel proteins Banner, Bloomer, Petsko, Phillips, Wilson, (1976) Biochem.Biophys.Res. Commun. 72: 146-155 Nagano, Orengo, Thornton (2002) J.Mol. Biol. 321: 741-65.

29 HIV-related structures (609) 311 110 39 27 122 Number of Structures Decade Protease Reverse Transcriptase Gag protein Integrase Other

30 Amprenavir (GSK)Fosamprenavir (GSK) Lopinavir (Abbott)Atazanavir (BMS) Nelfinavir (Agouron)Darunavir (Tibotec) Tipranavir (BI)Indinavir (Merck) Ritonavir (Abbott)Saquinavir (Roche) HIV-1 protease (311) Navia, Fitzgerald, McKeever, Leu, Heimbach, Herber, Sigal, Darke, Springer (1989) Nature 337: 615-620; Wlodawer, Miller, Jaskolski, Sathyanarayana, Baldwin, Weber, Selk, Clawson, Schneider, Kent (1989) Science 245: 616-621 226 structures with ligands 2R5P, 2B7Z, 2AVV, 2AVO, 2AVS, 1SGU, 1SDT, 1SDV, 1SDU, 1K6C, 1C6Y, 2BPX, 1HSG, 1HSH 1T7J, 1HPV 2B60, 1RL8, 1SH9, 1N49, 1HXW 2QAK, 2PYM, 2Q63, 2PYN, 2Q64, 2R5Q, 1OHR 2O4N, 2O4L, 2O4P, 1D4Y, 1D4S 3D1X, 3D1Y, 3CYX, 2NMW, 2NMZ, 2NNP, 2NMY, 2NNK, 1C6Z, 1FB7 2FXE, 2FXD, 2O4K, 2AQU, 2FND 2RKG, 2RKF, 2QHC, 2Z54, 2Q5K, 2O4S, 1RV7, 1MUI

31 Abacavir (GSK) Nevirapine (BI)Stavudin (BMS) Efavirenz (BMS)Lamivudine (GSK) Zidovudine (GSK)Emtricitabine (Gilead) Tenofovir (Gilead)Zalcitabine (Hoffmann- LaRoche) Etravirine (Tibotec)Delavirdine (Pfizer) HIV-1 reverse transcriptase (110) Year Number of Structures Wang, Smerdon, Jager, Kohlstaedt, Rice, Friedman, Steitz, (1994) Proc.Natl.Acad.Sci.USA 91: 7242-7246 76 structures with ligands 2HND, 2HNY, 1S1U, 1S1X, 1LW0, 1LWE, 1LWC, 1LWF, 1JLB, 1JLF, 1FKP, 1VRT, 3HVT 1JKH, 1IKW, 1IKV, 1FKO, 1FK9 1T05 1S6P

32 KEGG Pathway Number of Structures Complement and coagulation cascades506 Small cell lung cancer506 Regulation of actin cytoskeleton449 Non-small cell lung cancer407 Pyrimidine metabolism402 Nitrogen metabolism399 Two-component system - General360 Ribosome333 Base excision repair328 Purine metabolism310 Antigen processing and presentation281 Nicotinate and nicotinamide metabolism252 Insulin signaling pathway248 Porphyrin and chlorophyll metabolism248 ABC transporters - General246 Prostate cancer244 Structural coverage of KEGG pathways 50136 structures 16526 structures associated with KEGG pathway (33%)

33 Human biological pathways Genes that contain a PDB structure are in red Complement and coagulation cascades pathway Small cell lung cancerNon small cell lung cancer Regulation of actin cytoskeleton KEGG (http://www.genome.jp/kegg/)

34 EM maps and Models in the PDB

35 How EM experiments are archived

36

37 Nuclear pore complex, 85 Å EMD-1097 Rotavirus V6 protein, 3.8 Å EMD-1461 EMDataBank  Created by EBI in 2002 for archiving EM maps  US deposition/annotation site added this year  Maps stored in CCP4/MRC format  Associated metadata stored in xml format 580 entries total

38 EM entries in the PDB  Atomic coordinate models fitted to EM maps  Storage format for models and metadata is CIF  Matrix representations possible  Some large entries “break” PDB format PBCV-1 (1m4x, 1680 matrices) 80S ribosome (1s1h + 1s1i) 230 entries total

39 PDBj

40 Goals  Common data model  Data harvesting tools  “One-stop shop” for deposition and retrieval  Tools for visualization, segmentation, and assessment

41 Acknowledgements Wellcome Trust, EU, CCP4, BBSRC, MRC, EMBL NLM BIRD-JST, MEXT NSF, NIGMS, DOE, NLM, NCI, NCRR, NIBIB, NINDS, NIDDK

42

43 Acknowledgements NIH GM079429 (Baylor, Rutgers, EBI) 2007- 2012 EU Network of Excellence LSHG-CT-2004-50282 (EBI) 2004-2009


Download ppt "Worldwide Protein Data Bank www.wwpdb.org What the Protein Data Bank teaches us about structural biology Helen M. Berman NCMI Workshop December 13, 2008."

Similar presentations


Ads by Google