The Protein Data Bank: Evolution of a key resource in biology Helen M. Berman September 9, 2010
What is the Protein Data Bank? Single international archive for all information about the structure of large biological molecules (>67,000 entries) Archival database with hundreds of thousands of users who depend on the data Used by structural biologists, computational biologists, biophysicists, biochemists, geneticists, cell biologists, molecular biologists, educators, students, general public
Early structures 1960s: Protein crystallography begins to take off Emerging interest in protein folding Use of computer graphics to represent structure Nobel Prize awarded for the first 3D protein structures: myoglobin and hemoglobin Lysozyme Hemoglobin Ribonuclease Myoglobin Myoglobin: Kendrew, Bodo, Dintzis, Parrish, Wyckoff, Phillips (1958) Nature 181 662-666; Hemoglobin: Perutz (1962) Proc. R. Soc. A265, 161-187; Lysozyme: Blake, Koenig, Mair, North, Phillips, Sarma (1965) Nature 206 757; Ribonuclease: Kartha, Bello, Harker (1967) Nature 213, 862-865; Wyckoff, Hardman, Allewell, Inagami, Johnson, Richards (1967) J. Biol. Chem. 242, 3753-3757. 3
4 4
PDB Depositors PDB Access PDB FTP & RSYNC Traffic (July 2009 – June 2010) RCSB PDB 173,416,704 data downloads PDBe 32,344,547 data downloads PDBj 14,053,071 data downloads 5
PDB History 1970s Community discussions about a protein structure archive Cold Spring Harbor meeting in protein crystallography PDB established at Brookhaven (Oct 1971; 7 structures) 1980s Number of structures increases as technology improves Community discussions about requiring depositions IUCr guidelines established Number of structures deposited increases show petition iucr article illustrate mmcif 6
PDB History 1990s mmCIF standard created Structural genomics begins PDB moves to RCSB PDB 2000s wwPDB formed New methods for structure determination Demand for new validation standards show petition iucr article illustrate mmcif 7
wwPDB Formalization of current working practice MOU signed July 1, 2003 Announced in Nature Structural Biology November 21, 2003
wwPDB guidelines and responsibilities All members issue PDB IDs and serve as distribution sites for data One member is the archive keeper (RCSB PDB) All format documentation publicly available Strict rules for redistribution of PDB files All sites can create their own websites
Community involvement at every step Formation of the resource Guidelines for deposition Standards for the data Global cooperation
Contributing factors for success The science that is being archived must be important enough for people to want to access results The technology for data archiving must be continually evaluated and changed as IT changes The creation of an international organization recognizes the fact that science is global Understanding sociological issues of both the data users and the data producers Attribution of the work of data producers
12 Wellcome Trust, EU, CCP4, BBSRC, MRC, EMBL BIRD-JST, MEXT NLM NSF, NIGMS, DOE, NLM, NCI, NINDS, NIDDK 12