X-ray crystallography NMR cryoEM Experimental approaches for structural biology
cryoEM
Where to get structural data? biological molecules –PDB – Protein Data Bank free –NDB – Nucleic Data Bank organic molecules –CSD – Cambridge Structural Database paid
PDB History 1957 Myoglobin structure determined 1970’s Discussions how to establish an archive of protein structures PDB established at Brookhaven –Oct 1971, 7 structures 1980’s Technology takes off –molecular biology, instrumentation, computer hardware and software Number of structures increases Structural biology is able to focus on medical problems IUCr requires data deposition to the PDB 1990’s Complexity of structures increases Structural genomics begins
Current state of the PDB – structures in the PDB archive new structures deposited in 2012 so far Depositions by macromolecule type –92.6 % Proteins ( structures) –2.8 % Nucleic acids (2456 structures) –4.5 % Protein-nucleic acid complexes (3905 structures) Depositions by experimental technique: –88.0% x-ray diffraction ( structures) –11.2% solution NMR (9702 structures) –0.5% cryo-EM (468 structures) data as of
PDB ID Each structure in the PDB is represented by a 4 character identifier of the form [0- 9][a-z,0-9][a-z,0-9][a-z,0-9] 1B3T
Data formats of PDB PDB format, mmCIF (and derived xml PDBML) Dictionary resources at: mmCIF is the PDB archival format –all data released in all three formats
PDB Format legacy format fortran-like 80 column-wide not structured enough to describe complicated 3D objects its limits have been broken several times 99,999 atoms, 34 (or 58) chains readable by most programs
model – chain – residue – atom
mmCIF language based on community-agreed definitions allows adding new features and customization mmCIF categories are easily transformed to database tables not designed to be read by humans, data should be viewed through programs and databases
Pubmed, MEDLINE, Entrez etc.
NCBI National Institute of Health (NIH) – U. S. government National Library of Medicine (NLM) National Centre for Biotechnology Information (NCBI)
NCBI (founded 1988, Genomic sequences - GenBank – open access annotated collection of all available nucleotide sequences, doubles each 18 months (October 2008 – bp), new release every 2 months, accession number (U49845) required upon publication OMIM – Online Mendelian Inheritance in Man, db of diseases together with their genetic components PubChem ( – db of small organic molecules, includes the information about their bioacivities Entrez ( – federated search engine offering unified access to all NCBI databases
MEDLINE journal citations and abstracts for biomedical literature since free access to MEDLINE via PubMed. PubMed - Web-based retrieval system developed by the NCBI at the NLM. It is part of NCBI's Entrez. PubMed contains –abstracts –links to full-text articles –links to other databases –…and much more
What’s in Pubmed Most PubMed records are MEDLINE citations. –citations and author abstracts from approx biomedical journals –diverse topics: microbiology, delivery of health care, nutrition, pharmacology and environmental health. –currently over 19 million references dating back to 1948 –new material added Tuesday through Saturday –about 90% records are from English-language sources or have English abstracts –Approximately 79% of the citations are included with the published abstract
What’s in Pubmed Pubmed Central (PMC) – –db of free full texts –since 2007 paper funded by NIH must be freely available through PMC no later tha 12 month since publishing NCBI Bookshelf – –free biomedical books (biochemistry, molecular biology, …)
MeSH created 1960 by NLM "Medical Subject Headings." –the authority list of the biomedical terms –used for indexing journal articles for MEDLINE It imposes uniformity and consistency to the indexing of biomedical literature. MeSH Tree. Citations are indexed manually.
MeSH vocabulary is organized by 16 main branches: 1.Anatomy 2.Organisms 3.Diseases 4.Chemical and Drugs 5.Analytical, Diagnostic and Therapeutic Techniques and Equipment 6.Psychiatry and Psychology 7.Biological Sciences 8.Natural Sciences 9.Anthropology, Education, Sociology and Social Phenomena 10.Technology, Industry, Agriculture 11.Humanities 12.Information Science 13.Named Groups 14.Health Care 15.Publication Characteristics 16.Geographic Locations
Search Pubmed each citation has a unique PbMed ID (PMID), Boolean operators –must be UPPERCASE! –AND is default –parenthesis: salmonella AND (hamburger OR eggs) phrase searching –“kidney failure”, kidney failure*, kidney failure[tw] author names –natural or inverted order (“julia wong”, “wong julia”) –searching last name only – use [au] tag ( wheeler[au] )
Search tags [ad] – affiliation of the first author [all] – all fields [au] – author [dp] – date of publication, yyyy/mm/dd, last two are optionally [ta] – journal title (abbreviated, full), see Journals database [mh] - MeSH term [majr] – MeSH major topic [ti] – title [tiab] – title + abstract
citation sensor –choi blood 2008 related articles –sorted from most to least relevant All, Review, Free full text