The Protein Databank Working with protein data-files
Determining Biomolecule Structures ● X-ray crystallography ● Nuclear magnetic resonance
The Protein Databank
The PDB Growth Chart figGROWTH.eps
Maxim 10.1 Beware of anything in the PDB Header Section
The PDB Data-File Formats
Example PDB structure 1LQT fig1LQT.eps
Example PDB structure 1M7T fig1M7T.eps
Downloading PDB data-files
Accessing Data In PDB Entries ● Accessing PDB Annotation Data ● Free R and resolution
REMARK 2 REMARK 2 RESOLUTION ANGSTROMS. REMARK 215 NMR STUDY REMARK 215 THE COORDINATES IN THIS ENTRY WERE GENERATED FROM SOLUTION REMARK 215 NMR DATA. PROTEIN DATA BANK CONVENTIONS REQUIRE THAT REMARK 215 CRYST1 AND SCALE RECORDS BE INCLUDED, BUT THE VALUES ON REMARK 215 THESE RECORDS ARE MEANINGLESS. Example PDB data-file
. REMARK 3 FIT TO DATA USED IN REFINEMENT. REMARK 3 CROSS-VALIDATION METHOD : THROUGHOUT REMARK 3 FREE R VALUE TEST SET SELECTION : RANDOM REMARK 3 R VALUE (WORKING + TEST SET) : REMARK 3 R VALUE (WORKING SET) : REMARK 3 FREE R VALUE : REMARK 3 FREE R VALUE TEST SET SIZE (%) : NULL REMARK 3 FREE R VALUE TEST SET COUNT : Example PDB data-file, cont.
Plotting Free R Values against Resolution figFREER.eps
DBREF 1LQT A GB AAK DBREF 1LQT B GB AAK DBREF 1AFI 1 72 SWS P04129 MERP_SHIFL DBREF 1M7T A 1 66 SWS P10599 THIO_HUMAN 0 65 DBREF 1M7T A SWS P00274 THIO_ECOLI Database cross references
REMARK 210 REMARK 210 BEST REPRESENTATIVE CONFORMER IN THIS ENSEMBLE : 21 REMARK 210 Coordinates section
ATOM 1 N ARG A N ANISOU 1 N ARG A N ATOM 2 CA ARG A C ANISOU 2 CA ARG A C ATOM 3 C ARG A C ANISOU 3 C ARG A C ATOM 4 O ARG A O. TER 7215 GLY A 456 ATOM 7216 N ARG B N ANISOU 7216 N ARG B N ATOM 7217 CA ARG B C ANISOU 7217 CA ARG B C ATOM 7218 C ARG B C ANISOU 7218 C ARG B C ATOM 7219 O ARG B O Data section
TER GLY B 456 HETATM14290 C ACT C ANISOU14290 C ACT C. CONECT CONECT CONECT TER. CONECT MASTER END Data section, cont.
MODEL 1 ATOM 1 N MET A N ATOM 2 CA MET A C ATOM 3 C MET A C ATOM 4 O MET A O ATOM 5 CB MET A C ATOM 6 CG MET A C ATOM 7 SD MET A S ATOM 8 CE MET A C ATOM 9 1H MET A H ATOM 10 2H MET A H ATOM 11 3H MET A H ATOM 12 HA MET A H ATOM 13 1HB MET A H ATOM 14 2HB MET A H ATOM 15 1HG MET A H ATOM 16 2HG MET A H ATOM 17 1HE MET A H ATOM 18 2HE MET A H ATOM 19 3HE MET A H ATOM 20 N VAL A N Data section, cont.
TER 1659 VAL A 107 ENDMDL MODEL 2 ATOM 1 N MET A N ATOM 2 CA MET A C. TER 1660 VAL A 107 ENDMDL Data section, cont.
my ( $X, $Y, $Z ) = ( substr( $_, 30, 8 ), substr( $_, 38, 8 ), substr( $_, 46, 8 ) ); Extracting 3D co-ordinate data
#! /usr/bin/perl -w # simple_coord_extract - Demonstrates the extraction of # C-Alpha co-ordinates from a PDB # data-file. use strict; while ( <> ) { if ( /^ATOM/ && substr( $_, 13, 4 ) eq "CA " ) { my ( $X, $Y, $Z ) = ( substr( $_, 30, 8 ), substr( $_, 38, 8 ), substr( $_, 46, 8 ) ); $X =~ s/ //g; $Y =~ s/ //g; $Z =~ s/ //g; print "X, Y & Z: $X, $Y, $Z\n"; } The simple_coord_extract program
X, Y & Z: , , X, Y & Z: , , X, Y & Z: , , X, Y & Z: , , X, Y & Z: , , X, Y & Z: , , X, Y & Z: , , X, Y & Z: 6.507, , Results from simple_coord_extract...
The graphic image contact map figCONTACTMAP.eps
STRIDE: Secondary Structure Assignment
Maxim 10.2 It is often easier and desirable to regenerate database annotation than trawl through entries reconstituting the annotation using custom code.
$ tar -zxvf stride.tar.gz $ cd stride $ make $./stride Installation of STRIDE
Assigning Secondary Structures
Simplified definition of a Hydrogen Bond figSIMPLIFIED.eps
Example of Secondary Structure Elements in Proteins figSSDEMO.eps
Definition of Dihedral angles in the backbone of protein structures figPSIPSI.eps
$./stride You must specify input file Action: secondary structure assignment Usage: stride [Options] InputFile [ > file ] Options: -f File Output file -mFile MolScript file -o Report secondary structure summary Only -h Report Hydrogen bonds -rId1Id2.. Read only chains Id1, Id2... -cId1Id2.. Process only Chains Id1, Id2... -q[File] Generate SeQuence file in FASTA format and die Options are position and case insensitive $ stride -cA 1lqt.pdb Using STRIDE and parsing the output
$ gawk '/^ASG/ {print $8 " " $9}' 1lqt.A.stride $ gawk '(/^ASG/ && /Strand/) {print $8 " " $9}' 1lqt.A.stride $ gawk '(/^ASG/ && /AlphaHelix/) {print $8 " " $9}' 1lqt.A.stride Using gawk...
Ramachandran Plot of dihedral angles of chain A from 1LQT fig1LQTPHIPSI.eps
$ stride -q 1lqt.pdb >1lqt.pdb A RPYYIAIVGSGPSAFFAAASLLKAADTTEDLDMAVDMLEMLPTPWGLVRSGVAPDHPKI K. >1lqt.pdb B RPYYIAIVGSGPSAFFAAASLLKAADTTEDLDMAVDMLEMLPTPWGLVRSGVAPDHPKI K. $ stride -cA -q 1lqt.pdb >1lqt.pdb A RPYYIAIVGSGPSAFFAAASLLKAADTTEDLDMAVDMLEMLPTPWGLVRSGVAPDHPKI K. Extracting amino acid sequences using STRIDE
Introducing The mmCIF Protein Format
Converting mmCIF ● Converting mmCIF to PDB ● Converting mmCIFs to PDB with CIFTr
$ cd $ tar -zxvf ciftr-v2.0-linux.tar.gz $ cd ciftr-v2.0-linux/ $ setenv RCSBROOT ~/ciftr-v2.0-linux $ export RCSBROOT = ~/ciftr-v2.0-linux $./CIFTr -i 1lqt.cif The CIFTr program
More on mmCIF ● Problems with the CIFTr conversion ● Some advice on using mmCIF ● Automated conversion of mmCIF to PDB
Where To From Here