Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Protein Databank Working with protein data-files.

Similar presentations


Presentation on theme: "The Protein Databank Working with protein data-files."— Presentation transcript:

1 The Protein Databank Working with protein data-files

2 Determining Biomolecule Structures ● X-ray crystallography ● Nuclear magnetic resonance

3 The Protein Databank

4 The PDB Growth Chart figGROWTH.eps

5 Maxim 10.1 Beware of anything in the PDB Header Section

6 The PDB Data-File Formats

7 Example PDB structure 1LQT fig1LQT.eps

8 Example PDB structure 1M7T fig1M7T.eps

9 http://www.rcsb.org/pdb/ http://www.ebi.ac.uk/services/ Downloading PDB data-files

10 Accessing Data In PDB Entries ● Accessing PDB Annotation Data ● Free R and resolution

11 REMARK 2 REMARK 2 RESOLUTION. 1.05 ANGSTROMS. REMARK 215 NMR STUDY REMARK 215 THE COORDINATES IN THIS ENTRY WERE GENERATED FROM SOLUTION REMARK 215 NMR DATA. PROTEIN DATA BANK CONVENTIONS REQUIRE THAT REMARK 215 CRYST1 AND SCALE RECORDS BE INCLUDED, BUT THE VALUES ON REMARK 215 THESE RECORDS ARE MEANINGLESS. Example PDB data-file

12 . REMARK 3 FIT TO DATA USED IN REFINEMENT. REMARK 3 CROSS-VALIDATION METHOD : THROUGHOUT REMARK 3 FREE R VALUE TEST SET SELECTION : RANDOM REMARK 3 R VALUE (WORKING + TEST SET) : 0.134 REMARK 3 R VALUE (WORKING SET) : 0.134 REMARK 3 FREE R VALUE : 0.153 REMARK 3 FREE R VALUE TEST SET SIZE (%) : NULL REMARK 3 FREE R VALUE TEST SET COUNT : 2200. Example PDB data-file, cont.

13 Plotting Free R Values against Resolution figFREER.eps

14 DBREF 1LQT A 1 456 GB 13882996 AAK47528 1 456 DBREF 1LQT B 1 456 GB 13882996 AAK47528 1 456 DBREF 1AFI 1 72 SWS P04129 MERP_SHIFL 20 91 DBREF 1M7T A 1 66 SWS P10599 THIO_HUMAN 0 65 DBREF 1M7T A 67 106 SWS P00274 THIO_ECOLI 68 107 Database cross references

15 REMARK 210 REMARK 210 BEST REPRESENTATIVE CONFORMER IN THIS ENSEMBLE : 21 REMARK 210 Coordinates section

16 ATOM 1 N ARG A 2 26.318 -8.010 39.090 1.00 20.71 N ANISOU 1 N ARG A 2 2040 3071 2755 114 -339 -393 N ATOM 2 CA ARG A 2 25.150 -8.702 38.505 1.00 18.85 C ANISOU 2 CA ARG A 2 2029 2677 2455 67 -321 -209 C ATOM 3 C ARG A 2 24.846 -8.176 37.123 1.00 17.23 C ANISOU 3 C ARG A 2 1689 2429 2429 143 -282 -258 C ATOM 4 O ARG A 2 25.151 -7.048 36.775 1.00 18.14 O. TER 7215 GLY A 456 ATOM 7216 N ARG B 2 -19.423 25.709 6.980 1.00 21.57 N ANISOU 7216 N ARG B 2 2476 3012 2707 -165 -370 95 N ATOM 7217 CA ARG B 2 -18.718 26.510 8.024 1.00 19.01 C ANISOU 7217 CA ARG B 2 2127 2672 2424 -63 -285 91 C ATOM 7218 C ARG B 2 -17.250 26.207 8.002 1.00 17.22 C ANISOU 7218 C ARG B 2 1955 2392 2196 -91 -299 121 C ATOM 7219 O ARG B 2 -16.851 25.158 7.535 1.00 18.15 O Data section

17 TER 14289 GLY B 456 HETATM14290 C ACT 1866 -13.075 1.733 10.218 1.00 27.25 C ANISOU14290 C ACT 1866 3493 3560 3299 -39 -36 -44 C. CONECT14290142911429214293 CONECT1429114290 CONECT1429214290 TER. CONECT1469014663 MASTER 389 0 15 46 38 0 0 620280 2 401 72 END Data section, cont.

18 MODEL 1 ATOM 1 N MET A 1 3.110 -4.682 -3.025 1.00 0.00 N ATOM 2 CA MET A 1 2.546 -3.712 -2.053 1.00 0.00 C ATOM 3 C MET A 1 1.134 -3.295 -2.450 1.00 0.00 C ATOM 4 O MET A 1 0.882 -2.130 -2.758 1.00 0.00 O ATOM 5 CB MET A 1 3.466 -2.491 -2.002 1.00 0.00 C ATOM 6 CG MET A 1 3.781 -1.903 -3.370 1.00 0.00 C ATOM 7 SD MET A 1 4.256 -0.166 -3.285 1.00 0.00 S ATOM 8 CE MET A 1 6.004 -0.307 -2.920 1.00 0.00 C ATOM 9 1H MET A 1 2.906 -4.327 -3.980 1.00 0.00 H ATOM 10 2H MET A 1 2.650 -5.601 -2.859 1.00 0.00 H ATOM 11 3H MET A 1 4.134 -4.738 -2.858 1.00 0.00 H ATOM 12 HA MET A 1 2.517 -4.178 -1.079 1.00 0.00 H ATOM 13 1HB MET A 1 2.996 -1.724 -1.405 1.00 0.00 H ATOM 14 2HB MET A 1 4.397 -2.778 -1.536 1.00 0.00 H ATOM 15 1HG MET A 1 4.596 -2.461 -3.807 1.00 0.00 H ATOM 16 2HG MET A 1 2.907 -1.993 -3.998 1.00 0.00 H ATOM 17 1HE MET A 1 6.344 -1.302 -3.167 1.00 0.00 H ATOM 18 2HE MET A 1 6.169 -0.120 -1.869 1.00 0.00 H ATOM 19 3HE MET A 1 6.553 0.416 -3.505 1.00 0.00 H ATOM 20 N VAL A 2 0.215 -4.256 -2.446 1.00 0.00 N Data section, cont.

19 TER 1659 VAL A 107 ENDMDL MODEL 2 ATOM 1 N MET A 1 2.750 -6.779 -1.627 1.00 0.00 N ATOM 2 CA MET A 1 2.487 -5.475 -2.290 1.00 0.00 C. TER 1660 VAL A 107 ENDMDL Data section, cont.

20 my ( $X, $Y, $Z ) = ( substr( $_, 30, 8 ), substr( $_, 38, 8 ), substr( $_, 46, 8 ) ); Extracting 3D co-ordinate data

21 #! /usr/bin/perl -w # simple_coord_extract - Demonstrates the extraction of # C-Alpha co-ordinates from a PDB # data-file. use strict; while ( <> ) { if ( /^ATOM/ && substr( $_, 13, 4 ) eq "CA " ) { my ( $X, $Y, $Z ) = ( substr( $_, 30, 8 ), substr( $_, 38, 8 ), substr( $_, 46, 8 ) ); $X =~ s/ //g; $Y =~ s/ //g; $Z =~ s/ //g; print "X, Y & Z: $X, $Y, $Z\n"; } The simple_coord_extract program

22 X, Y & Z: 25.150, -8.702, 38.505 X, Y & Z: 23.675, -8.497, 35.069 X, Y & Z: 20.747, -6.252, 34.332 X, Y & Z: 17.545, -8.297, 34.292 X, Y & Z: 15.182, -7.484, 31.454 X, Y & Z: 11.736, -8.952, 30.942 X, Y & Z: 10.261, -9.014, 27.451 X, Y & Z: 6.507, -9.548, 27.173 Results from simple_coord_extract...

23 The graphic image contact map figCONTACTMAP.eps

24 STRIDE: Secondary Structure Assignment

25 Maxim 10.2 It is often easier and desirable to regenerate database annotation than trawl through entries reconstituting the annotation using custom code.

26 $ tar -zxvf stride.tar.gz $ cd stride $ make $./stride Installation of STRIDE

27 Assigning Secondary Structures

28 Simplified definition of a Hydrogen Bond figSIMPLIFIED.eps

29 Example of Secondary Structure Elements in Proteins figSSDEMO.eps

30 Definition of Dihedral angles in the backbone of protein structures figPSIPSI.eps

31 $./stride You must specify input file Action: secondary structure assignment Usage: stride [Options] InputFile [ > file ] Options: -f File Output file -mFile MolScript file -o Report secondary structure summary Only -h Report Hydrogen bonds -rId1Id2.. Read only chains Id1, Id2... -cId1Id2.. Process only Chains Id1, Id2... -q[File] Generate SeQuence file in FASTA format and die Options are position and case insensitive $ stride -cA 1lqt.pdb Using STRIDE and parsing the output

32 $ gawk '/^ASG/ {print $8 " " $9}' 1lqt.A.stride 360.00 156.52 -75.72 161.36 -71.26 145.24 -111.08 119.10 -118.65 131.78. $ gawk '(/^ASG/ && /Strand/) {print $8 " " $9}' 1lqt.A.stride $ gawk '(/^ASG/ && /AlphaHelix/) {print $8 " " $9}' 1lqt.A.stride Using gawk...

33 Ramachandran Plot of dihedral angles of chain A from 1LQT fig1LQTPHIPSI.eps

34 $ stride -q 1lqt.pdb >1lqt.pdb A 452 1.050 RPYYIAIVGSGPSAFFAAASLLKAADTTEDLDMAVDMLEMLPTPWGLVRSGVAPDHPKI K. >1lqt.pdb B 454 1.050 RPYYIAIVGSGPSAFFAAASLLKAADTTEDLDMAVDMLEMLPTPWGLVRSGVAPDHPKI K. $ stride -cA -q 1lqt.pdb >1lqt.pdb A 452 1.050 RPYYIAIVGSGPSAFFAAASLLKAADTTEDLDMAVDMLEMLPTPWGLVRSGVAPDHPKI K. Extracting amino acid sequences using STRIDE

35 Introducing The mmCIF Protein Format

36 Converting mmCIF ● Converting mmCIF to PDB ● Converting mmCIFs to PDB with CIFTr

37 $ cd $ tar -zxvf ciftr-v2.0-linux.tar.gz $ cd ciftr-v2.0-linux/ $ setenv RCSBROOT ~/ciftr-v2.0-linux $ export RCSBROOT = ~/ciftr-v2.0-linux $./CIFTr -i 1lqt.cif The CIFTr program

38 More on mmCIF ● Problems with the CIFTr conversion ● Some advice on using mmCIF ● Automated conversion of mmCIF to PDB

39 Where To From Here


Download ppt "The Protein Databank Working with protein data-files."

Similar presentations


Ads by Google