Download presentation
Presentation is loading. Please wait.
1
The Protein Databank Working with protein data-files
2
Determining Biomolecule Structures ● X-ray crystallography ● Nuclear magnetic resonance
3
The Protein Databank
4
The PDB Growth Chart figGROWTH.eps
5
Maxim 10.1 Beware of anything in the PDB Header Section
6
The PDB Data-File Formats
7
Example PDB structure 1LQT fig1LQT.eps
8
Example PDB structure 1M7T fig1M7T.eps
9
http://www.rcsb.org/pdb/ http://www.ebi.ac.uk/services/ Downloading PDB data-files
10
Accessing Data In PDB Entries ● Accessing PDB Annotation Data ● Free R and resolution
11
REMARK 2 REMARK 2 RESOLUTION. 1.05 ANGSTROMS. REMARK 215 NMR STUDY REMARK 215 THE COORDINATES IN THIS ENTRY WERE GENERATED FROM SOLUTION REMARK 215 NMR DATA. PROTEIN DATA BANK CONVENTIONS REQUIRE THAT REMARK 215 CRYST1 AND SCALE RECORDS BE INCLUDED, BUT THE VALUES ON REMARK 215 THESE RECORDS ARE MEANINGLESS. Example PDB data-file
12
. REMARK 3 FIT TO DATA USED IN REFINEMENT. REMARK 3 CROSS-VALIDATION METHOD : THROUGHOUT REMARK 3 FREE R VALUE TEST SET SELECTION : RANDOM REMARK 3 R VALUE (WORKING + TEST SET) : 0.134 REMARK 3 R VALUE (WORKING SET) : 0.134 REMARK 3 FREE R VALUE : 0.153 REMARK 3 FREE R VALUE TEST SET SIZE (%) : NULL REMARK 3 FREE R VALUE TEST SET COUNT : 2200. Example PDB data-file, cont.
13
Plotting Free R Values against Resolution figFREER.eps
14
DBREF 1LQT A 1 456 GB 13882996 AAK47528 1 456 DBREF 1LQT B 1 456 GB 13882996 AAK47528 1 456 DBREF 1AFI 1 72 SWS P04129 MERP_SHIFL 20 91 DBREF 1M7T A 1 66 SWS P10599 THIO_HUMAN 0 65 DBREF 1M7T A 67 106 SWS P00274 THIO_ECOLI 68 107 Database cross references
15
REMARK 210 REMARK 210 BEST REPRESENTATIVE CONFORMER IN THIS ENSEMBLE : 21 REMARK 210 Coordinates section
16
ATOM 1 N ARG A 2 26.318 -8.010 39.090 1.00 20.71 N ANISOU 1 N ARG A 2 2040 3071 2755 114 -339 -393 N ATOM 2 CA ARG A 2 25.150 -8.702 38.505 1.00 18.85 C ANISOU 2 CA ARG A 2 2029 2677 2455 67 -321 -209 C ATOM 3 C ARG A 2 24.846 -8.176 37.123 1.00 17.23 C ANISOU 3 C ARG A 2 1689 2429 2429 143 -282 -258 C ATOM 4 O ARG A 2 25.151 -7.048 36.775 1.00 18.14 O. TER 7215 GLY A 456 ATOM 7216 N ARG B 2 -19.423 25.709 6.980 1.00 21.57 N ANISOU 7216 N ARG B 2 2476 3012 2707 -165 -370 95 N ATOM 7217 CA ARG B 2 -18.718 26.510 8.024 1.00 19.01 C ANISOU 7217 CA ARG B 2 2127 2672 2424 -63 -285 91 C ATOM 7218 C ARG B 2 -17.250 26.207 8.002 1.00 17.22 C ANISOU 7218 C ARG B 2 1955 2392 2196 -91 -299 121 C ATOM 7219 O ARG B 2 -16.851 25.158 7.535 1.00 18.15 O Data section
17
TER 14289 GLY B 456 HETATM14290 C ACT 1866 -13.075 1.733 10.218 1.00 27.25 C ANISOU14290 C ACT 1866 3493 3560 3299 -39 -36 -44 C. CONECT14290142911429214293 CONECT1429114290 CONECT1429214290 TER. CONECT1469014663 MASTER 389 0 15 46 38 0 0 620280 2 401 72 END Data section, cont.
18
MODEL 1 ATOM 1 N MET A 1 3.110 -4.682 -3.025 1.00 0.00 N ATOM 2 CA MET A 1 2.546 -3.712 -2.053 1.00 0.00 C ATOM 3 C MET A 1 1.134 -3.295 -2.450 1.00 0.00 C ATOM 4 O MET A 1 0.882 -2.130 -2.758 1.00 0.00 O ATOM 5 CB MET A 1 3.466 -2.491 -2.002 1.00 0.00 C ATOM 6 CG MET A 1 3.781 -1.903 -3.370 1.00 0.00 C ATOM 7 SD MET A 1 4.256 -0.166 -3.285 1.00 0.00 S ATOM 8 CE MET A 1 6.004 -0.307 -2.920 1.00 0.00 C ATOM 9 1H MET A 1 2.906 -4.327 -3.980 1.00 0.00 H ATOM 10 2H MET A 1 2.650 -5.601 -2.859 1.00 0.00 H ATOM 11 3H MET A 1 4.134 -4.738 -2.858 1.00 0.00 H ATOM 12 HA MET A 1 2.517 -4.178 -1.079 1.00 0.00 H ATOM 13 1HB MET A 1 2.996 -1.724 -1.405 1.00 0.00 H ATOM 14 2HB MET A 1 4.397 -2.778 -1.536 1.00 0.00 H ATOM 15 1HG MET A 1 4.596 -2.461 -3.807 1.00 0.00 H ATOM 16 2HG MET A 1 2.907 -1.993 -3.998 1.00 0.00 H ATOM 17 1HE MET A 1 6.344 -1.302 -3.167 1.00 0.00 H ATOM 18 2HE MET A 1 6.169 -0.120 -1.869 1.00 0.00 H ATOM 19 3HE MET A 1 6.553 0.416 -3.505 1.00 0.00 H ATOM 20 N VAL A 2 0.215 -4.256 -2.446 1.00 0.00 N Data section, cont.
19
TER 1659 VAL A 107 ENDMDL MODEL 2 ATOM 1 N MET A 1 2.750 -6.779 -1.627 1.00 0.00 N ATOM 2 CA MET A 1 2.487 -5.475 -2.290 1.00 0.00 C. TER 1660 VAL A 107 ENDMDL Data section, cont.
20
my ( $X, $Y, $Z ) = ( substr( $_, 30, 8 ), substr( $_, 38, 8 ), substr( $_, 46, 8 ) ); Extracting 3D co-ordinate data
21
#! /usr/bin/perl -w # simple_coord_extract - Demonstrates the extraction of # C-Alpha co-ordinates from a PDB # data-file. use strict; while ( <> ) { if ( /^ATOM/ && substr( $_, 13, 4 ) eq "CA " ) { my ( $X, $Y, $Z ) = ( substr( $_, 30, 8 ), substr( $_, 38, 8 ), substr( $_, 46, 8 ) ); $X =~ s/ //g; $Y =~ s/ //g; $Z =~ s/ //g; print "X, Y & Z: $X, $Y, $Z\n"; } The simple_coord_extract program
22
X, Y & Z: 25.150, -8.702, 38.505 X, Y & Z: 23.675, -8.497, 35.069 X, Y & Z: 20.747, -6.252, 34.332 X, Y & Z: 17.545, -8.297, 34.292 X, Y & Z: 15.182, -7.484, 31.454 X, Y & Z: 11.736, -8.952, 30.942 X, Y & Z: 10.261, -9.014, 27.451 X, Y & Z: 6.507, -9.548, 27.173 Results from simple_coord_extract...
23
The graphic image contact map figCONTACTMAP.eps
24
STRIDE: Secondary Structure Assignment
25
Maxim 10.2 It is often easier and desirable to regenerate database annotation than trawl through entries reconstituting the annotation using custom code.
26
$ tar -zxvf stride.tar.gz $ cd stride $ make $./stride Installation of STRIDE
27
Assigning Secondary Structures
28
Simplified definition of a Hydrogen Bond figSIMPLIFIED.eps
29
Example of Secondary Structure Elements in Proteins figSSDEMO.eps
30
Definition of Dihedral angles in the backbone of protein structures figPSIPSI.eps
31
$./stride You must specify input file Action: secondary structure assignment Usage: stride [Options] InputFile [ > file ] Options: -f File Output file -mFile MolScript file -o Report secondary structure summary Only -h Report Hydrogen bonds -rId1Id2.. Read only chains Id1, Id2... -cId1Id2.. Process only Chains Id1, Id2... -q[File] Generate SeQuence file in FASTA format and die Options are position and case insensitive $ stride -cA 1lqt.pdb Using STRIDE and parsing the output
32
$ gawk '/^ASG/ {print $8 " " $9}' 1lqt.A.stride 360.00 156.52 -75.72 161.36 -71.26 145.24 -111.08 119.10 -118.65 131.78. $ gawk '(/^ASG/ && /Strand/) {print $8 " " $9}' 1lqt.A.stride $ gawk '(/^ASG/ && /AlphaHelix/) {print $8 " " $9}' 1lqt.A.stride Using gawk...
33
Ramachandran Plot of dihedral angles of chain A from 1LQT fig1LQTPHIPSI.eps
34
$ stride -q 1lqt.pdb >1lqt.pdb A 452 1.050 RPYYIAIVGSGPSAFFAAASLLKAADTTEDLDMAVDMLEMLPTPWGLVRSGVAPDHPKI K. >1lqt.pdb B 454 1.050 RPYYIAIVGSGPSAFFAAASLLKAADTTEDLDMAVDMLEMLPTPWGLVRSGVAPDHPKI K. $ stride -cA -q 1lqt.pdb >1lqt.pdb A 452 1.050 RPYYIAIVGSGPSAFFAAASLLKAADTTEDLDMAVDMLEMLPTPWGLVRSGVAPDHPKI K. Extracting amino acid sequences using STRIDE
35
Introducing The mmCIF Protein Format
36
Converting mmCIF ● Converting mmCIF to PDB ● Converting mmCIFs to PDB with CIFTr
37
$ cd $ tar -zxvf ciftr-v2.0-linux.tar.gz $ cd ciftr-v2.0-linux/ $ setenv RCSBROOT ~/ciftr-v2.0-linux $ export RCSBROOT = ~/ciftr-v2.0-linux $./CIFTr -i 1lqt.cif The CIFTr program
38
More on mmCIF ● Problems with the CIFTr conversion ● Some advice on using mmCIF ● Automated conversion of mmCIF to PDB
39
Where To From Here
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.