The Protein Databank Working with protein data-files.

Slides:



Advertisements
Similar presentations
Determination of Protein Structure. Methods for Determining Structures X-ray crystallography – uses an X-ray diffraction pattern and electron density.
Advertisements

Computing Protein Structures from Electron Density Maps: The Missing Loop Problem I. Lotan, H. van den Bedem, A. Beacon and J.C. Latombe.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
Protein-a chemical view A chain of amino acids folded in 3D Picture from on-line biology bookon-line biology book Peptide Protein backbone N / C terminal.
Thomas Blicher Center for Biological Sequence Analysis
Protein Primer. Outline n Protein representations n Structure of Proteins Structure of Proteins –Primary: amino acid sequence –Secondary:  -helices &
ProteinStructuralDatabases. Proteins are built from amino-acids. Introduction H | NH2-c-CO2H | R.
Structures and Structure Descriptions Chapter 8 Protein Bioinformatics.
Structure Prediction in 1D
Physics of Protein Folding. Why is the protein folding problem important? Understanding the function Drug design Types of experiments: X-ray crystallography.
Resonance Assignment NMR analysis of proteins Sequential resonance assignment strategies Practical Issues.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.
Computing for Bioinformatics Lecture 8: protein folding.
1 Computational Biology, Part 13 Retrieving and Displaying Macromolecular Structures Robert F. Murphy Copyright  1996, 1999, All rights reserved.
Genetic Threading By J.Yadgari and A.Amir Published: special issue on Bioinformatics in Journal of Constraints, June 2001 Alexandre Tchourbanov University.
1 Computational Biology, Part 11 Retrieving and Displaying Macromolecular Structures Robert F. Murphy Copyright  1996, 1999, All rights reserved.
Comparing protein structure and sequence similarities Sumi Singh Sp 2015.
Structure Representation and Coordinates Format Lecture 3 Structural Bioinformatics Dr. Avraham Samson
Computational Structure Prediction Kevin Drew BCH364C/391L Systems Biology/Bioinformatics 2/12/15.
Structure and Function of Proteins Lecturer: Dr. Ora Furman Oct 2009 Winter 2009/10 Teaching Assistants: Miraim Oxsman Sivan Pearl.
Chapter 12 Protein Structure Basics. 20 naturally occurring amino acids Free amino group (-NH2) Free carboxyl group (-COOH) Both groups linked to a central.
Number of released entries Year. Growth of Molecular Complexity Number of Chains Year Number of Structures Containing that Number of Chains.
Part II : Introduction To Protein Structure Kong Lesheng Victor Tong Joo Chuan National University of Singapore.
MODELLER hands-on Ben Webb, Sali Lab, UC San Francisco Maya Topf, Birkbeck College, London.
Being a binding site: Characterizing Residue-Composition of Binding Sites on Proteins joint work with Zoltán Szabadka and Gábor Iván, Protein Information.
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
The.pdb file format, and other resources for structural information Topic 5 Chapter 10 & 11, Du and Bourne “Structural Bioinformatics”
Protein Secondary Structure Lecture 2/19/2003. Three Dimensional Protein Structures Confirmation: Spatial arrangement of atoms that depend on bonds and.
SMART Teams: Students Modeling A Research Topic Jmol Training 101!
 Four levels of protein structure  Linear  Sub-Structure  3D Structure  Complex Structure.
© Wiley Publishing All Rights Reserved. Protein 3D Structures.
Macromolecular Visualization or… Where to go when ChemDraw just isn’t enough Martin Case Chem
The Strategy of Atomic Resolution Structural Biology Break down complexity so that the system can be understood at a fundamental level Build up a picture.
EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI.
Common File Formats in Rosetta Steven Combs. The Files Flags/Option files Resfiles Params PDB Silent Atom tree diffs.
Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 1 C571/C696 Chemical Information Technology David Wild
Protein structure prediction May 26, 2011 HW #8 due today Quiz #3 on Tuesday, May 31 Learning objectives-Understand the biochemical basis of secondary.
The digestive system, thermodynamics, enzymes, and transport across membranes May 12, 2003 Learning objectives- Be capable of manipulating protein structures.
Doug Raiford Lesson 17.  Framework model  Secondary structure first  Assemble secondary structure segments  Hydrophobic collapse  Molten: compact.
Module 3 Protein Structure Database/Structure Analysis Learning objectives Understand how information is stored in PDB Learn how to read a PDB flat file.
Structure database: PDB Tuomas Hätinen. Protein Data Bank A repository for 3-D biological macromolecular structure. It includes proteins, nucleic acids.
Biochemistry - as science; biomolecules; metabolic ways. Structure of proteins, methods of its determination.
Data Harvesting: automatic extraction of information necessary for the deposition of structures from protein crystallography Martyn Winn CCP4, Daresbury.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Introductory Protein Structure Warren Kaplan Peter Wills Bioinformatics Centre Garvan Institute of Medical Research.
Parsing BLAST output. Output of a local BLAST search “less” program Full path to the BLAST output file.
Protein NMR Part II.
EBI is an Outstation of the European Molecular Biology Laboratory. Sanchayita Sen, Ph.D. PDB Depositions Validation & Structure Quality.
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
CATH Soap Web Services SWSScan.
X-ray detection xray/facilities.html.
Protein Structure and Bioinformatics. Chapter 2 What is protein structure? What are proteins made of? What forces determines protein structure? What is.
©CMBI 2008 Databases Data must be in a certain format for software to recognize Every database can have its own format but some data elements are essential.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
1 2. Nucleic acids and proteins in one and more dimensions.
EBI is an Outstation of the European Molecular Biology Laboratory. PDBe Search Services (PDBelite, PDBePro and BIObar) Sanchayita Sen, Ph.D. PDB Depositions.
Marlou Snelleman 2012 Protein structure. Overview Sequence to structure Hydrogen bonds Helices Sheets Turns Hydrophobicity Helices Sheets Structure and.
Introducing Bioperl Toward the Bioinformatics Perl programmer's nirvana.
Python is Awesome! (and cooler than R). My Research.
Sequence: PFAM Used example: Database of protein domain families. It is based on manually curated alignments.
Computational Structure Prediction
Protein Structure Prediction Dr. G.P.S. Raghava Protein Sequence + Structure.
Getting the Most out of the PDBe
Number of released entries
Douglas Kojetin, Ph.D. UC College of Medicine
Section 1 Powerpoint Assignment for Micbio 565, 2012
Haixu Tang School of Inforamtics
Levels of Protein Structure
Presentation transcript:

The Protein Databank Working with protein data-files

Determining Biomolecule Structures ● X-ray crystallography ● Nuclear magnetic resonance

The Protein Databank

The PDB Growth Chart figGROWTH.eps

Maxim 10.1 Beware of anything in the PDB Header Section

The PDB Data-File Formats

Example PDB structure 1LQT fig1LQT.eps

Example PDB structure 1M7T fig1M7T.eps

Downloading PDB data-files

Accessing Data In PDB Entries ● Accessing PDB Annotation Data ● Free R and resolution

REMARK 2 REMARK 2 RESOLUTION ANGSTROMS. REMARK 215 NMR STUDY REMARK 215 THE COORDINATES IN THIS ENTRY WERE GENERATED FROM SOLUTION REMARK 215 NMR DATA. PROTEIN DATA BANK CONVENTIONS REQUIRE THAT REMARK 215 CRYST1 AND SCALE RECORDS BE INCLUDED, BUT THE VALUES ON REMARK 215 THESE RECORDS ARE MEANINGLESS. Example PDB data-file

. REMARK 3 FIT TO DATA USED IN REFINEMENT. REMARK 3 CROSS-VALIDATION METHOD : THROUGHOUT REMARK 3 FREE R VALUE TEST SET SELECTION : RANDOM REMARK 3 R VALUE (WORKING + TEST SET) : REMARK 3 R VALUE (WORKING SET) : REMARK 3 FREE R VALUE : REMARK 3 FREE R VALUE TEST SET SIZE (%) : NULL REMARK 3 FREE R VALUE TEST SET COUNT : Example PDB data-file, cont.

Plotting Free R Values against Resolution figFREER.eps

DBREF 1LQT A GB AAK DBREF 1LQT B GB AAK DBREF 1AFI 1 72 SWS P04129 MERP_SHIFL DBREF 1M7T A 1 66 SWS P10599 THIO_HUMAN 0 65 DBREF 1M7T A SWS P00274 THIO_ECOLI Database cross references

REMARK 210 REMARK 210 BEST REPRESENTATIVE CONFORMER IN THIS ENSEMBLE : 21 REMARK 210 Coordinates section

ATOM 1 N ARG A N ANISOU 1 N ARG A N ATOM 2 CA ARG A C ANISOU 2 CA ARG A C ATOM 3 C ARG A C ANISOU 3 C ARG A C ATOM 4 O ARG A O. TER 7215 GLY A 456 ATOM 7216 N ARG B N ANISOU 7216 N ARG B N ATOM 7217 CA ARG B C ANISOU 7217 CA ARG B C ATOM 7218 C ARG B C ANISOU 7218 C ARG B C ATOM 7219 O ARG B O Data section

TER GLY B 456 HETATM14290 C ACT C ANISOU14290 C ACT C. CONECT CONECT CONECT TER. CONECT MASTER END Data section, cont.

MODEL 1 ATOM 1 N MET A N ATOM 2 CA MET A C ATOM 3 C MET A C ATOM 4 O MET A O ATOM 5 CB MET A C ATOM 6 CG MET A C ATOM 7 SD MET A S ATOM 8 CE MET A C ATOM 9 1H MET A H ATOM 10 2H MET A H ATOM 11 3H MET A H ATOM 12 HA MET A H ATOM 13 1HB MET A H ATOM 14 2HB MET A H ATOM 15 1HG MET A H ATOM 16 2HG MET A H ATOM 17 1HE MET A H ATOM 18 2HE MET A H ATOM 19 3HE MET A H ATOM 20 N VAL A N Data section, cont.

TER 1659 VAL A 107 ENDMDL MODEL 2 ATOM 1 N MET A N ATOM 2 CA MET A C. TER 1660 VAL A 107 ENDMDL Data section, cont.

my ( $X, $Y, $Z ) = ( substr( $_, 30, 8 ), substr( $_, 38, 8 ), substr( $_, 46, 8 ) ); Extracting 3D co-ordinate data

#! /usr/bin/perl -w # simple_coord_extract - Demonstrates the extraction of # C-Alpha co-ordinates from a PDB # data-file. use strict; while ( <> ) { if ( /^ATOM/ && substr( $_, 13, 4 ) eq "CA " ) { my ( $X, $Y, $Z ) = ( substr( $_, 30, 8 ), substr( $_, 38, 8 ), substr( $_, 46, 8 ) ); $X =~ s/ //g; $Y =~ s/ //g; $Z =~ s/ //g; print "X, Y & Z: $X, $Y, $Z\n"; } The simple_coord_extract program

X, Y & Z: , , X, Y & Z: , , X, Y & Z: , , X, Y & Z: , , X, Y & Z: , , X, Y & Z: , , X, Y & Z: , , X, Y & Z: 6.507, , Results from simple_coord_extract...

The graphic image contact map figCONTACTMAP.eps

STRIDE: Secondary Structure Assignment

Maxim 10.2 It is often easier and desirable to regenerate database annotation than trawl through entries reconstituting the annotation using custom code.

$ tar -zxvf stride.tar.gz $ cd stride $ make $./stride Installation of STRIDE

Assigning Secondary Structures

Simplified definition of a Hydrogen Bond figSIMPLIFIED.eps

Example of Secondary Structure Elements in Proteins figSSDEMO.eps

Definition of Dihedral angles in the backbone of protein structures figPSIPSI.eps

$./stride You must specify input file Action: secondary structure assignment Usage: stride [Options] InputFile [ > file ] Options: -f File Output file -mFile MolScript file -o Report secondary structure summary Only -h Report Hydrogen bonds -rId1Id2.. Read only chains Id1, Id2... -cId1Id2.. Process only Chains Id1, Id2... -q[File] Generate SeQuence file in FASTA format and die Options are position and case insensitive $ stride -cA 1lqt.pdb Using STRIDE and parsing the output

$ gawk '/^ASG/ {print $8 " " $9}' 1lqt.A.stride $ gawk '(/^ASG/ && /Strand/) {print $8 " " $9}' 1lqt.A.stride $ gawk '(/^ASG/ && /AlphaHelix/) {print $8 " " $9}' 1lqt.A.stride Using gawk...

Ramachandran Plot of dihedral angles of chain A from 1LQT fig1LQTPHIPSI.eps

$ stride -q 1lqt.pdb >1lqt.pdb A RPYYIAIVGSGPSAFFAAASLLKAADTTEDLDMAVDMLEMLPTPWGLVRSGVAPDHPKI K. >1lqt.pdb B RPYYIAIVGSGPSAFFAAASLLKAADTTEDLDMAVDMLEMLPTPWGLVRSGVAPDHPKI K. $ stride -cA -q 1lqt.pdb >1lqt.pdb A RPYYIAIVGSGPSAFFAAASLLKAADTTEDLDMAVDMLEMLPTPWGLVRSGVAPDHPKI K. Extracting amino acid sequences using STRIDE

Introducing The mmCIF Protein Format

Converting mmCIF ● Converting mmCIF to PDB ● Converting mmCIFs to PDB with CIFTr

$ cd $ tar -zxvf ciftr-v2.0-linux.tar.gz $ cd ciftr-v2.0-linux/ $ setenv RCSBROOT ~/ciftr-v2.0-linux $ export RCSBROOT = ~/ciftr-v2.0-linux $./CIFTr -i 1lqt.cif The CIFTr program

More on mmCIF ● Problems with the CIFTr conversion ● Some advice on using mmCIF ● Automated conversion of mmCIF to PDB

Where To From Here