10/19/2015BCHB524 - 2015 Protein Structure Informatics using Bio.PDB BCHB524 2015 Lecture 12 By Edwards & Li.

Slides:



Advertisements
Similar presentations
Determination of Protein Structure. Methods for Determining Structures X-ray crystallography – uses an X-ray diffraction pattern and electron density.
Advertisements

Proteins Review. Learning outcomes (e) Describe the structure of an amino acid and the formation and breakage of a peptide bond. (f) Explain the meaning.
Structure Visualization UCSF Chimera José R. Valverde CNB/CSIC © José R. Valverde, 2014 CC-BY-NC-SA.
Protein structure and folding Some facts and fundamental conepts Cherri Hsu Institute of Chemistry B307
10/1/2014BCHB Edwards Python Modules and Basic File Parsing BCHB Lecture 10.
RNA and Protein Synthesis
10/6/2014BCHB Edwards Sequence File Parsing using Biopython BCHB Lecture 11.
Slide 1 Powerpoint Assignment for Micbio 565 Eric Martz Yeast Gal4 transcriptional regulator, 1d66. X-ray crystallography, resolution 2.7 Angstroms. 2.
Protein Primer. Outline n Protein representations n Structure of Proteins Structure of Proteins –Primary: amino acid sequence –Secondary:  -helices &
FIGURE 9.1.  -amino acids and the peptide bond..
1. Primary Structure: Polypeptide chain Polypeptide chain Amino acid monomers Peptide linkages Figure 3.6 The Four Levels of Protein Structure.
Slide 1 Powerpoint Assignment for Micbio 565 Eric Martz Yeast Gal4 transcriptional regulator, 1d66. X-ray crystallography, resolution 2.7 Angstroms. 2.
Computing for Bioinformatics Lecture 8: protein folding.
10/8/2014BCHB Edwards Protein Structure Informatics using Bio.PDB BCHB Lecture 12.
Nucleic acids: Information Molecules
2.7 DNA Replication, transcription and translation
RNA Transcription.
FROM DNA TO PROTEINS CHAPTER 7 AND PAGES Molecular Genetics.
Doris Lee Even Zheng Joanna Tang Kiki Jang Rachel Zhang Vincent Ma.
Computational Structure Prediction Kevin Drew BCH364C/391L Systems Biology/Bioinformatics 2/12/15.
Protein domains. Protein domains are structural units (average 160 aa) that share: Function Folding Evolution Proteins normally are multidomain (average.
BioPython Workshop Gershon Celniker Tel Aviv University.
SMART Teams: Students Modeling A Research Topic Jmol Training 101!
 Four levels of protein structure  Linear  Sub-Structure  3D Structure  Complex Structure.
© Wiley Publishing All Rights Reserved. Protein 3D Structures.
EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI.
PROTEIN SYNTHESIS THE FORMATION OF PROTEINS USING THE INFORMATION CODED IN DNA WITHIN THE NUCLEUS AND CARRIED OUT BY RNA IN THE CYTOPLASM.
09/06/12 CSCE 769 Amino Acids, Polypeptides and Proteins Homayoun Valafar Department of Computer Science and Engineering, USC.
Mrs. Einstein Research in Molecular Biology. Importance of proteins for cell function: Proteins are the end product of the central dogma YOU are your.
10/20/2014BCHB Edwards Advanced Python Concepts: Modules BCHB Lecture 14.
Python + PyMOL Arbitrary Python code possible within python scripts PyMOL functionality available by importing PyMOL's modules cmd, cgo, stored etc. Call.
Module 3 Protein Structure Database/Structure Analysis Learning objectives Understand how information is stored in PDB Learn how to read a PDB flat file.
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
9/28/2015BCHB Edwards Basic Python Review BCHB Lecture 8.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
Bioinformatics Project BB201 Metabolism A.Nasser
Using Local Tools: BLAST
Protein Structure and Bioinformatics. Chapter 2 What is protein structure? What are proteins made of? What forces determines protein structure? What is.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
PROTEIN STRUCTURE (Donaldson, March 10,2003) What are we trying to learn about genes and their proteins: Predict function for unknown protein by comparison.
Introduction to Molecular Biology and Genomics BMI/CS 776 Mark Craven January 2002.
Lecture 10 CS566 Fall Structural Bioinformatics Motivation Concepts Structure Solving Structure Comparison Structure Prediction Modeling Structural.
AP Biology Proteins AP Biology Proteins Multipurpose molecules.
Marlou Snelleman 2012 Protein structure. Overview Sequence to structure Hydrogen bonds Helices Sheets Turns Hydrophobicity Helices Sheets Structure and.
Cells Lecture IV DNA and Protein Synthesis. Biology Standards Covered 1d ~ students know the central dogma of molecular biology outlines the flow of information.
Biopython 1. What is Biopython? tools for computational molecular biology to program in python and want to make it as easy as possible to use python for.
Sequence File Parsing using Biopython
Protein Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form in a biologically functional.
Computational Structure Prediction
Using Local Tools: BLAST
Protein Structure Prediction Dr. G.P.S. Raghava Protein Sequence + Structure.
Advanced Python Concepts: Modules
Proteins Primary structure: Amino acids link together to form a linear polypeptide. The primary structure of a protein is a linear chain of amino acids.
Amino Acids and Proteins
Sequence File Parsing using Biopython
Protein Structure Informatics using Bio.PDB
Protein 3D representation
Basic Python Review BCHB524 Lecture 8 BCHB524 - Edwards.
Structure of an LDLR-RAP Complex Reveals a General Mode for Ligand Recognition by Lipoprotein Receptors  Carl Fisher, Natalia Beglova, Stephen C. Blacklow 
Chaperone-Assisted Crystallography with DARPins
Advanced Python Concepts: Modules
Using Local Tools: BLAST
Protein Structure Informatics using Bio.PDB
Basic Python Review BCHB524 Lecture 8 BCHB524 - Edwards.
Protein Structure Informatics using Bio.PDB
Using Local Tools: BLAST
Advanced Python Concepts: Modules
Protein 3D representation
Sequence File Parsing using Biopython
Presentation transcript:

10/19/2015BCHB Protein Structure Informatics using Bio.PDB BCHB Lecture 12 By Edwards & Li

10/19/2015BCHB Outline Review Python modules Biopython Sequence modules Biopython’s Bio.PDB Protein structure primer / PyMOL PDB file parsing PDB data navigation: SMCRA Examples

10/19/2015BCHB Python Modules Review Access the program environment sys, os, os.path Specialized functions math, random Access file-like resources as files: zipfile, gzip, urllib Make specialized formats into “lists” and “dictionaries” csv (, XML, …)

10/19/2015BCHB BioPython Sequence Modules Provide “sequence” abstraction More powerful than a python string Knows its alphabet! Basic tasks already available Easy parsing of (many) downloadable sequence database formats FASTA, Genbank, SwissProt/UniProt, etc… Simplify access to large collections of sequence Access by iteration, get sequence and accession Other content available as lists and dictionaries. Little semantic extraction or interpretation

Biopython Bio.SeqIO Access to additional information annotations dictionary features list Information, keys, and keywords vary with database! Semantic content extraction (still) up to you! 10/19/2015BCHB import Bio.SeqIO import sys seqfile = open(sys.argv[1]) for seq_record in Bio.SeqIO.parse(seqfile, "uniprot-xml"): print "\n------NEW SEQRECORD------\n" print "seq_record.annotations\n\t",seq_record.annotations print "seq_record.features\n\t",seq_record.features print "seq_record.dbxrefs\n\t",seq_record.dbxrefs print "seq_record.format('fasta')\n",seq_record.format('fasta') seqfile.close()

10/19/2015BCHB Proteins are… …a linear sequence of amino-acids, after transcription from DNA, and translation from mRNA.

10/19/2015BCHB Proteins are… …3-D molecules that interact with other (biological) molecules to carry out biological functions… DNA Polymerase Hemoglobin

Proteins are… g PrimarySecondaryTertiaryQuaternary Introduction to Protein Structure, by Carl Branden and John Tooze

Proteins are Composed of 20 Amino Acids

Representations of Protein 3D Structures Ball-StickRibbonSurface Ball-Stick: Atom- Bond Helix, Sheet and Random Coil Solvent Accessible Surface

10/19/2015BCHB Protein Data Bank (PDB) Repository of the 3-D conformation(s) / structure of proteins. The result of laborious and expensive experiments using X-ray crystallography and/or nuclear magnetic resonance (NMR). (x,y,z) position of every atom of every amino-acid Some entries contain multi-protein complexes, small-molecule ligands, docked epitopes and antibody-antigen complexes…

10/19/2015BCHB Visualization (PyMOL)

Molecular Visualization Tools PyMol: Chimera: PMV: Coot: CCP4mg: mmLib: VMD: MMTK:

Overall Layout of a Structure Object A structure consists of models A model consists of chains A chain consists of residues A residue consists of atoms

10/19/2015BCHB Biopython Bio.PDB Parser for PDB format files Navigate structure and answer atom-atom distance/angle questions. Structure (PDB File) >> Model >> Chain >> Residue >> Atom >> (x,y,z) coordinates SMCRA representation mirrors PDB format

10/19/2015BCHB SMCRA Data-Model Each PDB file represents one “structure” Each structure may contain many models In most cases there is only one model, index 0. Each polypeptide (amino-acid sequence) is a “chain”. A single-protein structure has one chain, “A” 1HPV is a dimer and has chains “A” and “B”.

10/19/2015BCHB SMCRA Data-Model #eg1.py import Bio.PDB.PDBParser import sys # Use QUIET=True to avoid lots of warnings... parser = Bio.PDB.PDBParser(QUIET=True) structure = parser.get_structure("1HPV", "1HPV.pdb") model = structure[0] # This structure is a dimer with two chains achain = model['A'] bchain = model['B']

10/19/2015BCHB SMCRA Chains are composed of amino-acid residues Access by iteration, or by index Residue “index” may not be sequence position Residues are composed of atoms: Access by iteration or by atom name …except for H! Water molecules are also represented as atoms – HOH residue name, het=“W”

10/19/2015BCHB SMCRA Data-Model #eg2.py import Bio.PDB.PDBParser import sys # Use QUIET=True to avoid lots of warnings... parser = Bio.PDB.PDBParser(QUIET=True) structure = parser.get_structure("1HPV", "1HPV.pdb") model = structure[0] for chain in model: for residue in chain: for atom in residue: print chain, residue, atom, atom.get_coord()

10/19/2015BCHB Polypeptide molecules S-G-Y-A-L

10/19/2015BCHB SMCRA Atom names

10/19/2015BCHB Check polypeptide backbone #eg3.py import Bio.PDB.PDBParser import sys # Use QUIET=True to avoid lots of warnings... parser = Bio.PDB.PDBParser(QUIET=True) structure = parser.get_structure("1HPV", "1HPV.pdb") model = structure[0] achain = model['A'] for residue in achain: index = residue.get_id()[1] calpha = residue['CA'] carbon = residue['C'] nitrogen = residue['N'] oxygen = residue['O'] print "Residue:",residue.get_resname(),index print "N - Ca",(nitrogen - calpha) print "Ca - C ",(calpha - carbon) print "C - O ",(carbon - oxygen) print break

10/19/2015BCHB Check polypeptide backbone #eg4.py # As before... for residue in achain: index = residue.get_id()[1] calpha = residue['CA'] carbon = residue['C'] nitrogen = residue['N'] oxygen = residue['O'] print "Residue:",residue.get_resname(),index print "N - Ca",(nitrogen - calpha) print "Ca - C ",(calpha - carbon) print "C - O ",(carbon - oxygen) if achain.has_id(index+1): nextresidue = achain[index+1] nextnitrogen = nextresidue['N'] print "C - N ",(carbon - nextnitrogen) print

10/19/2015BCHB Find potential disulfide bonds The sulfur atoms of Cys amino-acids often form “di-sulfide” bonds if they are close enough – less than 8 Å. Compare with PDB file contents: SSBOND Bio.PDB does not provide an easy way to access the SSBOND annotations

10/19/2015BCHB Find potential disulfide bonds #eg5.py import Bio.PDB.PDBParser import sys # Use QUIET=True to avoid lots of warnings... parser = Bio.PDB.PDBParser(QUIET=True) structure = parser.get_structure("1KCW", "1KCW.pdb") model = structure[0] achain = model['A'] cysresidues = [] for residue in achain: if residue.get_resname() == 'CYS': cysresidues.append(residue) for c1 in cysresidues: c1index = c1.get_id()[1] for c2 in cysresidues: c2index = c2.get_id()[1] if (c1['SG'] - c2['SG']) < 8.0: print "possible di-sulfide bond:", print "Cys",c1index,"-", print "Cys",c2index, print round(c1['SG'] - c2['SG'],2)

10/19/2015BCHB Find contact residues in a dimer #eg6.py import Bio.PDB.PDBParser import sys # Use QUIET=True to avoid lots of warnings... parser = Bio.PDB.PDBParser(QUIET=True) structure = parser.get_structure("1HPV","1HPV.pdb") achain = structure[0]['A'] bchain = structure[0]['B'] for res1 in achain: if res1.get_id()[0][0] is 'H' or res1.get_id()[0][0] is 'W': continue r1ca = res1['CA'] r1ind = res1.get_id()[1] r1sym = res1.get_resname() for res2 in bchain: if res2.get_id()[0][0] is 'H' or res2.get_id()[0][0] is 'W': continue r2ca = res2['CA'] r2ind = res2.get_id()[1] r2sym = res2.get_resname() if (r1ca - r2ca) < 6.0: print "Residues",r1sym,r1ind,"in chain A", print "and",r2sym,r2ind,"in chain B", print "are close to each other:",round(r1ca-r2ca,2)

10/19/2015BCHB Find contact residues in a dimer – better version #eg7.py import Bio.PDB.PDBParser import sys # Use QUIET=True to avoid lots of warnings... parser = Bio.PDB.PDBParser(QUIET=True) structure = parser.get_structure("1HPV","1HPV.pdb") achain = structure[0]['A'] bchain = structure[0]['B'] bchainca = [ r['CA'] for r in bchain ] neighbors = Bio.PDB.NeighborSearch(bchainca) for res1 in achain: r1ca = res1['CA'] r1ind = res1.get_id()[1] r1sym = res1.get_resname() for r2ca in neighbors.search(r1ca.get_coord(), 6.0): res2 = r2ca.get_parent() r2ind = res2.get_id()[1] r2sym = res2.get_resname() print "Residues",r1sym,r1ind,"in chain A", print "and",r2sym,r2ind,"in chain B", print "are close to each other:",round(r1ca-r2ca,2)

10/19/2015BCHB Superimpose two structures import Bio.PDB import Bio.PDB.PDBParser import sys # Use QUIET=True to avoid lots of warnings... parser = Bio.PDB.PDBParser(QUIET=True) structure1 = parser.get_structure("2WFJ","2WFJ.pdb") structure2 = parser.get_structure("2GW2","2GW2a.pdb") ppb=Bio.PDB.PPBuilder() # Manually figure out how the query and subject peptides correspond... # query has an extra residue at the front # subject has two extra residues at the back query = ppb.build_peptides(structure1)[0][1:] target = ppb.build_peptides(structure2)[0][:-2] query_atoms = [ r['CA'] for r in query ] target_atoms = [ r['CA'] for r in target ] superimposer = Bio.PDB.Superimposer() superimposer.set_atoms(query_atoms, target_atoms) print "Query and subject superimposed, RMS:", superimposer.rms superimposer.apply(structure2.get_atoms()) # Write modified structures to one file outfile=open("2GW2-modified.pdb", "w") io=Bio.PDB.PDBIO() io.set_structure(structure2) io.save(outfile) outfile.close()

10/19/2015BCHB Superimpose two chains import Bio.PDB parser = Bio.PDB.PDBParser(QUIET=1) structure = parser.get_structure("1HPV","1HPV.pdb") model = structure[0] ppb=Bio.PDB.PPBuilder() # Get the polypeptide chains achain,bchain = ppb.build_peptides(model) aatoms = [ r['CA'] for r in achain ] batoms = [ r['CA'] for r in bchain ] superimposer = Bio.PDB.Superimposer() superimposer.set_atoms(aatoms, batoms) print "Query and subject superimposed, RMS:", superimposer.rms superimposer.apply(model['B'].get_atoms()) # Write structure to file outfile=open("1HPV-modified.pdb", "w") io=Bio.PDB.PDBIO() io.set_structure(structure) io.save(outfile) outfile.close()

10/19/2015BCHB Exercises Read through and try the examples from Chapter 10 of the Biopython Tutorial and the Bio.PDB FAQ. Write a program that analyzes a PDB file (filename provided on the command-line!) to find pairs of lysine residues that might be linked if the BS3 cross-linker is used. The rigid BS3 cross-linker is approximately 11 Å long. Write two versions, one that computes the distance between all pairs of lysine residues, and one that uses the NeighborSearch technique.