Download presentation
Presentation is loading. Please wait.
Published byCecil Garrett Modified over 8 years ago
1
10/19/2015BCHB524 - 2015 Protein Structure Informatics using Bio.PDB BCHB524 2015 Lecture 12 By Edwards & Li
2
10/19/2015BCHB524 - 2015 Outline Review Python modules Biopython Sequence modules Biopython’s Bio.PDB Protein structure primer / PyMOL PDB file parsing PDB data navigation: SMCRA Examples
3
10/19/2015BCHB524 - 2015 Python Modules Review Access the program environment sys, os, os.path Specialized functions math, random Access file-like resources as files: zipfile, gzip, urllib Make specialized formats into “lists” and “dictionaries” csv (, XML, …)
4
10/19/2015BCHB524 - 2015 BioPython Sequence Modules Provide “sequence” abstraction More powerful than a python string Knows its alphabet! Basic tasks already available Easy parsing of (many) downloadable sequence database formats FASTA, Genbank, SwissProt/UniProt, etc… Simplify access to large collections of sequence Access by iteration, get sequence and accession Other content available as lists and dictionaries. Little semantic extraction or interpretation
5
Biopython Bio.SeqIO Access to additional information annotations dictionary features list Information, keys, and keywords vary with database! Semantic content extraction (still) up to you! 10/19/2015BCHB524 - 2015 import Bio.SeqIO import sys seqfile = open(sys.argv[1]) for seq_record in Bio.SeqIO.parse(seqfile, "uniprot-xml"): print "\n------NEW SEQRECORD------\n" print "seq_record.annotations\n\t",seq_record.annotations print "seq_record.features\n\t",seq_record.features print "seq_record.dbxrefs\n\t",seq_record.dbxrefs print "seq_record.format('fasta')\n",seq_record.format('fasta') seqfile.close()
6
10/19/2015BCHB524 - 2015 Proteins are… …a linear sequence of amino-acids, after transcription from DNA, and translation from mRNA.
7
10/19/2015BCHB524 - 2015 Proteins are… …3-D molecules that interact with other (biological) molecules to carry out biological functions… DNA Polymerase Hemoglobin
8
Proteins are… www.uniprot.or g www.rcsb.org PrimarySecondaryTertiaryQuaternary Introduction to Protein Structure, by Carl Branden and John Tooze
9
Proteins are Composed of 20 Amino Acids
10
Representations of Protein 3D Structures Ball-StickRibbonSurface Ball-Stick: Atom- Bond Helix, Sheet and Random Coil Solvent Accessible Surface
11
10/19/2015BCHB524 - 2015 Protein Data Bank (PDB) Repository of the 3-D conformation(s) / structure of proteins. The result of laborious and expensive experiments using X-ray crystallography and/or nuclear magnetic resonance (NMR). (x,y,z) position of every atom of every amino-acid Some entries contain multi-protein complexes, small-molecule ligands, docked epitopes and antibody-antigen complexes…
12
10/19/2015BCHB524 - 2015 Visualization (PyMOL)
13
Molecular Visualization Tools PyMol: http://pymol.sourceforge.net/http://pymol.sourceforge.net/ Chimera: http://www.cgl.ucsf.edu/chimera/http://www.cgl.ucsf.edu/chimera/ PMV: http://www.scripps.edu/~sanner/python/http://www.scripps.edu/~sanner/python/ Coot: http://www.ysbl.york.ac.uk/~emsley/coot/http://www.ysbl.york.ac.uk/~emsley/coot/ CCP4mg: http://www.ysbl.york.ac.uk/~lizp/molgraphics.htmlhttp://www.ysbl.york.ac.uk/~lizp/molgraphics.html mmLib: http://pymmlib.sourceforge.net/http://pymmlib.sourceforge.net/ VMD: http://www.ks.uiuc.edu/Research/vmd/http://www.ks.uiuc.edu/Research/vmd/ MMTK: http://starship.python.net/crew/hinsen/MMTK/http://starship.python.net/crew/hinsen/MMTK/ http://biopython.org/
14
Overall Layout of a Structure Object A structure consists of models A model consists of chains A chain consists of residues A residue consists of atoms http://biopython.org/
15
10/19/2015BCHB524 - 2015 Biopython Bio.PDB Parser for PDB format files Navigate structure and answer atom-atom distance/angle questions. Structure (PDB File) >> Model >> Chain >> Residue >> Atom >> (x,y,z) coordinates SMCRA representation mirrors PDB format
16
10/19/2015BCHB524 - 2015 SMCRA Data-Model Each PDB file represents one “structure” Each structure may contain many models In most cases there is only one model, index 0. Each polypeptide (amino-acid sequence) is a “chain”. A single-protein structure has one chain, “A” 1HPV is a dimer and has chains “A” and “B”.
17
10/19/2015BCHB524 - 2015 SMCRA Data-Model #eg1.py import Bio.PDB.PDBParser import sys # Use QUIET=True to avoid lots of warnings... parser = Bio.PDB.PDBParser(QUIET=True) structure = parser.get_structure("1HPV", "1HPV.pdb") model = structure[0] # This structure is a dimer with two chains achain = model['A'] bchain = model['B']
18
10/19/2015BCHB524 - 2015 SMCRA Chains are composed of amino-acid residues Access by iteration, or by index Residue “index” may not be sequence position Residues are composed of atoms: Access by iteration or by atom name …except for H! Water molecules are also represented as atoms – HOH residue name, het=“W”
19
10/19/2015BCHB524 - 2015 SMCRA Data-Model #eg2.py import Bio.PDB.PDBParser import sys # Use QUIET=True to avoid lots of warnings... parser = Bio.PDB.PDBParser(QUIET=True) structure = parser.get_structure("1HPV", "1HPV.pdb") model = structure[0] for chain in model: for residue in chain: for atom in residue: print chain, residue, atom, atom.get_coord()
20
10/19/2015BCHB524 - 2015 Polypeptide molecules S-G-Y-A-L
21
10/19/2015BCHB524 - 2015 SMCRA Atom names
22
10/19/2015BCHB524 - 2015 Check polypeptide backbone #eg3.py import Bio.PDB.PDBParser import sys # Use QUIET=True to avoid lots of warnings... parser = Bio.PDB.PDBParser(QUIET=True) structure = parser.get_structure("1HPV", "1HPV.pdb") model = structure[0] achain = model['A'] for residue in achain: index = residue.get_id()[1] calpha = residue['CA'] carbon = residue['C'] nitrogen = residue['N'] oxygen = residue['O'] print "Residue:",residue.get_resname(),index print "N - Ca",(nitrogen - calpha) print "Ca - C ",(calpha - carbon) print "C - O ",(carbon - oxygen) print break
23
10/19/2015BCHB524 - 2015 Check polypeptide backbone #eg4.py # As before... for residue in achain: index = residue.get_id()[1] calpha = residue['CA'] carbon = residue['C'] nitrogen = residue['N'] oxygen = residue['O'] print "Residue:",residue.get_resname(),index print "N - Ca",(nitrogen - calpha) print "Ca - C ",(calpha - carbon) print "C - O ",(carbon - oxygen) if achain.has_id(index+1): nextresidue = achain[index+1] nextnitrogen = nextresidue['N'] print "C - N ",(carbon - nextnitrogen) print
24
10/19/2015BCHB524 - 2015 Find potential disulfide bonds The sulfur atoms of Cys amino-acids often form “di-sulfide” bonds if they are close enough – less than 8 Å. Compare with PDB file contents: SSBOND Bio.PDB does not provide an easy way to access the SSBOND annotations
25
10/19/2015BCHB524 - 2015 Find potential disulfide bonds #eg5.py import Bio.PDB.PDBParser import sys # Use QUIET=True to avoid lots of warnings... parser = Bio.PDB.PDBParser(QUIET=True) structure = parser.get_structure("1KCW", "1KCW.pdb") model = structure[0] achain = model['A'] cysresidues = [] for residue in achain: if residue.get_resname() == 'CYS': cysresidues.append(residue) for c1 in cysresidues: c1index = c1.get_id()[1] for c2 in cysresidues: c2index = c2.get_id()[1] if (c1['SG'] - c2['SG']) < 8.0: print "possible di-sulfide bond:", print "Cys",c1index,"-", print "Cys",c2index, print round(c1['SG'] - c2['SG'],2)
26
10/19/2015BCHB524 - 2015 Find contact residues in a dimer #eg6.py import Bio.PDB.PDBParser import sys # Use QUIET=True to avoid lots of warnings... parser = Bio.PDB.PDBParser(QUIET=True) structure = parser.get_structure("1HPV","1HPV.pdb") achain = structure[0]['A'] bchain = structure[0]['B'] for res1 in achain: if res1.get_id()[0][0] is 'H' or res1.get_id()[0][0] is 'W': continue r1ca = res1['CA'] r1ind = res1.get_id()[1] r1sym = res1.get_resname() for res2 in bchain: if res2.get_id()[0][0] is 'H' or res2.get_id()[0][0] is 'W': continue r2ca = res2['CA'] r2ind = res2.get_id()[1] r2sym = res2.get_resname() if (r1ca - r2ca) < 6.0: print "Residues",r1sym,r1ind,"in chain A", print "and",r2sym,r2ind,"in chain B", print "are close to each other:",round(r1ca-r2ca,2)
27
10/19/2015BCHB524 - 2015 Find contact residues in a dimer – better version #eg7.py import Bio.PDB.PDBParser import sys # Use QUIET=True to avoid lots of warnings... parser = Bio.PDB.PDBParser(QUIET=True) structure = parser.get_structure("1HPV","1HPV.pdb") achain = structure[0]['A'] bchain = structure[0]['B'] bchainca = [ r['CA'] for r in bchain ] neighbors = Bio.PDB.NeighborSearch(bchainca) for res1 in achain: r1ca = res1['CA'] r1ind = res1.get_id()[1] r1sym = res1.get_resname() for r2ca in neighbors.search(r1ca.get_coord(), 6.0): res2 = r2ca.get_parent() r2ind = res2.get_id()[1] r2sym = res2.get_resname() print "Residues",r1sym,r1ind,"in chain A", print "and",r2sym,r2ind,"in chain B", print "are close to each other:",round(r1ca-r2ca,2)
28
10/19/2015BCHB524 - 2015 Superimpose two structures import Bio.PDB import Bio.PDB.PDBParser import sys # Use QUIET=True to avoid lots of warnings... parser = Bio.PDB.PDBParser(QUIET=True) structure1 = parser.get_structure("2WFJ","2WFJ.pdb") structure2 = parser.get_structure("2GW2","2GW2a.pdb") ppb=Bio.PDB.PPBuilder() # Manually figure out how the query and subject peptides correspond... # query has an extra residue at the front # subject has two extra residues at the back query = ppb.build_peptides(structure1)[0][1:] target = ppb.build_peptides(structure2)[0][:-2] query_atoms = [ r['CA'] for r in query ] target_atoms = [ r['CA'] for r in target ] superimposer = Bio.PDB.Superimposer() superimposer.set_atoms(query_atoms, target_atoms) print "Query and subject superimposed, RMS:", superimposer.rms superimposer.apply(structure2.get_atoms()) # Write modified structures to one file outfile=open("2GW2-modified.pdb", "w") io=Bio.PDB.PDBIO() io.set_structure(structure2) io.save(outfile) outfile.close()
29
10/19/2015BCHB524 - 2015 Superimpose two chains import Bio.PDB parser = Bio.PDB.PDBParser(QUIET=1) structure = parser.get_structure("1HPV","1HPV.pdb") model = structure[0] ppb=Bio.PDB.PPBuilder() # Get the polypeptide chains achain,bchain = ppb.build_peptides(model) aatoms = [ r['CA'] for r in achain ] batoms = [ r['CA'] for r in bchain ] superimposer = Bio.PDB.Superimposer() superimposer.set_atoms(aatoms, batoms) print "Query and subject superimposed, RMS:", superimposer.rms superimposer.apply(model['B'].get_atoms()) # Write structure to file outfile=open("1HPV-modified.pdb", "w") io=Bio.PDB.PDBIO() io.set_structure(structure) io.save(outfile) outfile.close()
30
10/19/2015BCHB524 - 2015 Exercises Read through and try the examples from Chapter 10 of the Biopython Tutorial and the Bio.PDB FAQ. Write a program that analyzes a PDB file (filename provided on the command-line!) to find pairs of lysine residues that might be linked if the BS3 cross-linker is used. The rigid BS3 cross-linker is approximately 11 Å long. Write two versions, one that computes the distance between all pairs of lysine residues, and one that uses the NeighborSearch technique.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.