Protein Structure Informatics using Bio.PDB

Slides:



Advertisements
Similar presentations
The Structure of Proteins
Advertisements

Determination of Protein Structure. Methods for Determining Structures X-ray crystallography – uses an X-ray diffraction pattern and electron density.
Tutorial Homology Modelling. A Brief Introduction to Homology Modeling.
Slide 1 Powerpoint Assignment for Micbio 565 Eric Martz Yeast Gal4 transcriptional regulator, 1d66. X-ray crystallography, resolution 2.7 Angstroms. 2.
FIGURE 9.1.  -amino acids and the peptide bond..
Tertiary protein structure modelling May 31, 2005 Graded papers will handed back Thursday Quiz#4 today Learning objectives- Continue to learn how to manipulate.
1. Primary Structure: Polypeptide chain Polypeptide chain Amino acid monomers Peptide linkages Figure 3.6 The Four Levels of Protein Structure.
Slide 1 Powerpoint Assignment for Micbio 565 Eric Martz Yeast Gal4 transcriptional regulator, 1d66. X-ray crystallography, resolution 2.7 Angstroms. 2.
Computing for Bioinformatics Lecture 8: protein folding.
Protein Basics Protein function Protein structure –Primary Amino acids Linkage Protein conformation framework –Dihedral angles –Ramachandran plots Sequence.
10/8/2014BCHB Edwards Protein Structure Informatics using Bio.PDB BCHB Lecture 12.
Nucleic acids: Information Molecules
Comparing protein structure and sequence similarities Sumi Singh Sp 2015.
Doris Lee Even Zheng Joanna Tang Kiki Jang Rachel Zhang Vincent Ma.
Biomolecules: Peptides and Proteins Lecture 5, Medical Biochemistry.
Biomolecules: Nucleic Acids and Proteins
ProteiN proteiN – “N” stands for nitrogen. There is an “N” in the word proteiN The element Nitrogen is always present in proteiNs.
Answer all questions fully in your exercise books 1)What causes the colour change seen in the reducing sugars test? 2)Why is vitamin K2 important and how.
SMART Teams: Students Modeling A Research Topic Jmol Training 101!
 Four levels of protein structure  Linear  Sub-Structure  3D Structure  Complex Structure.
AP Biology Proteins. AP Biology Proteins  Most structurally & functionally diverse group of biomolecules  Function:  involved in almost everything.
EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI.
PROTEIN SYNTHESIS THE FORMATION OF PROTEINS USING THE INFORMATION CODED IN DNA WITHIN THE NUCLEUS AND CARRIED OUT BY RNA IN THE CYTOPLASM.
BIOCHEMISTRY pp CARBON COMPOUNDS CARBON BONDING Has 4 electrons in the outer level so it can bond 4 times Has 4 electrons in the outer level so.
AP Biology Proteins AP Biology Proteins Multipurpose molecules.
AP Biology Proteins. AP Biology Proteins  Most structurally & functionally diverse group of biomolecules  Function:  involved in.
Module 3 Protein Structure Database/Structure Analysis Learning objectives Understand how information is stored in PDB Learn how to read a PDB flat file.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
AP Biology Proteins AP Biology Proteins Multipurpose molecules.
10/19/2015BCHB Protein Structure Informatics using Bio.PDB BCHB Lecture 12 By Edwards & Li.
CARBON COMPOUNDS CHAPTER 2, SECTION 3. CARBON is the principle element in the large molecules that organisms make and use ORGANIC compounds contain carbon.
AP Biology Organic Chemistry: Proteins AP Biology Proteins Multipurpose molecules.
Protein Structure and Bioinformatics. Chapter 2 What is protein structure? What are proteins made of? What forces determines protein structure? What is.
Structural alignment methods Like in sequence alignment, try to find best correspondence: –Look at atoms –A 3-dimensional problem –No a priori knowledge.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
PROTEINS Characteristics of Proteins Contain carbon, hydrogen, oxygen, nitrogen, and sulfur Serve as structural components of animals Serve as control.
AP Biology Proteins AP Biology Proteins Multipurpose molecules.
AP Biology Proteins AP Biology Proteins Multipurpose molecules.
Marlou Snelleman 2012 Protein structure. Overview Sequence to structure Hydrogen bonds Helices Sheets Turns Hydrophobicity Helices Sheets Structure and.
단백질의 다양성 ( 그림 5.1) 5.1 아미노산 - 아미노산 이름 및 약어 ( 표 5.1), 표준아미노산 ( 그림 5.2), - 일반구조 ( 그림 5.3): α- 탄소원자, 곁사슬, 카르복실기, 아미노기 - 프로린은 고리모양 ( 곁사슬과 아미노질소사이 ) -pH7 에서.
Sequence: PFAM Used example: Database of protein domain families. It is based on manually curated alignments.
Sequence File Parsing using Biopython
Functional Variety of Proteins
Computational Structure Prediction
Proteins & Enzymes.
Proteins Primary structure: Amino acids link together to form a linear polypeptide. The primary structure of a protein is a linear chain of amino acids.
Amino Acids and Proteins
Chemistry of Living Things
Proteins.
Molecular Docking Profacgen. The interactions between proteins and other molecules play important roles in various biological processes, including gene.
Sequence File Parsing using Biopython
Section 1 Powerpoint Assignment for Micbio 565, 2012
Protein Structure Informatics using Bio.PDB
Protein 3D representation
There are four levels of structure in proteins
Haixu Tang School of Inforamtics
Biological Molecules -Biological molecules consist primarily of carbon, oxygen, hydrogen, and nitrogen. -These elements share valence electrons to form.
Section 1: Identity Powerpoint Assignment for Micbio 565, 2014
Figure: 22.1 Title: Table Examples of the many functions of proteins in biological systems. Caption: The functions of proteins are described.
Chaperone-Assisted Crystallography with DARPins
Proteins and Enzymes 2:3.
Hydration and DNA Recognition by Homeodomains
PROTEINS.
Protein Structure Informatics using Bio.PDB
Protein 3D representation
Volume 14, Issue 5, Pages (May 2006)
A Drug-Drug Interaction Crystallizes a New Entry Point into the UPR
Sequence File Parsing using Biopython
Proteins and Enzymes 2:3.
Peter König, Rafael Giraldo, Lynda Chapman, Daniela Rhodes  Cell 
Presentation transcript:

Protein Structure Informatics using Bio.PDB BCHB524 Lecture 13 BCHB524 - Edwards

Proteins are… …3-D molecules that interact with other (biological) molecules to carry out biological functions… DNA Polymerase Hemoglobin BCHB524 - Edwards

Protein Data Bank (PDB) Repository of the 3-D conformation(s) / structure of proteins. The result of laborious and expensive experiments using X-ray crystallography and/or nuclear magnetic resonance (NMR). (x,y,z) position of every atom of every amino-acid Some entries contain multi-protein complexes, small-molecule ligands, docked epitopes and antibody-antigen complexes… BCHB524 - Edwards

Visualization (PyMOL) BCHB524 - Edwards

Biopython Bio.PDB Parser for PDB format files Navigate structure and answer atom-atom distance/angle questions. Structure (PDB File) >> Model >> Chain >> Residue >> Atom >> (x,y,z) coordinates SMCRA representation mirrors PDB format BCHB524 - Edwards

SMCRA Data-Model Each PDB file represents one “structure” Each structure may contain many models In most cases there is only one model, index 0. Each polypeptide (amino-acid sequence) is a “chain”. A single-protein structure has one chain, “A” 1HPV is a dimer and has chains “A” and “B”. BCHB524 - Edwards

SMCRA Data-Model import Bio.PDB.PDBParser import sys # Use QUIET=True to avoid lots of warnings... parser = Bio.PDB.PDBParser(QUIET=True) structure = parser.get_structure("1HPV", "1HPV.pdb") model = structure[0] # This structure is a dimer with two chains achain = model['A'] bchain = model['B'] BCHB524 - Edwards

SMCRA Chains are composed of amino-acid residues Access by iteration, or by index Residue “index” may not be sequence position Residues are composed of atoms: Access by iteration or by atom name …except for H! Water molecules are also represented as atoms – HOH residue name, het=“W” BCHB524 - Edwards

SMCRA Data-Model import Bio.PDB.PDBParser import sys # Use QUIET=True to avoid lots of warnings... parser = Bio.PDB.PDBParser(QUIET=True) structure = parser.get_structure("1HPV", "1HPV.pdb") model = structure[0] for chain in model:   for residue in chain:     for atom in residue:       print chain, residue, atom, atom.get_coord() BCHB524 - Edwards

Polypeptide molecules S-G-Y-A-L BCHB524 - Edwards

SMCRA Atom names BCHB524 - Edwards

Check polypeptide backbone import Bio.PDB.PDBParser import sys # Use QUIET=True to avoid lots of warnings... parser = Bio.PDB.PDBParser(QUIET=True) structure = parser.get_structure("1HPV", "1HPV.pdb") model = structure[0] achain = model['A'] for residue in achain:     index = residue.get_id()[1]     calpha = residue['CA']     carbon = residue['C']     nitrogen = residue['N']     oxygen = residue['O']     print "Residue:",residue.get_resname(),index     print "N  - Ca",(nitrogen - calpha)     print "Ca - C ",(calpha - carbon)     print "C  - O ",(carbon - oxygen)     print BCHB524 - Edwards

Check polypeptide backbone # As before... for residue in achain:     index = residue.get_id()[1]     calpha = residue['CA']     carbon = residue['C']     nitrogen = residue['N']     oxygen = residue['O']     print "Residue:",residue.get_resname(),index     print "N  - Ca",(nitrogen - calpha)     print "Ca - C ",(calpha - carbon)     print "C  - O ",(carbon - oxygen)     if achain.has_id(index+1):         nextresidue = achain[index+1]         nextnitrogen = nextresidue['N']         print "C  - N ",(carbon - nextnitrogen)     print BCHB524 - Edwards

Find potential disulfide bonds The sulfur atoms of Cys amino-acids often form “di-sulfide” bonds if they are close enough – less than 8 Å. Compare with PDB file contents: SSBOND Bio.PDB does not provide an easy way to access the SSBOND annotations BCHB524 - Edwards

Find potential disulfide bonds import Bio.PDB.PDBParser import sys # Use QUIET=True to avoid lots of warnings... parser = Bio.PDB.PDBParser(QUIET=True) structure = parser.get_structure("1KCW", "1KCW.pdb") model = structure[0] achain = model['A'] cysresidues = [] for residue in achain:     if residue.get_resname() == 'CYS':         cysresidues.append(residue) for c1 in cysresidues:     c1index = c1.get_id()[1]     for c2 in cysresidues:         c2index = c2.get_id()[1]         if (c1['SG'] - c2['SG']) < 8.0:             print "possible di-sulfide bond:",             print "Cys",c1index,"-",             print "Cys",c2index,             print round(c1['SG'] - c2['SG'],2) BCHB524 - Edwards

Find contact residues in a dimer import Bio.PDB.PDBParser import sys # Use QUIET=True to avoid lots of warnings... parser = Bio.PDB.PDBParser(QUIET=True) structure = parser.get_structure("1HPV","1HPV.pdb") achain = structure[0]['A'] bchain = structure[0]['B'] for res1 in achain:     r1ca = res1['CA']     r1ind = res1.get_id()[1]     r1sym = res1.get_resname()     for res2 in bchain:         r2ca = res2['CA']         r2ind = res2.get_id()[1]         r2sym = res2.get_resname()         if (r1ca - r2ca) < 6.0:             print "Residues",r1sym,r1ind,"in chain A",             print "and",r2sym,r2ind,"in chain B",             print "are close to each other:",round(r1ca-r2ca,2) BCHB524 - Edwards

Find contact residues in a dimer – better version import Bio.PDB.PDBParser import sys # Use QUIET=True to avoid lots of warnings... parser = Bio.PDB.PDBParser(QUIET=True) structure = parser.get_structure("1HPV","1HPV.pdb") achain = structure[0]['A'] bchain = structure[0]['B'] bchainca = [ r['CA'] for r in bchain ] neighbors = Bio.PDB.NeighborSearch(bchainca) for res1 in achain:     r1ca = res1['CA']     r1ind = res1.get_id()[1]     r1sym = res1.get_resname()     for r2ca in neighbors.search(r1ca.get_coord(), 6.0):         res2 = r2ca.get_parent()         r2ind = res2.get_id()[1]         r2sym = res2.get_resname()         print "Residues",r1sym,r1ind,"in chain A",         print "and",r2sym,r2ind,"in chain B",         print "are close to each other:",round(r1ca-r2ca,2) BCHB524 - Edwards

Superimpose two structures import Bio.PDB import Bio.PDB.PDBParser import sys # Use QUIET=True to avoid lots of warnings... parser = Bio.PDB.PDBParser(QUIET=True) structure1 = parser.get_structure("2WFJ","2WFJ.pdb") structure2 = parser.get_structure("2GW2","2GW2a.pdb") ppb=Bio.PDB.PPBuilder() # Manually figure out how the query and subject peptides correspond... # query has an extra residue at the front # subject has two extra residues at the back query = ppb.build_peptides(structure1)[0][1:] target = ppb.build_peptides(structure2)[0][:-2] query_atoms = [ r['CA'] for r in query ] target_atoms = [ r['CA'] for r in target ] superimposer = Bio.PDB.Superimposer() superimposer.set_atoms(query_atoms, target_atoms) print "Query and subject superimposed, RMS:", superimposer.rms superimposer.apply(structure2.get_atoms()) # Write modified structures to one file outfile=open("2GW2-modified.pdb", "w")  io=Bio.PDB.PDBIO()  io.set_structure(structure2)  io.save(outfile)  outfile.close()  BCHB524 - Edwards

Superimpose two chains import Bio.PDB parser = Bio.PDB.PDBParser(QUIET=1) structure = parser.get_structure("1HPV","1HPV.pdb") model = structure[0] ppb=Bio.PDB.PPBuilder() # Get the polypeptide chains achain,bchain = ppb.build_peptides(model) aatoms = [ r['CA'] for r in achain ] batoms = [ r['CA'] for r in bchain ] superimposer = Bio.PDB.Superimposer() superimposer.set_atoms(aatoms, batoms) print "Query and subject superimposed, RMS:", superimposer.rms superimposer.apply(model['B'].get_atoms()) # Write structure to file outfile=open("1HPV-modified.pdb", "w")  io=Bio.PDB.PDBIO()  io.set_structure(structure)  io.save(outfile)   outfile.close()  BCHB524 - Edwards

Exercises (Optional) Read through and try the examples from Chapter 11 of the Biopython Tutorial and the Bio.PDB FAQ. Write a program that analyzes a PDB file (filename provided on the command-line!) to find pairs of lysine residues that might be linked if the BS3 cross-linker is used. The rigid BS3 cross-linker is approximately 11 Å long. Write two versions, one that computes the distance between all pairs of lysine residues, and one that uses the NeighborSearch technique. BCHB524 - Edwards