Secondary Structure & Solvent accessible surface Calculation Lecture 6 Structural Bioinformatics Dr. Avraham Samson
DSSP 2012Avraham Samson - Faculty of Medicine - Bar Ilan University 2 Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features Wolfgang Kabsch, Christian Sander Biopolymers, Volume 22, Issue 12, pages 2577–2637, December 1983
Amino Acids Secondary Structure Solvent Accessibility
Hydrogen bond donors and acceptors the amide nitrogen: main-chain hydrogen bond donor the carbonyl oxygen: main-chain hydrogen bond acceptor there are also side-chain acceptors and donors
2012Avraham Samson - Faculty of Medicine - Bar Ilan University 5
Hydrogen bonded turns 2012Avraham Samson - Faculty of Medicine - Bar Ilan University 7
Hydrogen bonded bridges 2012Avraham Samson - Faculty of Medicine - Bar Ilan University 8
Bend 2012Avraham Samson - Faculty of Medicine - Bar Ilan University 9
Chirality 2012Avraham Samson - Faculty of Medicine - Bar Ilan University 10
Dihedral angle calculation The book "Crystal Structure Analysis for Chemists and Biologists" by Jenny P. Glusker gives four different ways of calculating the dihedral angle, p Probably the most direct is: Consider the four atom chain The distances between any two atoms is denoted d(ij). For example d13 is the distance between atoms 1 and 3. Since you already have cartesian coordinates, this is easily calculated as SQRT( SQ(x3-x1) + SQ(y3-y1) + SQ(z3-z1) ) The dihedral angle is defined as follows: cos(angle) = P/SQRT(Q) where P = SQ(d12) * ( SQ(d23)+SQ(d34)-SQ(d24)) + SQ(d23) * (- SQ(d23)+SQ(d34)+SQ(d24)) + SQ(d13) * ( SQ(d23)-SQ(d34)+SQ(d24)) - 2 * SQ(d23) * SQ(d14) and Q = (d12 + d23 + d13) * ( d12 + d23 - d13) * (d12 - d23 + d13) * (-d12 + d23 + d13 ) * (d23 + d34 + d24) * ( d23 + d34 - d24 ) * (d23 - d34 + d24) * (-d23 + d34 + d24 ) A test case, d12 = 2.38, d23 = 1.48, d34 = 1.48, d13 = 3.56, d14 = 3.61, d24 = 2.40 P = 20.83, SQRT(Q) = 21.40, angle = 13.3 degrees 2012Avraham Samson - Faculty of Medicine - Bar Ilan University 11
2012Avraham Samson - Faculty of Medicine - Bar Ilan University 12 Helices
Ladders and sheets 2012Avraham Samson - Faculty of Medicine - Bar Ilan University 13
More details SS-bonds Chain breaks Handedness (chirality) Pymol and molmol use DSSP to assign secondary structure 2012Avraham Samson - Faculty of Medicine - Bar Ilan University 14
Nelson et al (Eisenberg lab), Nature 435:773 (2005). for background on “polar zippers”: Perutz et al. PNAS 91:5355 (1991) These types of fibrils important in Huntington’s disease etc amyloid-like fibril(left) of peptide GNNQNNY from the yeast prion protein Sup35, and its atomic structure (right) Because of the repetitive nature of secondary structures, and particularly beta-sheets, proteins can form fibrillar structures and aggregates amide stacks fibril axis in the case of this fibril the side chains also hydrogen bond to each other
Fibrillar helical structures: the leucine zipper GCN4 “leucine zipper” (green) bound as a dimer (two copies of the polypeptide) to target DNA The GCN4 dimer is formed through hydrophobic interactions between leucines (red) in the two polypeptide chains Leu
DSSP Code: H = alpha helix G = 3-helix (3/10 helix) I = 5 helix (pi helix) B = residue in isolated beta-bridge E = extended strand, participates in beta ladder T = hydrogen bonded turn S = bend Blank = loop
18
Question: How would you assign structural neighbors (<5 A) from a PDB file? Answer: Parse PDB file for atoms with distance less than 5 Angstroms! 19
Contact maps of protein structures 1avg--structure of triabin map of C -C distances < 6 Å near diagonal: local contacts in the sequence off-diagonal: long-range (nonlocal) contacts rainbow ribbon diagram blue to red: N to C -both axes are the sequence of the protein
Contact maps of protein structures Structure of n15 Cro -both axes are the sequence of the protein rainbow ribbon diagram blue to red: N to C map of C -C distances < 6 Å
Contact maps of protein structures Structure of n15 Cro -both axes are the sequence of the protein rainbow ribbon diagram blue to red: N to C map of all heavy atom distances < 6 Å (includes side chains)
Surface and interior of globular proteins solvent accessible surface molecular surface residue fractional accessibility pockets and cavities “hydrophobic core” ordered waters in protein structures
“Accessible Surface” Lee & Richards, 1971 Shrake & Rupley, 1973 represent atoms as spheres w/appropriate radii and eliminate overlapping parts... mathematically roll a sphere all around that surface... the sphere’s center traces out a surface as it rolls...
Now look at a cross-section (slice) of a protein structure: Inner surfaces here are van der Waals. Outer surface is that traced out by the center of the sphere as it rolls around the van der Waals’ surface. If any part of the arc around a given atom is traced out, that atom is accessible to solvent. The solvent accessible surface of the atom is defined as the sum the arcs traced around an atom. solvent accessible surface from Lee & Richards, 1971 van der Waals surface arc traced around atom there’s not much solvent accessible surface in the middle
“Accessible surface”/“Molecular surface” note: these are alternative ways of representing the same reality: the surface which is essentially in contact with solvent
molecular and accessible surfaces are both useful representations, but molecular surface is more closely related to the actual atomic surfaces. This makes it somewhat better for visualizing the texture of the outer surface, as well as for assessing the shape and volume of any internal cavities. you will hear the term Connolly surface used often, after Michael Connolly. A Connolly surface is a particular way of calculating the molecular surface. The accessible surface is also occasionally called the Richards surface, after Fred Richards.
Molecular surface of proteins depiction of heavy atoms (O, N,C, S) in a protein as van der Waals spheres depiction of the corresponding “molecular surface”--volume contained by this surface is vdW volume plus “interstitial volume”--spaces in between
The irregular surface of proteins: pockets and cavities a pocket is an empty concavity on a protein surface which is accessible to solvent from the outside. a cavity or void in a protein is a pocket which has no opening to the outside. It is an interior empty space inside the protein. Pockets and cavities can be critical features of proteins in terms of their binding behavior, and identifying them is usually a first step in structure-based ligand design etc.
Fractional accessibility calculate total solvent accessible surface of protein structure (also can calculate solvent accessible surface for individual residues/sidechains within the protein) can also model the accessible surface area in a disordered or unfolded protein using accessible surface area calculations on model tripeptides such as Ala-X-Ala or Gly-X-Gly. from these we can calculate what fraction of the surface is buried (inaccessible to solvent) by virtue of being within the folded, native structure of the protein. this is done by dividing the accessible surface area in the native protein structure by the accessible surface in the modelled unfolded protein. That’s the fractional accessibility. The residue fractional accessibility and side chain fractional accessibility refer to the same thing calculated for individual residues/sidechains within the structure.
Accessible surface area in globular protein structures Accessible surface area A s in native states of proteins is a non-linear function of molecular weight (Miller, Janin, Lesk & Chothia, 1987): A s = 6.3M r 0.73 `where M r is molecular wt This is an empirical correlation but it comes close to the expected two-thirds power law relating surface area to volume or mass for a set of bodies of similar shape and density.
How much surface area is buried when a protein adopts its native structure in solution? estimate total accessible surface area in extended/disorded polypeptide chain using the accessible surface areas in Gly-X-Gly or Ala-X-Ala models. This is a linear function of molecular weight A t = 1.48M r + 21 the total fractional accessibility is A s /A t,and the fraction of surface area buried is 1- A s /A t What is the total fractional surface area buried for a protein of molecular weight 10,000? 20,000? Is the fraction higher for small proteins or large?
Distribution of residue fractional accessibilities note broad distribution among non-buried residues, and mean fractional accessibility for non-buried residues of around 0.5 note that few residues are completely exposed to solvent, but that fractional accessibility of >1 is possible from Miller et al, 1987 note that a sizeable group are completely buried (hatched) or nearly completely buried
Buried residues in proteins size classmean Mrfraction of buried residues 0% ASA5% ASA small medium large XL all the fraction of buried residues (defined by 0% or 5% ASA cutoffs) increases as a function of molecular weight--for your average protein around 25% of the residues will be buried. These form the core.
Residue fractional accessibility correlates with free energies of transfer for amino acids between water and organic solvents (Miller, Janin, Lesk & Chothia, 1987) (Fauchere & Pliska, 1983) the interior of a protein is akin to a nonpolar solvent in which the nonpolar sidechains are buried. Polar sidechains, on the other hand, are usually on the surface. However, some polar side chains do get buried, and it must also be remembered that the backbone for every residue is polar, including those with nonpolar side chains. So a lot of polar moieties do get buried in proteins.
The hydrophobic core of a small protein: N15 Cro 0% ASA: Pro 3 Leu 6 Ala 16 Val 27 Ile 36 Ile 44 < 5 % ASA: Met 1 Ala 17 Val 20 Gln 41 Ser of 66 ordered residues have less than 5% ASA note that some polar residues are buried
The outer surface: water in protein structures Structures of water-soluble proteins determined at reasonably high resolution will be decorated on their outer surfaces with water molecules (cyan balls) with relatively well-defined positions, and waters may also occur internally Water is not just surrounding the protein--it is interacting with it
Water interacts with protein surfaces second shell water: only contacts other waters first shell waters: in contact with/ hydrogen bound to protein Most waters visible in crystal structures make hydrogen bonds to each other and/or to the protein, as donor/acceptor/both
DSSP Web Service
Amino Acids Secondary Structure Solvent Accessibility
STRIDE web service bin/stride/stridecgi.py 41
REM Detailed secondary structure assignment L4W REM 1L4W REM |---Residue---| |--Structure--| |-Phi-| |-Psi-| |-Area-| 1L4W ASG ILE A 1 1 C Coil L4W ASG VAL A 2 2 E Strand L4W ASG CYS A 3 3 E Strand L4W ASG HIS A 4 4 E Strand L4W ASG THR A 5 5 E Strand L4W ASG THR A 6 6 E Strand L4W ASG ALA A 7 7 C Coil L4W ASG THR A 8 8 T Turn L4W ASG SER A 9 9 T Turn L4W ASG PRO A T Turn L4W ASG ILE A E Strand L4W ASG SER A E Strand L4W ASG ALA A E Strand L4W ASG VAL A E Strand L4W ASG THR A E Strand L4W ASG CYS A C Coil L4W ASG PRO A C Coil L4W ASG PRO A T Turn L4W ASG GLY A T Turn L4W ASG GLU A T Turn L4W ASG ASN A T Turn L4W ASG LEU A E Strand L4W ASG CYS A E Strand L4W ASG TYR A E Strand L4W ASG ARG A E Strand L4W ASG LYS A E Strand L4W ASG MET A E Strand L4W ASG TRP A E Strand L4W ASG CYS A E Strand L4W ASG ASP A E Strand L4W ASG ALA A B Bridge L4W ASG PHE A T Turn L4W ASG CYS A T Turn L4W ASG SER A T Turn L4W ASG SER A T Turn L4W ASG ARG A C Coil L4W ASG GLY A E Strand L4W 2012Avraham Samson - Faculty of Medicine - Bar Ilan University 42
Structure Analysis Assign secondary structure for amino acids from 3D structure Generate solvent accessible area for amino acids from 3D structure Most widely used tool: DSSP (Dictionary of Protein Secondary Structure: Pattern Recognition of Hydrogen-Bonded and Geometrical Features. Kabsch and Sander, 1983)
2D: Contact Map Prediction 1 2 ………..………..…j...…………………..…n i n i n 3D Structure 2D Contact Map Cheng, Randall, Sweredoski, Baldi. Nucleic Acid Research, 2005 Distance Threshold = 8A o
3D Structure Prediction Tools MULTICOM ( ) I-TASSER ( HHpred ( ed) ed Robetta ( 3D-Jury ( FFAS ( Pcons ( Sparks ( sp3.html) FUGUE ( cryst.bioc.cam.ac.uk/%7Efugue/prfsearch.html) cryst.bioc.cam.ac.uk/%7Efugue/prfsearch.html FOLDpro ( SAM ( Phyre ( 3D-PSSM ( mGenThreader (