Protein Structure Databases Databases of three dimensional structures of proteins, where structure has been solved using X-ray crystallography or nuclear.

Slides:



Advertisements
Similar presentations
Secondary structure prediction from amino acid sequence.
Advertisements

Protein Structure C483 Spring 2013.
Secondary structure assignment
Protein Structure Prediction
Protein Structure – Part-2 Pauling Rules The bond lengths and bond angles should be distorted as little as possible. No two atoms should approach one another.
The amino acids in their natural habitat. Topics: Hydrogen bonds Secondary Structure Alpha helix Beta strands & beta sheets Turns Loop Tertiary & Quarternary.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Protein Secondary Structures
The Structure and Functions of Proteins BIO271/CS399 – Bioinformatics.
Predicting local Protein Structure Morten Nielsen.
1 Levels of Protein Structure Primary to Quaternary Structure.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Protein Structure Modeling (1). Protein Folding Problem A protein folds into a unique 3D structure under physiological conditions Lysozyme sequence: KVFGRCELAA.
Protein Secondary Structures Assignment and prediction.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Protein Secondary Structures Assignment and prediction.
Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, All rights reserved.
Protein Secondary Structures Assignment and prediction.
CISC667, F05, Lec20, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction Protein Secondary Structure.
Protein Secondary Structures Assignment and prediction Pernille Haste Andersen
Protein Secondary Structure Prediction Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia.
Structure Prediction in 1D
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU October 29, 2004Claus Lundegaard Protein Secondary Structures Assignment and.
Computing for Bioinformatics Lecture 8: protein folding.
Protein Secondary Structures Assignment and prediction.
Predicting local Protein Structure Morten Nielsen.
Class 7: Protein Secondary Structure
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structures.
Protein structure prediction
Chapter 12 Protein Structure Basics. 20 naturally occurring amino acids Free amino group (-NH2) Free carboxyl group (-COOH) Both groups linked to a central.
Proteins Secondary Structure Predictions Structural Bioinformatics.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
The most important secondary structural elements of proteins are: A. α-Helix B. Pleated-sheet structures C. β Turns The most common secondary structures.
Proteins. Proteins? What is its How does it How is its How does it How is it Where is it What are its.
Protein Secondary Structure Prediction Some of the slides are adapted from Dr. Dong Xu’s lecture notes.
Protein Secondary Structure Prediction. Input: protein sequence Output: for each residue its associated Secondary structure (SS): alpha-helix, beta-strand,
Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003.
© Wiley Publishing All Rights Reserved. Protein 3D Structures.
Protein Secondary Structure Prediction
Secondary structure prediction
2 o structure, TM regions, and solvent accessibility Topic 13 Chapter 29, Du and Bourne “Structural Bioinformatics”
CS790 – BioinformaticsProtein Structure and Function1 Review of fundamental concepts  Know how electron orbitals and subshells are filled Know why atoms.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Web Servers for Predicting Protein Secondary Structure (Regular and Irregular) Dr. G.P.S. Raghava, F.N.A. Sc. Bioinformatics Centre Institute of Microbial.
Protein structure prediction May 26, 2011 HW #8 due today Quiz #3 on Tuesday, May 31 Learning objectives-Understand the biochemical basis of secondary.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
The α-helix forms within a continuous strech of the polypeptide chain 5.4 Å rise, 3.6 aa/turn  1.5 Å/aa N-term C-term prototypical  = -57  ψ = -47 
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
Protein Secondary Structure Prediction G P S Raghava.
1 Protein Structure Prediction (Lecture for CS397-CXZ Algorithms in Bioinformatics) April 23, 2004 ChengXiang Zhai Department of Computer Science University.
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Comparative methods Basic logics: The 3D structure of the protein is deduced from: 1.Similarities between the protein and other proteins 2.Statistical.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Protein backbone Biochemical view:
Proteins Structure Predictions Structural Bioinformatics.
Tymoczko • Berg • Stryer © 2015 W. H. Freeman and Company
Protein Structure Prediction. Protein Synthesis the national health museum.
Structure and Function
Structural organization of proteins
Mir Ishruna Muniyat. Primary structure (Amino acid sequence) ↓ Secondary structure ( α -helix, β -sheet ) ↓ Tertiary structure ( Three-dimensional.
Protein Structure Prediction Dr. G.P.S. Raghava Protein Sequence + Structure.
Introduction to Bioinformatics II
Levels of Protein Structure
Protein structure prediction.
Protein structure prediction
Presentation transcript:

Protein Structure Databases Databases of three dimensional structures of proteins, where structure has been solved using X-ray crystallography or nuclear magnetic resonance (NMR) techniques Databases of three dimensional structures of proteins, where structure has been solved using X-ray crystallography or nuclear magnetic resonance (NMR) techniques Protein Databases: Protein Databases: PDB (protein data bank) PDB (protein data bank) Swiss-Prot Swiss-Prot PIR PIR (Protein Information Resource) SCOP (Structural Classification of Proteins) SCOP (Structural Classification of Proteins)

Protein Structure Databases Most extensive for 3-D structure is PDB Most extensive for 3-D structure is PDB

Visualization of Proteins A number of programs convert atomic coordinates of 3-d structures into views of the molecule A number of programs convert atomic coordinates of 3-d structures into views of the molecule allow the user to manipulate the molecule by rotation, zooming, etc. allow the user to manipulate the molecule by rotation, zooming, etc. Critical in drug design -- yields insight into how the protein might interact with ligands at active sites Critical in drug design -- yields insight into how the protein might interact with ligands at active sites

Visualization of Proteins Most popular programs for viewing 3-D structures: Protein explorer: Rasmol: Chime: Cn3D: Mage: Swiss 3D viewer:

Alignment of Protein Structure Compare 3D structure of one protein against 3D structure of second protein Compare 3D structure of one protein against 3D structure of second protein Compare positions of atoms in three-dimensional structures Compare positions of atoms in three-dimensional structures Look for positions of secondary structural elements (helices and strands) within a protein domain Look for positions of secondary structural elements (helices and strands) within a protein domain Exam distances between carbon atoms to determine degree structures may be superimposed Exam distances between carbon atoms to determine degree structures may be superimposed Side chain information can be incorporated Side chain information can be incorporated Buried; visible Buried; visible Structural similarity between proteins does not necessarily mean evolutionary relationship Structural similarity between proteins does not necessarily mean evolutionary relationship

Alignment of Protein Structure

T Simple case – two closely related proteins with the same number of amino acids. Structure alignment Find a transformation to achieve the best superposition

Transformations  Translation  Translation and Rotation -- Rigid Motion (Euclidian space)

Types of Structure Comparison  Sequence-dependent vs. sequence-independent structural alignment  Global vs. local structural alignment  Pairwise vs. multiple structural alignment

ASCRKLE ¦¦¦¦¦¦¦ ASCRKLE Minimize rmsd of distances 1-1,...,7-7 Sequence-dependent Structure Comparison

Can be solved in O(n) time. Can be solved in O(n) time. Useful in comparing structures of the same protein solved in different methods, under different conformation, through dynamics. Useful in comparing structures of the same protein solved in different methods, under different conformation, through dynamics. Evaluation protein structure prediction. Evaluation protein structure prediction.

Sequence-independent Structure Comparison Given two configurations of points in the three dimensional space: find T which produces “largest” superimpositions of corresponding 3-D points. T

Evaluating Structural Alignments 1. 1.Number of amino acid correspondences created RMSD of corresponding amino acids 3. 3.Percent identity in aligned residues 4. 4.Number of gaps introduced 5. 5.Size of the two proteins 6. 6.Conservation of known active site environments 7. 7.… No universally agreed upon criteria. It depends on what you are using the alignment for.

Protein Secondary Structure Prediction

Why secondary structure prediction? Accurate secondary structure prediction can be an important information for the tertiary structure prediction Accurate secondary structure prediction can be an important information for the tertiary structure prediction Protein function prediction Protein function prediction Protein classification Protein classification Predicting structural change Predicting structural change An easier problem than 3D structure prediction (more than 40 years of history). An easier problem than 3D structure prediction (more than 40 years of history).

 helix α-helix (30-35%) Hydrogen bond between C=O (carbonyl) & NH (amine) groups within strand (4 positions apart) 3.6 residues / turn, 1.5 Å rise / residue Typically right hand turn Most abundant secondary structure α-helix formers: A,C,L,M,E,Q,H,K

 sheet &  turn β-sheet / β-strand (20-25%) Hydrogen bond between groups across strands Forms parallel and antiparallel pleated sheets Amino acids less compact – 3.5 Å between adjacent residues Residues alternate above and below β-sheet β-sheet formers: V,I,P,T,W β-turn Short turn (4 residues) Hydrogen bond between C=O & NH groups within strand (3 positions apart) Usually polar, found near surface β-turn formers: S,D,N,P,R

Others Loop Regions between α-helices and β-sheets On the surface, vary in length and 3D configurations Do not have regular periodic structures Loop formers: small polar residues Coil (40-50%) Generally speaking, anything besides α-helix, β-sheet, β-turn

Assigning Secondary Structure Defining features Defining features Dihedral angles Dihedral angles Hydrogen bonds Hydrogen bonds Geometry Geometry Assigned manually by crystallographers or Assigned manually by crystallographers or Automatic Automatic DSSP (Definition of secondary structure of proteins, Kabsch & Sander,1983) DSSP (Definition of secondary structure of proteins, Kabsch & Sander,1983) DSSP STRIDE (Frishman & Argos, 1995) STRIDE (Frishman & Argos, 1995) STRIDE Continuum (Claus Andersen, Burkhard Rost, 2001) Continuum (Claus Andersen, Burkhard Rost, 2001) Continuum

Definition of secondary structure of proteins (DSSP) The DSSP code The DSSP code H = alpha helix H = alpha helix B = residue in isolated beta-bridge B = residue in isolated beta-bridge E = extended strand, participates in beta ladder E = extended strand, participates in beta ladder G = 3-helix (3/10 helix) G = 3-helix (3/10 helix) I = 5 helix (pi helix) I = 5 helix (pi helix) T = hydrogen bonded turn T = hydrogen bonded turn S = bend S = bend CASP Standard CASP Standard H = (H, G, I), E = (E, B), C = (T, S) H = (H, G, I), E = (E, B), C = (T, S)

Secondary Structure Prediction Given a protein sequence (primary structure) Given a protein sequence (primary structure) HWIATGQLIREAYEDYSS GHWIATRGQLIREAYEDYRHFSSECPFIP Predict its secondary structure content (C=Coils H=Alpha Helix E=Beta Strands) HWIATGQLIREAYEDYSS GHWIATRGQLIREAYEDYRHFSSECPFIP EEEEEHHHHHHHHHHHHH CEEEEECHHHHHHHHHHHCCCHHCCCCCC

Algorithm Chou-Fasman Method Chou-Fasman Method Examining windows of residues to predict structure Examining windows of residues to predict structure

From PDB database, calculate the propensity for a given amino acid to adopt a certain ss-type From PDB database, calculate the propensity for a given amino acid to adopt a certain ss-type (aa i --- amino acid i,  --- ss type) l Example: #Alanine=2,000, #residues=20,000, #helix=4,000, #Ala in helix=500 P=? Secondary structure propensity

From PDB database, calculate the propensity for a given amino acid to adopt a certain ss-type From PDB database, calculate the propensity for a given amino acid to adopt a certain ss-type l Example: #Ala=2,000, #residues=20,000, #helix=4,000, #Ala in helix=500 P( ,aa i ) = 500/20,000, p(  p(aa i ) = 2,000/20,000 P = 500 / (4,000/10) = 1.25

Chou-Fasman parameters Note: The parameters given in the textbook are 100*P  i

Chou-FasmanChou-Fasman algorithm Chou-Fasman Helix: Helix: Scan through the peptide and identify regions where 4 out of 6 contiguous residues have P(H) > That region is declared an alpha-helix. Scan through the peptide and identify regions where 4 out of 6 contiguous residues have P(H) > That region is declared an alpha-helix. Extend the helix in both directions until a set of four contiguous residues that have an average P(H) < 1.00 is reached. That is declared the end of the helix. Extend the helix in both directions until a set of four contiguous residues that have an average P(H) < 1.00 is reached. That is declared the end of the helix. If the segment defined by this procedure is longer than 5 residues and the average P(H) > P(E) for that segment, the segment can be assigned as a helix. If the segment defined by this procedure is longer than 5 residues and the average P(H) > P(E) for that segment, the segment can be assigned as a helix. Repeat this procedure to locate all of the helical regions in the sequence. Repeat this procedure to locate all of the helical regions in the sequence.

TSPTAELMRSTG P(H) TSPTAELMRSTG P(H) Initiation Identify regions where 4/6 have a P(H) >1.00 “alpha-helix nucleus”

Propagation Extend helix in both directions until a set of four residues have an average P(H) <1.00. TSPTAELMRSTG P(H) If the average P(H) > P(E) for that segment, the segment can be assigned as a helix. P(H)=107.5%>P(E)=85.9%

Prediction TSPTAELMRSTG P(H) HHHHHHHH

Chou-FasmanChou-Fasman algorithm Chou-Fasman B-strand: B-strand: Scan through the peptide and identify a region where 3 out of 5 of the residues have a value of P(E)>1.00. That region is declared as a beta-sheet. Scan through the peptide and identify a region where 3 out of 5 of the residues have a value of P(E)>1.00. That region is declared as a beta-sheet. Extend the sheet in both directions until a set of four contiguous residues that have an average P(E) < 1.00 is reached. That is declared the end of the beta-sheet. Extend the sheet in both directions until a set of four contiguous residues that have an average P(E) < 1.00 is reached. That is declared the end of the beta-sheet. Any segment of the region located by this procedure is assigned as a beta-sheet if the average P(E)>1.05 and the average P(E)>P(H) for that region. Any segment of the region located by this procedure is assigned as a beta-sheet if the average P(E)>1.05 and the average P(E)>P(H) for that region. Any region containing overlapping alpha-helical and beta-sheet assignments are taken to be helical if the average P(H) > P(E) for that region. It is a beta sheet if the average P(E) > P(H) for that region. Any region containing overlapping alpha-helical and beta-sheet assignments are taken to be helical if the average P(H) > P(E) for that region. It is a beta sheet if the average P(E) > P(H) for that region.

Chou-FasmanChou-Fasman algorithm Chou-Fasman Beta-turn Beta-turn To identify a bend at residue number j, calculate the following value To identify a bend at residue number j, calculate the following value p(t) = f(j)f(j+1)f(j+2)f(j+3) If If (1) p(t) > , (2) the average value for P(turn) > 1.00 in the tetrapeptide and (3) the averages for the tetrapeptide obey the inequality P(H) P(E), then a beta-turn is predicted at that location.

Exercise Predict the secondary structure of the following protein sequence: Predict the secondary structure of the following protein sequence: Ala Pro Ala Phe Ser Val Ser Leu Ala Ser Gly Ala

exercise Predict the secondary structure of the following protein sequence: Predict the secondary structure of the following protein sequence: Ala Pro Ala Phe Ser Val Ser Leu Ala Ser Gly Ala H H H H H H E E E E E E E E E E E E E E T T T T T T T T T T H H E E E E E E T T T T

Prediction Methods Single sequence Examine single protein sequence Base prediction on Statistics – composition of amino acids Neural networks – patterns of amino acids Multiple sequence alignment First create MSA Use sequences from PSI-BLAST, CLUSTALW, etc… Align sequence with related proteins in family Predict secondary structure based on consensus/profile Generally improves prediction 8-9%

Accuracyaccuracy Statistical method (single sequence) Statistical method (single sequence) 1974 Chou & Fasman~50-53% 1978 Garnier 63% Statistical method (Multiple seqs) Statistical method (Multiple seqs) 1987 Zvelebil66% 1993 Yi & Lander68% Neural network Neural network 1988 Qian & Sejnowski1988 Qian & Sejnowski64.3% 1988 Qian & Sejnowski 1993 Rost & Sander1993 Rost & Sander % 1993 Rost & Sander 1997 Frishman & Argos1997 Frishman & Argos<75% 1997 Frishman & Argos 1999 Cuff & Barton1999 Cuff & Barton72.9% 1999 Cuff & Barton 1999 Jones1999 Jones76.5% 1999 Jones 2000 Petersen et al.2000 Petersen et al.77.9% 2000 Petersen et al.

Neural network Input signals are summed and turned into zero or one J1J1 J2J2 J3J3 J4J4 Feed-forward multilayer network Input layerHidden layerOutput layer neurons

Enter sequences Compare Prediction to Reality Adjust Weights Neural network training

Neural net for secondary structure

Neural net for SS Prediction Jury decisions Use multiple neural networks & combine results Average output Majority decision

Neural net for SS Prediction JPRED [Cuff+ 1998] Finds consensus from PHD, PREDATOR, DSC, NNSSP, etc…