CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU October 29, 2004Claus Lundegaard Protein Secondary Structures Assignment and.

Slides:



Advertisements
Similar presentations
Secondary structure prediction from amino acid sequence.
Advertisements

Secondary structure assignment
Three-Stage Prediction of Protein Beta-Sheets Using Neural Networks, Alignments, and Graph Algorithms Jianlin Cheng and Pierre Baldi Institute for Genomics.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Protein Secondary Structures
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Protein Secondary Structures Assignment and prediction.
Sequence analysis June 20, 2006 Learning objectives-Understand sliding window programs. Understand difference between identity, similarity and homology.
Predicting local Protein Structure Morten Nielsen.
Protein-a chemical view A chain of amino acids folded in 3D Picture from on-line biology bookon-line biology book Peptide Protein backbone N / C terminal.
1 Levels of Protein Structure Primary to Quaternary Structure.
Amino Acids and Proteins 1.What is an amino acid / protein 2.Where are they found 3.Properties of the amino acids 4.How are proteins synthesized 1.Transcription.
IT og Sundhed 2010/11 Sequence based predictors. Secondary structure and surface accessibility Bent Petersen 13 January 2011.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Can protein model accuracy be identified? Morten Nielsen, CBS, BioCentrum, DTU.
Sequence analysis June 18, 2008 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.
Protein Structure Databases Databases of three dimensional structures of proteins, where structure has been solved using X-ray crystallography or nuclear.
It og Sundhed Thomas Nordahl Petersen, Associate Professor Center for Biological Sequence Analysis, DTU
Protein Structure Modeling (1). Protein Folding Problem A protein folds into a unique 3D structure under physiological conditions Lysozyme sequence: KVFGRCELAA.
Protein Secondary Structures Assignment and prediction.
Sequence analysis June 19, 2007 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Protein Secondary Structures Assignment and prediction.
Sequence analysis June 17, 2003 Learning objectives-Review amino acids structures. Understand sliding window programs. Understand difference between identity,
Thomas Blicher Center for Biological Sequence Analysis
It & Health 2009 Summary Thomas Nordahl Petersen.
Artificial Neural Networks Thomas Nordahl Petersen & Morten Nielsen.
Protein Secondary Structures Assignment and prediction.
Introduction to bioinformatics
Protein Secondary Structures Assignment and prediction Pernille Haste Andersen
Protein Secondary Structure Prediction Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia.
Structure Prediction in 1D
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.
Protein Secondary Structures Assignment and prediction.
Predicting local Protein Structure Morten Nielsen.
Class 7: Protein Secondary Structure
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Protein Structure Elements Primary to Quaternary Structure.
Protein Structure FDSC400. Protein Functions Biological?Food?
Protein Structures: Experiments and Modeling Patrice Koehl.
Template-based Prediction of Protein 8-state Secondary Structures June 12 th 2013 Ashraf Yaseen and Yaohang Li DEPARTMENT OF COMPUTER SCIENCE OLD DOMINION.
Protein structure prediction
Rising accuracy of protein secondary structure prediction Burkhard Rost
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
©CMBI 2006 Amino Acids “ When you understand the amino acids, you understand everything ”
Protein Secondary Structure Prediction Some of the slides are adapted from Dr. Dong Xu’s lecture notes.
Protein Secondary Structure Prediction. Input: protein sequence Output: for each residue its associated Secondary structure (SS): alpha-helix, beta-strand,
Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Protein Secondary Structure Prediction
Secondary structure prediction
2 o structure, TM regions, and solvent accessibility Topic 13 Chapter 29, Du and Bourne “Structural Bioinformatics”
Web Servers for Predicting Protein Secondary Structure (Regular and Irregular) Dr. G.P.S. Raghava, F.N.A. Sc. Bioinformatics Centre Institute of Microbial.
Protein structure prediction May 26, 2011 HW #8 due today Quiz #3 on Tuesday, May 31 Learning objectives-Understand the biochemical basis of secondary.
Protein Structure Prediction
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
Protein Secondary Structure Prediction G P S Raghava.
1 Protein Structure Prediction (Lecture for CS397-CXZ Algorithms in Bioinformatics) April 23, 2004 ChengXiang Zhai Department of Computer Science University.
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Proteins Structure of proteins Proteins are made of C, H, O and nitrogen and may have sulfur. The monomers of proteins are amino acids An amino acid.
Proteins Secondary Structure Predictions
Secondary Structure Prediction Lecture 7 Structural Bioinformatics Dr. Avraham Samson
Protein structure prediction June 27, 2003 Learning objectives-Understand the basis of secondary structure prediction programs. Become familiar with the.
Doug Raiford Lesson 14.  Reminder  Involved in virtually every chemical reaction ▪ Enzymes catalyze reactions  Structure ▪ muscle, keratins (skin,
Fibrous Proteins Examples 1. a-keratins 2. Silk Fibroin 3. Collagen
Improved Protein Secondary Structure Prediction. Secondary Structure Prediction Given a protein sequence a 1 a 2 …a N, secondary structure prediction.
Introduction to Bioinformatics II
Figure 3.14A–D Protein structure (layer 1)
Levels of Protein Structure
Artificial Neural Networks Thomas Nordahl Petersen & Morten Nielsen
Presentation transcript:

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU October 29, 2004Claus Lundegaard Protein Secondary Structures Assignment and prediction

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU October 29, 2004Claus Lundegaard Secondary Structure Elements ß-strand Helix Turn Bend

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU October 29, 2004Claus Lundegaard Use of secondary structure Classification of protein structures Definition of loops/core Use in fold recognition methods Improvements of alignments Definition of domain boundaries

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU October 29, 2004Claus Lundegaard Classification of secondary structure Defining features –Dihedral angles –Hydrogen bonds –Geometry Assigned manually by crystallographers or Automatic –DSSP (Kabsch & Sander,1983) –STRIDE (Frishman & Argos, 1995) –Continuum (Andersen et al., 2002)

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU October 29, 2004Claus Lundegaard Dihedral Angles phi - dihedral angle about the N-Calpha bond psi - dihedral angle about the Calpha-C bond omega - dihedral angle about the C-N (peptide) bond From

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU October 29, 2004Claus Lundegaard Helices phi(deg) psi(deg) H-bond pattern right-handed alpha-helix i+4 pi-helix i helix i+3 (omega is 180 deg in all cases) From

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU October 29, 2004Claus Lundegaard Beta Strands phi(deg) psi(deg) omega (deg) beta strand Hydrogen bond patterns in beta sheets. Here a four-stranded beta sheet is drawn schematically which contains three antiparallel and one parallel strand. Hydrogen bonds are indicated with red lines (antiparallel strands) and green lines (parallel strands) connecting the hydrogen and receptor oxygen. From

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU October 29, 2004Claus Lundegaard Secondary Structure Elements ß-strand Helix Turn Bend

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU October 29, 2004Claus Lundegaard Secondary Structure Type Descriptions *H = alpha helix *B = residue in isolated beta-bridge *E = extended strand, participates in beta ladder *G = 3-helix (3/10 helix) *I = 5 helix (pi helix) *T = hydrogen bonded turn *S = bend

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU October 29, 2004Claus Lundegaard Automatic assignment programs DSSP ( ) Continuum ( ) STRIDE ( ) # RESIDUE AA STRUCTURE BP1 BP2 ACC N-H-->O O-->H-N N-H-->O O-->H-N TCO KAPPA ALPHA PHI PSI X-CA Y-CA Z-CA 1 4 A E , 0.0 2,-0.3 0, 0.0 0, A H , 0.0 2, , , A V , ,-2.6 2, 0.0 2, A I E -A 23 0A ,-0.4 2, , , A I E -A 22 0A 74 17, , ,-0.5 2, A Q E -A 21 0A 86 -2,-0.4 2, , , A A E +A 20 0A 18 13, , ,-0.9 2, A E E +A 19 0A 63 -2,-0.4 2, , , A F E -A 18 0A 31 9,-1.5 9, ,-0.3 2, A Y E -A 17 0A 36 -2,-0.3 2,-0.4 7,-0.2 7, A L E >> -A 16 0A 24 5,-3.2 4, ,-0.4 5, A N T 45S , , 0.0 2,-0.2 0, A P T 45S , ,-0.2 0, , A D T 45S , ,-0.2 1,-0.1 3, A Q T < ,-1.7 2,-0.3 1, , A S E < +A 11 0A 44 -5, ,-3.2 2, 0.0 2, A G E -A 10 0A 28 -2,-0.3 2, , , A E E -A 9 0A 14 -9, , ,-0.3 2, A F E +A 8 0A 3 12, , ,-0.3 2, A M E -AB 7 30A 0 -13, , ,-0.4 2, A F E -AB 6 29A 45 8,-2.4 7, ,-0.3 8, A D E -AB 5 27A 6 -17, , ,-0.4 2, A F E > S-AB 4 26A 76 3,-3.5 3, , , A D T 3 S , , , , A G T 3 S ,-0.3 2,-0.4 1, , A D E < S-B 23 0A , , , 0.0 2, A E E -B 22 0A 8 -2, , ,-0.2 3,

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU October 29, 2004Claus Lundegaard Straight HEC Secondary Structure Prediction What to predict? –All 8 types or pool types into groups H E C *H =  helix *B = residue in isolated  -bridge *E = extended strand, participates in  ladder *G = 3-helix (3/10 helix) * I = 5 helix (  helix) *T = hydrogen bonded turn *S = bend *C/.= random coil CASP Q3

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU October 29, 2004Claus Lundegaard Secondary Structure Prediction Simple alignments Align to a close homolog for which the structure has been experimentally solved. Heuristic Methods (e.g., Chou-Fasman, 1974) Apply scores for each amino acid an sum up over a window. Neural Networks (different inputs) Raw Sequence (late 80’s) Blosum matrix (e.g., PhD, early 90’s) Position specific alignment profiles (e.g., PsiPred, late 90’s) Multiple networks balloting, probability conversion, output expansion (Petersen et al., 2000).

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU October 29, 2004Claus Lundegaard Improvement of accuracy 1974 Chou & Fasman~50-53% 1978 Garnier63% 1987 Zvelebil66% 1988 Quian & Sejnowski64.3% 1993 Rost & Sander % 1997 Frishman & Argos<75% 1999 Cuff & Barton72.9% 1999 Jones76.5% 2000 Petersen et al.77.9%

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU October 29, 2004Claus Lundegaard Simple Alignments Solved structure of a homolog to query is needed Homologous proteins have ~88% identical (3 state) secondary structure If no close homologue can be identified alignments will give almost random results

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU October 29, 2004Claus Lundegaard Amino acid preferences in  - Helix

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU October 29, 2004Claus Lundegaard Amino acid preferences in  - Strand

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU October 29, 2004Claus Lundegaard Amino acid preferences in coil

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU October 29, 2004Claus Lundegaard Chou-Fasman NameP(a)P(b)P(turn)f(i)f(i+1)f(i+2)f(i+3) Ala Arg Asp Asn Cys Glu Gln Gly His Ile Leu Lys Met Phe Pro Ser Thr Trp Tyr Val

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU October 29, 2004Claus Lundegaard Chou-Fasman 1.Assign all of the residues in the peptide the appropriate set of parameters. 2.Scan through the peptide and identify regions where 4 out of 6 contiguous residues have P(a-helix) > 100. That region is declared an alpha-helix. Extend the helix in both directions until a set of four contiguous residues that have an average P(a-helix) P(b-sheet) for that segment, the segment can be assigned as a helix. 3.Repeat this procedure to locate all of the helical regions in the sequence. 4.Scan through the peptide and identify a region where 3 out of 5 of the residues have a value of P(b-sheet) > 100. That region is declared as a beta-sheet. Extend the sheet in both directions until a set of four contiguous residues that have an average P(b-sheet) 105 and the average P(b-sheet) > P(a-helix) for that region. 5.Any region containing overlapping alpha-helical and beta-sheet assignments are taken to be helical if the average P(a-helix) > P(b-sheet) for that region. It is a beta sheet if the average P(b-sheet) > P(a-helix) for that region. 6.To identify a bend at residue number j, calculate the following value: p(t) = f(j)f(j+1)f(j+2)f(j+3) where the f(j+1) value for the j+1 residue is used, the f(j+2) value for the j+2 residue is used and the f(j+3) value for the j+3 residue is used. If: (1) p(t) > ; (2) the average value for P(turn) > 1.00 in the tetra-peptide; and (3) the averages for the tetra-peptide obey the inequality P(a-helix) P(b-sheet), then a beta-turn is predicted at that location.

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU October 29, 2004Claus Lundegaard Chou-Fasman General applicable Works for sequences with no solved homologs But, Low Accuracy

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU October 29, 2004Claus Lundegaard Neural Networks Benefits –General applicable –Can capture higher order correlations –Inputs other than sequence information Drawbacks –Needs many data (different solved structures). However, theese does exist today (nearly 2000 solved structures with low sequence identity. –Complex methods with several pitfalls.

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU October 29, 2004Claus Lundegaard Architecture I K E E H V I I Q A E H E C IKEEHVIIQAEFYLNPDQSGEF….. Window Input Layer Hidden Layer Output Layer Weights

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU October 29, 2004Claus Lundegaard Sparse encoding Inp Neuron AAcid A R N D C Q E

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU October 29, 2004Claus Lundegaard Input Layer I K E E H V I I Q A E

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU October 29, 2004Claus Lundegaard BLOSUM 62 A R N D C Q E G H I L K M F P S T W Y V B Z X * A R N D C Q E G H I L K M F P S T W Y V

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU October 29, 2004Claus Lundegaard Input Layer I K E E H V I I Q A E

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU October 29, 2004Claus Lundegaard Secondary networks (Structure-to-Structure) H E C H E C H E C H E C IKEEHVIIQAEFYLNPDQSGEF….. Window Input Layer Hidden Layer Output Layer Weights

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU October 29, 2004Claus Lundegaard PHD method (Rost and Sander) Combine neural networks with sequence profiles –6-8 Percentage points increase in prediction accuracy over standard neural networks Use second layer “Structure to structure” network to filter predictions Jury of predictors Set up as mail server

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU October 29, 2004Claus Lundegaard PSI-Pred (Jones, DT) Use alignments from iterative sequence searches (PSI-Blast) as input to a neural network Better predictions due to better sequence profiles Available as stand alone program and via the web

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU October 29, 2004Claus Lundegaard Position specific scoring matrices (BLAST profiles) A R N D C Q E G H I L K M F P S T W Y V 1 I K E E H V I I Q A E F Y L N P D

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU October 29, 2004Claus Lundegaard Benchmarking secondary structure predictions CASP –Critical Assessment of Structure Predictions –Sequences from about-to-be-solved-structures are given to groups who submit their predictions before the structure is published EVA –Newly solved structures are send to prediction servers.

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU October 29, 2004Claus Lundegaard EVA results PROFphd77.0% PSIPRED76.8% SAM-T99sec76.1% SSpro76.0% Jpred275.5% PHD71.7% –Cubic.columbia.edu/eva

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU October 29, 2004Claus Lundegaard Sequence-to-structure –Window sizes15,17,19 and 21 –Hidden units50 and 75 –10-fold cross validation => 80 predictions Structure-to-structure –Window size17 –Hidden units40 –10-fold cross validation => 800 predictions Several different architectures

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU October 29, 2004Claus Lundegaard Confidence of a per residue prediction –P(Highest) – P(second highest) –H: 0.80 E: 0.05 C:0.15 => conf.=0.65 Mean per chain confidence for all 800 predictions –Calculate Mean and Standard deviation –Averaging of per chain predictions with   Balloting procedure

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU October 29, 2004Claus Lundegaard Activities to probabilities … Helix activities (output) Strand activities (output) Coil probabilities! (calculated) Coil conversion

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU October 29, 2004Claus Lundegaard Links to servers Database of links – bin/renderrelres?protmodelhttp://mmtsb.scripps.edu/cgi- bin/renderrelres?protmodel ProfPHD – PSIPRED – JPred – htmlwww.compbio.dundee.ac.uk/Software/JPred/jpred. html

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU October 29, 2004Claus Lundegaard Practical Conclusion If you need a secondary structure prediction use one of the newer ones such as –ProfPHD, –PSIPRED, and –JPred And not one of the older ones such as –Chou-Fasman, and –Garnier