Protein Secondary Structure Prediction G P S Raghava.

Slides:



Advertisements
Similar presentations
Chemistry 2100 Lecture 10.
Advertisements

Secondary structure prediction from amino acid sequence.
Review of Basic Principles of Chemistry, Amino Acids and Proteins Brian Kuhlman: The material presented here is available on the.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Protein Secondary Structures
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Protein Secondary Structures Assignment and prediction.
Sequence analysis June 20, 2006 Learning objectives-Understand sliding window programs. Understand difference between identity, similarity and homology.
Predicting local Protein Structure Morten Nielsen.
Protein-a chemical view A chain of amino acids folded in 3D Picture from on-line biology bookon-line biology book Peptide Protein backbone N / C terminal.
1 Levels of Protein Structure Primary to Quaternary Structure.
Amino Acids and Proteins 1.What is an amino acid / protein 2.Where are they found 3.Properties of the amino acids 4.How are proteins synthesized 1.Transcription.
IT og Sundhed 2010/11 Sequence based predictors. Secondary structure and surface accessibility Bent Petersen 13 January 2011.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Sequence analysis June 18, 2008 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.
Protein Structure Databases Databases of three dimensional structures of proteins, where structure has been solved using X-ray crystallography or nuclear.
Protein Structure Modeling (1). Protein Folding Problem A protein folds into a unique 3D structure under physiological conditions Lysozyme sequence: KVFGRCELAA.
Protein Secondary Structures Assignment and prediction.
Sequence analysis June 19, 2007 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Protein Secondary Structures Assignment and prediction.
Sequence analysis June 17, 2003 Learning objectives-Review amino acids structures. Understand sliding window programs. Understand difference between identity,
It & Health 2009 Summary Thomas Nordahl Petersen.
Protein Secondary Structures Assignment and prediction.
©CMBI 2005 Why align sequences? Lots of sequences with unknown structure and function. A few sequences with known structure and function If they align,
Protein Secondary Structures Assignment and prediction Pernille Haste Andersen
Protein Secondary Structure Prediction Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia.
Structure Prediction in 1D
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU October 29, 2004Claus Lundegaard Protein Secondary Structures Assignment and.
Protein Secondary Structures Assignment and prediction.
Predicting local Protein Structure Morten Nielsen.
Protein Structure July 2, 2006 Learning objectives-Understand the basis of the secondary structure prediction program- Psi-PRED. Introduce the concept.
Protein structure prediction May 24, 2005 Return of Quiz#3 Writing assignments-please hand in. Learning objectives-Understand the basis of secondary structure.
Protein Structure FDSC400. Protein Functions Biological?Food?
You Must Know How the sequence and subcomponents of proteins determine their properties. The cellular functions of proteins. (Brief – we will come back.
Protein Structures: Experiments and Modeling Patrice Koehl.
Protein structure prediction
Rising accuracy of protein secondary structure prediction Burkhard Rost
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
©CMBI 2006 Amino Acids “ When you understand the amino acids, you understand everything ”
Protein Secondary Structure Prediction Some of the slides are adapted from Dr. Dong Xu’s lecture notes.
Protein Secondary Structure Prediction. Input: protein sequence Output: for each residue its associated Secondary structure (SS): alpha-helix, beta-strand,
BIOCHEMISTRY REVIEW Overview of Biomolecules Chapter 4 Protein Sequence.
Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Eric C. Rouchka, University of Louisville Sequence Database Searching Eric Rouchka, D.Sc. Bioinformatics Journal Club October.
Protein Secondary Structure Prediction
Secondary structure prediction
Learning Targets “I Can...” -State how many nucleotides make up a codon. -Use a codon chart to find the corresponding amino acid.
2 o structure, TM regions, and solvent accessibility Topic 13 Chapter 29, Du and Bourne “Structural Bioinformatics”
Web Servers for Predicting Protein Secondary Structure (Regular and Irregular) Dr. G.P.S. Raghava, F.N.A. Sc. Bioinformatics Centre Institute of Microbial.
Protein structure prediction May 26, 2011 HW #8 due today Quiz #3 on Tuesday, May 31 Learning objectives-Understand the biochemical basis of secondary.
Protein Structure Prediction
A program of ITEST (Information Technology Experiences for Students and Teachers) funded by the National Science Foundation Background Session #3 DNA &
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Amino Acids ©CMBI 2001 “ When you understand the amino acids, you understand everything ”
Proteins.
Proteins Structure of proteins Proteins are made of C, H, O and nitrogen and may have sulfur. The monomers of proteins are amino acids An amino acid.
Proteins Secondary Structure Predictions
Secondary Structure Prediction Lecture 7 Structural Bioinformatics Dr. Avraham Samson
Protein Structure Modelling
Comparative methods Basic logics: The 3D structure of the protein is deduced from: 1.Similarities between the protein and other proteins 2.Statistical.
Protein structure prediction Haixu Tang School of Informatics.
Protein structure prediction June 27, 2003 Learning objectives-Understand the basis of secondary structure prediction programs. Become familiar with the.
Doug Raiford Lesson 14.  Reminder  Involved in virtually every chemical reaction ▪ Enzymes catalyze reactions  Structure ▪ muscle, keratins (skin,
Fibrous Proteins Examples 1. a-keratins 2. Silk Fibroin 3. Collagen
Improved Protein Secondary Structure Prediction. Secondary Structure Prediction Given a protein sequence a 1 a 2 …a N, secondary structure prediction.
Arginine, who are you? Why so important?. Release 2015_01 of 07-Jan-15 of UniProtKB/Swiss-Prot contains sequence entries, comprising
Introduction to Bioinformatics II
Figure 3.14A–D Protein structure (layer 1)
Haixu Tang School of Inforamtics
Presentation transcript:

Protein Secondary Structure Prediction G P S Raghava

Protein Structure Prediction Importance What is secondary structure Assignment of secondary structure (SS) Type of SS prediction methods Description of various methods Role of multiple sequence alignment/profiles How to use

Assignment of Secondary Structure Program  DSSP (Sander Group)  Stride (Argos Group)  Pcurve DSSP  3 helix states (I=3,4,5 )  2 Sheets (isolated and extended)  Irregular Regions

dssp The DSSP program defines secondary structure, geometrical features and solvent exposure of proteins, given atomic coordinates in Protein Data Bank format Usage: dssp [-na] [-v] pdb_file [dssp_file] Output : E H < S R H < S N < K ! C I E -cd 58 89B L E -cd 59 90B V E -cd 60 91B G E -cd 61 92B 0

Automatic assignment programs DSSP ( ) STRIDE ( ) # RESIDUE AA STRUCTURE BP1 BP2 ACC N-H-->O O-->H-N N-H-->O O-->H-N TCO KAPPA ALPHA PHI PSI X-CA Y-CA Z-CA 1 4 A E , 0.0 2,-0.3 0, 0.0 0, A H , 0.0 2, , , A V , ,-2.6 2, 0.0 2, A I E -A 23 0A ,-0.4 2, , , A I E -A 22 0A 74 17, , ,-0.5 2, A Q E -A 21 0A 86 -2,-0.4 2, , , A A E +A 20 0A 18 13, , ,-0.9 2, A E E +A 19 0A 63 -2,-0.4 2, , , A F E -A 18 0A 31 9,-1.5 9, ,-0.3 2, A Y E -A 17 0A 36 -2,-0.3 2,-0.4 7,-0.2 7, A L E >> -A 16 0A 24 5,-3.2 4, ,-0.4 5, A N T 45S , , 0.0 2,-0.2 0, A P T 45S , ,-0.2 0, , A D T 45S , ,-0.2 1,-0.1 3, A Q T < ,-1.7 2,-0.3 1, , A S E < +A 11 0A 44 -5, ,-3.2 2, 0.0 2, A G E -A 10 0A 28 -2,-0.3 2, , , A E E -A 9 0A 14 -9, , ,-0.3 2, A F E +A 8 0A 3 12, , ,-0.3 2, A M E -AB 7 30A 0 -13, , ,-0.4 2, A F E -AB 6 29A 45 8,-2.4 7, ,-0.3 8, A D E -AB 5 27A 6 -17, , ,-0.4 2, A F E > S-AB 4 26A 76 3,-3.5 3, , , A D T 3 S , , , , A G T 3 S ,-0.3 2,-0.4 1, , A D E < S-B 23 0A , , , 0.0 2, A E E -B 22 0A 8 -2, , ,-0.2 3,

Secondary Structure Types *H = alpha helix *B = residue in isolated beta-bridge *E = extended strand, participates in beta ladder *G = 3-helix (3/10 helix) *I = 5 helix (pi helix) *T = hydrogen bonded turn *S = bend

Straight HEC Secondary Structure Prediction What to predict?  All 8 types or pool types into groups H E C *H =  helix *B = residue in isolated  -bridge *E = extended strand, participates in  ladder *G = 3-helix (3/10 helix) * I = 5 helix (  helix) *T = hydrogen bonded turn *S = bend *C/.= random coil CASP Q3

Type of Secondary Structure Prediction Information based classification  Property based methods (Manual / Subjective)  Residue based methods  Segment or peptide based approaches  Application of Multiple Sequence Alignment Technical classification  Statistical Methods Chou & fashman (1974) GOR  Artificial Itellegence Based Methods Neural Network Based Methods (1988) Nearest Neighbour Methods (1992) Hidden Markove model (1993) Support Vector Machine based methods

How to evaluate a prediction? correctly predicted residues number of residues The Q 3 test: Of course, all methods would be tested on the same proteins.

CHOU- FASMAN ALGORITHM Calculate propensity of each type of amino acid i using Propensities > 1 mean that the residue type I is likely to be found in the Corresponding secondary structure type.

Chou-Fasman Rules (Mathews, Van Holde, Ahern) Amino Acid  -Helix  -SheetTurn Ala Cys Leu Met Glu Gln His Lys Val Ile Phe Tyr Trp Thr Gly Ser Asp Asn Pro Arg Favors  -Helix Favors  -Sheet Favors Turns

Chou-Fasman First widely used procedure If propensity in a window of six residues (for a helix) is above a certain threshold the helix is chosen as secondary structure. If propensity in a window of five residues (for a beta strand) is above a certain threshold then beta strand is chosen. The segment is extended until the average propensity in a 4 residue window falls below a value. Output-helix, strand or turn.

GOR method Garnier, Osguthorpe & Robson Assumes amino acids up to 8 residues on each side influence the ss of the central residue. Frequency of amino acids at the central position in the window, and at -1, and +1, is determined for ,  and turns (later other or coils) to give three 17 x 20 scoring matrices. Calculate the score that the central residue is one type of ss and not another. Correctly predicts ~64%.

I(S j ;R 1,R 2,…..R last ) ≃ ∑ I(S j ;R j+m ) m = +8 m = – 8

Artificial Neural Network General structure of ANN : One input layer. Some hidden layers. One output layer.

Architecture I K E E H V I I Q A E H E C IKEEHVIIQAEFYLNPDQSGEF….. Window Input Layer Hidden Layer Output Layer Weights

Secondary Structure Prediction Application of Multiple sequence alignment  Segment based (+8 to -8 residue)  Input Multiple alignment instead of single sequence  Application of PSIBLAST Current methods (combination of)  Segment based  Neural network  Multiple sequence alignment (PSIBLAST)  Combination of Neural Network + Nearest Neighbour Method

PSI - PRED Reliability numbers: Used by many methods. Correlates with accuracy. The way the ANN tells us how much it is sure about the assignment.

PSIPRED Uses multiple aligned sequences for prediction. Uses training set of folds with known structure. Uses a two-stage neural network to predict structure based on position specific scoring matrices generated by PSI-BLAST (Jones, 1999)  First network converts a window of 15 aa’s into a raw score of h,e (sheet), c (coil) or terminus  Second network filters the first output. For example, an output of hhhhehhhh might be converted to hhhhhhhhh. Can obtain a Q 3 value of 70-78% (may be the highest achievable)

CAFASP3 in the Spotlight of EVA PROTEINS: Structure, Function, and Genetics (2003) 53:548