Protein Secondary Structures

Protein Secondary Structures
Assignment and prediction Pernille Haste Andersen

Outline What is protein secondary structure How can it be used?
Different prediction methods Alignment to homologues Propensity methods Neural networks Evaluation of prediction methods Links to prediction servers

Secondary Structure Elements
ß-strand Helix Bend Turn

Use of secondary structure
Classification of protein structures Definition of loops (active sites) Use in fold recognition methods Improvements of alignments Definition of domain boundaries

Classification of secondary structure
Defining features Dihedral angles Hydrogen bonds Geometry Assigned manually by crystallographers or Automatic DSSP (Kabsch & Sander,1983) STRIDE (Frishman & Argos, 1995) DSSPcont (Andersen et al., 2002)

Dihedral Angles phi - dihedral angle of the N-Calpha bond
From phi dihedral angle of the N-Calpha bond psi dihedral angle of the Calpha-C bond omega dihedral angle of the C-N (peptide) bond

Helices phi(deg) psi(deg) H-bond pattern
alpha-helix i+4 pi-helix i+5 310 helix i+3 (omega = 180 deg ) From

Beta Strands Antiparallel Parallel phi(deg) psi(deg) omega (deg)
beta strand Antiparallel From Parallel From

Secondary Structure Elements
ß-strand Helix Bend Turn

Secondary Structure Type Descriptions
* H = alpha helix * G = helix * I = 5 helix (pi helix) * E = extended strand, participates in beta ladder * B = residue in isolated beta-bridge * T = hydrogen bonded turn * S = bend * C = coil

Automatic assignment programs
DSSP ( ) STRIDE ( ) DSSPcont ( ) The protein data bank visualizes DSSP assignments on structures in the data base # RESIDUE AA STRUCTURE BP1 BP2 ACC N-H-->O O-->H-N N-H-->O O-->H-N TCO KAPPA ALPHA PHI PSI X-CA Y-CA Z-CA A E , , , , A H , , , , A V , , , , A I E -A A , , , , A I E -A A , , , , A Q E -A A , , , , A A E +A A , , , , A E E +A A , , , , A F E -A A , , , , A Y E -A A , , , , A L E >> -A A , , , , A N T 45S , , , , A P T 45S , , , , A D T 45S , , , ,

Secondary Structure Prediction
What to predict? All 8 types or pool types into groups DSSP Q3 * H = alpha helix * G = 310 -helix * I = 5 helix (pi helix) * E = extended strand * B = beta-bridge * T = hydrogen bonded turn * S = bend * C = coil H E C

Straight HEC What to predict? All 8 types or pool types into groups Q3 * H = alpha helix * E = extended strand * T = hydrogen bonded turn * S = bend * C = coil * G = 310-helix * I = 5 helix (pi helix) * B = beta-bridge H E C

Simple alignments Align to a close homolog for which the structure has been experimentally solved. Heuristic Methods (e.g., Chou-Fasman, 1974) Apply scores for each amino acid an sum up over a window. Neural Networks Raw Sequence (late 80’s) Blosum matrix (e.g., PhD, early 90’s) Position specific alignment profiles (e.g., PsiPred, late 90’s) Multiple networks balloting, probability conversion, output expansion (Petersen et al., 2000).

Improvement of accuracy
1974 Chou & Fasman ~50-53% 1978 Garnier 63% 1987 Zvelebil 66% 1988 Quian & Sejnowski 64.3% 1993 Rost & Sander % 1997 Frishman & Argos <75% 1999 Cuff & Barton 72.9% 1999 Jones % 2000 Petersen et al %

Simple Alignments Solved structure of a homolog to query is needed
Homologous proteins have ~88% identical (3 state) secondary structure If no close homologue can be identified alignments will give almost random results

Propensities: Amino acid preferences in -Helix
Capping

Propensities: Amino acid preferences in -Strand

Propensities: Amino acid preferences in coil

Chou-Fasman propensities
Name P(a) P(b) P(turn) f(i) f(i+1) f(i+2) f(i+3) Ala Arg Asp Asn Cys Glu Gln Gly His Ile Leu Lys Met Phe Pro Ser Thr Trp Tyr Val

Chou-Fasman Generally applicable
Works for sequences with no solved homologs But the accuracy is low! The problem is that the method does not use enough information about the structural context of a residue

Neural Networks Benefits Drawbacks Generally applicable
Can capture higher order correlations Inputs other than sequence information Drawbacks Needs a high amount of data (different solved structures). However, today nearly 2500 structures with low sequence identity/high resolution are solved Complex method with several pitfalls

Architecture Weights Input Layer I K H E Output Layer E E C H V I I Q
Hidden Layer Window IKEEHVIIQAEFYLNPDQSGEF…..

Sparse encoding Inp Neuron AAcid A R N D C Q E

Input Layer I K E 1 E H V I I Q A E

BLOSUM 62 A R N D C Q E G H I L K M F P S T W Y V B Z X *

Input Layer I K E E H V I I Q A E -1 2 -4 2 5 -2 -3 -3 1 -2 -3 -1 -1
I 2 K -4 E 2 E 5 H -2 V I -3 I -3 Q 1 A -2 E -3 -1 -1 -3 -2 -2

Secondary networks (Structure-to-Structure)
Weights Input Layer H E H C Output Layer E H C E C H E C Window Hidden Layer IKEEHVIIQAEFYLNPDQSGEF…..

PHD method (Rost and Sander)
Combine neural networks with sequence profiles 6-8 Percentage points increase in prediction accuracy over standard neural networks Use second layer “Structure to structure” network to filter predictions Jury of predictors Set up as mail server

PSI-Pred (Jones) Use alignments from iterative sequence searches (PSI-Blast) as input to a neural network Better predictions due to better sequence profiles Available as stand alone program and via the web

Position specific scoring matrices (PSI-BLAST profiles)
A R N D C Q E G H I L K M F P S T W Y V 1 I 2 K 3 E 4 E 5 H 6 V 7 I 8 I 9 Q 10 A 11 E 12 F 13 Y 14 L 15 N 16 P 17 D

Several different architectures
Sequence-to-structure Window sizes 15,17,19 and 21 Hidden units 50 and 75 10-fold cross validation => 80 predictions Structure-to-structure Window size 17 Hidden units 40 10-fold cross validation => 800 predictions Output: C C H H C C C Output: C C C C C C C

The majority rules Combining predictions from several networks improves the prediction Combinations of 800 different networks were used in the method described by Petersen TN et al. 2000, Prediction of protein secondary structure at 80 % accuracy. Proteins

Activities to probabilities
Helix activities (output) Strand activities (output) Coil probabilities! (calculated) Coil conversion … 1.0 0.10 . 1.0

Benchmarking secondary structure predictions
EVA Newly solved structures are send to prediction servers. Every week

EVA results (Rost et al., 2001)
PROFphd 77.0% PSIPRED 76.8% SAM-T99sec 76.1% SSpro % Jpred % PHD % Cubic.columbia.edu/eva

Links to servers Several links: ProfPHD PSIPRED JPred
ProfPHD PSIPRED JPred

Practical Conclusions
If you need a secondary structure prediction use the newer methods based on advanced machine learning methods such as : ProfPHD PSIPRED JPred And not one of the older ones such as : Chou-Fasman Garnier

Protein Secondary Structures

Similar presentations

Presentation on theme: "Protein Secondary Structures"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Protein Secondary Structures

Similar presentations

Presentation on theme: "Protein Secondary Structures"— Presentation transcript:

Similar presentations

About project

Feedback