Lecture 14 Secondary Structure Prediction

Slides:



Advertisements
Similar presentations
Review.
Advertisements

Amino Acids PHC 211.  Characteristics and Structures of amino acids  Classification of Amino Acids  Essential and Nonessential Amino Acids  Levels.
A Ala Alanine Alanine is a small, hydrophobic
Secondary structure prediction from amino acid sequence.
Review of Basic Principles of Chemistry, Amino Acids and Proteins Brian Kuhlman: The material presented here is available on the.
Structure Prediction (I): Secondary structure Structure Prediction (I): Secondary structure DNA/Protein structure-function analysis and prediction Lecture.
1 Lesson 5 Protein Prediction and Classification.
Secondary Structure Prediction Victor A. Simossis Bioinformatics Center IBIVU.
Protein Secondary Structures
Structure-Function Analysis 11 Jan DNA/Protein structure-function analysis and prediction Basics of Protein Structure: –Short Introduction to Molecular.
Protein-a chemical view A chain of amino acids folded in 3D Picture from on-line biology bookon-line biology book Peptide Protein backbone N / C terminal.
1 Levels of Protein Structure Primary to Quaternary Structure.
Applied Bioinformatics The amino acids. Overview Proteins (sneak preview) – Primary structure – Secondary structure – Tertiary structure The amino acids.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Sequence analysis June 18, 2008 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.
Protein Secondary Structures Assignment and prediction Pernille Haste Andersen
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Master Course DNA/Protein Structure- function Analysis and Prediction Lecture 7.
Computing for Bioinformatics Lecture 8: protein folding.
©CMBI 2001 A Ala Alanine Alanine is a small, hydrophobic residue. Its side chain, R, is just a methyl group. Alanine likes to sit in an alpha helix,it.
©CMBI 2006 Amino Acids “ When you understand the amino acids, you understand everything ”
You Must Know How the sequence and subcomponents of proteins determine their properties. The cellular functions of proteins. (Brief – we will come back.
Proteins account for more than 50% of the dry mass of most cells
Proteins account for more than 50% of the dry mass of most cells
Proteins Secondary Structure Predictions Structural Bioinformatics.
©CMBI 2006 Amino Acids “ When you understand the amino acids, you understand everything ”
STRUCTURAL ORGANIZATION
Protein Secondary Structure Prediction Some of the slides are adapted from Dr. Dong Xu’s lecture notes.
Lecture 15 Secondary Structure Prediction Bioinformatics Center IBIVU.
Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003.
Protein Secondary Structure Prediction
Secondary structure prediction
Protein Structure Prediction
Protein Secondary Structure Prediction G P S Raghava.
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Amino Acids ©CMBI 2001 “ When you understand the amino acids, you understand everything ”
Proteins.
Chapter 3 Proteins.
Protein Structure and Bioinformatics. Chapter 2 What is protein structure? What are proteins made of? What forces determines protein structure? What is.
Introduction to Bioinformatics Lecture 13 Protein Secondary Structure Elements and their Prediction IBIVU Centre C E N T R F O R I N T E G R A T I V E.
Protein structure prediction Haixu Tang School of Informatics.
Proteins Structure Predictions Structural Bioinformatics.
Proteins Tertiary Protein Structure of Enzyme Lactasevideo Video 2.
Amino acids.
Protein Folding Notes.
Proteins account for more than 50% of the dry mass of most cells
Protein Structure and Properties
Protein Structure September 7,
Protein Sequence Alignments
Proteins.
Conformationally changed Stability
Introduction to Bioinformatics II
Proteins account for more than 50% of the dry mass of most cells
Chemistry 121 Winter 2016 Introduction to Organic Chemistry and Biochemistry Instructor Dr. Upali Siriwardane (Ph.D. Ohio State)
Chapter 3 Proteins.
Fig. 5-UN1  carbon Amino group Carboxyl group.
A Ala Alanine Alanine is a small, hydrophobic
Amino acids R-groups non-polar polar acidic basic proteins
Amino acids R-groups non-polar polar acidic basic proteins
Proteins account for more than 50% of the dry mass of most cells
Proteins Genetic information in DNA codes specifically for the production of proteins Cells have thousands of different proteins, each with a specific.
Conformationally changed Stability
The 20 amino acids.
Levels of Protein Structure
The 20 amino acids.
The Chemical Building Blocks of Life
Example of regression by RBF-ANN
Proteins Proteins have many structures, resulting in a wide range of functions Proteins do most of the work in cells and act as enzymes 2. Proteins are.
Lecture 10 Secondary Structure Prediction
“When you understand the amino acids,
The Three-Dimensional Structure of Proteins
Presentation transcript:

Lecture 14 Secondary Structure Prediction F O I G A V B M S U Lecture 14 Secondary Structure Prediction Bioinformatics Center IBIVU

Protein structure

Linus Pauling (1951) Atomic Coordinates and Structure Factors for Two Helical Configurations of Polypeptide Chains Alpha-helix

James Watson & Francis Crick (1953) Molecular structure of nucleic acids

James Watson & Francis Crick (1953) Molecular structure of nucleic acids

The Building Blocks (proteins) Proteins consist of chains of amino acids Bound together through the peptide bond Special folding of the chain yields structure Structure determines the function

Chains of amino acids

Three-dimensional Structures Four levels of protein architecture

Amino acids classes Hydrophobic aminoacids Charged aminoacids Alanine Ala A Valine Val V Phenylalanine Phe F Isoleucine Ile I Leucine Leu L Proline Pro P Methionine Met M Charged aminoacids Aspartate (-) Asp D Glutamate (-) Glu E Lysine (+) Lys K Arginine (+) Arg R Polar aminoacids Serine Ser S Threonine Thr T Tyrosine Tyr Y Cysteine Cys C Asparagine Asn N Glutamine Gln Q Histidine His H Tryptophane Trp W Glycine (sidechain is only a hydrogen) Glycine Gly G

Disulphide bridges Two cysteines can form disulphide bridges Anchoring of secondary structure elements

Ramachandran plot Only certain combinations of values of phi (f) and psi (y) angles are observed psi psi phi omega phi

Motifs of protein structure Global structural characteristics: Outside hydrophylic, inside hydrophobic (unless…) Often globular form (unless…) Artymiuk et al, Structure of Hen Egg White Lysozyme (1981)

Secondary structure elements Alpha-helix Beta-strand

Renderings of proteins Irving Geis:

Renderings of proteins Jane Richardson (1981):

Alpha helix Hydrogen bond: from N-H at position n, to C=O at position n-4 (‘n-n+4’)

Other helices Alternative helices are also possible 310-helix: hydrogen bond from N-H at position n, to C=O at position n-3 Bigger chance of bad contacts a-helix: hydrogen bond from N-H at position n, to C=O at position n-4 p-helix: hydrogen bond from N-H at position n, to C=O at position n-5 structure more open: no contacts Hollow in the middle too small for e.g. water At the edge of the Ramachandran plot

Helices Backbone hydrogenbridges form the structure Directed through hydrophobic center of protein Sidechains point outwards Possibly: one side hydrophobic, one side hydrophylic

Globin fold Common theme 8 helices (ABCDEFGH), short loops Still much variation (16 – 99 % similarity) Helix length Exact position Shift through the ridges

Beta-strands form beta-sheets Beta-strands next to each other form hydrogen bridges Sidechains alternating (up, down)

Parallel or Antiparallel sheets Usually only parallel or anti-parallel Occasionally mixed

Beta structures barrels propeller like structure beta helix up-and-down barrels greek key barrels jelly roll barrels propeller like structure beta helix

Greek key barrels Greek key motif occurs also in barrels two greek keys (g crystallin) combination greek key / up-and-down

Turns and motifs Secondary structure elements are connected by loops Very short loops between twee b-strands: turn Different secundary structure elementen often appear together: motifs Helix-turn-helix Calcium binding motif Hairpin Greek key motif b-a-b-motif

Helix-turn-helix motif Helix-turn-helix important for DNA recognition by proteins EF-hand: calcium binding motif

Hairpin / Greek key motif Different possible hairpins : type I/II Greek key: anti-parallel beta-sheets

b-a-b motif Most common way to obtain parallel b-sheets Usually the motif is ‘right-handed’

Domains formed by motifs Within protein different domains can be identified For example: ligand binding domain DNA binding domain Catalytic domain Domains are built from motifs of secundary structure elements Domains often are a functional unit of proteins

Protein structure summary Amino acids form polypeptide chains Chains fold into three-dimensional structure Specific backbone angles are permitted or not: Ramachandran plot Secundary structure elements: a-helix, b-sheet Common structural motifs: Helix-turn-helix, Calcium binding motif, Hairpin, Greek key motif, b-a-b-motif Combination of elements and motifs: tertiary structure Many protein structures available: Protein Data Bank (PDB)

Now we go into predicting Secondary Structure Elements

Protein primary structure 20 amino acid types A generic residue Peptide bond SARS Protein From Staphylococcus Aureus 1 MKYNNHDKIR DFIIIEAYMF RFKKKVKPEV 31 DMTIKEFILL TYLFHQQENT LPFKKIVSDL 61 CYKQSDLVQH IKVLVKHSYI SKVRSKIDER 91 NTYISISEEQ REKIAERVTL FDQIIKQFNL 121 ADQSESQMIP KDSKEFLNLM MYTMYFKNII 151 KKHLTLSFVE FTILAIITSQ NKNIVLLKDL 181 IETIHHKYPQ TVRALNNLKK QGYLIKERST 211 EDERKILIHM DDAQQDHAEQ LLAQVNQLLA 241 DKDHLHLVFE

Protein secondary structure Alpha-helix Beta strands/sheet SARS Protein From Staphylococcus Aureus 1 MKYNNHDKIR DFIIIEAYMF RFKKKVKPEV DMTIKEFILL TYLFHQQENT SHHH HHHHHHHHHH HHHHHHTTT SS HHHHHHH HHHHS S SE 51 LPFKKIVSDL CYKQSDLVQH IKVLVKHSYI SKVRSKIDER NTYISISEEQ EEHHHHHHHS SS GGGTHHH HHHHHHTTS EEEE SSSTT EEEE HHH 101 REKIAERVTL FDQIIKQFNL ADQSESQMIP KDSKEFLNLM MYTMYFKNII HHHHHHHHHH HHHHHHHHHH HTT SS S SHHHHHHHH HHHHHHHHHH 151 KKHLTLSFVE FTILAIITSQ NKNIVLLKDL IETIHHKYPQ TVRALNNLKK HHH SS HHH HHHHHHHHTT TT EEHHHH HHHSSS HHH HHHHHHHHHH 201 QGYLIKERST EDERKILIHM DDAQQDHAEQ LLAQVNQLLA DKDHLHLVFE HTSSEEEE S SSTT EEEE HHHHHHHHH HHHHHHHHTS SS TT SS

Protein secondary structure prediction Why bother predicting them? SS Information can be used for downstream analysis: Framework model of protein folding, collapse secondary structures Fold prediction by comparing to database of known structures Can be used as information to predict function Can also be used to help align sequences (e.g. SS- Praline)

Why predict when you can have the real thing? UniProt Release 1.3 (02/2004) consists of: Swiss-Prot Release : 144731 protein sequences TrEMBL Release : 1017041 protein sequences PDB structures : : ~35000 protein structures Primary structure Secondary structure Tertiary structure Quaternary structure Function ‘Mind the gap’

Secondary Structure An easier question – what is the secondary structure when the 3D structure is known?

DSSP DSSP (Dictionary of Secondary Structure of a Protein) – assigns secondary structure to proteins which have a crystal (x-ray) or NMR (Nuclear Magnetic Resonance) structure H = alpha helix B = beta bridge (isolated residue) E = extended beta strand G = 3-turn (3/10) helix I = 5-turn () helix T = hydrogen bonded turn S = bend DSSP uses hydrogen-bonding structure to assign Secondary Structure Elements (SSEs). The method is strict but consistent (as opposed to expert assignments in PDB

A more challenging task: Predicting secondary structure from primary sequence alone

What we need to do Train a method on a diverse set of proteins of known structure Test the method on a test set separate from our training set Assess our results in a useful way against a standard of truth Compare to already existing methods using the same assessment

How to develop a method Other method(s) prediction Test set of T<<N sequences with known structure Database of N sequences with known structure Standard of truth Assessment method(s) Method Prediction Training set of K<N sequences with known structure Trained Method

Some key features ALPHA-HELIX: Hydrophobic-hydrophilic residue periodicity patterns BETA-STRAND: Edge and buried strands, hydrophobic-hydrophilic residue periodicity patterns OTHER: Loop regions contain a high proportion of small polar residues like alanine, glycine, serine and threonine. The abundance of glycine is due to its flexibility and proline for entropic reasons relating to the observed rigidity in its kinking the main-chain. As proline residues kink the main-chain in an incompatible way for helices and strands, they are normally not observed in these two structures (breakers), although they can occur in the N-terminal two positions of a-helices. Edge Buried

Burried and Edge strands Parallel -sheet Anti-parallel -sheet

History (1) Using computers in predicting protein secondary has its onset >30 years ago (Nagano (1973) J. Mol. Biol., 75, 401) on single sequences. The accuracy of the computational methods devised early-on was in the range 50-56% (Q3). The highest accuracy was achieved by Lim with a Q3 of 56% (Lim, V. I. (1974) J. Mol. Biol., 88, 857). The most widely used early method was that of Chou-Fasman (Chou, P. Y. , Fasman, G. D. (1974) Biochemistry, 13, 211). Random prediction would yield about 40% (Q3) correctness given the observed distribution of the three states H, E and C in globular proteins (with generally about 30% helix, 20% strand and 50% coil).

History (2) Nagano 1973 – Interactions of residues in a window of 6. The interactions were linearly combined to calculate interacting residue propensities for each SSE type (H, E or C) over 95 crystallographically determined protein tertiary structures. Lim 1974 – Predictions are based on a set of complicated stereochemical prediction rules for a-helices and b-sheets based on their observed frequencies in globular proteins. Chou-Fasman 1974 - Predictions are based on differences in residue type composition for three states of secondary structure: a-helix, b-strand and turn (i.e., neither a-helix nor b-strand). Neighbouring residues were checked for helices and strands and predicted types were selected according to the higher scoring preference and extended as long as unobserved residues were not detected (e.g. proline) and the scores remained high.

How do secondary structure prediction methods work? They often use a window approach to include a local stretch of amino acids around a considered sequence position in predicting the secondary structure state of that position The next slides provide basic explanations of the window approach (for the GOR method as an example) and two basic techniques to train a method and predict SSEs: k-nearest neighbour and neural nets

Anything else – turn/loop Secondary Structure Reminder- secondary structure is usually divided into three categories: Anything else – turn/loop Alpha helix Beta strand (sheet)

Sliding window H H H E E E E Central residue Sliding window Sequence of known structure H H H E E E E The frequencies of the residues in the window are converted to probabilities of observing a SS type The GOR method uses three 17*20 windows for predicting helix, strand and coil; where 17 is the window length and 20 the number of a.a. types At each position, the highest probability (helix, strand or coil) is taken. A constant window of n residues long slides along sequence

Sliding window H H H E E E E Sliding window Sequence of known structure H H H E E E E The frequencies of the residues in the window are converted to probabilities of observing a SS type The GOR method uses three 17*20 windows for predicting helix, strand and coil; where 17 is the window length and 20 the number of a.a. types At each position, the highest probability (helix, strand or coil) is taken. A constant window of n residues long slides along sequence

Sliding window H H H E E E E Sliding window Sequence of known structure H H H E E E E The frequencies of the residues in the window are converted to probabilities of observing a SS type The GOR method uses three 17*20 windows for predicting helix, strand and coil; where 17 is the window length and 20 the number of a.a. types At each position, the highest probability (helix, strand or coil) is taken. A constant window of n residues long slides along sequence

Sliding window H H H E E E E Sliding window Sequence of known structure H H H E E E E The frequencies of the residues in the window are converted to probabilities of observing a SS type The GOR method uses three 17*20 windows for predicting helix, strand and coil; where 17 is the window length and 20 the number of a.a. types At each position, the highest probability (helix, strand or coil) is taken. A constant window of n residues long slides along sequence

Chou and Fasman (1974) Name P(a) P(b) P(turn) Alanine 142 83 66 Arginine 98 93 95 Aspartic Acid 101 54 146 Asparagine 67 89 156 Cysteine 70 119 119 Glutamic Acid 151 037 74 Glutamine 111 110 98 Glycine 57 75 156 Histidine 100 87 95 Isoleucine 108 160 47 Leucine 121 130 59 Lysine 114 74 101 Methionine 145 105 60 Phenylalanine 113 138 60 Proline 57 55 152 Serine 77 75 143 Threonine 83 119 96 Tryptophan 108 137 96 Tyrosine 69 147 114 Valine 106 170 50 The propensity of an amino acid to be part of a certain secondary structure (e.g. – Proline has a low propensity of being in an alpha helix or beta sheet  breaker)

Chou-Fasman prediction Look for a series of >4 amino acids which all have (for instance) alpha helix values >100 Extend (…) Accept as alpha helix if average alpha score > average beta score Ala Pro Tyr Phe Phe Lys Lys His Val Ala Thr α 142 57 69 113 113 114 114 100 106 142 83 β 83 55 147 138 138 74 74 87 170 83 119

Chou and Fasman (1974) Success rate of 50%

GOR: the older standard The GOR method (version IV) was reported by the authors to perform single sequence prediction accuracy with an accuracy of 64.4% as assessed through jackknife testing over a database of 267 proteins with known structure. (Garnier, J. G., Gibrat, J.-F., , Robson, B. (1996) In: Methods in Enzymology (Doolittle, R. F., Ed.) Vol. 266, pp. 540-53.) The GOR method relies on the frequencies observed in the database for residues in a 17- residue window (i.e. eight residues N-terminal and eight C-terminal of the central window position) for each of the three structural states. 17 H E C GOR-I GOR-II GOR-III GOR-IV 20

Improvements in the 1990’s Conservation in MSA Smarter algorithms (e.g. HMM, neural networks).

K-nearest neighbour Sequence fragments from database of known structures (exemplars) Sliding window Compare window with exemplars Qseq Central residue Get k most similar exemplars PSS HHE

Neural nets Sequence database of known structures Sliding window Qseq Central residue Neural Network The weights are adjusted according to the model used to handle the input data.

Neural nets Training an NN: Forward pass: the outputs are calculated and the error at the output units calculated. Backward pass: The output unit error is used to alter weights on the output units. Then the error at the hidden nodes is calculated (by back-propagating the error at the output units through the weights), and the weights on the hidden nodes altered using these values. For each data pair to be learned a forward pass and backwards pass is performed. This is repeated over and over again until the error is at a low enough level (or we give up). Y = 1 / (1+ exp(-k.(Σ Win * Xin)), where Win is weight and Xin is input The graph shows the output for k=0.5, 1, and 10, as the activation varies from -10 to 10.

Example of widely used neural net method: PHD, PHDpsi, PROFsec The three above names refer to the same basic technique and come from the same laboratory (Rost’s lab at Columbia, NYC) Three neural networks: A 13 residue window slides over the alignment and produces 3-state raw secondary structure predictions. A 17-residue window filters the output of network 1. The output of the second network then comprises for each alignment position three adjusted state probabilities. This post-processing step for the raw predictions of the first network is aimed at correcting unfeasible predictions and would, for example, change (HHHEEHH) into (HHHHHHH). A network for a so-called jury decision over a set of independently trained networks 1 and 2 (extra predictions to correct for training biases). The predictions obtained by the jury network undergo a final simple filtering step to delete predicted helices of one or two residues and changing those into coil.

Multiple Sequence Alignments are the superior input to a secondary structure prediction method Multiple sequence alignment: three or more sequences that are aligned so that overall the greatest number of similar characters are matched in the same column of the alignment. Enables detection of: Regions of high mutation rates over evolutionary time. Evolutionary conservation. Regions or domains that are critical to functionality. Sequence changes that cause a change in functionality. Modern SS prediction methods all use Multiple Sequence Alignments (compared to single sequence prediction >10% better)

Rules of thumb when looking at a multiple alignment (MA) Hydrophobic residues are internal Gly (Thr, Ser) in loops MA: hydrophobic block -> internal -strand MA: alternating (1-1) hydrophobic/hydrophilic => edge -strand MA: alternating 2-2 (or 3-1) periodicity => -helix MA: gaps in loops MA: Conserved column => functional? => active site

Rules of thumb when looking at a multiple alignment (MA) Active site residues are together in 3D structure MA: ‘inconsistent’ alignment columns and alignment match errors! Helices often cover up core of strands Helices less extended than strands => more residues to cross protein -- motif is right-handed in >95% of cases (with parallel strands) Secondary structures have local anomalies, e.g. -bulges

How to optimise? Differentiate along SSEs – The Yaspin method (Lin et al., 2005) Helices and strands are dissected in (begin, middle, end) sections. The Yaspin method then tries to regognise these sections. Lin K., Simossis V.A., Taylor W.R. and Heringa J. (2005) A simple and fast secondary structure prediction algorithm using hidden neural networks. Bioinformatics. 21(2):152-9.

How to optimise? Capture long-range interactions (Important for -strand prediction) Predator (Frishman and Argos, 1995) side-chains show subtle patterns in cross-strand contacts SSPro (Polastri et al., 2002) – uses bidirectional recurrent neural networks One basic sliding window is used, with two more windows that slight in from opposite sites at each basic window position. This way all-possible long-range interactions are checked.

A stepwise hierarchy Sequence database searching PSI-BLAST, SAM-T2K These basically are local alignment techniques to collect homologous sequences from a database so a multiple alignment containing the query sequence can be made Sequence database searching PSI-BLAST, SAM-T2K 2) Multiple sequence alignment of selected sequences PSSMs, HMM models, MSAs 3) Secondary structure prediction of query sequences based on the generated MSAs Single methods: PHD, PROFsec, PSIPred, SSPro, JNET, YASPIN consensus

The current picture Step 1: Database sequence search Step 2: MSA Single sequence Step 1: Database sequence search SAM-T2K PSI-BLAST Sequence database Step 2: MSA Check file PSSM Homologous sequences HMM model MSA method MSA Step 3: SS Prediction Trained machine-learning Algorithm(s) Secondary structure prediction

Jackknife test A jackknife test is a test scenario for prediction methods that need to be tuned using a training database. In its simplest form: For a database containing N sequences with known tertiary (and hence secondary) structure, a prediction is made for one test sequence after training the method on a training database containing the N-1 remaining sequences (one-at-a-time jackknife testing). A complete jackknife test involves N such predictions, after which for all sequences a prediction is made. If N is large enough, meaningful statistics can be derived from the observed performance. For example, the mean prediction accuracy and associated standard deviation give a good indication of the sustained performance of the method tested. If the jackknife test is computationally too expensive, the database can be split in larger groups, which are then jackknifed. The latter is called Cross-validation

Cross validation To save on computation time relative to the Jackknife, the database is split up in a number of non-overlapping sub-databases. For example, with 10-fold cross-validation, the database is divided into 10 equally (or near equally) sized groups. One group is then taken out of the database as a test set, the method trained on the remaining nine groups, after which predictions are made for the sequences in the test group and the predictions assessed. The amount of training required is now only 10% of what would be needed with jackknife testing.

Standards of truth What is a standard of truth? - a structurally derived secondary structure assignment (using a 3D structure from the PDB) Why do we need one? - it dictates how accurate our prediction is How do we get it? - methods use hydrogen-bonding patterns along the main-chain to define the Secondary Structure Elements (SSEs).

Some examples of programs that assign secondary structures in 3D structures DSSP (Kabsch and Sander, 1983) – most popular STRIDE (Frishman and Argos, 1995) DEFINE (Richards and Kundrot, 1988) Annotation: Helix: 3/10-helix (G), -helix (H), -helix (I) H Strand: -strand (E), -bulge (B)  E Turn: H-bonded turn (T), bend (S) Rest: Coil (“ “)  C

Assessing a prediction How do we decide how good a prediction is? 1. Qn : the number of correctly predicted n SSE states over the total number of predicted states Q3 = [(PH + PE + PC)/N]  100% 2. Segment OVerlap (SOV): the number of correctly predicted n SSE states over the total number of predictions with higher penalties for core segment regions (Zemla et al, 1999)

Assessing a prediction How do we decide how good a prediction is? 3. Matthews Correlation Coefficients (MCC): the number of correctly predicted n SSE states over the total number of predictions taking into account how many prediction errors were made for each state: ~P = false positive, ~N = false negative, S = one of three states (H, E or C)

Single vs. Consensus predictions The current standard ~1% better on average Predictions from different methods H E C Max observations are kept as correct

Accuracy Accuracy of prediction seems to hit a ceiling of 70-80% accuracy Long-range interactions are not included Beta-strand prediction is difficult Accuracy Method 50% Chou & Fasman 69% Adding the MSA 70-80% MSA+ sophisticated computations

Some Servers PSI-pred uses PSI-BLAST profiles JPRED Consensus prediction PHD home page – all-in-one prediction, includes secondary structure nnPredict – uses neural networks BMERC PSA Server IBIVU YASPIN server BMC launcher – choose your prediction program