Secondary Structure Prediction

Slides:



Advertisements
Similar presentations
Secondary structure prediction from amino acid sequence.
Advertisements

PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
Protein Structure C483 Spring 2013.
Protein Structure Prediction
The amino acids in their natural habitat. Topics: Hydrogen bonds Secondary Structure Alpha helix Beta strands & beta sheets Turns Loop Tertiary & Quarternary.
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
1 September, 2004 Chapter 5 Macromolecular Structure.
Protein Secondary Structures
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
Protein secondary structure prediction methods TDVEAAVNSLVNLYLQASYLS “From sequence to structure”
1 Levels of Protein Structure Primary to Quaternary Structure.
An Introduction to Bioinformatics Protein Structure Prediction.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Protein Secondary Structure : Kendrew Solves the Structure of Myoglobin “Perhaps the most remarkable features of the molecule are its complexity.
Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, All rights reserved.
The Protein Data Bank (PDB)
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Structure Prediction in 1D
Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.
Introduction to Bioinformatics - Tutorial no. 8 Predicting protein structure PSI-BLAST.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Lecture 3. α domain structures Coiled-coil, knobs and hole packing Four-helix bundle Donut ring large structure Globin fold Ridges and grooves model CS882,
Protein structure prediction
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
Supersecondary structures. Supersecondary structures motifs motifs or folds, are particularly stable arrangements of several elements of the secondary.
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
Secondary Structure Prediction Protein Analysis Workshop 2008 Bioinformatics group Institute of Biotechnology University of helsinki Hung Ta
Lecture 10: Protein structure
Introduction to Protein Structure
Proteins: Secondary Structure Alpha Helix
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Protein Structure Prediction. Historical Perspective Protein Folding: From the Levinthal Paradox to Structure Prediction, Barry Honig, 1999 A personal.
Good solutions are advantageous Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Levels of Protein Structure
Protein Secondary Structure Prediction Some of the slides are adapted from Dr. Dong Xu’s lecture notes.
Chapter 4 The Three-Dimensional Structure of Proteins.
Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki.
Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003.
© Wiley Publishing All Rights Reserved. Protein 3D Structures.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Secondary structure prediction
2 o structure, TM regions, and solvent accessibility Topic 13 Chapter 29, Du and Bourne “Structural Bioinformatics”
Protein structure prediction May 26, 2011 HW #8 due today Quiz #3 on Tuesday, May 31 Learning objectives-Understand the biochemical basis of secondary.
Mrs. Einstein Research in Molecular Biology. Importance of proteins for cell function: Proteins are the end product of the central dogma YOU are your.
Part I : Introduction to Protein Structure A/P Shoba Ranganathan Kong Lesheng National University of Singapore.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
The α-helix forms within a continuous strech of the polypeptide chain 5.4 Å rise, 3.6 aa/turn  1.5 Å/aa N-term C-term prototypical  = -57  ψ = -47 
Protein Secondary Structure Prediction G P S Raghava.
1 Protein Structure Prediction (Lecture for CS397-CXZ Algorithms in Bioinformatics) April 23, 2004 ChengXiang Zhai Department of Computer Science University.
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
Manually Adjusting Multiple Alignments Chris Wilton.
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Query sequence MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDN GVDGEWTYTE Structure-Sequence alignment “Structure is better preserved than sequence” Me! Non-redundant.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Protein backbone Biochemical view:
Levels of Protein Structure. Why is the structure of proteins (and the other organic nutrients) important to learn?
Levels of Protein Structure. Why is the structure of proteins (and the other organic nutrients) important to learn?
Protein Structure Prediction. Protein Sequence Analysis Molecular properties (pH, mol. wt. isoelectric point, hydrophobicity) Secondary Structure Super-secondary.
Structure and Function
Structural organization of proteins
Secondary Structure Prediction
Protein Structure BL
Protein Structure September 7,
Levels of Protein Structure
Presentation transcript:

Secondary Structure Prediction Protein Analysis Workshop 2006 Secondary Structure Prediction Alain Schenkel Chris Wilton Bioinformatics group Institute of Biotechnology University of helsinki

Overview Review of protein structure. Introduction to structure prediction: Different approaches. Prediction of 1D strings of structural elements. Server/soft review: COILS, MPEx, … The PredictProtein metaserver.

Proteins Proteins play a crucial role in virtually all biological processes with a broad range of functions. The activity of an enzyme or the function of a protein is governed by the three-dimensional structure. H11_MOUSE histocompatibility antigen VE2_BPV1 Bovine DNA-binding domain

20 amino acids - the building blocks Clickable map at: http://www.russell.embl-heidelberg.de/aas/

The Amino Acids - hydrophobic

The Amino Acids - polar

The Amino Acids - charged

Secondary Structure: a-helix Note: alpha=4_13 ie hbonded to following forth residue (13 atoms in rings). Exist also but ver seldom: 3_10, 5_16 (pi helix) Alpha-helix: 413 Very seldom: 310, 516 (Pi-helix)

Secondary Structure: a-helix 3.6 residues per turn Axial dipole moment Hydrogen-bonded Protein surfaces Typically, no Proline nor Glycine (“helix-breaker”)

Secondary Structure: b-sheets

Secondary Structure: b-sheets Parallel or antiparallel Alternating side-chains Connecting loops often have polar amino acids

Secondary Structure: b-sheets

Terminology Primary structure: The sequence of amino acid residues FTPAVHAFLDKFLAS …

Terminology Secondary structure: A first level of structural organization. Provides rigidity. The structural form adopted by each amino-acid residue: H: helix ( alpha ) E: extended ( beta strand ) T: turn ( often Proline ) C: coil ( random, unstructured )

Terminology Secondary structure elements (SSE): Stretches of residues in H conformation are helical SSEs. Stretches of residues in E conformation are beta-strand SSEs. Stretches of residues in C conformation are loops or coil. Turns (T) are isolated residues, usually Proline or Glycine. Other notation (in 3 states): L for all but H,E.

Secondary Structure Elements Example: one helix, one beta strand, three loops Primary: MSEGEDDFPRKRTPWCFDDEHMC Secondary: CCHHHHHHCCCCEEEEEECCCCC

Terminology Tertiary structure: The full 3D structure of a single polypeptide chain. Secondary structure elements pack together to form a structural core. Called a protein “fold”.

Terminology Quaternary structure: How several fully folded protein chains pack together to form a fully functional protein. Example: 1jch (ribosome inhibitor). PDB identifier The Protein Data Bank is the principal repository for solved structures.

Example: 1jch has 4 chains Docking. Will come back to Coiled-coils. Note how exposed to solvent, typical for coiled-coils The elongated 2-helix structures in the center are called coiled-coils.

Structural classification of folds For example (CATH): alpha beta alpha+beta alpha/beta irregular More on structural classification next week.

Biochemical classification of folds Globular proteins: in aqueous environment, compact fold, hydrophobic core and polar surfaces. Membrane proteins: attached to or across the cell membrane, hydrophobic surface within membrane. Fibrous proteins: structural role, repeat of regular/atypical SSE or irregular structure.

Globular (2 domains) Transmembrane Fibrous

INTRODUCTION TO STRUCTURE PREDICTION

Why is 3D Structure Important? A pre-requisite for understanding function processes of molecular recognition, eg DNA recognition by 2bop. Catalytic mechanisms of enzymes often require key residues to be close together in 3D space. Structure is often preserved under evolution when sequence is not. Drug design.

Structure Prediction GPSRYIVDL… ?

Approaches to structure prediction Ab initio: from physical principles only. De novo: knowledge-based potentials from PDB. Fold recognition: thread sequence through known structures for compatibility. Homology modeling: use sequence alignment to infer possible template structure. The only reliable: homology. So, needs to be lucky (ie to have an homologous prot that has been solved) Other: various level of accuracy. Pbm: can vary a lot, no testing. So serve only as hints. More on homology modeling next week.

Prediction in One-Dimension Simplification: project 3D structure onto strings of structural assignments. Eg: coiled-coils membrane helices solvent accessibility: residue is buried or exposed …eeebbbbeebbbbee… secondary structure elements: …HHHLLLEEEEEELLEEE… If accurate: can be used to improve predictions of 3D structures (eg, in fold recognition).

A Flow Chart for Structure Prediction http://speedy.embl-heidelberg.de/gtsp/flowchart2.html

Structure Prediction Why is structure prediction, and in particular ab initio prediction, a difficult problem? Many degrees of freedom: atoms of all residues and solvent. Problem increases exponentially per residue. Remote noncovalent interactions complicate matters. A delicate problem of stability. Cannot exhaustively search all possible conformations. A folding protein does not try all conformations !! (Levinthal paradox)

Basic Principle of Folding (globular protein) Pack hydrophobic side chains into the interior of the molecule, away from solvent. So, Hydrophobic residues predominantly within a central structural core. Tight packing (crystal-like). Hydrophilic residues predominantly on the protein surface, exposed to solvent. But main chain is highly polar. This forces the formation of SSEs in the core. So, Due to oxygen and hydrogen Core residues tend to be in SSEs. Loops are on the outside of the protein.

Protein Structure and Evolution Rate of evolution of genomic DNA sequence reflects degree of functional constraint. Protein coding regions evolve much more slowly than non-coding regions: need to maintain stable 3D protein structure, need to maintain vital biological function.

Rates of Protein Sequence Evolution Sequences of highly constrained structures evolve very slowly (eg: histones). Less constrained ones evolve more quickly (eg: immunoglobulins). In general: response to mutation is structural change, but many mutations will not (or only slightly) change the structure => Structure is better conserved than sequence.

Evolution of SSEs and Loops Residues in the hydrophobic core (SSEs) are constrained by the need for tight packing: changes rarely accepted - evolution is slow. Residues on the surface (loops) are less constrained (simply need to be hydrophilic): aa substitution less restricted – evolution is quicker.

Evolution of Key Residues Residues with key functional roles will be conserved. Eg: active site residues involved in catalysis. BUT: gene duplication can lead to change of function without changing structure. Residues with key structural role also tend to be conserved. Eg: GLY: high conformational flexibility => tight turns,… PRO: side-chain bounds back to backbone => tight turns. CYS: disulfide bridges.

Structure Prediction by Homology Multiple sequence / structure alignments measure differences in evolutionary rates of residues, and thus Contain more information than a single sequence for applications such as homology modeling and secondary structure prediction, Give location of conserved regions and motifs, residues buried in the protein core or exposed to solvent, plus important secondary structures. More on homology modeling next week.

Secondary Structure Prediction Three generations: Single residue statistical analysis: For each amino acid type, assign its ‘propensity’ to be in a helix, sheet, or coil. Limited accuracy: ~55-60% on average. Eg: Chou-Fasman (1974), not used any more.

Secondary Structure Prediction Segment-based statistics: Look for correlations (within 11-21 aa windows). Many algorithms have been tried. Most performant: Neural Networks: Input: a number of protein sequences with their known secondary structure. Output: a trained network that predicts secondary structure elements for given query sequences. Accuracy < 70%. Eg: GORII, COMBINE. Many algorithm tried: sequence patterns, statistical inform, physico-chemical properties, graph theory, expert rules

Neural Networks query trained network 3 states output prediction for this residue prediction query trained network (picture from B.Rost, 1999)

Secondary Structure Prediction Using information from evolution: Compute a sequence profile from a multiple sequence alignment. Use profile instead of query as input to Neural Network. 6-8 % points increase in accuracy over Neural Network only. Eg: PHD/PROF: alignments by MaxHom (B. Rost, 1996/2000) PSI-PRED: alignments from Psi-Blast (D.T. Jones, 1999) Accuracy: 72% ± 11%. # of correctly predicted 2ndary str. states Accuracy measured as Q3= total # of residues

Accuracy Illustration Psi-Pred benchmark on set of 187 chains. (D.T. Jones, 1999) Your query could be here !! In particular, accuracy can be as low as 50% for a given query => Use many different methods and compare answers.

Other Structural Features There are other structural features that one can try to predict: coiled-coils, membrane helices, solvent accessibility, globularity, disulfide bridges, confomational switches, …

POPULAR SERVERS FOR DEALING WITH SECONDARY STRUCTURES Coiled-coils Transmembrane helices Secondary structure Metaservers

Prediction of coiled-coils Coiled-coils are generally solvent exposed multi-stranded helix structures: two-stranded Helix periodicity and solvent exposure impose special pattern of heptad repeat: Helical diagram of 2 interacting helices: … abcdefg … hydrophobic residues hydrophilic residues (From Wikipedia Leucine zipper article)

The COILS server at EMBnet Compares a sequence to a database of known, parallel two-stranded coiled-coils, and derives a similarity score. By comparing this score to the distribution of scores in globular and coiled-coil proteins, the program then calculates the probability that the sequence will adopt a coiled-coil conformation. Options: scoring matrices, window size (score may vary), weighting options.

COILS Limitations The program works well for parallel two-stranded structures that are solvent-exposed but runs progressively into problems with the addition of more helices, their antiparallel orientation and their decreasing length. The program fails entirely on buried structures.

COILS Demo Let us submit the sequence to the COILS server at EMBnet: >1jch_A VAAPVAFGFPALSTPGAGGLAVSISAGALSAAIADIMAALKGPFKFGLWGVALYGVLPSQ IAKDDPNMMSKIVTSLPADDITESPVSSLPLDKATVNVNVRVVDDVKDERQNISVVSGVP MSVPVVDAKPTERPGVFTASIPGAPVLNISVNNSTPAVQTLSPGVTNNTDKDVRPAFGTQ GGNTRDAVIRFPKDSGHNAVYVSVSDVLSPDQVKQRQDEENRRQQEWDATHPVEAAERNY ERARAELNQANEDVARNQERQAKAVQVYNSRKSELDAANKTLADAIAEIKQFNRFAHDPM AGGHRMWQMAGLKAQRAQTDVNNKQAAFDAAAKEKSDADAALSSAMESRKKKEDKKRSAE NNLNDEKNKPRKGFKDYGHDYHPAPKTENIKGLGDLKPGIPKTPKQNGGGKRKRWTGDKG RKIYEWDSQHGELEGYRASDGQHLGSFDPKTGNQLKGPDPKRNIKKYL to the COILS server at EMBnet: http://www.ch.embnet.org/software/COILS_form.html

mtidk matrix, no weights, all window lengths

Frame probabilities at each residue. Columns: window size of 14, 21, 28 aa. high probability heptads

Transmembrane Region Prediction Transmembrane regions: Usually contain residues with hydrophobic side chains (surface must be hydrophobic). Usually ~20 residues long, can be up to 30 if not perpendicular through membrane. Methods: Hydropathy plots (historical, better methods now available) Threading (TMpred, MEMSAT), Hidden Markov Model (TMHMM), Neural Network (PHDhtm).

Hydropathy Plots (Kyte-Doolittle) compute an average hydropathy value for each position in the query sequence, window length of 19 usually chosen for membrane-spanning region prediction. Peaks between scales 1-2?

Hydropathy Plot Servers Membrane Explorer (also as standalone MPEx), Grease (http://fasta.bioch.virginia.edu/fasta/grease.htm) Let us submit the sequence >sp|P06010|RCEM_RHOVI Reaction center protein M chain (Photosynthetic reaction center M subunit) - Rhodopseudomonas viridis. ADYQTIYTQIQARGPHITVSGEWGDNDRVGKPFYSYWLGKIGDAQIGPIYLGASGIAAFAFGSTAILIILFNMAAEVHFDPLQFFRQFFWLGLYPPKAQYGMGIPPLHDGGWWLMAGLFMTLSLGSWWIRVYSRARALGLGTHIAWNFAAAIFFVLCIGCIHPTLVGSWSEGVPFGIWPHIDWLTAFSIRYGNFYYCPWHGFSIGFAYGCGLLFAAHGATILAVARFGGDREIEQITDRGTAVERAALFWRWTIGFNATIESVHRWGWFFSLMVMVSASVGILLTGTFVDNWYLWCVKHG AAPDYPAYLPATPDPASLPGAPK to http://blanco.biomol.uci.edu/mpex/ (Membrane Explorer)

TM Pred Method summary: Scans a candidate sequence for matches to a sequence scoring matrix, obtained by aligning the sequences of all transmembrane alpha-helical regions that are known from structures. These sequences are collected in a database called TMBase. Remark: Authors do not suggest this method for genomic sequences. Automatic methods recommended, eg, TMHMM, PHDhtm.

TM Pred Server Let us submit RCEM_RHOVI again >sp|P06010|RCEM_RHOVI Reaction center protein M chain (Photosynthetic reaction center M subunit) - Rhodopseudomonas viridis. ADYQTIYTQIQARGPHITVSGEWGDNDRVGKPFYSYWLGKIGDAQIGPIYLGASGIAAFAFGSTAILIILFNMAAEVHFDPLQFFRQFFWLGLYPPKAQYGMGIPPLHDGGWWLMAGLFMTLSLGSWWIRVYSRARALGLGTHIAWNFAAAIFFVLCIGCIHPTLVGSWSEGVPFGIWPHIDWLTAFSIRYGNFYYCPWHGFSIGFAYGCGLLFAAHGATILAVARFGGDREIEQITDRGTAVERAALFWRWTIGFNATIESVHRWGWFFSLMVMVSASVGILLTGTFVDNWYLWCVKHG AAPDYPAYLPATPDPASLPGAPK to the TMPred server at EMBnet: http://www.ch.embnet.org/software/TMPRED_form.html

Annotation for RCEM_RHOVI Uniprot entry for RCEM_RHOVI: Chain M of photosynthetic reaction center. Integral membrane protein. Can we see the predicted helices in the structure? Let´s try at SCOP.

The Psi-Pred Server Secondary structure prediction (PSIPRED) Transmembrane topology prediction (MEMSAT) Fold recognition (GenTHREADER) Let´s submit >uniprot|P00772|ELA1_PIG Elastase-1 precursor MLRLLVVASLVLYGHSTQDFPETNARVVGGTEAQRNSWPSQISLQYRSGSSWAHTCGGTL IRQNWVMTAAHCVDRELTFRVVVGEHNLNQNDGTEQYVGVQKIVVHPYWNTDDVAAGYDI ALLRLAQSVTLNSYVQLGVLPRAGTILANNSPCYITGWGLTRTNGQLAQTLQQAYLPTVD YAICSSSSYWGSTVKNSMVCAGGDGVRSGCQGDSGGPLHCLVNGQYAVHGVTSFVSRLGC NVTRKPTVFTRVSAYISWINNVIASN to http://bioinf.cs.ucl.ac.uk/psipred/

(see later for comparison with solved structure) PSIPRED PREDICTION RESULTS Key Conf: Confidence (0=low, 9=high) Pred: Predicted secondary structure (H=helix, E=strand, C=coil) AA: Target sequence # PSIPRED HFORMAT (PSIPRED V2.5 by David Jones) Conf: 978999999997404555676678816988988788877499999934884158982897 Pred: CHHHHHHHHHHHHHCCCCCCCCCCCCEECCEECCCCCCCCEEEEEEECCCCCEEEEEEEE AA: MLRLLVVASLVLYGHSTQDFPETNARVVGGTEAQRNSWPSQISLQYRSGSSWAHTCGGTL 10 20 30 40 50 60 Conf: 138734320122478742368754345663179827995679998026888865344411 Pred: CCCCEEEEECCCCCCCCCEEEEEEEEEEEECCCCCEEEEEEEEEEECCCCCCCCCCCCCH AA: IRQNWVMTAAHCVDRELTFRVVVGEHNLNQNDGTEQYVGVQKIVVHPYWNTDDVAAGYDI 70 80 90 100 110 120 Conf: 010005863201367530113433210010268995234110254467622168863110 Pred: HHEECCCCCCEEEEEEEECCCCCCCCCCCCEEEEEEECCCCCCCCCCCCCCEEEEEEEEE AA: ALLRLAQSVTLNSYVQLGVLPRAGTILANNSPCYITGWGLTRTNGQLAQTLQQAYLPTVD 130 140 150 160 170 180 Conf: 024554202566567752773344343221110467438998993899999972376889 Pred: CHHHHHHHCCCCCCCCCEEEECCCCCCCCCEEECCCCEEEEECCEEEEEEEEEECCCCCC AA: YAICSSSSYWGSTVKNSMVCAGGDGVRSGCQGDSGGPLHCLVNGQYAVHGVTSFVSRLGC 190 200 210 220 230 240 Conf: 88988779999687678899886049 Pred: CCCCCCEEEEEHHHHHHHHHHHHHCC AA: NVTRKPTVFTRVSAYISWINNVIASN 250 260 (see later for comparison with solved structure)

Meta-Servers A server which allows you to obtain predictions from different parallel methods under one browser window, eg: PredictProtein: http://predictprotein.org or makes predictions based on several methods (consensus), eg: 3D-Jury: http://bioinfo.pl/meta GeneSilico: http://www.genesilico.pl/meta

The PredictProtein meta-server Sequence motif search: ProSite, ProDom, SEG. One-Dim structure prediction: secondary structure, transmembrane helices, solvent accessibility, globularity, disulfide bridge, conformational switch. Links to a multitude of other servers (numerous links also from 3D-Jury).

Motif Search at PP SEG: finds low complexity regions. ProSite: database of functional motifs, ie, biologically relevant short patterns. ProDom: a comprehensive set of protein domain families automatically generated from the SWISS-PROT and TrEMBL sequence databases. Motif: keeps reappearing: can be used to identify family of new sequence More on domains and protein family classification next week (ADDA, Pfam etc.). ProSite: http://au.expasy.org/prosite/ ProDom: http://protein.toulouse.inra.fr/prodom/current/html/home.php

One-Dim predictions at PP Use information from evolution: Sequence database is scanned for similar sequences (Blast, Psi-Blast). Multiple sequence alignment profiles are generated by weighted dynamic programming (MaxHom). The PROF (improved PHD) series: PROFsec (PHDsec): secondary structure, PROFacc (PHDacc): solvent accessibility, PHDhtm: transmembrane helices.

Meta-PP PredictProtein allows to automatically submit a query to other servers: Secondary structure prediction: Psi-Pred, SAM-T02, Jpred, … Membrane helices prediction: TMHMM, … Tertiary structure prediction: Homology: Swiss-Model, 3D-Jigsaw, … Threading: Superfamily, AGAPE, … Inter-residue contact prediction: CMAPpro, …

PredictProtein Demo Let´s submit again to http://predictprotein.org/ >uniprot|P00772|ELA1_PIG Elastase-1 precursor MLRLLVVASLVLYGHSTQDFPETNARVVGGTEAQRNSWPSQISLQYRSGSSWAHTCGGTL IRQNWVMTAAHCVDRELTFRVVVGEHNLNQNDGTEQYVGVQKIVVHPYWNTDDVAAGYDI ALLRLAQSVTLNSYVQLGVLPRAGTILANNSPCYITGWGLTRTNGQLAQTLQQAYLPTVD YAICSSSSYWGSTVKNSMVCAGGDGVRSGCQGDSGGPLHCLVNGQYAVHGVTSFVSRLGC NVTRKPTVFTRVSAYISWINNVIASN to http://predictprotein.org/ For a list of mirror sites: http://predictprotein.org/newwebsite/doc/mirrors.html

Let´s explore the results here.

Comparison with solved structure ELA1_PIG Elastase-1 has a solved structure: 1EST DSSP: ??????????????????????????CBTCEECCTTTCTTEEEEEEEETTEEEEEEEEEEEETTEEEECSGGGCSCCCEE PSIP: .HHHHHHHHHHHHH............EE..EE........EEEEEEE.....EEEEEEEE....EEEEE.........EE PROF: ..HHHHHHHHHHH............EEEE.EE.......EEEEEEEE......EEEEEEEE...EEEEEEEEE.....EE DSSP: EEESCSBTTSCCSCCEEEEEEEEEECTTCCTTCGGGCCCCEEEEESSCCCCBTTBCCCCCCCTTCCCCTTCCEEEEESCB PSIP: EEEEEEEEEE.....EEEEEEEEEEE.............HHHEE......EEEEEEEE............EEEEEEE... PROF EEEEEEE........EEEEEEEEEEE.............EEEEEE........EEEEEE............EEEEEEEE. DSSP: SSTTCCBCSBCEEEECCEECHHHHTSTTTTGGGSCTTEEEECCSSSSBCCTTCTTCEEEEEETTEEEEEEEEEECBTTBS PSIP: ...........EEEEEEEEE.HHHHHHH.........EEEE.........EEE....EEEEE..EEEEEEEEEE...... PROF: ..........EEEEEEEEE..................EEEE...............EEEEEE...EEEEEEEE....... DSSP: SBTTBCEEEEEGGGSHHHHHHHHHTC PSIP: ......EEEEEHHHHHHHHHHHHH.. PROF: .......EEEEHHHHHHHHHHHH... DSSP: secondary structure assignment from PDB (Kabsch-Sander, 1983) H = alpha helix B = residue in isolated beta-bridge E = extended strand, participates in beta ladder G = 3-helix (3/10 helix) I = 5 helix (pi helix) T = hydrogen bonded turn S = bend

Conclusions Both predictions agree quite well and are quite accurate. But: it may not be as good next time. => Compare predictions from different methods to check whether there is a consensus. Use servers that automatically combine different methods (3D-Jury, ...).

Benchmarks LiveBench http://bioinfo.pl/meta/livebench.pl CASP (critical assessment of structure prediction) http://predictioncenter.gc.ucdavis.edu/ CAFASP (ca of fully automated structure prediction) http://www.cs.bgu.ac.il/~dfisher/CAFASP5/index.html

References Documentation: Articles: Books: COILS: http://www.ch.embnet.org/software/coils/COILS_doc.html TMPred: http://www.ch.embnet.org/software/tmbase/TMBASE_doc.html MPEx: http://blanco.biomol.uci.edu/mpex/MPEXdoc.html Articles: B. Rost: Evolution teaches neural networks. In Scientific applications of neural nets. Ed. J.W.Clark, T.Lindenau, M.L. Ristig, 207-223 (1999). D.T Jones: Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices. J.Mol.Biol. 292, 195-202 (1999). B. Rost: Prediction in 1D: Secondary Structure, Membrane Helices, and Accessibility. In Structural Bioinformatics (reference below). Books: P.E. Bourne, H. Weissig: Structural Bioinformatics. Wiley-Liss, 2003. A. Tramontano: Protein Structure Prediction. Wiley-VCH, 2006.