Proteins Secondary Structure Predictions
Specific databases of protein sequences and structures Swissprot PIR TREMBL (translated from DNA) PDB (Three Dimensional Structures)
Protein Structure Primary Secondary Tertiary Quaternary Amino acid sequence Alpha helices & Beta sheets, loops. Packing of secondary elements. Packing of several polypeptide chains
Symbols for the 20 amino acids A ala alanine M met methionine C cys cysteine N asn aspargine D asp aspartic acid P pro proline E glu glutamic acid Q gln glutamine F phe phenylalanine R arg arginine G gly glycine S ser serine H his histidine T thr threonine I ile isoleucine V val valine K lys lysine W trp tryptophane L leu leucine Y tyr tyrosine
The 20 Amino Acids
Grouping amino acids to physio-chemical properties
Myoglobin – the first high resolution protein structure Solved in 1958 by Max Perutz John Kendrew of Cambridge University. Won the 1962 and Nobel Prize in Chemistry. “ Perhaps the most remarkable features of the molecule are its complexity and its lack of symmetry. The arrangement seems to be almost totally lacking in the kind of regularities which one instinctively anticipates.”
Alpha Helices Right-handed spiral 5 to 40 amino acids (10 average) 3.6 amino acids per turn Some a.a. are more frequent than others in helices.
Beta Sheets Parallel – Strands run in the same direction (C to N) Anti-parallel- Strands run in opposite directions Each strand has 5-10 amino acids (6 average) Some a.a. are more frequent than others N C C C C N
Loop Regions All other protein regions Irregular shape and size Connect the secondary structure elements
Structure Presentation Ribbon diagram: Alpha helix Beta Sheet
Structure Presentation TOPS cartoon: beta sheets are triangles alpha helices are circles. the peptide chain runs from N terminus to C terminus.
Structure Prediction: Motivation Hundreds of thousands of gene sequences translated to proteins (genbanbk, SW, PIR) Only about 28000 solved structures (PDB) Goal: Predict protein structure based on sequence information
Structure Prediction: Motivation Understand protein function Locate binding sites Broaden homology Detect similar function where sequence differs Explain disease See effect of amino acid changes Design suitable compensatory drugs
Prediction Approaches Primary (sequence) to secondary structure Sequence characteristics Secondary to tertiary structure Fold recognition Threading against known structures Primary to tertiary structure Ab initio modelling
Can we predict the secondary structure from sequence ? a-helix b-sheet non- polar polar polar polar Non-polar Secondary structures have an amphiphilic nature : one face polar and the other non polar
Secondary Structure Prediction Why is it complex? A huge space of possible structures Assume a 100 aa chain only 2 possible conformations for each residue 2100~1030 different conformations for the chain as a whole. Infer secondary structure from sequence is problematic: Similar sequences may result in different structures (mutations, different environments). Different sequences may result in similar structures (the Globin fold).
Secondary Structure Prediction Methods Chou-Fasman / GOR Method Based on amino acid frequencies No more than 60% accurate Artificial Neural Network (ANN) methods PHDsec and PSIpred Use multiple sequences Secondary structure based on family Best accuracy now ~78%
PHDsec and PSIpred PHDsec PSIpred Rost & Sander, 1993 Based on sequence family alignments PSIpred Jones, 1999 Based on Position Specific Scoring Matrix Generated by PSI-BLAST Both consider long-range interactions
Brain Neurons Outgoing signal determined by incoming Connected together in networks Learns from experience
SS prediction using ANN F G H I K L M N P Q R S T V W Y . Inputs for one position Amino acid at position
Position-Specific Scoring Matrix
Inputs for one position PHDsec Neural Net A C D E F G H I K L M N P Q R S T V W Y . Inputs for one position Amino acid at position Outputs H= helix E= strand C= Coil Confidence 0=low,9=high Hidden layer
Secondary structure prediction AGADIR - An algorithm to predict the helical content of peptides APSSP - Advanced Protein Secondary Structure Prediction Server GOR - Garnier et al, 1996 HNN - Hierarchical Neural Network method (Guermeur, 1997) Jpred - A consensus method for protein secondary structure prediction at University of Dundee JUFO - Protein secondary structure prediction from sequence (neural network) nnPredict - University of California at San Francisco (UCSF) PredictProtein - PHDsec, PHDacc, PHDhtm, PHDtopology, PHDthreader, MaxHom, EvalSec from Columbia University Prof - Cascaded Multiple Classifiers for Secondary Structure Prediction PSA - BioMolecular Engineering Research Center (BMERC) / Boston PSIpred - Various protein structure prediction methods at Brunel University SOPMA - Geourjon and Delיage, 1995 SSpro - Secondary structure prediction using bidirectional recurrent neural networks at University of California DLP - Domain linker prediction at RIKEN