Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in.

Slides:



Advertisements
Similar presentations
Secondary structure prediction from amino acid sequence.
Advertisements

Blast to Psi-Blast Blast makes use of Scoring Matrix derived from large number of proteins. What if you want to find homologs based upon a specific gene.
Chimera. Chimera 1/3 Starting Chimera Open Chimera from desktop (ZDV app) (If there is an update it will take a minute or two) Open a 3D structure by.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala.
PROTEIN SECONDARY STRUCTURE PREDICTION WITH NEURAL NETWORKS.
Protein Secondary Structures
Progressive MSA Do pair-wise alignment Develop an evolutionary tree Most closely related sequences are then aligned, then more distant are added. Genetic.
Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, All rights reserved.
The Protein Data Bank (PDB)
CISC667, F05, Lec20, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction Protein Secondary Structure.
Protein Secondary Structures Assignment and prediction Pernille Haste Andersen
Structure Prediction in 1D
Similar Sequence Similar Function Charles Yan Spring 2006.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
PREDICTION OF PROTEIN FEATURES Beyond protein structure (TM, signal/target peptides, coiled coils, conservation…)
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Artificial Neural Networks for Secondary Structure Prediction CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (slides by J. Burg)
Situations where generic scoring matrix is not suitable Short exact match Specific patterns.
© Wiley Publishing All Rights Reserved. Searching Sequence Databases.
Multiple sequence alignment
Protein Tertiary Structure Prediction
Protein domains. Protein domains are structural units (average 160 aa) that share: Function Folding Evolution Proteins normally are multidomain (average.
Protein Sequence Alignment and Database Searching.
Rising accuracy of protein secondary structure prediction Burkhard Rost
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Protein Secondary Structure Prediction. Input: protein sequence Output: for each residue its associated Secondary structure (SS): alpha-helix, beta-strand,
Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003.
© Wiley Publishing All Rights Reserved. Protein 3D Structures.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Multiple Alignments Motifs/Profiles What is multiple alignment? HOW does one do this? WHY does one do this? What do we mean by a motif or profile? BIO520.
Sequence Based Analysis Tutorial NIH Proteomics Workshop Lai-Su Yeh, Ph.D. Protein Information Resource at Georgetown University Medical Center.
Web Servers for Predicting Protein Secondary Structure (Regular and Irregular) Dr. G.P.S. Raghava, F.N.A. Sc. Bioinformatics Centre Institute of Microbial.
Protein structure prediction May 26, 2011 HW #8 due today Quiz #3 on Tuesday, May 31 Learning objectives-Understand the biochemical basis of secondary.
Copyright OpenHelix. No use or reproduction without express written consent1.
Protein Secondary Structure Prediction G P S Raghava.
Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department.
Motif discovery and Protein Databases Tutorial 5.
Manually Adjusting Multiple Alignments Chris Wilton.
Generic substitution matrix based sequence comparison Q: M A T W L I. A: M A - W T V. Scr: 45 -?11 3 Scr: Q: M A T W L I. A: M A W T V A. Total:
(PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Sequence Based Analysis Tutorial
Point Specific Alignment Methods PSI – BLAST & PHI – BLAST.
Exercises Pairwise alignment Homology search (BLAST) Multiple alignment (CLUSTAL W) Iterative Profile Search: Profile Search –Pfam –Prosite –PSI-BLAST.
Doug Raiford Lesson 5.  Dynamic programming methods  Needleman-Wunsch (global alignment)  Smith-Waterman (local alignment)  BLAST Fixed: best Linear:
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
HANDS-ON ConSurf! Web-Server: The ConSurf webserver.
HANDS-ON ConSurf! Web-Server: The ConSurf webserver.
Protein motif /domain Structural unit Functional unit Signature of protein family How are they defined?
Predicting Structural Features Chapter 12. Structural Features Phosphorylation sites Transmembrane helices Protein flexibility.
Improved Protein Secondary Structure Prediction. Secondary Structure Prediction Given a protein sequence a 1 a 2 …a N, secondary structure prediction.
Protein domains Miguel Andrade Mainz, Germany Faculty of Biology,
Homology 3D modeling Miguel Andrade Mainz, Germany Faculty of Biology,
Protein 3D representation
Prediction of protein features. Beyond protein structure
Secondary structure prediction
Protein domains Miguel Andrade Mainz, Germany Faculty of Biology,
Protein 3D representation
Secondary structure prediction
Protein 3D representation
Protein domains Miguel Andrade Mainz, Germany Faculty of Biology,
Homology 3D modeling and effect of mutations
Sequence Based Analysis Tutorial
Adva Yeheskel Bioinformatics Unit, Tel Aviv University 8/5/2018
Sequence Based Analysis Tutorial
Protein domains Miguel Andrade Mainz, Germany Faculty of Biology,
Protein 3D representation
Neural Networks for Protein Structure Prediction Dr. B Bhunia.
Presentation transcript:

Secondary structure prediction

Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in 2009

Secondary structure prediction Limits: Limited to globular proteins Not for membrane proteins

Secondary structure prediction Applications Site directed mutagenesis Locate functionally important residues Find structural units / domains

Secondary structure prediction Techniques Linear statistics Physicochemical properties Linear discrimination Machine learning Neural Networks K-nearest neighbours Evolutionary trees Residue substitution matrices Using evolutionary information = Multiple sequence alignments.

Secondary structure prediction Jnet Cuff and Barton (2000) Neural Network Training set: 480 proteins (non homologous) Construction of MSA for each using BLAST

Secondary structure prediction Jnet Neural network Ni Neuron i, Nj neuron j Wij weight from Ni to Nj Ni Nj Wij Signal forward propagation Output from Ni * Weight Ni to Nj Input to Nj is Ij = Oi * Wij

Secondary structure prediction Jnet Neural network Ni Neuron i, Nj neuron j Wij weight from Ni to Nj Ni Nz Wij Nj Nk Input layer Output layer

Secondary structure prediction Jnet Neural network The network receives input values Ni Nz Wiz Nj Nk Wjz Wkz Ii Ij Ik Input layer Output layer

Secondary structure prediction Jnet Neural network Signal forward propagation Ni Nz Wiz Nj Nk Sum of outputs from Ni, Nj, Nk Wjz Wkz Oz Ii Ij Ik

Secondary structure prediction Jnet Neural network Compute the error Ni Nz Wiz Nj Nk Desired value is 1, Oz is 0.8 Error is = Oz – desired value = =0.2 Wjz Wkz Oz Ii Ij Ik

Secondary structure prediction Jnet Neural network Error backpropagation Ni Nz Wiz Nj Nk Weights are modified so that the result is a bit closer to what we wanted Wjz Wkz Oz

Secondary structure prediction Jnet Neural network Ni Nz Nj Nk Input layer Output layer Hidden layer Np Nq

Secondary structure prediction Jnet Neural network Input layer: read a sequence A A C C D D … Y Y CTEIL...

Secondary structure prediction Jnet Neural network Input layer: read a sequence A A C C D D … Y Y CDEKL

Secondary structure prediction Jnet Neural network Input layer: read a sequence A A C C D D … Y Y CDEKL

Secondary structure prediction Jnet Neural network Input layer: read a sequence A A C C D D … Y Y CDEKL...

Secondary structure prediction Jnet Neural network Output layer: structure Desired output: known structure A A C C D D … Y Y CDEKL... b b c c a a Alpha-helix 1 0 0

Secondary structure prediction Jnet Neural network Output layer: structure Desired output: known structure A A C C D D … Y Y CDEKL... b b c c a a Alpha-helix

Secondary structure prediction Jnet Neural network Error backpropagation = weights are modified A A C C D D … Y Y CDEKL... b b c c a a Alpha-helix

Secondary structure prediction Jnet Jnet architecture LAPEDCDEKLKLEPNAC b b c c a a Sequence to structure network Input layer = window of 17 residues Hidden layer = 9 neurons Output layer = 3 neurons

Secondary structure prediction Jnet Jnet architecture FLAPEDCDEKLKLEPNACW b b c c a a ccaacaaccbbbbbcbbbc

Secondary structure prediction Jnet Jnet architecture FLAPEDCDEKLKLEPNACW b b c c a a Structure to structure network b b c c a a Input layer = window of 19 residues Hidden layer = 9 neurons Output layer = 3 neurons ccaacaaccbbbbbcbbbc ccaaaaacccbbbbbbbbc

Secondary structure prediction Cole et al (2008) Nucleic Acids Research Geoff Barton, University of Dundee

Secondary structure prediction Uses algorithm Jnet2.0 Three state prediction Alpha, beta, coil Accuracy 81.5% (2008) But if no homolog (orphan sequence) 65.9%! PSIBLAST PSSM matrix HMMer profiles (instead of aa frequencies) Multiple neural networks 100 hidden layer units Jpred

Secondary structure prediction First, search against PDB sequences using BLAST (but only for warning) PSIBLAST search of UniRef90, 3 iterations, Alignment of hits (filtered at 75% id) Profiles from alignment (PSSM and HMMer) Profiles are input to JNet Alternative: user provides alignment (faster) Jpred

Secondary structure prediction Advanced Jpred4 usage

Secondary structure prediction Jpred output

Secondary structure prediction Jpred output / Jalview JPred

Secondary structure prediction Jpred output / view all JPred

Secondary structure prediction Jpred output / PDF output JPred

Starting Jalview Open Firefox with JRE (from ZDV) Go to Click the pink arrow “Launch Jalview Desktop” You can close all the demo windows that appear Exercise 1/4 Jalview 2D prediction

Load an alignment Use MR1_fasta.txt This is an alignment of a fragment of the mineralocorticoid receptor Open it from File > Input alignment > From file (Hint: You can load it directly as an URL, e.g. mainz.de/files/2015/02/MR1_fasta.txt) mainz.de/files/2015/02/MR1_fasta.txt The alignment has its own Menu tabs Try Colour > Clustalx to see conservation Exercise 1/4 Jalview 2D prediction

Exercise 2/4 Jalview 2D prediction Build a phylogenetic tree Also using MR1_fasta.txt Try Calculate > Calculate tree > Neighbour joining using PAM 250 Hint: We used identifiers indicating species names Try now Calculate > Calculate tree > Average distance using PAM 250 Compare results. Which one is better?

Exercise 2/4 Jalview 2D prediction Xenopus Anolis Taeniopygia Gallus Monodelphis Danio

Web service -> Secondary Structure Prediction -> Jnet secondary str pred No selection (or all sequences selected) = Jnet runs on top sequence using the alignment (fast) One sequence (or region) selected = Jnet runs on that sequence using homologs (slow) Some sequences selected = Jnet runs on top one using homologs (slow) Try with all sequences selected Exercise 3/4 Jalview 2D prediction

If this doesn’t work you can run directly MR1_fasta.txt on jpred4. Use the advanced option Upload a file option Select type of input = Multiple alignment (use format FASTA) Tick the skip PDB search option There is an option to view output in Jalview Exercise 3/4 Jalview 2D prediction

Annotations: Lupas_21, Lupas_14, Lupas_28 Coiled-coil predictions for the sequence. 21, 14 and 28 are windows used. Exercise 3/4 Jalview 2D prediction

Annotations: JNETHMM, JNETALIGN: predictions using diff profiles Jnetpred: Consensus prediction. Beta sheets: green arrows. Alpha helices: red tubes. Exercise 3/4 Jalview 2D prediction

Annotations: JNETCONF Confidence in the prediction. Exercise 3/4 Jalview 2D prediction

Annotations: JNETSOL25,JNETSOL5,JNETSOL0 Solvent accessibility predictions - binary predictions of 25%, 5% or 0% solvent accessibility. Exercise 3/4 Jalview 2D prediction

Obtain the sequence of the human glutamine synthetase. Run BLAST with the human sequence against: 1) the archaea Methanosarcina 2) the bacteria Escherichia coli 3) the fungi Pseudozima Antarctica Get the best homolog, align the sequences (including the human protein, on top) and use the input in Jalview. Exercise 4/4 2D prediction of known 3D

NCBI BLAST against single species is faster! Exercise 4/4 2D prediction of known 3D

Obtain the sequence of the human glutamine synthetase. Run BLAST with the human sequence against: 1) the archaea Methanosarcina 2) the bacteria Escherichia coli 3) the fungi Pseudozima Antarctica Get the best homolog from each, align the sequences and use the input in Jalview. Put the human protein on top. Exercise 4/4 2D prediction of known 3D

Load the alignment in Jalview and run web prediction. Alternative. Run the alignment in the Jpred4 server. (Hint: You could run the human sequence alone but that will search for homologs and will take very long) Compare the prediction with the known 3D of the human protein (open it in Chimera, File > Fetch by ID > PDB 2QC8) Exercise 4/4 2D prediction of known 3D

We need to hide all chains except one. Select one of the chains (ctrl + click on a residue, then arrow up). Invert selection (press arrow right). Actions > Ribbon > Hide Actions > Atoms/bonds > Hide Select the chain and focus on it Actions > Focus Exercise 4/4 2D prediction of known 3D

Compare the output of jpred/jalview 2D pref with the 3D structure of this protein. For example, locate a predicted helix or beta-strand in Jalview. Find out the start and end positions hovering over the human sequence with the mouse (the numbers on top of the alignment are different from the amino acid positions in each sequence). Color the corresponding residues it in the 3D view using Select > Atom specifier And ranges: e.g. : (predicted as helix) Actions > color > red Apply color some helices red and strands in green. Do you see differences? Where are they? Would you say that the 2D prediction was reasonable? Exercise 4/4 2D prediction of known 3D