Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003.

Slides:



Advertisements
Similar presentations
Secondary structure prediction from amino acid sequence.
Advertisements

Gapped Blast and PSI BLAST Basic Local Alignment Search Tool ~Sean Boyle Basic Local Alignment Search Tool ~Sean Boyle.
Three-Stage Prediction of Protein Beta-Sheets Using Neural Networks, Alignments, and Graph Algorithms Jianlin Cheng and Pierre Baldi Institute for Genomics.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Protein Secondary Structures
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
Protein threading algorithms 1.GenTHREADER Jones, D. T. JMB(1999) 287, Protein Fold Recognition by Prediction-based Threading Rost, B., Schneider,
Predicting local Protein Structure Morten Nielsen.
Profile-profile alignment using hidden Markov models Wing Wong.
Expect value Expect value (E-value) Expected number of hits, of equivalent or better score, found by random chance in a database of the size.
Readings for this week Gogarten et al Horizontal gene transfer….. Francke et al. Reconstructing metabolic networks….. Sign up for meeting next week for.
IT og Sundhed 2010/11 Sequence based predictors. Secondary structure and surface accessibility Bent Petersen 13 January 2011.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Protein Secondary Structures Assignment and prediction.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Protein Secondary Structures Assignment and prediction.
Protein Secondary Structures Assignment and prediction.
CISC667, F05, Lec20, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction Protein Secondary Structure.
Protein Secondary Structures Assignment and prediction Pernille Haste Andersen
Structure Prediction in 1D
Similar Sequence Similar Function Charles Yan Spring 2006.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU October 29, 2004Claus Lundegaard Protein Secondary Structures Assignment and.
Protein Secondary Structures Assignment and prediction.
Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved. Biological Language Modeling Project TXTpred: A New.
Predicting local Protein Structure Morten Nielsen.
Introduction to Bioinformatics - Tutorial no. 8 Predicting protein structure PSI-BLAST.
Class 7: Protein Secondary Structure
Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs Christopher Bystroff & David Baker Paper presented by: Tal Blum.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Template-based Prediction of Protein 8-state Secondary Structures June 12 th 2013 Ashraf Yaseen and Yaohang Li DEPARTMENT OF COMPUTER SCIENCE OLD DOMINION.
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
Rising accuracy of protein secondary structure prediction Burkhard Rost
Proteins Secondary Structure Predictions Structural Bioinformatics.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,
Protein Secondary Structure Prediction Some of the slides are adapted from Dr. Dong Xu’s lecture notes.
Scoring Matrices Scoring matrices, PSSMs, and HMMs BIO520 BioinformaticsJim Lund Reading: Ch 6.1.
Protein Secondary Structure Prediction: A New Improved Knowledge-Based Method Wen-Lian Hsu Institute of Information Science Academia Sinica, Taiwan.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Protein Secondary Structure Prediction
2 o structure, TM regions, and solvent accessibility Topic 13 Chapter 29, Du and Bourne “Structural Bioinformatics”
P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee CS 882 Protein Folding Instructed by Professor Ming Li.
Web Servers for Predicting Protein Secondary Structure (Regular and Irregular) Dr. G.P.S. Raghava, F.N.A. Sc. Bioinformatics Centre Institute of Microbial.
Protein structure prediction May 26, 2011 HW #8 due today Quiz #3 on Tuesday, May 31 Learning objectives-Understand the biochemical basis of secondary.
Bioinformatics Ayesha M. Khan 9 th April, What’s in a secondary database?  It should be noted that within multiple alignments can be found conserved.
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
Protein Secondary Structure Prediction G P S Raghava.
1 Protein Structure Prediction (Lecture for CS397-CXZ Algorithms in Bioinformatics) April 23, 2004 ChengXiang Zhai Department of Computer Science University.
Meng-Han Yang September 9, 2009 A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins.
Study of Protein Prediction Related Problems Ph.D. candidate Le-Yi WEI 1.
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
1 Improve Protein Disorder Prediction Using Homology Instructor: Dr. Slobodan Vucetic Student: Kang Peng.
Protein Structure Prediction Graham Wood Charlotte Deane.
Query sequence MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDN GVDGEWTYTE Structure-Sequence alignment “Structure is better preserved than sequence” Me! Non-redundant.
Secondary Structure Prediction Lecture 7 Structural Bioinformatics Dr. Avraham Samson
Comparative methods Basic logics: The 3D structure of the protein is deduced from: 1.Similarities between the protein and other proteins 2.Statistical.
Protein Prediction with Neural Networks! Chris Alvino CS152 Fall ’06 Prof. Keller.
Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Protein motif /domain Structural unit Functional unit Signature of protein family How are they defined?
“ Using Sequence Motifs for Enhanced Neural Network Prediction of Protein Distance Constraints ” J.Gorodkin, O.Lund, C.A.Anderson, S.Brunak On ISMB 99.
Proteins Structure Predictions Structural Bioinformatics.
Predicting Structural Features Chapter 12. Structural Features Phosphorylation sites Transmembrane helices Protein flexibility.
Improved Protein Secondary Structure Prediction. Secondary Structure Prediction Given a protein sequence a 1 a 2 …a N, secondary structure prediction.
Statistical Machine Learning Methods for Bioinformatics IV
Yang Liu, Perry Palmedo, Qing Ye, Bonnie Berger, Jian Peng 
Bidirectional Dynamics for Protein Secondary Structure Prediction
Protein structure prediction
Neural Networks for Protein Structure Prediction Dr. B Bhunia.
Presentation transcript:

Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003

Protein Secondary Structure Dictionary of Secondary Structure Prediction (DSSP) based on hydrogen bonding patterns and geometrical constraints 7 DSSP labels for PSS: Helix types: H (alpha-helix) G ( 3 / 10 helix) Sheet types: B (extended strand, participates in beta ladder) E (isolated beta-bridge strand) Coil types: T _ S (Coil)

Protein Secondary Structure Prediction Given a protein sequence: APAFSVSPASGA Predict its secondary structure sequence: CCEEEEECCCC Application Provide constraints for tertiary structure predictions or as part of fold recognition

Related Work Standard SS prediction methods: PHD (Rost & Sander 1993) Multiple sequence profiles Based on the observations that conserved regions are functional important, and (or) buried in the protein core Benner & Gerloff demonstrated that the degree of solvent accessibility can be predicted with reasonable accuracy Two-layered feed-forward Neural networks

PSIPRED: Generation of a sequence profile Position-specific score matrices Prediction of initial secondary structure Standard feed-forward back-propagation networks Filtering the predicted structures

Position-specific scoring matrices (PSSM) -1 PSSM (Altschul et al., 1997), or profiles Given a protein sequence with length N, together with its multiple sequence alignment Construct a Nx20 matrix Score definition Different methods for estimating Qi Alpha = Nc-1, beta = 10 Fi: weighted observed frequencies Other estimation:

Position-specific scoring matrices (PSSM) -2 Advantage A more sensitive scoring system Improved estimation of the probabilities of which amino acids occur at pattern position Relatively precise definition of the boundaries of important motifs Disadvantage Too sensitive to biases in the sequence data banks Prone to erroneously incorporating repetitive sequences into the profiles

PSSM in PSIPRED Input to neural networks: The PSSM from PSI-BLAST after three iterations Set to window size to 15 Scaled to the 0-1 range by standard logistic function

Neural network architecture-1 Two stage neural networks 1 st stage: Sequence to structure mapping 315 inputs: 21 * hidden units: 3 * 15 2 nd stage: Structure to structure mapping 60 inputs: 4 * hidden variable: 4 * 15 (extra input to indicate the window spans a chain terminus)

Neural network architecture-2 Training parameters Momentum term: 0.9 Learning rate: Prevent overfitting: leave 10% of the training set for validation

Experimental results Training and testing data Collected to remove structural similarity Apply CATH to detect homologous protein sequences A total of 187 protein sequences: 62, 62, 63 Three-way cross-validation

Experimental results Per-chain results Distribution of Q3 and SOV (left) Avg Q3: 76.0% Avg SOV: 73.5% Per-residue results Q3: 76.5%

Experimental results Rank top 1 in CASP –3 Avg Q3: 73.4% (69.0% by top 2, 66.7% by PHD) Avg SOV: 71.9% (65.7% by top 2, 63.8% by PHD) Also rank top 1 in CASP –4 (Dec, 2000)

Conclusion PSIPRED is by far the best method for secondary structure prediction The difference between PHD and PSIPRED: Position-specific scoring matrices Training data