Template-based Prediction of Protein 8-state Secondary Structures June 12 th 2013 Ashraf Yaseen and Yaohang Li DEPARTMENT OF COMPUTER SCIENCE OLD DOMINION.

Slides:



Advertisements
Similar presentations
Transmembrane Protein Topology Prediction Using Support Vector Machines Tim Nugent and David Jones Bioinformatics Group, Department of Computer Science,
Advertisements

Secondary structure prediction from amino acid sequence.
PhyCMAP: Predicting protein contact map using evolutionary and physical constraints by integer programming Zhiyong Wang and Jinbo Xu Toyota Technological.
11/9/99ICTAI-99, Chicago1 Protein Secondary Structure Prediction Using Data Mining Tool C5 Meiliu Lu †, Du Zhang †, Hongjun Xu †, Ken Tse-yau Lau ‡, and.
Protein Backbone Angle Prediction with Machine Learning Approaches by R Kang, C Leslie, & A Yang in Bioinformatics, 1 July 2004, vol 20 nbr 10 pp
High Throughput Computing and Protein Structure Stephen E. Hamby.
Three-Stage Prediction of Protein Beta-Sheets Using Neural Networks, Alignments, and Graph Algorithms Jianlin Cheng and Pierre Baldi Institute for Genomics.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Protein Secondary Structures
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
Garnier-Osguthorpe-Robson
IT og Sundhed 2010/11 Sequence based predictors. Secondary structure and surface accessibility Bent Petersen 13 January 2011.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, All rights reserved.
CISC667, F05, Lec20, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction Protein Secondary Structure.
Protein Secondary Structures Assignment and prediction Pernille Haste Andersen
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU October 29, 2004Claus Lundegaard Protein Secondary Structures Assignment and.
Project list 1.Peptide MHC binding predictions using position specific scoring matrices including pseudo counts and sequences weighting clustering (Hobohm)
Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved. Biological Language Modeling Project TXTpred: A New.
Profiles and Fuzzy K-Nearest Neighbor Algorithm for Protein Secondary Structure Prediction Rajkumar Bondugula, Ognen Duzlevski and Dong Xu Digital Biology.
Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs Christopher Bystroff & David Baker Paper presented by: Tal Blum.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Predicting Protein Solvent Accessibility with Sequence, Evolutionary Information and Context-based Features 12/05/2013 Ashraf Yaseen Department of Mathematics.
Protein Tertiary Structure Prediction
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
Rising accuracy of protein secondary structure prediction Burkhard Rost
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,
Protein Secondary Structure Prediction with inclusion of Hydrophobicity information Tzu-Cheng Chuang, Okan K. Ersoy and Saul B. Gelfand School of Electrical.
Representations of Molecular Structure: Bonds Only.
Protein Secondary Structure Prediction. Input: protein sequence Output: for each residue its associated Secondary structure (SS): alpha-helix, beta-strand,
Protein Secondary Structure Prediction: A New Improved Knowledge-Based Method Wen-Lian Hsu Institute of Information Science Academia Sinica, Taiwan.
Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Protein Secondary Structure Prediction
Secondary structure prediction
2 o structure, TM regions, and solvent accessibility Topic 13 Chapter 29, Du and Bourne “Structural Bioinformatics”
Web Servers for Predicting Protein Secondary Structure (Regular and Irregular) Dr. G.P.S. Raghava, F.N.A. Sc. Bioinformatics Centre Institute of Microbial.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
Protein Secondary Structure Prediction G P S Raghava.
1 Protein Structure Prediction (Lecture for CS397-CXZ Algorithms in Bioinformatics) April 23, 2004 ChengXiang Zhai Department of Computer Science University.
Meng-Han Yang September 9, 2009 A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins.
Study of Protein Prediction Related Problems Ph.D. candidate Le-Yi WEI 1.
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
LOGO iDNA-Prot|dis: Identifying DNA-Binding Proteins by Incorporating Amino Acid Distance- Pairs and Reduced Alphabet Profile into the General Pseudo Amino.
Speech Lab, ECE, State University of New York at Binghamton  Classification accuracies of neural network (left) and MXL (right) classifiers with various.
Feature Extraction Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and.
Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features 王荣 14S
Matching Protein  -Sheet Partners by Feedforward and Recurrent Neural Network Proceedings of Eighth International Conference on Intelligent Systems for.
Comparative methods Basic logics: The 3D structure of the protein is deduced from: 1.Similarities between the protein and other proteins 2.Statistical.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Final Report (30% final score) Bin Liu, PhD, Associate Professor.
“ Using Sequence Motifs for Enhanced Neural Network Prediction of Protein Distance Constraints ” J.Gorodkin, O.Lund, C.A.Anderson, S.Brunak On ISMB 99.
Proteins Structure Predictions Structural Bioinformatics.
We propose an accurate potential which combines useful features HP, HH and PP interactions among the amino acids Sequence based accessibility obtained.
Predicting Structural Features Chapter 12. Structural Features Phosphorylation sites Transmembrane helices Protein flexibility.
Improved Protein Secondary Structure Prediction. Secondary Structure Prediction Given a protein sequence a 1 a 2 …a N, secondary structure prediction.
Feature Extraction Introduction Features Algorithms Methods
Prediction of RNA Binding Protein Using Machine Learning Technique
Master’s Thesis defense Ming Du Advisor: Dr. Yi Shang
Extra Tree Classifier-WS3 Bagging Classifier-WS3
Introduction to Bioinformatics II
Support Vector Machine (SVM)
Protein Structure Prediction
Sequence Based Analysis Tutorial
Protein Structures.
Protein structure prediction
Presentation transcript:

Template-based Prediction of Protein 8-state Secondary Structures June 12 th 2013 Ashraf Yaseen and Yaohang Li DEPARTMENT OF COMPUTER SCIENCE OLD DOMINION UNIVERSITY, NORFOLK, VA 3rd IEEE International Conference on Computational Advances in Bio and Medical Sciences (ICCABS)

Contents  Introduction  Secondary Structure Definition & Representation  Secondary Structure Prediction  C8-Scorpion  Materials & Methods  Data Sets, Template Construction, and Encoding  Neural Network Model  Results & Discussions  Summary 2

Protein Secondary Structure Prediction in Protein Modeling 3  Proteins; Proteios, “primary”, “of prime importance.” The primary components of living things  In nature, proteins fold into specific 3D structures  critical to their functions Protein Modeling  Correctly predicting protein secondary structure is a critical step stone to obtain correct 3D models Sequen ce 3D intermediate prediction steps

Secondary Structures - Definition Protein 1BOO Chain A π -helix α -helix helix Turn Bend Other β -strand 4 General 3D form of local segments of residues Identified from determined protein 3D DSSP

Secondary Structures - Representation helix (G) α -helix (H) π -helix (I) β -stand (E) bridge (B) turn (T) bend (S) others (C)

Secondary Structure Prediction - Effectiveness 6  Correctly predicting secondary structure  Reduce the degrees of freedom in protein structure modeling  reduce the difficulty of obtaining high resolution 3D models  Derive a much smaller range of possible torsion angles

Secondary Structure Prediction - Background 7 Secondary Structure Prediction 3-state (helix, sheet, coil) 8-state ( α -helix, π -helix, helix, β -strand, β - bridge, turn, bend and others) Predictor Structural state of Ri Secondary Structure Prediction  classification Each residue is predicted to be in one of few states Machine Learning (ANN, SVM, HMM,...)  3-state Examples:  GOR4, PSI-Pred, PHD, SAM, Porter, JPred, SPINE, SSPRO, NETSURF, and many others.  ~80% (Q3)  8-state Examples:  SSpro8, 62-63% Q8  RaptorXss8, 67.9% Q8

Secondary Structure Prediction - 8-state 8 Prediction Accuracy of RaptorXss8 on Benchmarks of CB513, CASP9, Manesh215, and Carugo338. Prediction accuracies for 3-10 helices (G), π -helices (I), β -bridges (B), and bends (T) are particularly low due to their low appearance frequencies Distribution of 3-10 helices (G), α -helices (H), π -helices (I), β -sheets (E), β -bridges (B), turns (T), bends (S), and coils (C) in Cull5547

Secondary Structure Prediction - Template-based 9  Most current methods for secondary structure predictions are ab initio  However, many protein sequences have some degree of similarity among themselves  Latest version of Porter (in 3-state)  Improvement in prediction accuracy with >30% sequence similarity  Decline in efficiency with low sequence similarity <20%

Template-based C8-SCORPION 10 Predictor Structural feature (state) of Ri Input encoding Sequence & evolutionary info (PSSM) + Structure info. from (templates Or context-based scores) Is an extension of our previous method C3-SCORPION

Materials & Methods 11 Cull5547 PISCES server 25% (at most) sequence identity, 2.0A resolution CASP9 Manesh215 Carugo338 CB513 Data Sets Template Construction Encoding Context-based scores: potential scores, based on statistics, derived from the protein datasets, estimate the favorability of residues in adopting specific structural states, within their amino acid environment.

Materials & Methods -cont. 12 Two phases of template-based 8-state secondary structure prediction (architecture and encoding) Neural Network Model

Results & Discussions 13 Q8Q8 SOV 8 G H I 0.00 E B S T C Overall fold cross-validation accuracy in template-based 8-state prediction Q8Q8 SOV 8 No TemplateWith TemplateNo TemplateWith Template CB CASP Manesh Carugo Comparison between 8-state predictions with and without template on Benchmarks Distribution of 8-state secondary structure prediction accuracy (Q8) as a function of sequence similarity- the first group of bars corresponds to template-less predictions

Results & Discussions -cont. 14 (0, 10](10, 20](20, 40](40, 70](70, 95] # of chains 4,4264,2153,2041,4371,133 QHQH QGQG QIQI 0.00 QEQE QBQB QTQT QSQS QCQC Q8Q Comparison of 7-fold cross validation prediction accuracies in eight states when templates with different sequence similarities are used

Results & Discussions -cont. 15 Comparison between template-less and template-based predictions on 1BTN chain A

16 Working with C8-Scorpion Input title Input your sequence Input your Submit, then wait for the results... “C8-Scorpion” available at:

17 Working with C8-Scorpion Check your , Click the link provided The results are displayed

Summary  The effectiveness of using structural information in templates has been demonstrated in our computational results in 7-fold cross validation as well as on benchmarks, where enhancements of prediction accuracies are observed.  Overall, 78.85% Q8 accuracy and 80.10% SOV8 accuracy are achieved in 7-fold cross validation  More importantly, when good templates are available, the prediction accuracy of less frequent secondary structure states, such as 3-10 helices, turns, and bends, are highly improved, which are suitable for practical use in applications.  A webserver (C8-Scorpion) implementing template-less 8-state secondary structure prediction is currently available at The integration of template- based prediction into the C8-Scorpion webserver is currently under development 18

Acknowledgement 19 This work is partially supported by NSF grant and ODU 2013 Multidisciplinary Seed grant

Questions? Thank You 20