Predicting Protein Solvent Accessibility with Sequence, Evolutionary Information and Context-based Features 12/05/2013 Ashraf Yaseen Department of Mathematics.

Slides:



Advertisements
Similar presentations
Transmembrane Protein Topology Prediction Using Support Vector Machines Tim Nugent and David Jones Bioinformatics Group, Department of Computer Science,
Advertisements

Secondary structure prediction from amino acid sequence.
PhyCMAP: Predicting protein contact map using evolutionary and physical constraints by integer programming Zhiyong Wang and Jinbo Xu Toyota Technological.
11/9/99ICTAI-99, Chicago1 Protein Secondary Structure Prediction Using Data Mining Tool C5 Meiliu Lu †, Du Zhang †, Hongjun Xu †, Ken Tse-yau Lau ‡, and.
Protein Backbone Angle Prediction with Machine Learning Approaches by R Kang, C Leslie, & A Yang in Bioinformatics, 1 July 2004, vol 20 nbr 10 pp
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
Three-Stage Prediction of Protein Beta-Sheets Using Neural Networks, Alignments, and Graph Algorithms Jianlin Cheng and Pierre Baldi Institute for Genomics.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala.
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
Garnier-Osguthorpe-Robson
IT og Sundhed 2010/11 Sequence based predictors. Secondary structure and surface accessibility Bent Petersen 13 January 2011.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
CISC667, F05, Lec20, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction Protein Secondary Structure.
Structure Prediction in 1D
1 Protein Structure Prediction Charles Yan. 2 Different Levels of Protein Structures The primary structure is the sequence of residues in the polypeptide.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Template-based Prediction of Protein 8-state Secondary Structures June 12 th 2013 Ashraf Yaseen and Yaohang Li DEPARTMENT OF COMPUTER SCIENCE OLD DOMINION.
Protein Tertiary Structure Prediction
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
Accelerating Knowledge-based Energy Evaluation in Protein Structure Modeling with Graphics Processing Units 1 A. Yaseen, Yaohang Li, “Accelerating Knowledge-based.
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel:
Proteins Secondary Structure Predictions Structural Bioinformatics.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,
Protein Secondary Structure Prediction with inclusion of Hydrophobicity information Tzu-Cheng Chuang, Okan K. Ersoy and Saul B. Gelfand School of Electrical.
Protein Secondary Structure Prediction. Input: protein sequence Output: for each residue its associated Secondary structure (SS): alpha-helix, beta-strand,
Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Protein Secondary Structure Prediction
Secondary structure prediction
2 o structure, TM regions, and solvent accessibility Topic 13 Chapter 29, Du and Bourne “Structural Bioinformatics”
A Study of Residue Correlation within Protein Sequences and its Application to Sequence Classification Christopher Hemmerich Advisor: Dr. Sun Kim.
Web Servers for Predicting Protein Secondary Structure (Regular and Irregular) Dr. G.P.S. Raghava, F.N.A. Sc. Bioinformatics Centre Institute of Microbial.
Protein Secondary Structure Prediction G P S Raghava.
Meng-Han Yang September 9, 2009 A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins.
Study of Protein Prediction Related Problems Ph.D. candidate Le-Yi WEI 1.
New Strategies for Protein Folding Joseph F. Danzer, Derek A. Debe, Matt J. Carlson, William A. Goddard III Materials and Process Simulation Center California.
Manually Adjusting Multiple Alignments Chris Wilton.
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
LOGO iDNA-Prot|dis: Identifying DNA-Binding Proteins by Incorporating Amino Acid Distance- Pairs and Reduced Alphabet Profile into the General Pseudo Amino.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
A New Supervised Over-Sampling Algorithm with Application to Protein-Nucleotide Binding Residue Prediction Li Lihong (Anna Lee) Cumputer science 22th,Apr.
Prediction of Protein Binding Sites in Protein Structures Using Hidden Markov Support Vector Machine.
Matching Protein  -Sheet Partners by Feedforward and Recurrent Neural Network Proceedings of Eighth International Conference on Intelligent Systems for.
Emidio Capriotti, Piero Fariselli and Rita Casadio Biocomputing Unit
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Final Report (30% final score) Bin Liu, PhD, Associate Professor.
Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang.
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
Proteins Structure Predictions Structural Bioinformatics.
We propose an accurate potential which combines useful features HP, HH and PP interactions among the amino acids Sequence based accessibility obtained.
Protein Structure Prediction. Protein Sequence Analysis Molecular properties (pH, mol. wt. isoelectric point, hydrophobicity) Secondary Structure Super-secondary.
Predicting Structural Features Chapter 12. Structural Features Phosphorylation sites Transmembrane helices Protein flexibility.
Madhavi Ganapathiraju Graduate student Carnegie Mellon University
SMA5422: Special Topics in Biotechnology
Feature Extraction Introduction Features Algorithms Methods
Introduction Feature Extraction Discussions Conclusions Results
Prediction of RNA Binding Protein Using Machine Learning Technique
Extra Tree Classifier-WS3 Bagging Classifier-WS3
Introduction to Bioinformatics II
Support Vector Machine (SVM)
Yuchun Tang (1), Preeti Singh (1), Yanqing Zhang (1),
Protein Folding and Protein Threading
Protein Structures.
Generalizations of Markov model to characterize biological sequences
謝孫源 (Sun-Yuan Hsieh) 成功大學 電機資訊學院 資訊工程系
Presentation transcript:

Predicting Protein Solvent Accessibility with Sequence, Evolutionary Information and Context-based Features 12/05/2013 Ashraf Yaseen Department of Mathematics & Computer Science Central State University Wilberforce, Ohio Yaohang Li Computer Science Department Old Dominion University Norfolk, Virginia BIOT 2013: Biotechnology and Bioinformatics Symposium

Contents  Introduction  Research Objective  Background  Method  Protein data sets  Context-based features  Neural Network model  Results  Summary 2

Introduction 3  The solvent-accessible surface area, or accessibility, of a residue is the surface area of the residue that is exposed to solvent.  The residue accessibility is a useful indicator to the residue's location, on the surface or in the core Surface area of a protein segment

Introduction-cont. 4  DSSP program calculates the absolute solvent accessibility values of proteins  Relative values are calculated as the ratio between the absolute solvent accessibility value and that in an extended tripeptide (Ala-X-Ala) conformation  To allow comparisons between the accessibility of the different amino acids in proteins  A threshold of 0.25 to define 2-state (exposed if >0.25, buried otherwise)

Prediction effectiveness 5  Residue solvent accessibility plays an important role in folding and enhancing proteins’ thermodynamic and mechanical stability  The burial of residues at core (hydrophobic residues) is a major driving force for folding  Active sites of proteins are located on its surface.  Reduce the conformational space to aid modeling protein structures in three dimensions  Help predict important protein functions

Predicting Structural Features in Protein Modeling 6 Protein Modeling  Correctly predicting structural features is a critical step stone to obtain correct 3D models Sequence 3D intermediate prediction steps

Protein Structural Features 7 Protein 1BOO Chain A Secondary Structure: General 3D form of local segments of residues Disulfide bond in protein chain Surface area of a protein segment Properties of the residues in proteins

Background 8  Many methods using different protein datasets and different computational methods,  Neural networks, support vector machines, nearest neighbor, information theory, and Bayesian statistics  The prediction is in a discrete fashion  Significant accuracy increase when using evolutionary information  2-state prediction accuracy of ~75% with 0.25 threshold  PSI-BLAST derived profiles  2-state prediction accuracy of ~78%

Background-cont. 9 Secondary Structure Prediction 3-state (helix, sheet, coil) 8-state ( α -helix, π -helix, helix, β -strand, β -bridge, turn, bend and others) Residue Solvent Accessibility Prediction 2-state (buried or exposed) Predictor Structural feature (state) of Ri Disulfide Bonding Prediction Stage1: Bonding state prediction (bonded/free) Stage2: Connectivity prediction (connected, not connected) Structural features prediction  classification Each residue is predicted to be in one of few states Machine Learning (ANN, SVM, HMM,...)

Statement of the Problem 10  The improvement of prediction methods benefits from the incorporation of effective features  MSA in machine learning  The accuracy of current prediction methods is stagnated for the past few years  2-state solvent accessibility ~78% 3-state secondary structure ~76-80% 8-state secondary structure ~68%

Statement of the Problem-cont. 11  How to continuously improve the accuracy of predicting protein structural features toward their theoretical upper bounds?  Reducing the inaccuracy of protein structural features prediction, will be very useful in improving the efficiency of protein tertiary structure prediction  the search space for finding a tertiary structure goes up super-linearly with the fraction of inaccuracy in structural feature prediction

HH X Our Approach 12  Extracting and selecting “good” features can significantly enhance the prediction performance  Probably the most effective features, when predicting the structural state of a residue, are the structural states of the neighboring residues  With true states >90% RiRi H H C C B Solvent Accessibility B: Buried E: Exposed Secondary Structure H: Helix E: Sheet C: Coil B B B B

Our Approach-cont. 13  Unfortunately, using the true structural states as features is not feasible  However, this inspires us that the favorability of a residue adopting a certain structural state can be also an effective feature  Statistical scores measuring the favorability of a residue adopting a certain structural state within its amino acid environment can be evaluated from the experimentally determined protein structures in (PDB)

Our Approach-cont. 14 Predictor Structural feature (state) of Ri Input encoding Sequence & evolutionary info (MSA) + Structure info (context-based scores) We expect that our approaches will improve the predictions of protein structural features with the goal of achieving high accuracy levels

Method Context-based features  potential scores  calculated based on the context- based statistics, derived from the protein datasets  estimate the favorability of residues in adopting specific structural states, within their amino acid environment. 15 Context-based Model

Context-based Statistics & Potentials 16 RiRi X RiRi CiCi CiCi YRiRi X CiCi

Encoding & Neural Network Model 17

Results 18 CASP9Manesh215Carugo338 NETASA Q2Q QBQB QEQE Sable t=0.2 Q2Q QBQB QEQE Sable t=0.3 Q2Q QBQB QEQE Netsurf Q2Q QBQB QEQE SPINE Q2Q QBQB QEQE ACCpro Q2Q QBQB QEQE Casa Q2Q QBQB QEQE C OMPARISON OF Q2 ACCURACY BETWEEN OUR AND OTHER POPULARLY USED S OLVENT A CCESSIBILITY PREDICTION SERVERS C OMPARISON OF PREDICTION PERFORMANCE OF S OLVENT A CCESSIBILITY USING PSSM ONLY AND PSSM WITH CONTEXT - BASED SCORES ON C ULL USING 7- FOLD CROSS VALIDATION QBQB QEQE Q2Q2 PSSM Only 78.44%80.61%79.50% PSSM+Score 79.21%82.00%80.76% Q B and Q E to measure the quality of predicting the buried state and the exposed state respectively Q 2 = total number of residues correctly predicted /total number of residues

Results-cont. 19 DAVMVFARQGDKGSVSVGDKHFRTQAFKVRLVNAAKSEISLKNSCLVAQSAAGQSFRLDTVDEELTADTLKPGASVEGDAIFASEDDAVYGASLVRLSDRCK 3NRF-A EEB.BEBEEEEEEEEEEEEEEEEBBBBEBEBBBEBEEEBEBEEEBBBBBBEEEEEBEEEEEEEEBEEEEBEEEEEBEBEBEBBBEEEBBEEBBBBBBBEEEE DSSP SA2 EEB.BBBBEEEEBBBBEEEEEEBBBEBEBBBBEBEEEEBEBEEBBBBBBBEEEEEBEBEEBEEEBEEEBBEEEEEBEBBBBBBBEEEEBBEBEBBEBBEEBE PSSM Only EEB.BBBEEEEEEEBEEEEEEBBBBEBEBBBBEEEEEEBEBEEBBBBBBBEEEEEBEBEEBEEEBEEEEBEEEEEBEBBBBBBBEEEBBBEBBBBEBBEEBE PSSM+Score Solvent Accessibility Prediction on protein 3NRF(A) Q2

20 Working with Casa Input title Input your sequence Input your Submit, then wait for the results... “Casa” available at:

21 Working with Casa Check your , Click the link provided The results are displayed

Summary  The effectiveness of using context-based features has been demonstrated in our computational results in N-fold cross validation as well as on benchmarks, where enhancements of prediction accuracies in secondary structures, disulfide bond and solvent accessibility are observed.  Web servers implementing our prediction methods are currently available.  Dinosolve, available at  C3-Scorpion, available at:  C8-Scorpion, available at:  Casa, available at: 22

Publications 23 Publication 1 Ashraf Yaseen and Yaohang Li “Enhancing Protein Disulfide Bonding Prediction Accuracy with Context- based Features”, Proceedings of Biotechnology and Bioinformatics Symposium, (BIOT2012), Provo, Ashraf Yaseen and Yaohang Li, "Dinosolve: A Protein Disulfide Bonding Prediction Server using Context- based Features to Enhance Prediction Accuracy". Accepted, BMC Bioinformatics Ashraf Yaseen and Yaohang Li “Template-based Prediction of Protein 8-state Secondary structures”. 3 rd IEEE International Conference on Computational Advances in Bio and Medical Sciences (ICCABS), New Orleans, April Accepted, BMC Bioinformatics 4 Ashraf Yaseen and Yaohang Li “Predicting Protein Solvent Accessibility with Sequence, Evolutionary Information and Context-based Features”, Accepted at BIOT Ashraf Yaseen and Yaohang Li “Context-based features can enhance protein secondary structure prediction accuracy”. Submitted to Bioinformatics. 6 Ashraf Yaseen and Yaohang Li, “Accelerating Knowledge-based Energy Evaluation in Protein Structure Modeling with Graphics Processing Units,” Journal of Parallel and Distributed Computing, 72(2): , 2012

Acknowledgement  This work is partially supported by NSF through grant and ODU SEECR grant 24

Questions? Thank You 25