We propose an accurate potential which combines useful features HP, HH and PP interactions among the amino acids Sequence based accessibility obtained.

Slides:



Advertisements
Similar presentations
11/9/99ICTAI-99, Chicago1 Protein Secondary Structure Prediction Using Data Mining Tool C5 Meiliu Lu †, Du Zhang †, Hongjun Xu †, Ken Tse-yau Lau ‡, and.
Advertisements

Rosetta Energy Function Glenn Butterfoss. Rosetta Energy Function Major Classes: 1. Low resolution: Reduced atom representation Simple energy function.
Protein Structure Prediction using ROSETTA
Hydrogen bonds in Rosetta: a phenomonological study Jack Snoeyink Dept. of Computer Science UNC Chapel Hill.
Proposed concepts illustrated well on sets of face images extracted from video: Face texture and surface are smooth, constraining them to a manifold Recognition.
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
High Throughput Computing and Protein Structure Stephen E. Hamby.
Protein Structure Prediction With Evolutionary Algorithms Natalio Krasnogor, U of the West of England William Hart, Sandia National Laboratories Jim Smith,
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December
Locally Constraint Support Vector Clustering
Heuristic alignment algorithms and cost matrices
DNA Barcode Data Analysis: Boosting Assignment Accuracy by Combining Distance- and Character-Based Classifiers Bogdan Paşaniuc, Sotirios Kentros and Ion.
Summary Protein design seeks to find amino acid sequences which stably fold into specific 3-D structures. Modeling the inherent flexibility of the protein.
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Template-based Prediction of Protein 8-state Secondary Structures June 12 th 2013 Ashraf Yaseen and Yaohang Li DEPARTMENT OF COMPUTER SCIENCE OLD DOMINION.
Determining the Significance of Item Order In Randomized Problem Sets Zachary A. Pardos, Neil T. Heffernan Worcester Polytechnic Institute Department of.
Protein Tertiary Structure Prediction
1 A Combinatorial Toolbox for Protein Sequence Design and Landscape Analysis in the Grand Canonical Model Ming-Yang Kao Department of Computer Science.
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
Secondary structure prediction
Ozgur Ozturk, Ahmet Sacan, Hakan Ferhatosmanoglu, Yusu Wang The Ohio State University LFM-Pro: a tool for mining family-specific sites in protein structure.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Multiple Mapping Method with Multiple Templates (M4T): optimizing sequence-to-structure alignments and combining unique information from multiple templates.
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
Approximation Algorithms For Protein Folding Prediction Giancarlo MAURI,Antonio PICCOLBONI and Giulio PAVESI Symposium on Discrete Algorithms, pp ,
Meng-Han Yang September 9, 2009 A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins.
Protein Structure Prediction
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
DDPIn Distance and Density Based Protein Indexing David Hoksza Charles University in Prague Department of Software Engineering Czech Republic.
Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin.
Feature Extraction Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and.
Prediction of Protein Binding Sites in Protein Structures Using Hidden Markov Support Vector Machine.
Modeling Protein Flexibility with Spatial and Energetic Constraints Yi-Chieh Wu 1, Amarda Shehu 2, Lydia Kavraki 2,3  Provided an approach to generating.
A New Approach to Utterance Verification Based on Neighborhood Information in Model Space Author :Hui Jiang, Chin-Hui Lee Reporter : 陳燦輝.
Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features 王荣 14S
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Protein backbone Biochemical view:
Surflex: Fully Automatic Flexible Molecular Docking Using a Molecular Similarity-Based Search Engine Ajay N. Jain UCSF Cancer Research Institute and Comprehensive.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Mismatch String Kernals for SVM Protein Classification Christina Leslie, Eleazar Eskin, Jason Weston, William Stafford Noble Presented by Pradeep Anand.
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
1 A latent information function to extend domain attributes to improve the accuracy of small-data-set forecasting Reporter : Zhao-Wei Luo Che-Jung Chang,Der-Chiang.
Shadow Detection in Remotely Sensed Images Based on Self-Adaptive Feature Selection Jiahang Liu, Tao Fang, and Deren Li IEEE TRANSACTIONS ON GEOSCIENCE.
A new protein-protein docking scoring function based on interface residue properties Reporter: Yu Lun Kuo (D )
BIOINFORMATION A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation - - 王红刚 14S
Avdesh Mishra, Md Tamjidul Hoque {amishra2,
Avdesh Mishra, Md Tamjidul Hoque {amishra2,
Protein Structure Prediction and Protein Homology modeling
Feature Extraction Introduction Features Algorithms Methods
Avdesh Mishra, Manisha Panta, Md Tamjidul Hoque, Joel Atallah
Introduction Feature Extraction Discussions Conclusions Results
Brain Hemorrhage Detection and Classification Steps
Prediction of RNA Binding Protein Using Machine Learning Technique
Extra Tree Classifier-WS3 Bagging Classifier-WS3
Support Vector Machine (SVM)
Alfonso Jaramillo, Shoshana J. Wodak  Biophysical Journal 
Generalizations of Markov model to characterize biological sequences
Volume 86, Issue 6, Pages (June 2004)
Homology Modeling.
Protein structure prediction.
Volume 107, Issue 3, Pages (August 2014)
謝孫源 (Sun-Yuan Hsieh) 成功大學 電機資訊學院 資訊工程系
G. Fiorin, A. Pastore, P. Carloni, M. Parrinello  Biophysical Journal 
Protein Structure Prediction by A Data-level Parallel Proceedings of the 1989 ACM/IEEE conference on Supercomputing Speaker : Chuan-Cheng Lin Advisor.
Alfonso Jaramillo, Shoshana J. Wodak  Biophysical Journal 
Volume 86, Issue 6, Pages (June 2004)
Pooja Pun, Avdesh Mishra, Simon Lailvaux, Md Tamjidul Hoque
Presentation transcript:

We propose an accurate potential which combines useful features HP, HH and PP interactions among the amino acids Sequence based accessibility obtained for each amino acids 3D Structure based property i.e. uPhi and uPsi The improved potential can be used for Protein-Ligand binding site prediction Ab Initio protein structure prediction Fold recognition Drug design and Enzyme design The proposed potential outperforms all the stat-of-arts approaches. 3D structure prediction is useful in drug and novel enzymes design. Energy functions can aid in Protein structure prediction and Fold recognition We propose, 3DIGARS3.0 potential for improved accuracy. We introduce two 3D structural features uPhi based energy uPsi based energy Motivation comes from the fact that the 3D structural features assists the advancement of the accuracy. uPhi and uPsi are linearly combined with prior energy components 3DIGARS energy which is based on HP, HH and PP interactions and their respective ideal gas reference state ASA energy computed by modeling real and predicted accessibility obtained from protein sequences The linearly combined energies are optimized using GA Three decoy sets were used in optimization Moulder Rosetta and I-Tasser Five independent test decoy sets were used to evaluate the accuracy 4state_reduced fisa_casp3 hg_structal ig_structal and ig_structural hires 3DIGARS3.0 outperformed the state-of-the-arts approaches DFIRE by % RWplus by % dDFIRE by 72.46% GOAP by 20.20% 3DIGARS by % 3DIGARS2.0 by % based on independent test datasets. The percentage weighted average improvement is calculated as where, y i represents new value and x i represents old value Figure 1: (a) Native like protein conformation, presented in a 3D hexagonal-close-packing (HCP) configuration using hydrophobic (H) and hydrophilic or polar (P) residues. The H-H interactions space is relatively smaller than P-P interactions space, since hydrophobic residues (black ball) being afraid of water tends to remain inside of the central space. (b) 3D metaphoric HP folding kernels, depicted based on HCP configuration based HP model, showing the 3 layers of distributions of amino-acids. Figure 5: Process flow of the design and development of 3DIGARS3.0 energy function. 3DIGARS potential Core statistical function based on HP, HH and PP interactions (see Fig. 1) Segregated ideal gas reference state and libraries for HP, HH and PP groups Better training dataset (100% sequence identity cutoff can capture natural frequency distribution) Three shape parameters (α hp, α hh and α pp ) controls shape of assumed spherical protein surface Three contribution parameters (β hp, β hh and β pp ) controls the contribution of each group 3DIGARS2.0 potential Integration of the core energy and sequence specific features Sequence specific feature is computed by modeling error between the real and predicted ASA (see Fig. 2) Real and predicted ASA are obtained from DSSP and REGAd 3 p respectively 3DIGARS2.0 is a linearly weighted accumulation of 3DIGARS and mined ASA 3DIGARS3.0 potential Integration of core energy, sequence specific energy and 3D structural features (see Fig. 5) 3D structural features added are attained based on uPhi and uPsi angles uPhi and uPsi are computed using Cartesian coordinates of set of 4 atoms (see Fig. 3 and 4) uPhi and uPsi based energies are computed based on following steps (see Fig. 4) Cosine value range (-1 to 1) of angles uPhi and uPsi are divided into 20 bins, each of width 0.1 Individual frequency tables for uPhi and uPsi are computed Frequency tables are further used to compute individual energy score libraries Energy score are then used to compute uPhi and uPsi energies for a given protein Protein folding and structure prediction problems relies on an accurate energy function. Accuracy of the potential function depends on Interaction distance between atom pairs Hydrophobic (H) and hydrophilic (P) properties Sequence-specific information Orientation-dependent interactions and Optimization techniques We develop a potential function, which is an optimized linearly weighted accumulation of 3-Dimensional Ideal Gas Reference State based Energy Function (3DIGARS) It is formulated using an idea of HP, HH and PP properties of amino acids Mined accessible surface area (ASA) and Ubiquitously computed Phi (uPhi) and Psi (uPsi) energies Optimization is performed using a Genetic Algorithm (GA). Based on independent test dataset, the proposed energy function outperformed state-of-the- art approaches significantly. An Eclectic Energy Function to Discriminate Native From Decoys Avdesh Mishra, Sumaiya Iqbal, Md Tamjidul Hoque {amishra2, siqbal1, Department of Computer Science, University of New Orleans, New Orleans, LA, USA Methods Introduction Results Discussions Conclusions Acknowledgements Figure 4: (a) Shows atoms arrangement as well as vectors created using the Cartesian coordinates of the atoms. (b) Shows the dihedral angle involving the four atoms. Figure 3: Definition of the angle formed by four atoms (At 1, At 2, At 3 and At 4 ). uPhi is computed using At 1 belonging to one residue and a set of atoms, At 2, At 3, At 4 belonging to some other residues. Similarly, uPsi is computed using a set of atoms, At 1, At 2, At 3 belonging to some residues and an atom At 4 belonging to some other residue. Figure 2: The dark central area, composed of atoms, can be thought of a 3D proteins and the outline around the area in green and red can be thought of real and predicted accessible surface area respectively. The error between real and predicted ASA is modelled as an energy feature. Table 1: Performance comparison of different energy functions on optimization datasets based on correct native count. Decoy Sets (No. of targets) Methods DFIRERWplusdDFIREGOAP3DIGARS3DIGARS2.03DIGARS3.0 Moulder (20) 19 (-2.97) 19 (-2.84) 18 (-2.74) 19 (-3.58) 19 (-2.99) 19 (-2.68) 20 (-3.851) Rosetta (58) 20 (-1.82) 20 (-1.47) 12 (-0.83) 45 (-3.70) 31 (-2.023) 49 (-2.987) 46 (-2.683) I-Tasser (56) 49 (-4.02) 56 (-5.77) 48 (-5.03) 45 (-5.36) 53 (-4.036) 56 (-4.296) 56 (-5.573) Weighted Average in % Legend: Entry format is native-count (z-score). Bold indicates best scores. Underscore indicates close to best scores. Table 2: Performance comparison of different energy functions on independent test datasets based on correct native count. Decoy Sets (No. of targets) Methods DFIRERWplusdDFIREGOAP3DIGARS3DIGARS2.03DIGARS3.0 4state_reduced (7) 6 (-3.48) 6 (-3.51) 7 (-4.15) 7 (-4.38) 6 (-3.371) 4 (-2.642) 7 (-3.456) fisa_casp3 (5) 4 (-4.80) 4 (-5.17) 4 (-4.83) 5 (-5.27) 5 (-4.319) 5 (-4.682) 4 (-4.076) hg_structal (29) 12 (-1.97) 12 (-1.74) 16 (-1.33) 22 (-2.73) 12 (-1.914) 12 (-1.589) 28 (-3.678) ig_structal (61) 0 (0.92) 0 (1.11) 26 (-1.02) 47 (-1.62) 0 (0.645) 0 (0.268) 60 (-2.526) ig_structal_hires (20) 0 (0.17) 0 (0.32) 16 (-2.05) 18 (-2.35) 0 (-0.002) 1 (0.030) 20 (-2.378) Weighted Average in % Legend: Entry format is native-count (z-score). Bold indicates best scores. Underscore indicates close to best scores. We gratefully acknowledge the Louisiana Board of Regents through the Board of Regents Support Fund, LEQSF ( )-RD-A-19.