Emidio Capriotti, Piero Fariselli and Rita Casadio Biocomputing Unit

Slides:



Advertisements
Similar presentations
Rosetta Energy Function Glenn Butterfoss. Rosetta Energy Function Major Classes: 1. Low resolution: Reduced atom representation Simple energy function.
Advertisements

Thermodynamics of Protein Folding
Ion Solvation Thermodynamics from Simulation with a Polarizable Force Field Gaurav Chopra 07 February 2005 CS 379 A Alan GrossfeildPengyu Ren Jay W. Ponder.
Sampath Koppole. Brief outline of the Talk: Summary Introduction to Continuum Electrostatics: Continuum Electrostatics --- What is it ?? Solvation free.
Case Studies Class 5. Computational Chemistry Structure of molecules and their reactivities Two major areas –molecular mechanics –electronic structure.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Can protein model accuracy be identified? Morten Nielsen, CBS, BioCentrum, DTU.
Energetics and kinetics of protein folding. Comparison to other self-assembling systems?
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Experimental Design The Research Process Defining a Research Question.
Competent Undemocratic Committees Włodzisław Duch, Łukasz Itert and Karol Grudziński Department of Informatics, Nicholas Copernicus University, Torun,
Modeling Gene Interactions in Disease CS 686 Bioinformatics.
Thermo & Stat Mech - Spring 2006 Class 19 1 Thermodynamics and Statistical Mechanics Partition Function.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Review of “Stability of Macromolecular Complexes” Dan Kulp Brooijmans, Sharp, Kuntz.
Protein Structures.
Predicting Protein Solvent Accessibility with Sequence, Evolutionary Information and Context-based Features 12/05/2013 Ashraf Yaseen Department of Mathematics.
Protein Tertiary Structure Prediction
By Eng. Monther Alhamdoosh Supervisor: Prof. Rita Casadio Co-supervisor: Dr. Piero Fariselli Disulfide Connectivity Prediction Using Machine Learning Approaches.
Marcin Pacholczyk, Silesian University of Technology.
Overcoming the Curse of Dimensionality in a Statistical Geometry Based Computational Protein Mutagenesis Majid Masso Bioinformatics and Computational Biology.
Strand Design for Biomolecular Computation
LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel:
Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,
Prediction of HIV-1 Drug Resistance: Representation of Target Sequence Mutational Patterns via an n-Grams Approach Majid Masso School of Systems Biology,
Representations of Molecular Structure: Bonds Only.
Force Fields G Vriend Force Fields 2 What is a Force Field ? A force field is a set of equations and parameters which when evaluated for a.
Protein Secondary Structure Prediction. Input: protein sequence Output: for each residue its associated Secondary structure (SS): alpha-helix, beta-strand,
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Flexible Multi-scale Fitting of Atomic Structures into Low- resolution Electron Density Maps with Elastic Network Normal Mode Analysis Tama, Miyashita,
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna.
Department of Mechanical Engineering
EMBL-EBI MSDpisa a web service for studying Protein Interfaces, Surfaces and Assemblies Eugene Krissinel
Web Servers for Predicting Protein Secondary Structure (Regular and Irregular) Dr. G.P.S. Raghava, F.N.A. Sc. Bioinformatics Centre Institute of Microbial.
Part I : Introduction to Protein Structure A/P Shoba Ranganathan Kong Lesheng National University of Singapore.
What is a Project Purpose –Use a method introduced in the course to describe some biological problem How –Construct a data set describing the problem –Define.
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
A Technical Introduction to the MD-OPEP Simulation Tools
Meng-Han Yang September 9, 2009 A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins.
Dott. Antonio Botrugno Ph.D. course UNIVERSITY OF LECCE (ITALY) DEPARTMENT OF PHYSICS.
Feature Extraction Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and.
Query sequence MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDN GVDGEWTYTE Structure-Sequence alignment “Structure is better preserved than sequence” Me! Non-redundant.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang.
Molecular dynamics (MD) simulations  A deterministic method based on the solution of Newton’s equation of motion F i = m i a i for the ith particle; the.
Modeling Cell Proliferation Activity of Human Interleukin-3 (IL-3) Upon Single Residue Replacements Majid Masso Bioinformatics and Computational Biology.
A Computational Study of RNA Structure and Dynamics Rhiannon Jacobs and Harish Vashisth Department of Chemical Engineering, University of New Hampshire,
A new protein-protein docking scoring function based on interface residue properties Reporter: Yu Lun Kuo (D )
Predictive Modeling and Simulation of Charge Mobility in 2D Material Based Devices Altaf Karim Department of Physics, COMSATS Institute of Information.
Madhavi Ganapathiraju Graduate student Carnegie Mellon University
8/7/2018 Statistical Thermodynamics
Classical Thermodynamics of Multicomponent Systems
CJT 765: Structural Equation Modeling
Protein Structure Prediction and Protein Homology modeling
Prediction of RNA Binding Protein Using Machine Learning Technique
Support Vector Machine (SVM)

Protein dynamics Folding/unfolding dynamics
Heat Capacity Effects on the Melting of DNA. 1. General Aspects
Alfonso Jaramillo, Shoshana J. Wodak  Biophysical Journal 
Predicting the Energetics of Conformational Fluctuations in Proteins from Sequence: A Strategy for Profiling the Proteome  Jenny Gu, Vincent J. Hilser 
Simon Bergqvist, Mark A Williams, Ronan O'Brien, John E Ladbury 
Experimental Overview
Alfonso Jaramillo, Shoshana J. Wodak  Biophysical Journal 
Energetics of Pore Opening in a Voltage-Gated K+ Channel
Matter.
Prediction of the Number of Residue Contacts in Proteins
Daniel Seeliger, Bert L. de Groot  Biophysical Journal 
Presentation transcript:

A neural network-based method for predicting protein stability changes upon single point mutations Emidio Capriotti, Piero Fariselli and Rita Casadio Biocomputing Unit Department of Biology, University of Bologna, Italy www.biocomp.unibo.it

Problem Definition The State of the Art Data Base Neural Network Predictor Results Comparison with other Methods I-Mutant

Problem Definition (I) If we change Alanine 35 with a Leucine, is the protein stability increased ? Decreased? A35L Mutant L Native A

Problem Definition (I) If we change Alanine 35 with a Leucine, is the protein stability increased ? Decreased? DDGf = DGf mut - DGf nat DGf=Gu-Gf Free Energy U F Native U F Mutant DGf mut DGf nat

Problem Definition (II) The sign of DDGfu identifies the direction of the stability change The sign is more informative than the |DDG| DDGf < 0 => the mutation increases the protein stability DDGf > 0 => the mutation decreases the protein stability Our Neural Networks are trained to predict the sign of the stability change

Problem Definition The State of the Art Data Base Neural Network Predictor Results Comparison with other Methods I-Mutant

The State of the Art Energy-based predictive methods 1) physical effective energy potentials (classical MM force fields) E= ½ks,ij(rij -ro)2 + ½kb,ij(ij –o)2 +... 2) statistical potentials E(i,j) = - KT log ( f(i,j) ) { 3) empirical energies DG = Wvdw DGvdw + Wsolv DGsolv + Wsc TDSsc +...

+ - OK Over/Under-predictions

Problem Definition The State of the Art Data Base Neural Network Predictor Results Comparison with other Methods I-Mutant

The Data Base http://www.rtc.riken.go.jp/jouhou/Protherm/ ProTherm is a collection of numerical data of thermodynamic parameters including Gibbs free energy change, enthalpy change, heat capacity change, transition temperature etc. for wild type and mutant proteins Total number of entries 15379 Number of unique proteins 471 Total number of all proteins 668 Number of Proteins with mutants 195 Number of Single Mutations 7586 Number of Double Mutations 1192 Number of Multiple Mutations 563 Number of Wild Type 6038 Gromiha et al. (2000). Nucleic Acids Res. 28, 283-285

Training/testing Data set (I) The data set of proteins was extracted from ProTherm, with the following constraints: i) the DDG value was experimentally detected and reported in the data base; ii) the protein structure is known with atomic resolution (and deposited in the PDB (Berman et al., 2000)); iii) the data are relative to single mutations (no multiple mutations have been taken into account).

Training/testing Data set (II) After this filtering procedure, we ended up with 2 data sets S1615 : 1615 different single mutations S388 : 388 mutations from containing only experiments performed at physiological conditions (T 20-40 °C, pH 6-8) S388 S1615

Problem Definition The State of the Art Data Base Neural Network Predictor Results Comparison with other Methods I-Mutant

Neural Network Predictor (I) N1: A 20 element vector that describes the aminoacid mutation, pH and T N2: adds to the N1 input one more neuron for the relative accessibility surface of the mutated residue N3: adds to N2 20 more input neurons (43 in total) encoding the three-dimensional residue environment

Neural Network Predictor (II) E->A A L G E I A C D E F G H I K L M N P Q R S T V W Y 2 1 3 Environment N3 Radius Neural Network Predictor (II) Network N1 A Relative Solvent Accessibility N2 A C D E F G H I K L M N P Q R S T V W Y 1 -1 Mutation E->A T pH

Problem Definition The State of the Art Data Base Neural Network Predictor Results Comparison with other Methods I-Mutant

Cross-validation performance of the different neural networks on S1615 + and – : the index is evaluated for positive and negative signs of protein energy stability change, respectively. Method Q2 P(+) Q(+) P(-) Q(-) C N1 0.74 0.59 0.23 0.76 0.94 0.24 N2 0.75 0.57 0.45 0.80 0.87 0.34 N3 0.81 0.71 0.52 0.83 0.91 0.49

Cross-validation performance of N3 as a function of different protein environments (different radius) centred on the mutated residue Method Radius Q2 P(+) Q(+) P(-) Q(-) C N3-4.5 4.5 0.79 0.63 0.55 0.83 0.88 0.45 N3-6.0 6.0 0.79 0.63 0.57 0.84 0.87 0.46 N3-9.0 9.0 0.81 0.71 0.52 0.83 0.91 0.49 N3-12.0 12.0 0.79 0.63 0.59 0.84 0.87 0.47

Q2 accuracy of neural network (N3-9 Q2 accuracy of neural network (N3-9.0) as a function of the reliability index (Rel)

Q2 accuracy of neural network (N3-9 Q2 accuracy of neural network (N3-9.0) as a function of the absolute value of protein stability changes upon mutation (|Stability Change|) Kcal/mol

Problem Definition The State of the Art Data Base Neural Network Predictor Results Comparison with other Methods I-Mutant

Comparison of neural network with other methods on S388 Method Q2 P(+) Q(+) P(-) Q(-) C FOLDX(1) 0.75 0.26 0.56 0.93 0.78 0.25 DFIRE(2) 0.68 0.18 0.44 0.90 0.71 0.11 PoPMuSiC(3) 0.85 0.33 0.25 0.90 0.93 0.20 N3-9.0 0.87 0.44 0.21 0.90 0.96 0.25 (1) http://fold-x.embl-heidelberg.de. (2) http://phyyz4.med.buffalo.edu/hzhou/dmutation.html (3) http://babylone.ulb.ac.be/popmusic/

Accuracy of joint-methods on subsets of S388 Method Agreement Q2 P(+) Q(+) P(-) Q(-) C N3-9.0 72% 0.93 0.88 0.28 0.93 0.99 0.47 + FOLDX(1) N3-9.0 69% 0.90 0.36 0.16 0.92 0.97 0.19 + DFIRE(2) N3-9.0 86% 0.91 0.67 0.07 0.92 0.99 0.19 +PoPMuSiC(3)

Problem Definition The State of the Art Data Base Neural Network Predictor Results Comparison with other Methods I-Mutant

I-Mutant

I-Mutant Web Server http://gpcr.biocomp.unibo.it/cgi/predictors/I-Mutant/I-Mutant.cgi/

thank you for your attention that’s all ! Stability test Emidio Capriotti, Piero Fariselli Rita Casadio

Measures of Accuracy The efficiency of the predictor is scored using the statistical indexes defined following. Overall Accuracy Coverage Probability correct Prediction Correlation coefficient Where N is the total number of prediction, p the correct number of predictions, u and o are the numbers of under and over predictions.

Q2 accuracy of the neural network (N3-9 Q2 accuracy of the neural network (N3-9.0) as a function of the relative accessibility value of the mutated residue

Q2 accuracy as a function of the residue mutation type native \ new Charged Polar Apolar Charged Polar Apolar 0.62 (4%) 0.77 (8%) 0.72 (9%) 0.69 (6%) 0.82 (10%) 0.77 (17%) 0.75 (3%) 0.92 (12%) 0.87 (31%)