A neural network-based method for predicting protein stability changes upon single point mutations Emidio Capriotti, Piero Fariselli and Rita Casadio Biocomputing Unit Department of Biology, University of Bologna, Italy www.biocomp.unibo.it
Problem Definition The State of the Art Data Base Neural Network Predictor Results Comparison with other Methods I-Mutant
Problem Definition (I) If we change Alanine 35 with a Leucine, is the protein stability increased ? Decreased? A35L Mutant L Native A
Problem Definition (I) If we change Alanine 35 with a Leucine, is the protein stability increased ? Decreased? DDGf = DGf mut - DGf nat DGf=Gu-Gf Free Energy U F Native U F Mutant DGf mut DGf nat
Problem Definition (II) The sign of DDGfu identifies the direction of the stability change The sign is more informative than the |DDG| DDGf < 0 => the mutation increases the protein stability DDGf > 0 => the mutation decreases the protein stability Our Neural Networks are trained to predict the sign of the stability change
Problem Definition The State of the Art Data Base Neural Network Predictor Results Comparison with other Methods I-Mutant
The State of the Art Energy-based predictive methods 1) physical effective energy potentials (classical MM force fields) E= ½ks,ij(rij -ro)2 + ½kb,ij(ij –o)2 +... 2) statistical potentials E(i,j) = - KT log ( f(i,j) ) { 3) empirical energies DG = Wvdw DGvdw + Wsolv DGsolv + Wsc TDSsc +...
+ - OK Over/Under-predictions
Problem Definition The State of the Art Data Base Neural Network Predictor Results Comparison with other Methods I-Mutant
The Data Base http://www.rtc.riken.go.jp/jouhou/Protherm/ ProTherm is a collection of numerical data of thermodynamic parameters including Gibbs free energy change, enthalpy change, heat capacity change, transition temperature etc. for wild type and mutant proteins Total number of entries 15379 Number of unique proteins 471 Total number of all proteins 668 Number of Proteins with mutants 195 Number of Single Mutations 7586 Number of Double Mutations 1192 Number of Multiple Mutations 563 Number of Wild Type 6038 Gromiha et al. (2000). Nucleic Acids Res. 28, 283-285
Training/testing Data set (I) The data set of proteins was extracted from ProTherm, with the following constraints: i) the DDG value was experimentally detected and reported in the data base; ii) the protein structure is known with atomic resolution (and deposited in the PDB (Berman et al., 2000)); iii) the data are relative to single mutations (no multiple mutations have been taken into account).
Training/testing Data set (II) After this filtering procedure, we ended up with 2 data sets S1615 : 1615 different single mutations S388 : 388 mutations from containing only experiments performed at physiological conditions (T 20-40 °C, pH 6-8) S388 S1615
Problem Definition The State of the Art Data Base Neural Network Predictor Results Comparison with other Methods I-Mutant
Neural Network Predictor (I) N1: A 20 element vector that describes the aminoacid mutation, pH and T N2: adds to the N1 input one more neuron for the relative accessibility surface of the mutated residue N3: adds to N2 20 more input neurons (43 in total) encoding the three-dimensional residue environment
Neural Network Predictor (II) E->A A L G E I A C D E F G H I K L M N P Q R S T V W Y 2 1 3 Environment N3 Radius Neural Network Predictor (II) Network N1 A Relative Solvent Accessibility N2 A C D E F G H I K L M N P Q R S T V W Y 1 -1 Mutation E->A T pH
Problem Definition The State of the Art Data Base Neural Network Predictor Results Comparison with other Methods I-Mutant
Cross-validation performance of the different neural networks on S1615 + and – : the index is evaluated for positive and negative signs of protein energy stability change, respectively. Method Q2 P(+) Q(+) P(-) Q(-) C N1 0.74 0.59 0.23 0.76 0.94 0.24 N2 0.75 0.57 0.45 0.80 0.87 0.34 N3 0.81 0.71 0.52 0.83 0.91 0.49
Cross-validation performance of N3 as a function of different protein environments (different radius) centred on the mutated residue Method Radius Q2 P(+) Q(+) P(-) Q(-) C N3-4.5 4.5 0.79 0.63 0.55 0.83 0.88 0.45 N3-6.0 6.0 0.79 0.63 0.57 0.84 0.87 0.46 N3-9.0 9.0 0.81 0.71 0.52 0.83 0.91 0.49 N3-12.0 12.0 0.79 0.63 0.59 0.84 0.87 0.47
Q2 accuracy of neural network (N3-9 Q2 accuracy of neural network (N3-9.0) as a function of the reliability index (Rel)
Q2 accuracy of neural network (N3-9 Q2 accuracy of neural network (N3-9.0) as a function of the absolute value of protein stability changes upon mutation (|Stability Change|) Kcal/mol
Problem Definition The State of the Art Data Base Neural Network Predictor Results Comparison with other Methods I-Mutant
Comparison of neural network with other methods on S388 Method Q2 P(+) Q(+) P(-) Q(-) C FOLDX(1) 0.75 0.26 0.56 0.93 0.78 0.25 DFIRE(2) 0.68 0.18 0.44 0.90 0.71 0.11 PoPMuSiC(3) 0.85 0.33 0.25 0.90 0.93 0.20 N3-9.0 0.87 0.44 0.21 0.90 0.96 0.25 (1) http://fold-x.embl-heidelberg.de. (2) http://phyyz4.med.buffalo.edu/hzhou/dmutation.html (3) http://babylone.ulb.ac.be/popmusic/
Accuracy of joint-methods on subsets of S388 Method Agreement Q2 P(+) Q(+) P(-) Q(-) C N3-9.0 72% 0.93 0.88 0.28 0.93 0.99 0.47 + FOLDX(1) N3-9.0 69% 0.90 0.36 0.16 0.92 0.97 0.19 + DFIRE(2) N3-9.0 86% 0.91 0.67 0.07 0.92 0.99 0.19 +PoPMuSiC(3)
Problem Definition The State of the Art Data Base Neural Network Predictor Results Comparison with other Methods I-Mutant
I-Mutant
I-Mutant Web Server http://gpcr.biocomp.unibo.it/cgi/predictors/I-Mutant/I-Mutant.cgi/
thank you for your attention that’s all ! Stability test Emidio Capriotti, Piero Fariselli Rita Casadio
Measures of Accuracy The efficiency of the predictor is scored using the statistical indexes defined following. Overall Accuracy Coverage Probability correct Prediction Correlation coefficient Where N is the total number of prediction, p the correct number of predictions, u and o are the numbers of under and over predictions.
Q2 accuracy of the neural network (N3-9 Q2 accuracy of the neural network (N3-9.0) as a function of the relative accessibility value of the mutated residue
Q2 accuracy as a function of the residue mutation type native \ new Charged Polar Apolar Charged Polar Apolar 0.62 (4%) 0.77 (8%) 0.72 (9%) 0.69 (6%) 0.82 (10%) 0.77 (17%) 0.75 (3%) 0.92 (12%) 0.87 (31%)