Presentation is loading. Please wait.

Presentation is loading. Please wait.

Emidio Capriotti, Piero Fariselli and Rita Casadio Biocomputing Unit

Similar presentations


Presentation on theme: "Emidio Capriotti, Piero Fariselli and Rita Casadio Biocomputing Unit"— Presentation transcript:

1 A neural network-based method for predicting protein stability changes upon single point mutations
Emidio Capriotti, Piero Fariselli and Rita Casadio Biocomputing Unit Department of Biology, University of Bologna, Italy

2 Problem Definition The State of the Art Data Base Neural Network Predictor Results Comparison with other Methods I-Mutant

3 Problem Definition (I)
If we change Alanine 35 with a Leucine, is the protein stability increased ? Decreased? A35L Mutant L Native A

4 Problem Definition (I)
If we change Alanine 35 with a Leucine, is the protein stability increased ? Decreased? DDGf = DGf mut - DGf nat DGf=Gu-Gf Free Energy U F Native U F Mutant DGf mut DGf nat

5 Problem Definition (II)
The sign of DDGfu identifies the direction of the stability change The sign is more informative than the |DDG| DDGf < 0 => the mutation increases the protein stability DDGf > 0 => the mutation decreases the protein stability Our Neural Networks are trained to predict the sign of the stability change

6 Problem Definition The State of the Art Data Base Neural Network Predictor Results Comparison with other Methods I-Mutant

7 The State of the Art Energy-based predictive methods 1) physical effective energy potentials (classical MM force fields) E= ½ks,ij(rij -ro)2 + ½kb,ij(ij –o)2 +... 2) statistical potentials E(i,j) = - KT log ( f(i,j) ) { 3) empirical energies DG = Wvdw DGvdw + Wsolv DGsolv + Wsc TDSsc +...

8 + - OK Over/Under-predictions

9 Problem Definition The State of the Art Data Base Neural Network Predictor Results Comparison with other Methods I-Mutant

10 The Data Base ProTherm is a collection of numerical data of thermodynamic parameters including Gibbs free energy change, enthalpy change, heat capacity change, transition temperature etc. for wild type and mutant proteins Total number of entries Number of unique proteins Total number of all proteins Number of Proteins with mutants 195 Number of Single Mutations Number of Double Mutations Number of Multiple Mutations Number of Wild Type Gromiha et al. (2000). Nucleic Acids Res. 28,

11 Training/testing Data set (I)
The data set of proteins was extracted from ProTherm, with the following constraints: i) the DDG value was experimentally detected and reported in the data base; ii) the protein structure is known with atomic resolution (and deposited in the PDB (Berman et al., 2000)); iii) the data are relative to single mutations (no multiple mutations have been taken into account).

12 Training/testing Data set (II)
After this filtering procedure, we ended up with 2 data sets S1615 : 1615 different single mutations S388 : 388 mutations from containing only experiments performed at physiological conditions (T °C, pH 6-8) S388 S1615

13 Problem Definition The State of the Art Data Base Neural Network Predictor Results Comparison with other Methods I-Mutant

14 Neural Network Predictor (I)
N1: A 20 element vector that describes the aminoacid mutation, pH and T N2: adds to the N1 input one more neuron for the relative accessibility surface of the mutated residue N3: adds to N2 20 more input neurons (43 in total) encoding the three-dimensional residue environment

15 Neural Network Predictor (II)
E->A A L G E I A C D E F G H I K L M N P Q R S T V W Y 2 1 3 Environment N3 Radius Neural Network Predictor (II) Network N1 A Relative Solvent Accessibility N2 A C D E F G H I K L M N P Q R S T V W Y 1 -1 Mutation E->A T pH

16 Problem Definition The State of the Art Data Base Neural Network Predictor Results Comparison with other Methods I-Mutant

17 Cross-validation performance of the different neural networks on S1615
+ and – : the index is evaluated for positive and negative signs of protein energy stability change, respectively. Method Q P(+) Q(+) P(-) Q(-) C N N N

18 Cross-validation performance of N3 as a function of different protein environments (different radius) centred on the mutated residue Method Radius Q2 P(+) Q(+) P(-) Q(-) C N N N N

19 Q2 accuracy of neural network (N3-9
Q2 accuracy of neural network (N3-9.0) as a function of the reliability index (Rel)

20 Q2 accuracy of neural network (N3-9
Q2 accuracy of neural network (N3-9.0) as a function of the absolute value of protein stability changes upon mutation (|Stability Change|) Kcal/mol

21 Problem Definition The State of the Art Data Base Neural Network Predictor Results Comparison with other Methods I-Mutant

22 Comparison of neural network with other methods on S388
Method Q P(+) Q(+) P(-) Q(-) C FOLDX(1) DFIRE(2) PoPMuSiC(3) N (1) (2) (3)

23 Accuracy of joint-methods on subsets of S388
Method Agreement Q P(+) Q(+) P(-) Q(-) C N % FOLDX(1) N % DFIRE(2) N % PoPMuSiC(3)

24 Problem Definition The State of the Art Data Base Neural Network Predictor Results Comparison with other Methods I-Mutant

25 I-Mutant

26 I-Mutant Web Server

27 thank you for your attention
that’s all ! Stability test Emidio Capriotti, Piero Fariselli Rita Casadio

28 Measures of Accuracy The efficiency of the predictor is scored using the statistical indexes defined following. Overall Accuracy Coverage Probability correct Prediction Correlation coefficient Where N is the total number of prediction, p the correct number of predictions, u and o are the numbers of under and over predictions.

29 Q2 accuracy of the neural network (N3-9
Q2 accuracy of the neural network (N3-9.0) as a function of the relative accessibility value of the mutated residue

30 Q2 accuracy as a function of the residue mutation type
native \ new Charged Polar Apolar Charged Polar Apolar 0.62 (4%) 0.77 (8%) 0.72 (9%) 0.69 (6%) 0.82 (10%) 0.77 (17%) 0.75 (3%) 0.92 (12%) 0.87 (31%)


Download ppt "Emidio Capriotti, Piero Fariselli and Rita Casadio Biocomputing Unit"

Similar presentations


Ads by Google