Presentation is loading. Please wait.

Presentation is loading. Please wait.

Artificiel Neural Networks 2 Morten Nielsen Department of Systems Biology, DTU.

Similar presentations


Presentation on theme: "Artificiel Neural Networks 2 Morten Nielsen Department of Systems Biology, DTU."— Presentation transcript:

1 Artificiel Neural Networks 2 Morten Nielsen Department of Systems Biology, DTU

2 Outline Optimization procedures –Gradient decent (this you already know) Network training –back propagation –cross-validation –Over-fitting –Examples –Deeplearning

3 Neural network. Error estimate I1I1 I2I2 w1w1 w2w2 Linear function o

4 Neural networks

5 Gradient decent (from wekipedia) Gradient descent is based on the observation that if the real-valued function F(x) is defined and differentiable in a neighborhood of a point a, then F(x) decreases fastest if one goes from a in the direction of the negative gradient of F at a. It follows that, if for  > 0 a small enough number, then F(b)<F(a)

6 Gradient decent (example)

7

8 Gradient decent. Example Weights are changed in the opposite direction of the gradient of the error I1I1 I2I2 w1w1 w2w2 Linear function o

9 Network architecture Input layer Hidden layer Output layer IkIk hjhj HjHj o O v jk wjwj

10 What about the hidden layer?

11 Hidden to output layer

12

13

14 Input to hidden layer

15 Summary

16 Or

17 I i =X[0][k] H j =X[1][j] O i =X[2][i]

18 Can you do it your self? v 22 =1 v 12 =1 v 11 =1 v 21 =-1 w 1 =-1 w 2 =1 h2H2h2H2 h1H1h1H1 oOoO I 1 =1I 2 =1 What is the output (O) from the network? What are the  w ij and  v jk values if the target value is 0 and  =0.5?

19 Can you do it your self (  =0.5). Has the error decreased? v 22 =1 v 12 =1 v 11 =1 v 21 =-1 w 1 =-1 w 2 =1 h2=H2=h2=H2= h 1= H 1 = o= O= I 1 =1I 2 =1 v 22 =. v 12 = V 11 = v 21 = w1=w1= w2=w2= h2=H2=h2=H2= h1=H1=h1=H1= o= O= I 1 =1I 2 =1 Before After

20 h 1 =2 H 1 =0.88 Can you do it your self (  =0.5). Has the error decreased? v 22 =. v 12 = V 11 = v 21 = w1=w1= w2=w2= h2=H2=h2=H2= h1=H1=h1=H1= o= O= I 1 =1I 2 =1 Before After v 22 =1 v 12 =1 v 11 =1 v 21 =-1 w 1 =-1 w 2 =1 h 2 =0 H 2 =0.5 o=-0.38 O=0.41 I 1 =1I 2 =1

21 Can you do it your self? v 22 =1 v 12 =1 v 11 =1 v 21 =-1 w 1 =-1 w 2 =1 h 2 =0 H 2 =0.5 h 1 =2 H 1 =0.88 o=-0.38 O=0.41 I 1 =1I 2 =1 What is the output (O) from the network? What are the  w ij and  v jk values if the target value is 0?

22 Can you do it your self (  =0.5). Has the error decreased? v 22=1 v 12 =1 v 11=1 v 21 =-1 w 1 =-1 w 2 =1 h 2 =0 H 2 =0.5 h 1 =2 H 1 =0.88 o=-0.38 O=0.41 I 1 =1I 2 =1 v 22 =.0.99 v 12 =1.005 v 11 =1.005 v 21 =-1.01 w 1 =-1.043 w 2 =0.975 h 2 =-0.02 H 2 =0.495 h 1 =2.01 H 1 =0.882 o=-0.44 O=0.39 I 1 =1I 2 =1

23 Sequence encoding Change in weight is linearly dependent on input value “True” sparse encoding (i.e 1/0) is therefore highly inefficient Sparse is most often encoded as –+1/-1 or 0.9/0.05

24 Sequence encoding - rescaling Rescaling the input values If the input (o or h) is too large or too small, g´ is zero and the weights are not changed. Optimal performance is when o,h are close to 0.5

25 Training and error reduction 

26

27  Size matters

28 Example

29 Neural network training. Cross validation Cross validation Train on 4/5 of data Test on 1/5 => Produce 5 different neural networks each with a different prediction focus

30 Neural network training curve Maximum test set performance Most cable of generalizing

31 5 fold training Which network to choose?

32 5 fold training

33 Conventional 5 fold cross validation

34 “Nested” 5 fold cross validation

35 When to be careful When data is scarce, the performance obtained used “conventional” versus “Nested” cross validation can be very large When data is abundant the difference is small, and “nested” cross validation might even be higher than “conventional” cross validation due to the ensemble aspect of the “nested” cross validation approach

36 Do hidden neurons matter? The environment matters NetMHCpan

37 Context matters FMIDWILDA YFAMYGEKVAHTHVDTLYVRYHYYTWAVLAYTWY 0.89 A0201 FMIDWILDA YFAMYQENMAHTDANTLYIIYRDYTWVARVYRGY 0.08 A0101 DSDGSFFLY YFAMYGEKVAHTHVDTLYVRYHYYTWAVLAYTWY 0.08 A0201 DSDGSFFLY YFAMYQENMAHTDANTLYIIYRDYTWVARVYRGY 0.85 A0101

38 Example

39 Summary Gradient decent is used to determine the updates for the synapses in the neural network Some relatively simple math defines the gradients –Networks without hidden layers can be solved on the back of an envelope (SMM exercise) –Hidden layers are a bit more complex, but still ok Always train networks using a test set to stop training –Be careful when reporting predictive performance Use “nested” cross-validation for small data sets And hidden neurons do matter (sometimes)

40 And some more stuff for the long cold winter nights Can it might be made differently?

41 Predicting accuracy Can it be made differently? Reliability

42 Identification of position specific receptor ligand interactions by use of artificial neural network decomposition. An investigation of interactions in the MHC:peptide system Master thesis by Frederik Otzen Bagger and Piotr Chmura Making sense of ANN weights

43

44

45

46

47

48

49 Deep learning http://www.slideshare.net/hammawan/deep-neural-networks

50 Deep(er) Network architecture

51 Deeper Network architecture IlIl Input layer, l 1. Hidden layer, k Output layer h1kh1k H1kH1k h3h3 H3H3 u kl wjwj 2. Hidden layer, j v jk h2jh2j H2jH2j

52 Network architecture (hidden to hidden)

53 Network architecture (input to hidden)

54

55 Speed. Use delta’s j k l jj v jk u kl kk Bishop, Christopher (1995). Neural networks for pattern recognition. Oxford: Clarendon Press. ISBN 0-19-853864-2.ISBN 0-19-853864-2.

56 Use delta’s

57 Deep learning – time is not an issue

58 Deep learning http://www.slideshare.net/hammawan/deep-neural-networks

59 Deep learning http://www.slideshare.net/hammawan/deep-neural-networks

60 Deep learning http://www.slideshare.net/hammawan/deep-neural-networks

61 Deep learning http://www.slideshare.net/hammawan/deep-neural-networks

62 Deep learning http://www.slideshare.net/hammawan/deep-neural-networks Auto encoder

63 Deep learning http://www.slideshare.net/hammawan/deep-neural-networks

64 Deep learning http://www.slideshare.net/hammawan/deep-neural-networks

65 Deep learning http://www.slideshare.net/hammawan/deep-neural-networks

66 Deep learning http://www.slideshare.net/hammawan/deep-neural-networks

67 Deep learning http://www.slideshare.net/hammawan/deep-neural-networks

68 Pan-specific prediction methods NetMHC NetMHCpan

69 Example Peptide Amino acids of HLA pockets HLA Aff VVLQQHSIA YFAVLTWYGEKVHTHVDTLVRYHY A0201 0.131751 SQVSFQQPL YFAVLTWYGEKVHTHVDTLVRYHY A0201 0.487500 SQCQAIHNV YFAVLTWYGEKVHTHVDTLVRYHY A0201 0.364186 LQQSTYQLV YFAVLTWYGEKVHTHVDTLVRYHY A0201 0.582749 LQPFLQPQL YFAVLTWYGEKVHTHVDTLVRYHY A0201 0.206700 VLAGLLGNV YFAVLTWYGEKVHTHVDTLVRYHY A0201 0.727865 VLAGLLGNV YFAVWTWYGEKVHTHVDTLLRYHY A0202 0.706274 VLAGLLGNV YFAEWTWYGEKVHTHVDTLVRYHY A0203 1.000000 VLAGLLGNV YYAVLTWYGEKVHTHVDTLVRYHY A0206 0.682619 VLAGLLGNV YYAVWTWYRNNVQTDVDTLIRYHY A6802 0.407855

70 Going Deep – One hidden layer Train Test

71 Going Deep – 3 hidden layers Train Test

72 Going Deep – more than 3 hidden layers Train Test

73 Going Deep – Using Auto-encoders Train Test

74 Deep learning http://www.slideshare.net/hammawan/deep-neural-networks

75 Conclusions -2 Implementing Deep networks using deltas 1 makes CPU time scale linearly with respect to the number of weights –So going Deep is not more CPU intensive than going wide Back-propagation is an efficient method for NN training for shallow networks with up to 3 hidden layers For deeper network, pre-training is required using for instance Auto-encoders 1 Bishop, Christopher (1995). Neural networks for pattern recognition. Oxford: Clarendon Press. ISBN 0-19-853864-2.ISBN 0-19-853864-2.


Download ppt "Artificiel Neural Networks 2 Morten Nielsen Department of Systems Biology, DTU."

Similar presentations


Ads by Google