Artificiel Neural Networks 2 Morten Nielsen Department of Systems Biology, DTU IIB-INTECH, UNSAM, Argentina
Outline Optimization procedures –Gradient decent (this you already know) Network training –back propagation –cross-validation –Over-fitting –examples
Neural network. Error estimate I1I1 I2I2 w1w1 w2w2 Linear function o
Neural networks
Gradient decent (from wekipedia) Gradient descent is based on the observation that if the real-valued function F(x) is defined and differentiable in a neighborhood of a point a, then F(x) decreases fastest if one goes from a in the direction of the negative gradient of F at a. It follows that, if for > 0 a small enough number, then F(b)<F(a)
Gradient decent (example)
Gradient decent. Example Weights are changed in the opposite direction of the gradient of the error I1I1 I2I2 w1w1 w2w2 Linear function o
Network architecture Input layer Hidden layer Output layer IkIk hjhj HjHj o O v jk wjwj
What about the hidden layer?
Hidden to output layer
Input to hidden layer
Summary
Or
I i =X[0][k] H j =X[1][j] O i =X[2][i]
Can you do it your self? v 22 =1 v 12 =1 v 11 =1 v 21 =-1 w 1 =-1 w 2 =1 h2H2h2H2 h1H1h1H1 oOoO I 1 =1I 2 =1 What is the output (O) from the network? What are the w ij and v jk values if the target value is 0 and =0.5?
Can you do it your self ( =0.5). Has the error decreased? v 22 =1 v 12 =1 v 11 =1 v 21 =-1 w 1 =-1 w 2 =1 h2=H2=h2=H2= h 1= H 1 = o= O= I 1 =1I 2 =1 v 22 =. v 12 = V 11 = v 21 = w1=w1= w2=w2= h2=H2=h2=H2= h1=H1=h1=H1= o= O= I 1 =1I 2 =1 Before After
Can you do it your self? v 22 =1 v 12 =1 v 11 =1 v 21 =-1 w 1 =-1 w 2 =1 h 2 =0 H 2 =0.5 h 1 =2 H 1 =0.88 o=-0.38 O=0.41 I 1 =1I 2 =1 What is the output (O) from the network? What are the w ij and v jk values if the target value is 0?
Can you do it your self ( =0.5). Has the error decreased? v 22=1 v 12 =1 v 11=1 v 21 =-1 w 1 =-1 w 2 =1 h 2 =0 H 2 =0.5 h 1 =2 H 1 =0.88 o=-0.38 O=0.41 I 1 =1I 2 =1 v 22 =.0.99 v 12 =1.005 v 11 =1.005 v 21 =-1.01 w 1 = w 2 =0.975 h 2 =-0.02 H 2 =0.495 h 1 =2.01 H 1 =0.882 o=-0.44 O=0.39 I 1 =1I 2 =1
Example
Sequence encoding Change in weight is linearly dependent on input value “True” sparse encoding is therefore highly inefficient Sparse is most often encoded as –+1/-1 or 0.9/0.05
Sequence encoding - rescaling Rescaling the input values If the input (o or h) is too large or too small, g´ is zero and the weight are not changed. Optimal performance is when o and h are close to 0.5
Training and error reduction
Size matters
Example
Neural network training. Cross validation Cross validation Train on 4/5 of data Test on 1/5 => Produce 5 different neural networks each with a different prediction focus
Neural network training curve Maximum test set performance Most cable of generalizing
5 fold training Which network to choose?
5 fold training
Conventional 5 fold cross validation
“Nested” 5 fold cross validation
When to be careful When data is scarce, the performance obtained used “conventional” versus “Nested” cross validation can be very large When data is abundant the difference is small, and “nested” cross validation might even be higher than “conventional” cross validation due to the ensemble aspect of the “nested” cross validation approach
Do hidden neurons matter? The environment matters NetMHCpan
Context matters FMIDWILDA YFAMYGEKVAHTHVDTLYVRYHYYTWAVLAYTWY 0.89 A0201 FMIDWILDA YFAMYQENMAHTDANTLYIIYRDYTWVARVYRGY 0.08 A0101 DSDGSFFLY YFAMYGEKVAHTHVDTLYVRYHYYTWAVLAYTWY 0.08 A0201 DSDGSFFLY YFAMYQENMAHTDANTLYIIYRDYTWVARVYRGY 0.85 A0101
Summary Gradient decent is used to determine the updates for the synapses in the neural network Some relatively simple math defines the gradients –Networks without hidden layers can be solved on the back of an envelope (SMM exercise) –Hidden layers are a bit more complex, but still ok Always train networks using a test set to stop training –Be careful when reporting predictive performance Use “nested” cross-validation for small data sets And hidden neurons do matter (sometimes)
And some more stuff for the long cold winter nights Can it might be made differently?
Predicting accuracy Can it be made differently? Reliability
Identification of position specific receptor ligand interactions by use of artificial neural network decomposition. An investigation of interactions in the MHC:peptide system Making sense of ANN weights
Deep learning
Deep learning
Deep learning
Deep learning
Deep learning
Deep learning Auto encoder
Deep learning
Deep learning
Deep learning
Deep learning
Deep learning
Deep learning