Artificiel Neural Networks 2 Morten Nielsen Department of Systems Biology, DTU.

Slides:

Advertisements

Similar presentations

NEURAL NETWORKS Backpropagation Algorithm

Advertisements

1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)

Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.

Tuomas Sandholm Carnegie Mellon University Computer Science Department

Artificial Neural Networks 2 Morten Nielsen BioSys, DTU.

Optimization methods Morten Nielsen Department of Systems Biology, DTU.

Optimization methods Morten Nielsen Department of Systems biology, DTU.

Biological sequence analysis and information processing by artificial neural networks.

PROTEIN SECONDARY STRUCTURE PREDICTION WITH NEURAL NETWORKS.

Artificial Neural Networks 2 Morten Nielsen Depertment of Systems Biology, DTU.

An introduction to: Deep Learning aka or related to Deep Neural Networks Deep Structural Learning Deep Belief Networks etc,

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Biological sequence analysis and information processing by artificial neural networks Morten Nielsen CBS.

Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Artificial Neural Networks Thomas Nordahl Petersen & Morten Nielsen.

Back-Propagation Algorithm

Chapter 6: Multilayer Neural Networks

Biological sequence analysis and information processing by artificial neural networks.

Project list 1.Peptide MHC binding predictions using position specific scoring matrices including pseudo counts and sequences weighting clustering (Hobohm)

Artificial Neural Networks

CS 484 – Artificial Intelligence

1 Introduction to Artificial Neural Networks Andrew L. Nelson Visiting Research Faculty University of South Florida.

Artificial Neural Networks

Classification Part 3: Artificial Neural Networks

Neural Networks AI – Week 23 Sub-symbolic AI Multi-Layer Neural Networks Lee McCluskey, room 3/10

Machine Learning Chapter 4. Artificial Neural Networks

Chapter 3 Neural Network Xiu-jun GONG (Ph. D) School of Computer Science and Technology, Tianjin University

11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering

Introduction to Artificial Neural Network Models Angshuman Saha Image Source: ww.physiol.ucl.ac.uk/fedwards/ ca1%20neuron.jpg.

Machine Learning Dr. Shazzad Hosain Department of EECS North South Universtiy

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4.

Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.

A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is: By gradient descent. x0x0 + -

CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 31: Feedforward N/W; sigmoid.

Artificiel Neural Networks 2 Morten Nielsen Department of Systems Biology, DTU IIB-INTECH, UNSAM, Argentina.

Artificial Intelligence Chapter 3 Neural Networks Artificial Intelligence Chapter 3 Neural Networks Biointelligence Lab School of Computer Sci. & Eng.

Neural Networks and Backpropagation Sebastian Thrun , Fall 2000.

Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:

CS621 : Artificial Intelligence

Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4.

Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.

Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22

Image Source: ww.physiol.ucl.ac.uk/fedwards/ ca1%20neuron.jpg

Artificiel Neural Networks 3 Tricks for improved learning Morten Nielsen Department of Systems Biology, DTU.

Optimization methods Morten Nielsen Department of Systems biology, DTU IIB-INTECH, UNSAM, Argentina.

Introduction to Neural Networks Freek Stulp. 2 Overview Biological Background Artificial Neuron Classes of Neural Networks 1. Perceptrons 2. Multi-Layered.

Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.

Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.

Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.

Supervised Learning in ANNs

an introduction to: Deep Learning

One-layer neural networks Approximation problems

第 3 章神经网络.

Prof. Carolina Ruiz Department of Computer Science

Machine Learning Today: Reading: Maria Florina Balcan

CSC 578 Neural Networks and Deep Learning

of the Artificial Neural Networks.

Artificial Intelligence Chapter 3 Neural Networks

Neural Networks and Deep Learning

Artificial Intelligence Chapter 3 Neural Networks

Artificial Intelligence Chapter 3 Neural Networks

Artificial Intelligence Chapter 3 Neural Networks

Seminar on Machine Learning Rada Mihalcea

An introduction to: Deep Learning aka or related to Deep Neural Networks Deep Structural Learning Deep Belief Networks etc,

Practical session on neural network modelling

Artificial Intelligence Chapter 3 Neural Networks

Prof. Carolina Ruiz Department of Computer Science

Presentation transcript:

Artificiel Neural Networks 2 Morten Nielsen Department of Systems Biology, DTU

Outline Optimization procedures –Gradient decent (this you already know) Network training –back propagation –cross-validation –Over-fitting –Examples –Deeplearning

Neural network. Error estimate I1I1 I2I2 w1w1 w2w2 Linear function o

Neural networks

Gradient decent (from wekipedia) Gradient descent is based on the observation that if the real-valued function F(x) is defined and differentiable in a neighborhood of a point a, then F(x) decreases fastest if one goes from a in the direction of the negative gradient of F at a. It follows that, if for  > 0 a small enough number, then F(b)<F(a)

Gradient decent (example)

Gradient decent. Example Weights are changed in the opposite direction of the gradient of the error I1I1 I2I2 w1w1 w2w2 Linear function o

Network architecture Input layer Hidden layer Output layer IkIk hjhj HjHj o O v jk wjwj

What about the hidden layer?

Hidden to output layer

Input to hidden layer

Summary

Or

I i =X[0][k] H j =X[1][j] O i =X[2][i]

Can you do it your self? v 22 =1 v 12 =1 v 11 =1 v 21 =-1 w 1 =-1 w 2 =1 h2H2h2H2 h1H1h1H1 oOoO I 1 =1I 2 =1 What is the output (O) from the network? What are the  w ij and  v jk values if the target value is 0 and  =0.5?

Can you do it your self (  =0.5). Has the error decreased? v 22 =1 v 12 =1 v 11 =1 v 21 =-1 w 1 =-1 w 2 =1 h2=H2=h2=H2= h 1= H 1 = o= O= I 1 =1I 2 =1 v 22 =. v 12 = V 11 = v 21 = w1=w1= w2=w2= h2=H2=h2=H2= h1=H1=h1=H1= o= O= I 1 =1I 2 =1 Before After

h 1 =2 H 1 =0.88 Can you do it your self (  =0.5). Has the error decreased? v 22 =. v 12 = V 11 = v 21 = w1=w1= w2=w2= h2=H2=h2=H2= h1=H1=h1=H1= o= O= I 1 =1I 2 =1 Before After v 22 =1 v 12 =1 v 11 =1 v 21 =-1 w 1 =-1 w 2 =1 h 2 =0 H 2 =0.5 o=-0.38 O=0.41 I 1 =1I 2 =1

Can you do it your self? v 22 =1 v 12 =1 v 11 =1 v 21 =-1 w 1 =-1 w 2 =1 h 2 =0 H 2 =0.5 h 1 =2 H 1 =0.88 o=-0.38 O=0.41 I 1 =1I 2 =1 What is the output (O) from the network? What are the  w ij and  v jk values if the target value is 0?

Can you do it your self (  =0.5). Has the error decreased? v 22=1 v 12 =1 v 11=1 v 21 =-1 w 1 =-1 w 2 =1 h 2 =0 H 2 =0.5 h 1 =2 H 1 =0.88 o=-0.38 O=0.41 I 1 =1I 2 =1 v 22 =.0.99 v 12 =1.005 v 11 =1.005 v 21 =-1.01 w 1 = w 2 =0.975 h 2 =-0.02 H 2 =0.495 h 1 =2.01 H 1 =0.882 o=-0.44 O=0.39 I 1 =1I 2 =1

Sequence encoding Change in weight is linearly dependent on input value “True” sparse encoding (i.e 1/0) is therefore highly inefficient Sparse is most often encoded as –+1/-1 or 0.9/0.05

Sequence encoding - rescaling Rescaling the input values If the input (o or h) is too large or too small, g´ is zero and the weights are not changed. Optimal performance is when o,h are close to 0.5

Training and error reduction 



 Size matters

Example

Neural network training. Cross validation Cross validation Train on 4/5 of data Test on 1/5 => Produce 5 different neural networks each with a different prediction focus

Neural network training curve Maximum test set performance Most cable of generalizing

5 fold training Which network to choose?

5 fold training

Conventional 5 fold cross validation

“Nested” 5 fold cross validation

When to be careful When data is scarce, the performance obtained used “conventional” versus “Nested” cross validation can be very large When data is abundant the difference is small, and “nested” cross validation might even be higher than “conventional” cross validation due to the ensemble aspect of the “nested” cross validation approach

Do hidden neurons matter? The environment matters NetMHCpan

Context matters FMIDWILDA YFAMYGEKVAHTHVDTLYVRYHYYTWAVLAYTWY 0.89 A0201 FMIDWILDA YFAMYQENMAHTDANTLYIIYRDYTWVARVYRGY 0.08 A0101 DSDGSFFLY YFAMYGEKVAHTHVDTLYVRYHYYTWAVLAYTWY 0.08 A0201 DSDGSFFLY YFAMYQENMAHTDANTLYIIYRDYTWVARVYRGY 0.85 A0101

Example

Summary Gradient decent is used to determine the updates for the synapses in the neural network Some relatively simple math defines the gradients –Networks without hidden layers can be solved on the back of an envelope (SMM exercise) –Hidden layers are a bit more complex, but still ok Always train networks using a test set to stop training –Be careful when reporting predictive performance Use “nested” cross-validation for small data sets And hidden neurons do matter (sometimes)

And some more stuff for the long cold winter nights Can it might be made differently?

Predicting accuracy Can it be made differently? Reliability

Identification of position specific receptor ligand interactions by use of artificial neural network decomposition. An investigation of interactions in the MHC:peptide system Master thesis by Frederik Otzen Bagger and Piotr Chmura Making sense of ANN weights

Deep learning

Deep(er) Network architecture

Deeper Network architecture IlIl Input layer, l 1. Hidden layer, k Output layer h1kh1k H1kH1k h3h3 H3H3 u kl wjwj 2. Hidden layer, j v jk h2jh2j H2jH2j

Network architecture (hidden to hidden)

Network architecture (input to hidden)

Speed. Use delta’s j k l jj v jk u kl kk Bishop, Christopher (1995). Neural networks for pattern recognition. Oxford: Clarendon Press. ISBN ISBN

Use delta’s

Deep learning – time is not an issue

Deep learning

Deep learning

Deep learning

Deep learning

Deep learning Auto encoder

Deep learning

Deep learning

Deep learning

Deep learning

Deep learning

Pan-specific prediction methods NetMHC NetMHCpan

Example Peptide Amino acids of HLA pockets HLA Aff VVLQQHSIA YFAVLTWYGEKVHTHVDTLVRYHY A SQVSFQQPL YFAVLTWYGEKVHTHVDTLVRYHY A SQCQAIHNV YFAVLTWYGEKVHTHVDTLVRYHY A LQQSTYQLV YFAVLTWYGEKVHTHVDTLVRYHY A LQPFLQPQL YFAVLTWYGEKVHTHVDTLVRYHY A VLAGLLGNV YFAVLTWYGEKVHTHVDTLVRYHY A VLAGLLGNV YFAVWTWYGEKVHTHVDTLLRYHY A VLAGLLGNV YFAEWTWYGEKVHTHVDTLVRYHY A VLAGLLGNV YYAVLTWYGEKVHTHVDTLVRYHY A VLAGLLGNV YYAVWTWYRNNVQTDVDTLIRYHY A

Going Deep – One hidden layer Train Test

Going Deep – 3 hidden layers Train Test

Going Deep – more than 3 hidden layers Train Test

Going Deep – Using Auto-encoders Train Test

Deep learning

Conclusions -2 Implementing Deep networks using deltas 1 makes CPU time scale linearly with respect to the number of weights –So going Deep is not more CPU intensive than going wide Back-propagation is an efficient method for NN training for shallow networks with up to 3 hidden layers For deeper network, pre-training is required using for instance Auto-encoders 1 Bishop, Christopher (1995). Neural networks for pattern recognition. Oxford: Clarendon Press. ISBN ISBN