COMP24111: Machine Learning and Optimisation

Slides:



Advertisements
Similar presentations
also known as the “Perceptron”
Advertisements

Navneet Goyal, BITS-Pilani Perceptrons. Labeled data is called Linearly Separable Data (LSD) if there is a linear decision boundary separating the classes.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Lecture 13 – Perceptrons Machine Learning March 16, 2010.
Lecture 14 – Neural Networks
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Artificial Neural Networks (ANN). Output Y is 1 if at least two of the three inputs are equal to 1.
Artificial Neural Networks
Artificial Neural Nets and AI Connectionism Sub symbolic reasoning.
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
NEURAL NETWORKS FOR DATA MINING
Classification / Regression Neural Networks 2
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Non-Bayes classifiers. Linear discriminants, neural networks.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
Chapter 8: Adaptive Networks
Neural Networks 2nd Edition Simon Haykin
Kim HS Introduction considering that the amount of MRI data to analyze in present-day clinical trials is often on the order of hundreds or.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Today’s Lecture Neural networks Training
Machine Learning Supervised Learning Classification and Regression
Neural networks.
Neural networks and support vector machines
Big data classification using neural network
Multiple-Layer Networks and Backpropagation Algorithms
Neural Network Architecture Session 2
Learning Deep Generative Models by Ruslan Salakhutdinov
Convolutional Neural Network
Deep Feedforward Networks
Deep Learning Amin Sobhani.
The Gradient Descent Algorithm
Learning with Perceptrons and Neural Networks
Artificial neural networks:
A Simple Artificial Neuron
Neural Networks CS 446 Machine Learning.
Classification with Perceptrons Reading:
Intelligent Information System Lab
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Classification / Regression Neural Networks 2
Machine Learning Today: Reading: Maria Florina Balcan
Goodfellow: Chap 6 Deep Feedforward Networks
Classification Neural Networks 1
Artificial Neural Network & Backpropagation Algorithm
Synaptic DynamicsII : Supervised Learning
of the Artificial Neural Networks.
Artificial Intelligence Chapter 3 Neural Networks
[Figure taken from googleblog
Neural Networks Geoff Hulten.
Artificial Intelligence Lecture No. 28
Deep Learning for Non-Linear Control
Lecture Notes for Chapter 4 Artificial Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
COSC 4335: Part2: Other Classification Techniques
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
David Kauchak CS158 – Spring 2019
Introduction to Neural Networks
Image recognition.
Artificial Intelligence Chapter 3 Neural Networks
Overall Introduction for the Lecture
Patterson: Chap 1 A Review of Machine Learning
Presentation transcript:

COMP24111: Machine Learning and Optimisation Chapter 5: Neural Networks and Deep Learning Dr. Tingting Mu Email: tingting.mu@manchester.ac.uk

Outline Understand the perceptron algorithm. Understand the multi-layer perceptron. Understand the back-propagation method. Understand the concept of deep learning.

Neuron Structure Simulating a neuron: each artificial neural network (ANN) neuron receives multiple inputs, and generates one output. Input signals sent from other neurons. A neuron is an electrically excitable cell that processes and transmits information by electro-chemical signaling. Connection strengths determine how the signals are accumulated. If enough signals accumulate, the neuron fires a signal. Figure is from http://2centsapiece.blogspot.co.uk/2015/10/identifying-subatomic-particles-with.html

Single Neuron Model w1 w2 wd x1 x2 xd b (bias) w1x1 w2x2 wdxd y An ANN neuron: multiple inputs [x1, x2,…, xd] and one output y. w1 w2 wd x1 x2 xd b (bias) w1x1 w2x2 wdxd y neuron adder activation Basic elements of a typical neuron include: A set of synapses or connections. Each of these is characterised by a weight (strength). An adder for summing the input signals, weighted by the respective synapses. An activation function, which squashes the permissible amplitude range of the output signal. Given d input, a neuron is modeled by d+1 parameters.

Types of Activation Function Threshold -1 1 Identify function: Threshold function: Sigmoid function (“S”-shaped curve): Rectified linear unit (ReLU): Identity 1 1 Sigmoid Tanh -1 ReLU

The Perceptron Algorithm When the activation function is set as the identify function, the single neuron model becomes the linear model we learned in previous chapters. The neuron weights and bias are equivalent to the coefficient vector of the linear model. When the activation function is set as the threshold function, the model is still linear, and it is known as the perceptron of Rosenblatt (1962). The perceptron algorithm is for two-class classification, and it occupies an import place in the history of pattern recognition algorithms. Identity Threshold -1 1

The Perceptron Algorithm Parameters stored in w are optimised by minimising an error function, called perceptron criterion: If a sample is correctly classified, applies an error penalty of zero; if incorrectly classified, applies an error penalty of the following quantity: We want to reduce the number of misclassified samples, therefore to minimise the above error penalty.

The Perceptron Algorithm Stochastic gradient descent is used for training. Estimate gradient using a misclassified sample: Weight update equation: Update using a misclassified sample in current iteration!

Training Algorithm Perceptron Training: Update weights using only one misclassified sample by: Perceptron Training: What weight changes do the following cases produce? Initialise the weights (stored in w(0)) to random numbers in range -1 to +1. For t = 1 to NUM_ITERATIONS For each training sample (xi,yi) Calculate activation using current weight (stored in w(t)). Update weight (stored in w(t+1) ) by learning rule. end if... (true label = -1, activation output = -1).... then if... (true label = +1, activation output = +1).... then if... (true label = -1, activation output = +1).... then No change No change Add – Add +

One neuron can be used to construct a linear model. x1 x2 xd y an input node One neuron can be used to construct a linear model. It has only one layer (input layer), and is called a single layer perceptron. w1 w2 wd x1 x2 xd b w1x1 w2x2 wdxd y adder activation Input Layer What can many connected neurons achieve?

Adding Hidden Layers! x1 x1 x2 x2 y y xd xd The presence of hidden layers allows to formulate more complex functions. Each hidden node finds a partial solution to the problem to be combined in the next layer. x1 x2 xd y hidden layer 1 layer 2 input layer x1 x2 xd y input layer hidden layer 1 Example:

Multilayer Perceptron A multilayer perceptron (MLP), also called feedforward artificial neural network, consists of at least three layers of nodes (input, hidden and output layers). input layer hidden layer 1 layer 2 output layer Number of neurons in the input layer is equal to the number of input features. Number of hidden layers is a hyperparameter to be set. Numbers of neurons in hidden layers are also hyperparameters to be set. Number of neurons in output layer depends on the task to be solved.

Multilayer Perceptron An MLP example with one hidden layer consisting of four hidden neurons. It takes 9 input features and returns 2 output variables (9 input neurons in input layer, 2 output neuron in output layer). Output of the j-th neuron in the hidden layer (j=1,2,3,4), for the n-th training sample: Output of the k-th neuron in the output layer (k=1,2), for the n-training sample: 10 x 4 =40 weights 5 x 2 =10 weights A total of 40+10 =50 weights to be optimised in this neural network (including bias parameters). Feed-forward information flow when computing the output variables. Hidden layer Output layer yk(n) zj(n) Wjk(o) xi(n) Wij(h) 9+1 weights 4+1 weights j k

Neural Network Training Neural network training is the process of finding the optimal setting of the neural network weights. input layer hidden layer 1 layer 2 output layer

Neural Network Training Neural network training is the process of finding the optimal setting of the neural network weights. Original features x New features φ(x) A neural network can be viewed as a powerful feature extractor to compute an effective representation for the sample, which helps the prediction task input layer hidden layer 1 layer 2 hidden layer 3 prediction layer (new output layer) Loss(φ(x))

Neural Network Training Treating φ(x) as the new features and using these as the input of a linear model, all the objective functions we learned in previous chapters can be used to optimise the neural network weights. Minimising sum-of-squares error ( least squares model, Chapter 2) Minimising a mixture of sum-of-squares error and a reguarlisation term (regularised least squares model, Chapter 2) Maximising (log) likelihood or minimising cross-entropy error (logistic regression, Chapter 3) Optimising a mixture of hinge loss error and separation margin (SVM, Chapter 4) Training (optimisation) methods: stochastic gradient descent, mini- batch gradient descent.

Example: Two-class classification Convert the output of the neural network into a single probability value using the logistic sigmoid function. Optimise neural network weights and prediction parameters w by likelihood maximisation (maxisining the chances of observing the data ). input layer hidden layer 1 layer 2 hidden layer 3 Original features x New features z x1 x2 xd z = φ(x) w Probability of whether it is from a class Use sigmoid function to build the prediction layer z1 zD

Example: Multi-class classification Convert the output of the neural network into a set of c probability values using softmax function. Optimise neural network weights and softmax function parameters, w1,…wc, by likelihood maximisation. w1 w2 wc red green purple probabilities Use softmax function to build the prediction layer input layer hidden layer 1 layer 2 hidden layer 3 Original features x New features z z1 zD x1 x2 xd z = φ(x)

Backpropagation Technically, backpropagation calculates the gradient of the loss function with respect to layers of neural network weights.  It uses chain rule to iteratively compute gradients for each layer. It can be viewed as a process of calculating the error contribution of each neuron after processing a batch of training data.

backpropagation original features x new features z input layer hidden hidden layer 3 original features x new features z backpropagation

Deep Learning Deep learning refers to techniques for learning using neural networks. Deep learning is considered as a kind of representation (feature) learning techniques. more hidden layers Example: AlexNet contains a total of 5 convolutional layers and 3 fully connected layers. The two figures are from Figs. 1.5 and 1.4 of Deep Learning book (I. Goodfellow, et al. 2016).

Popular Neural Networks Convolutional neural networks (CNN) have neurons arranged in 3 ways (dimensions): width, height, depth. This makes it suitable for processing images. It automatically learns a good feature vector for an image from its pixels. NeuralStyle, https://github.com/jcjohnson/neural-style DeepDream, https://deepdreamgenerator.com Recurrent neural network (RNN) is especially useful for learning from sequential data. Each neuron can use its internal memory to maintain information about the previous input. This makes it suitable for processing natural languages, speech, music, etc. PoemGenerator, https://github.com/dvictor/lstm-poetry Other architectures suitable for processing videos, and joint language/text and image learning. NeuralTalk, http://cs.stanford.edu/people/karpathy/neuraltalk/ TalkingMachines, https://deepmind.com/blog/wavenet-generative-model-raw-audio/ Another example: a system learns from images, sound, etc., https://teachablemachine.withgoogle.com

Goodbye! Enjoy your reading week! See you in revision week.