COMP 2208 Dr. Long Tran-Thanh University of Southampton Neural Networks.

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

Artificial Intelligence 12. Two Layer ANNs
Multi-Layer Perceptron (MLP)
Beyond Linear Separability
CSC321: Introduction to Neural Networks and Machine Learning Lecture 24: Non-linear Support Vector Machines Geoffrey Hinton.
Slides from: Doug Gray, David Poole
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Artificial Neural Networks (1)
Perceptron Learning Rule
NEURAL NETWORKS Perceptron
1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Artificial Neural Networks
Perceptron.
Machine Learning Neural Networks
Simple Neural Nets For Pattern Classification
x – independent variable (input)
Neural Networks Marco Loog.
Neural Networks. R & G Chapter Feed-Forward Neural Networks otherwise known as The Multi-layer Perceptron or The Back-Propagation Neural Network.
Back-Propagation Algorithm
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Classification and Prediction: Regression Analysis
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Neural Networks Lecture 8: Two simple learning algorithms
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
Artificial Intelligence Lecture No. 28 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
Neural Networks AI – Week 23 Sub-symbolic AI Multi-Layer Neural Networks Lee McCluskey, room 3/10
Outline What Neural Networks are and why they are desirable Historical background Applications Strengths neural networks and advantages Status N.N and.
Machine Learning Dr. Shazzad Hosain Department of EECS North South Universtiy
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 31: Feedforward N/W; sigmoid.
ADVANCED PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.
Artificial Intelligence Chapter 3 Neural Networks Artificial Intelligence Chapter 3 Neural Networks Biointelligence Lab School of Computer Sci. & Eng.
CS-424 Gregory Dudek Today’s Lecture Neural networks –Training Backpropagation of error (backprop) –Example –Radial basis functions.
CSC321 Introduction to Neural Networks and Machine Learning Lecture 3: Learning in multi-layer networks Geoffrey Hinton.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
CS621 : Artificial Intelligence
1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees.
1 Perceptron as one Type of Linear Discriminants IntroductionIntroduction Design of Primitive UnitsDesign of Primitive Units PerceptronsPerceptrons.
Perceptrons Michael J. Watts
Artificial Intelligence CIS 342 The College of Saint Rose David Goldschmidt, Ph.D.
Chapter 6 Neural Network.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
COMP 2208 Dr. Long Tran-Thanh University of Southampton Revision.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
Today’s Lecture Neural networks Training
Neural networks.
Fall 2004 Backpropagation CS478 - Machine Learning.
Learning with Perceptrons and Neural Networks
第 3 章 神经网络.
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Artificial Intelligence Chapter 3 Neural Networks
Perceptron as one Type of Linear Discriminants
Capabilities of Threshold Neurons
Artificial Intelligence Chapter 3 Neural Networks
Backpropagation.
Artificial Intelligence 12. Two Layer ANNs
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
CSSE463: Image Recognition Day 17
Computer Vision Lecture 19: Object Recognition III
David Kauchak CS158 – Spring 2019
Artificial Intelligence Chapter 3 Neural Networks
Presentation transcript:

COMP 2208 Dr. Long Tran-Thanh University of Southampton Neural Networks

Topics covered in the remaining lectures Classification: Neural networks (W7) K-NN (W8) Decision trees (W8) Search: Local search (W9) Reasoning: Bayes nets and Bayesian inference (W9) Sequential decision making: Markov decision processes (W9) Bandit theory (W10) Applied AI: Robotics + Vision (W10) Collaborative AI (W10)

But before neural nets: a little history John McCarthy, Marvin L. Minsky, Nathaniel Rochester, and Claude E. Shannon (1955)

A little history (cont’d) 7 main key requirements of AI: 1.Automatic computer 2.Language understanding 3.Usage of neuron nets 4.Computational efficiency 5.Self-improvement 6.Abstractions 7.Creativity The concept of learning John McCarthy, Marvin L. Minsky, Nathaniel Rochester, and Claude E. Shannon (1955) The concept of agent

Learning agents Environment Perception Behaviour Agent: anything capable of autonomous functioning in some environment (e.g., people, animals, robots, software agents) Learning: self-improvement through iterative actions / interactions Russel and Norvig book; Wooldridge and Jennings (1995)

Supervised vs. unsupervised learning Supervised learning: Given input X, predict output Y Input X Output Y Training set: a set of examples with correct input-output pairs dog woman? ? woman Labelled data: input data with its correct output

Supervised vs. unsupervised learning (cont’d) Unsupervised learning: Given input X, predict output Y Input X Output Y NO training sets: there is no labelled data Predict outcome of an investment (there’s no “correct” output) Common feature? (we don’t know the correct output) Semi-supervised learning (not covered): mix of supervised and unsupervised

Offline vs. online learning X1 X2 X3 Xn Y1 Y2 Y3 Yn Offline learning: all the inputs are available from the beginning Xn, …X3, X2, X1Yn, …Y3, Y2, Y1 Online learning: inputs come into the system as a stream

Neural networks (finally)

What does a neural network do? Environment Perception Behaviour Categorize inputs Update belief model Update decision making policy Decision making Perception Behaviour

Idea: imitating human brains Why neural nets? 7 main key requirements of AI: 1.Automatic computer 2.Language understanding 3.Usage of neuron nets 4.Computational efficiency 5.Self-improvement 6.Abstractions 7.Creativity John McCarthy, Marvin L. Minsky, Nathaniel Rochester, and Claude E. Shannon (1955) Real neuron

Inspiration from the brain Warren McCulloch and Walter Pitts (1943) "A Logical Calculus of the Ideas Immanent in Nervous Activity". Contains key properties of real neurons: Synaptic weights Cumulative affect Threshold for activation "all or nothing” (neuron fires an output signal if the sum of inputs is above threshold) Neuron X1 X2 X3 Y: Output

w1 neuron X1 X2 X3 Y: Output w2 w3 The perceptron model (Rosenblatt, 1957) Threshold for activation "all or nothing” Cumulative affect Synaptic weights Self-training the weights

Nice! But how does it work? ? ? X1 X2 X3 Y: Output Intuition: Consider a black box that takes numerical inputs, does something to them, and gives a numerical output. We can observe: some input-output pairs Wouldn't it be great to know the generic relationship between inputs and outputs? Regression analysis: estimate relationship f from the observed data

? ? X1 X2 X3 Y: Output Idea 1: what if we consider f as a sum of the inputs? Idea 2: If we allow the possibility of weighting each input differently, we gain some expressivity w1 w2 w3 1 1 b Vector form: … remind you of anything? Not so expressive  Weighted sum of the inputs

Weighted sum of the inputs (cont’d) 1 dimensional version (i.e., there is only 1 input) : equation of a line on the plane y: dependent variable x: independent variable w: coefficient, rate, slope of line b: intercept (where the line crosses the y-axis) Higher dimensions (i.e., more than 1 inputs): hyperplanes Any straight-line (hyperplane) relationship between X and Y can be expressed by our black box

Explaining the linear relationship: Weighted sum of the inputs (cont’d) Positive values of w mean that Y gets bigger as X gets bigger. Negative values of w mean that Y gets smaller as X gets bigger. The value of b tells us what Y should be when X = 0. Problem: in many cases, the relationship is not perfect Not all the points lie on the line Noisy data

Example: basketball ability vs. height

Linear regression No single straight line will match all of the X values (height) to the appropriate Y value (basketball skill). But we can imagine a "line of best fit" through the centre of the cloud of points that summarizes the relationship. This constitutes the statistical technique called linear regression.

Linear regression (cont’d) Which line is the best regression? / How to measure the efficiency of a particular regression line? Idea: method of least squares For a given line Y = wX + b, we can measure the differences between the actual Y values and those predicted by the line. The sum/average of the squared differences between the actual Y values and the predicted ones is a reasonable way to measure goodness of fit. Minimizing this value is the "training method" for regression analysis.

Example: mean squared error Height (X) True ability (Y) Est. ability (Y’) Difference Squared diff Average of squared differences: Mean squared error (MSE) = ( )/4 = 100.5

Back to the basketball example Basketball skill = 0.3*Height

Back to our perceptrons w1 neuron X1 X2 X3 Y: Output w2 w3 f: activation function

Types of activation functions

Basketball example (again) Y = 1 Y = 0

Expressiveness of perceptrons Idea: consider the all-or-nothing threshold function: What sorts of problems can it solve? This suggests a mapping to True and False, i.e., logic problems.

Expressiveness of perceptrons (cont’d) AND gate OR gate

1 f = threshold function X1 X2 1 1 Y: Output f = threshold function X1 X2 1 1 Y: Output Expressiveness of perceptrons (cont’d) Perceptron as: AND gate Perceptron as: OR gate

Training a perceptron So far so good, but how do find the optimal weight values? Well, we can minimise the MSE…But how to do this? Hand-designing the weights: not very practical  We want to train the network by showing it examples and somehow getting it to learn the relevant pattern. This is where the perceptron learning rule (delta rule, Widrow-Hoff rule) comes in.

The Widrow-Hoff learning rule Very simple idea: start with random weights. Present example input to the neuron and calculate the output. Compare output to target value (i.e., y), and nudge each weight slightly in the direction that would have helped to produce the correct output. Repeat until happy with performance. What you need to know is:

Limitations of the perceptron model A perceptron cuts its input space into a "high output” (y = 1) and a "low output” (y = 0) regions. The cut is linear (straight line, hyperplane, etc), so the perceptron can only solve linearly separable problems Linearly separable problems: regions are linearly separable (with one line) in the input space This means that there are problems a perceptron can’t solve

Limitations of the perceptron model (cont’d) Example: XOR gate (Minsky and Papert, 1969)

Limitations of the perceptron model (cont’d)

Multi-layered neural networks How can we overcome this issue? Possible solution: multi-layer neural nets Instead of having inputs feeding directly into output neurons, let’s add some intervening "hidden" neurons in between? The brain is certainly like that. Intuition: If we think of perceptrons as dividing a space into low vs high output with a single line... … then multiple perceptrons = multiple dividing lines Non-linear separation can be approximated by a set of linear lines

Multi-layered neural networks (cont’d)

f f X1 X2 1 1 Y Y f f f f Input layerOutput layer Hidden layers Perceptrons feeding into other perceptrons... Our black box is quite complicated now; can approximate arbitrary functions given enough hidden neurons.

Training multi-layered neural networks This sounds cool! But bow can we train this complex back box? Idea 1: We could use the usual delta-rule approach to train the weights between the last hidden layer and the output layer. Input layer Hidden layer 1 Hidden layer N Output layer Issue: what about the weights of the other hidden layers? Solution: backpropagation of errors (Rumelhart, Hinton, and Williams, 1986)

The backpropagation method An extension of the delta rule: We build an error function such that: E = sum of squared differences between the actual and target output values. We employ a bit of calculus to calculate the partial derivative of E with respect to each weight (we use chain rule to do so) Input layer Hidden layer 1 Hidden layer N Output layer Use a differentiable activation function We can thus know which way we need to "nudge" each weight for a given training example. In practice: we use the sigmoid function

Some further issues of neural networks How fast should the learning rate be? How many hidden neurons do I need for a given problem? Some guidelines available but the only reliable approach is to try different values and see how it goes. How do I get things "just right"? Other issues: Large datasets Large input space Computational issues

Modern time neural nets Another historical sum up: 1. A long time ago in a galaxy far, far away.... (in the ’s) McCulloch-Pitts, perceptron, multi-layer neural nets 2. Minsky and Papert book (1969) The XOR counter example (… I feel disturbance in the force) Were mistakenly believed to conjecture the same limitations for multi-layer NNs 3. Backpropagation (Hinton et al.) – 1980’s A new hope

Still historical sum up 4. Another disturbance: Real-world applications are very complex Requires new solutions to handle large data + complexity 4. Deep learning: Hinton et al., 2007 New heroes

Modern day neural nets: deep learning Main idea of deep learning: transform the input space into higher level abstractions with lower dimensions (unsupervised learning) Multi-layer architecture (typically with many hidden layers) – hence the name deep learning Each layer is responsible for a space transformation step By doing so, the complexity of non-linearity is decreased This is, however, is very expensive. Needs to rely on new computational solutions: GPUs, grid computing

Acknowledgement Thanks to Dr. Brendan Neville for many slides + contents