Back Propagation and Representation in PDP Networks

Slides:



Advertisements
Similar presentations
Slides from: Doug Gray, David Poole
Advertisements

NEURAL NETWORKS Backpropagation Algorithm
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.
Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods.
The back-propagation training algorithm
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
CSC2535: 2013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton.
Using Backprop to Understand Apects of Cognitive Development PDP Class Feb 8, 2010.
Artificial Neural Networks
Artificial Neural Networks
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
Back Propagation and Representation in PDP Networks Psychology 209 February 6, 2013.
A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is: By gradient descent. x0x0 + -
Multi-Layer Perceptron
Non-Bayes classifiers. Linear discriminants, neural networks.
11 1 Backpropagation Multilayer Perceptron R – S 1 – S 2 – S 3 Network.
Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.
Backpropagation Training
EEE502 Pattern Recognition
Learning: Neural Networks Artificial Intelligence CMSC February 3, 2005.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Neural networks.
Back Propagation and Representation in PDP Networks
Fall 2004 Backpropagation CS478 - Machine Learning.
Deep Feedforward Networks
The Gradient Descent Algorithm
Learning with Perceptrons and Neural Networks
第 3 章 神经网络.
Real Neurons Cell structures Cell body Dendrites Axon
Ranga Rodrigo February 8, 2014
A Simple Artificial Neuron
CSE 473 Introduction to Artificial Intelligence Neural Networks
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Neural Networks CS 446 Machine Learning.
Classification with Perceptrons Reading:
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
CSE P573 Applications of Artificial Intelligence Neural Networks
Simple learning in connectionist networks
CSE 473 Introduction to Artificial Intelligence Neural Networks
CS621: Artificial Intelligence
Machine Learning Today: Reading: Maria Florina Balcan
CSC 578 Neural Networks and Deep Learning
Biological and Artificial Neuron
Biological and Artificial Neuron
CSC 578 Neural Networks and Deep Learning
Artificial Intelligence Chapter 3 Neural Networks
Biological and Artificial Neuron
CSE 573 Introduction to Artificial Intelligence Neural Networks
Neural Networks Chapter 5
Neural Network - 2 Mayank Vatsa
Multilayer Perceptron & Backpropagation
Convolutional networks
Neural Networks Geoff Hulten.
Artificial Intelligence Chapter 3 Neural Networks
Backpropagation.
Backpropagation.
Artificial neurons Nisheeth 10th January 2019.
Artificial Intelligence Chapter 3 Neural Networks
Back Propagation and Representation in PDP Networks
Backpropagation David Kauchak CS159 – Fall 2019.
Artificial Intelligence Chapter 3 Neural Networks
Simple learning in connectionist networks
Backpropagation.
CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya.
Artificial Neural Networks / Spring 2002
Outline Announcement Neural networks Perceptrons - continued
Presentation transcript:

Back Propagation and Representation in PDP Networks Psychology 209 Jan 24, 2017

Why is back propagation important? Provides a procedure that allows networks to learn weights that can solve any deterministic input-output problem. Contrary to expectation, it does not get stuck in local minima except in cases where the network is exceptionally tightly constrained. Allows networks to learn how to represent information as well as how to use it. Raises questions about the nature of representations and of what must be specified in order to learn them. It is the engine behind deep learning! It allows us to capture abilities that only humans had until very recently.

The Perceptron For input pattern p, teacher tp and output op, change the threshold And weights as follows: Note: including bias = -q in net and using threshold of 0, then treating bias as a weight from a unit that is always on is equivalent

AND, OR, XOR

Adding a unit to make XOR solvable

LMS Associator Output is a linear function of inputs and weights: Find learning rule to minimize the Summed squared Error: Change weight proportional to to its effect on the E for each P: After we do the math: We ignore the factor of 2 and just think in terms of the learning rate e

LMS Associator – Tensorflow Version Output is a linear function of inputs and weights: We want to minimize the Sum Squared Error: Change weight proportional to to its effect on the E for one P: Convention: wij refers to the weight to unit i from unit j. We sometimes write wrs where r stands for receiver and s stands for sender.    

Error Surface for OR function in LMS Associator

What if we want to learn how to solve xor? We need to figure out how to adjust the weights into the ‘hidden’ unit, following the principle of gradient descent:

We start with an even simpler problem Assume units are linear, both weights = .5 and, i = 1, t = 1. We use the chain rule to calculate for each weight. Non-linear hidden units are necessary in general, but understanding learning in linear networks is useful to support a general understanding of the non-linear case

The logistic function and its derivative     Activation Net input

The Non-Linear 1:1:1 Network Consider network below, with training patterns: 1->1 0->0 No bias, non-linear activation at hidden and output level. Goodness landscape for this network:

Including the activation function in the chain rule and including more than one output unit leads to:   i   j

Back propagation algorithm Propagate activation forward Propagate activation backward Change the weights Variants: ‘Full Batch Mode’: Accumulate dE/dw’s across all patterns in the training set before changing weights Stochastic gradient descent (batch size = N) Process patterns in permuted order from the training set and adjust weights after each pattern Adjust weights after N patterns

Adding Momentum and Weight Decay Weight update step Gradient descent: є times the gradient for the current pattern Weight decay: w times the weight Momentum: a times the previous weight step  

XOR Problem from Next Homework Uses the network architecture at right. Uses full batch mode with momentum and no weight decay. Will allow you to get a feel for the gradient and other issues and explore effects of variation in parameters. Is implemented in Tensorflow but that will stay under the hood for now. Will be released before the weekend.

Our Screen View for Homework

Is Backprop biologically plausible? Neurons do net send error signals backward across their weights through a chain of neurons, as far as anyone can tell But we shouldn’t be too literal minded about the actual biological implementation of the learning rule. Some neurons appear to use error signals, and there are ways to use differences between activation signals to carry error information