Fall 2004 Backpropagation CS478 - Machine Learning.

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

Multi-Layer Perceptron (MLP)
Slides from: Doug Gray, David Poole
Learning in Neural and Belief Networks - Feed Forward Neural Network 2001 년 3 월 28 일 안순길.
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Ch. Eick: More on Machine Learning & Neural Networks Different Forms of Learning: –Learning agent receives feedback with respect to its actions (e.g. using.
1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Artificial Neural Networks
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Perceptron.
Overview over different methods – Supervised Learning
Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods.
Artificial Neural Networks Artificial Neural Networks are (among other things) another technique for supervised learning k-Nearest Neighbor Decision Tree.
Back-Propagation Algorithm
Artificial Neural Networks
Artificial Neural Networks
Artificial Neural Networks
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
Computer Science and Engineering
1 Artificial Neural Networks Sanun Srisuk EECP0720 Expert Systems – Artificial Neural Networks.
Neural Networks Ellen Walker Hiram College. Connectionist Architectures Characterized by (Rich & Knight) –Large number of very simple neuron-like processing.
CS464 Introduction to Machine Learning1 Artificial N eural N etworks Artificial neural networks (ANNs) provide a general, practical method for learning.
Machine Learning Chapter 4. Artificial Neural Networks
START OF DAY 4 Reading: Chap. 3 & 4. Project Topics & Teams Select topics/domains Select teams Deliverables – Description of the problem – Selection.
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi.
Artificial Intelligence Chapter 3 Neural Networks Artificial Intelligence Chapter 3 Neural Networks Biointelligence Lab School of Computer Sci. & Eng.
Non-Bayes classifiers. Linear discriminants, neural networks.
Linear Classification with Perceptrons
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
CS621 : Artificial Intelligence
SUPERVISED LEARNING NETWORK
Artificial Neural Network
1 Perceptron as one Type of Linear Discriminants IntroductionIntroduction Design of Primitive UnitsDesign of Primitive Units PerceptronsPerceptrons.
Chapter 6 Neural Network.
Learning: Neural Networks Artificial Intelligence CMSC February 3, 2005.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Machine Learning Supervised Learning Classification and Regression
Neural networks.
Learning with Perceptrons and Neural Networks
Chapter 2 Single Layer Feedforward Networks
第 3 章 神经网络.
Real Neurons Cell structures Cell body Dendrites Axon
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Neural Networks CS 446 Machine Learning.
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Artificial Neural Networks
CS621: Artificial Intelligence
Machine Learning Today: Reading: Maria Florina Balcan
CSC 578 Neural Networks and Deep Learning
Data Mining with Neural Networks (HK: Chapter 7.5)
Disadvantages of Discrete Neurons
Artificial Intelligence Chapter 3 Neural Networks
Perceptron as one Type of Linear Discriminants
Artificial Neural Networks
Neural Network - 2 Mayank Vatsa
Capabilities of Threshold Neurons
Lecture Notes for Chapter 4 Artificial Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Machine Learning: Lecture 4
Machine Learning: UNIT-2 CHAPTER-1
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Seminar on Machine Learning Rada Mihalcea
CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya.
David Kauchak CS158 – Spring 2019
Artificial Intelligence Chapter 3 Neural Networks
Outline Announcement Neural networks Perceptrons - continued
Presentation transcript:

Fall 2004 Backpropagation CS478 - Machine Learning

The Plague of Linear Separability The good news is: Learn-Perceptron is guaranteed to converge to a correct assignment of weights if such an assignment exists The bad news is: Learn-Perceptron can only learn classes that are linearly separable (i.e., separable by a single hyperplane) The really bad news is: There is a very large number of interesting problems that are not linearly separable (e.g., XOR)

Linear Separability Let d be the number of inputs Hence, there are too many functions that escape the algorithm

Historical Perspective The result on linear separability (Minsky & Papert, 1969) virtually put an end to connectionist research The solution was obvious: Since multi-layer networks could in principle handle arbitrary problems, one only needed to design a learning algorithm for them This proved to be a major challenge AI would have to wait over 15 years for a general purpose NN learning algorithm to be devised by Rumelhart in 1986

Towards a Solution Main problem: First thing to do: Learn-Perceptron implements discrete model of error (i.e., identifies the existence of error and adapts to it) First thing to do: Allow nodes to have real-valued activations (amount of error = difference between computed and target output) Second thing to do: Design learning rule that adjusts weights based on error Last thing to do: Use the learning rule to implement a multi-layer algorithm

Real-valued Activation Replace the threshold unit (step function) with a linear unit, where: Error no longer discrete:

Training Error We define the training error of a hypothesis, or weight vector, by: which we will seek to minimize

The Delta Rule Implements gradient descent (i.e., steepest) on the error surface: Note how the xid multiplicative factor implicitly identifies “active” lines as in Learn-Perceptron

Gradient-descent Learning (b) Initialize weights to small random values Repeat Initialize each wi to 0 For each training example <x,t> Compute output o for x For each weight wi wi  wi + (t – o)xi wi  wi + wi

Gradient-descent Learning (i) Initialize weights to small random values Repeat For each training example <x,t> Compute output o for x For each weight wi wi  wi + (t – o)xi

Discussion Gradient-descent learning (with linear units) requires more than one pass through the training set The good news is: Convergence is guaranteed if the problem is solvable The bad news is: Still produces only linear functions Even when used in a multi-layer context Needs to be further generalized!

Non-linear Activation Introduce non-linearity with a sigmoid function: 1. Differentiable (required for gradient-descent) 2. Most unstable in the middle

Sigmoid Function Derivative reaches maximum when output is most unstable. Hence, change will be largest when output is most uncertain.

Multi-layer Feed-forward NN k i k i j k i

Backpropagation (i) Repeat Until (E < CriticalError) Present a training instance Compute error k of output units For each hidden layer Compute error j using error from next layer Update all weights: wij  wij + wij where wij = Oij Until (E < CriticalError)

Error Computation

Example (I) Consider a simple network composed of: 3 inputs: a, b, c 1 hidden node: h 2 outputs: q, r Assume =0.5, all weights are initialized to 0.2 and weight updates are incremental Consider the training set: 1 0 1 – 0 1 0 1 1 – 1 1 4 iterations over the training set

Example (II)

Dealing with Local Minima No guarantee of convergence to the global minimum Use a momentum term: Keep moving through small local (global!) minima or along flat regions Use the incremental/stochastic version of the algorithm Train multiple networks with different starting weights Select best on hold-out validation set Combine outputs (e.g., weighted average)

Discussion 3-layer backpropagation neural networks are Universal Function Approximators Backpropagation is the standard Extensions have been proposed to automatically set the various parameters (i.e., number of hidden layers, number of nodes per layer, learning rate) Dynamic models have been proposed (e.g., ASOCS) Other neural network models exist: Kohonen maps, Hopfield networks, Boltzmann machines, etc.