Biological neuron artificial neuron.

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

Multi-Layer Perceptron (MLP)
Slides from: Doug Gray, David Poole
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Reading for Next Week Textbook, Section 9, pp A User’s Guide to Support Vector Machines (linked from course website)
CS Perceptrons1. 2 Basic Neuron CS Perceptrons3 Expanded Neuron.
Classification Neural Networks 1
Machine Learning Neural Networks
Overview over different methods – Supervised Learning
Artificial Neural Networks
Neural Networks.
Simple Neural Nets For Pattern Classification
Carla P. Gomes CS4700 CS 4700: Foundations of Artificial Intelligence Prof. Carla P. Gomes Module: Neural Networks Expressiveness.
Artificial Neural Networks
October 14, 2010Neural Networks Lecture 12: Backpropagation Examples 1 Example I: Predicting the Weather We decide (or experimentally determine) to use.
Data Mining with Neural Networks (HK: Chapter 7.5)
Artificial Neural Networks
LOGO Classification III Lecturer: Dr. Bo Yuan
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
CS 4700: Foundations of Artificial Intelligence
CS 484 – Artificial Intelligence
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Artificial Neural Networks
Computer Science and Engineering
Artificial Neural Networks
Neural NetworksNN 11 Neural netwoks thanks to: Basics of neural network theory and practice for supervised and unsupervised.
1 Artificial Neural Networks Sanun Srisuk EECP0720 Expert Systems – Artificial Neural Networks.
CS464 Introduction to Machine Learning1 Artificial N eural N etworks Artificial neural networks (ANNs) provide a general, practical method for learning.
Machine Learning Chapter 4. Artificial Neural Networks
START OF DAY 4 Reading: Chap. 3 & 4. Project Topics & Teams Select topics/domains Select teams Deliverables – Description of the problem – Selection.
Machine Learning Dr. Shazzad Hosain Department of EECS North South Universtiy
Classification / Regression Neural Networks 2
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Artificial Intelligence Techniques Multilayer Perceptrons.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi.
Linear Discrimination Reading: Chapter 2 of textbook.
Non-Bayes classifiers. Linear discriminants, neural networks.
Linear Classification with Perceptrons
Neural Networks and Backpropagation Sebastian Thrun , Fall 2000.
CS-424 Gregory Dudek Today’s Lecture Neural networks –Training Backpropagation of error (backprop) –Example –Radial basis functions.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
For Friday No reading Take home exam due Exam 2. For Monday Read chapter 22, sections 1-3 FOIL exercise due.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall Perceptron Rule and Convergence Proof Capacity.
Artificial Neural Network
Multilayer Neural Networks (sometimes called “Multilayer Perceptrons” or MLPs)
Perceptrons Michael J. Watts
Chapter 6 Neural Network.
Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with.
Artificial Neural Network. Introduction Robust approach to approximating real-valued, discrete-valued, and vector-valued target functions Backpropagation.
Artificial Neural Network. Introduction Robust approach to approximating real-valued, discrete-valued, and vector-valued target functions Backpropagation.
Announcements 1. Textbook will be on reserve at library 2. Topic schedule change; modified reading assignment: This week: Linear discrimination, evaluating.
129 Feed-Forward Artificial Neural Networks AMIA 2003, Machine Learning Tutorial Constantin F. Aliferis & Ioannis Tsamardinos Discovery Systems Laboratory.
Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Fall 2004 Backpropagation CS478 - Machine Learning.
Linear separability Hyperplane In 2D: Feature 1 Feature 2 A perceptron can separate data that is linearly separable.
Linear separability Hyperplane In 2D: Feature 1 Feature 2 A perceptron can separate data that is linearly separable.
Classification with Perceptrons Reading:
Artificial Neural Networks
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Artificial Neural Networks
Classification / Regression Neural Networks 2
Classification Neural Networks 1
Lecture Notes for Chapter 4 Artificial Neural Networks
Outline Announcement Neural networks Perceptrons - continued
Presentation transcript:

biological neuron artificial neuron

A two-layer neural network Output layer (activation represents classification) Weighted connections Hidden layer (“internal representation”) Input layer (activations represent feature vector for one training example)

Example: ALVINN (Pomerleau, 1993) ALVINN learns to drive an autonomous vehicle at normal speeds on public highways (!) Input: 30 x 32 grid of pixel intensities from camera

Each output unit correspond to a particular steering direction. The most highly activated one gives the direction to steer.

What kinds of problems are suitable for neural networks? Have sufficient training data Long training times are acceptable Not necessary for humans to understand learned target function or hypothesis

Advantages of neural networks Designed to be parallelized Robust on noisy training data Fast to evaluate new examples “Bad at logic, good at frisbee” (Andy Clark)

Perceptrons . Invented by Rosenblatt, 1950s Single-layer feedforward network: xn wn . o x2 w2 w0 w1 x 1 This can be used for classification: +1 = positive, -1 = negative +1

Consider example with two inputs, x1, x2. Can view trained network as defining “separation line”. What is its equation? We have: x2 + + + + + + + x1

This is a “linear discriminant function” or “linear decision surface”. Weights determine slope and bias determines offset. Can generalize to n-dimensions: Decision surface is an n-1 dimensional “hyperplane”.

Derivation of a learning rule for Perceptrons Let S = {(xi, yi): i = 1, 2, ..., m} be a training set. (Note, xi is a vector of inputs, and yi is  {+1, -1} for binary classification.) Let hw be the perceptron classifier represented by the weight vector w. Definition:

Gradient descent in weight space From T. M. Mitchell, Machine Learning

We want a learning method that will minimize E over S. For each weight wi , change weight in direction of steepest descent:

Perceptron learning rule: Start with random weights, w = (w1, w2, ... , wn). Select training example (x,y)  S. Run the perceptron with input x and weights w to obtain g  {+1, -1}. Let  be the training rate (a user-set parameter). Now, (We leave out the g'(in) since it is the same for all weights.) Go to 2.

The perceptron learning rule has been proven to converge to correct weights in a finite number of steps, provided the training examples are “linearly separable”. Exercise: Give example of training examples that are not linearly separable.

Exclusive-Or In their book Perceptrons, Minsky and Papert proved that a single-layer perceptron cannot represent the two-bit exclusive-or function. x0 x1 x0 xor x1 -1 1

Summed input to output node given by: x2 w2 w1 x1 Decision surface is always a line. w0 +1 XOR is not “linearly separable”.

1969: Minsky and Papert proved that perceptrons cannot represent non-linearly separable target functions. However, they proved that for binary inputs, any transformation can be carried out by adding fully connected hidden layer. In this case, hidden node creates a three-dimensional space defined by two inputs plus hidden node. Can now separate the four points by a plane.

Good news: Adding hidden layer allows more target functions to be represented. Bad news: No algorithm for learning in multi-layered networks, and no convergence theorem! Quote from Perceptrons (Expanded edition, p. 231): “[The perceptron] has many features to attract attention: its linearity; its intriguing learning theorem; its clear paradigmatic simplicity as a kind of parallel computation. There is no reason to suppose that any of these virtues carry over to the many-layered version. Nevertheless, we consider it to be an important research problem to elucidate (or reject) our intuitive judgment that the extension is sterile.”

Two major problems they saw were How can the learning algorithm apportion credit (or blame) to individual weights for incorrect classifications depending on a (sometimes) large number of weights? How can such a network learn useful higher-order features? Good news: Successful credit-apportionment learning algorithms developed soon afterwards (e.g., back-propagation). Still successful, in spite of lack of convergence theorem.

Multi-layer Perceptrons (MLPs) Single-layer perceptrons can only represent linear decision surfaces. Multi-layer perceptrons can represent non-linear decision surfaces:

Differentiable Threshold Units In order to represent non-linear functions, need non-linear activation function at each unit. In order to do gradient descent on weights, need differentiable activation function.

Sigmoid activation function: To do gradient descent in multi-layer networks, need to generalize perceptron learning rule.

Why use sigmoid activation function? Note that sigmoid activation function is non-linear, differentiable, and approximates a sgn function. The derivative of the sigmoid activation function is easily expressed in terms of its output: This is useful in deriving the back-propagation algorithm.

Training multi-layer perceptrons (The back-propagation algorithm) Assume two-layer networks: I. For each training example: Present input to the input layer. Forward propagate the activations times the weights to each node in the hidden layer.

Forward propagate the activations times weights from the hidden layer to the output layer. At each output unit, determine the error E. Run the back-propagation algorithm to update all weights in the network. II. Repeat (I) for a given number of “epochs” or until accuracy on training or test data is acceptable.

Example: Face recognition (From T. M Example: Face recognition (From T. M. Mitchell, Machine Learning, Chapter 4) Task: classify camera images of various people in various poses. Data: Photos, varying: Facial expression: happy, sad, angry, neutral Direction person is facing: left, right, straight ahead, up Wearing sunglasses?: yes, no Within these, variation in background, clothes, position of face for a given person.

an2i_left_angry_open_4 an2i_right_sad_sunglasses_4 glickman_left_angry_open_4

Preprocessing of photo: Create 30x32 coarse resolution version of 120x128 image This makes size of neural network more manageable Input to neural network: Photo is encoded as 30x32 = 960 pixel intensity values, scaled to be in [0,1] One input unit per pixel Output units: Encode classification of input photo

Possible target functions for neural network: Direction person is facing Identity of person Gender of person Facial expression etc. As an example, consider target of “direction person is facing”.

Network architecture . . . left straight right up Output layer: 4 units Classification result is most highly activated output unit. Hidden layer: 3 units Network is fully connected . . . Input layer: 960 units

Target function Target function is: Output unit should have activation 0.9 if it corresponds to correct classification Otherwise output unit should have activation 0.1 Use these values instead of 1 and 0, since sigmoid units can’t produce 1 and 0 activation.

Other parameters Learning rate  = 0.3 Momentum  = 0.3 ( and  were found through trial and error) If these are set too high, training fails to converge on network with acceptable error over training set. If these are set too low, training takes much longer.

Training For maximum of M epochs: For each training example Input photo to network Propagate activations to output units Determine error in output units Adjust weights using back-propagation algorithm Test accuracy of network on validation set. If accuracy is acceptable, stop algorithm. Software and data: http://www.cs.cmu.edu/afs/cs.cmu.edu/user/mitchell/ftp/faces.html

Understanding weight values After training: Weights from input to hidden layer: high positive in certain facial regions “Right” output unit has strong positive from second hidden unit, strong negative from third hidden unit. Second hidden unit has positive weights on right side of face (aligns with bright skin of person turned to right) and negative weights on top of head (aligns with dark hair of person turned to right) Third hidden unit has negative weights on right side of face, so will output value close to zero for person turned to right.

One advantage of neural networks is that “high-level feature extraction” can be done automatically! (Similar idea in face-recognition task. Note that you didn’t have to extract high-level features ahead of time.)

Other topics we won’t cover here Other methods for training the weights Recurrent networks Dynamically modifying network structure