Neural Networks Sections 19.1 - 19.5. Biological analogy §The brain is composed of a mass of interconnected neurons l each neuron is connected to many.

Slides:



Advertisements
Similar presentations
Multi-Layer Perceptron (MLP)
Advertisements

Beyond Linear Separability
Slides from: Doug Gray, David Poole
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Ch. Eick: More on Machine Learning & Neural Networks Different Forms of Learning: –Learning agent receives feedback with respect to its actions (e.g. using.
Artificial Neural Networks
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Machine Learning Neural Networks
Simple Neural Nets For Pattern Classification
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Artificial Neural Networks
Data Mining with Neural Networks (HK: Chapter 7.5)
Artificial Neural Networks
Some more Artificial Intelligence please read chapter 19.Neural Networks please read chapter 19. Genetic Algorithms Genetic Programming Behavior-Based.
CS 484 – Artificial Intelligence
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Backpropagation Networks
Some more Artificial Intelligence
MSE 2400 EaLiCaRA Spring 2015 Dr. Tom Way
Artificial Neural Networks
Computer Science and Engineering
Artificial Neural Networks
Artificial Neural Nets and AI Connectionism Sub symbolic reasoning.
EE749 Neural Networks An Introduction
Neural Networks AI – Week 23 Sub-symbolic AI Multi-Layer Neural Networks Lee McCluskey, room 3/10
Machine Learning Chapter 4. Artificial Neural Networks
Chapter 3 Neural Network Xiu-jun GONG (Ph. D) School of Computer Science and Technology, Tianjin University
Introduction to Artificial Neural Network Models Angshuman Saha Image Source: ww.physiol.ucl.ac.uk/fedwards/ ca1%20neuron.jpg.
Machine Learning Dr. Shazzad Hosain Department of EECS North South Universtiy
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
ADVANCED PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.
Non-Bayes classifiers. Linear discriminants, neural networks.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
Artificial Neural Networks Chapter 4 Perceptron Gradient Descent Multilayer Networks Backpropagation Algorithm 1.
Image Source: ww.physiol.ucl.ac.uk/fedwards/ ca1%20neuron.jpg
EEE502 Pattern Recognition
Perceptrons Michael J. Watts
Artificial Intelligence CIS 342 The College of Saint Rose David Goldschmidt, Ph.D.
Chapter 6 Neural Network.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Neural Networks References: “Artificial Intelligence for Games” "Artificial Intelligence: A new Synthesis"
Learning: Neural Networks Artificial Intelligence CMSC February 3, 2005.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Today’s Lecture Neural networks Training
Machine Learning Supervised Learning Classification and Regression
Neural networks.
Learning with Perceptrons and Neural Networks
Artificial neural networks:
第 3 章 神经网络.
Real Neurons Cell structures Cell body Dendrites Axon
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Artificial Neural Networks
Machine Learning Today: Reading: Maria Florina Balcan
Data Mining with Neural Networks (HK: Chapter 7.5)
Chapter 3. Artificial Neural Networks - Introduction -
Artificial Intelligence Chapter 3 Neural Networks
Perceptron as one Type of Linear Discriminants
Neural Network - 2 Mayank Vatsa
Artificial Intelligence Chapter 3 Neural Networks
Machine Learning: Lecture 4
Machine Learning: UNIT-2 CHAPTER-1
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
David Kauchak CS158 – Spring 2019
Presentation transcript:

Neural Networks Sections

Biological analogy §The brain is composed of a mass of interconnected neurons l each neuron is connected to many other neurons §Neurons transmit signals to each other §Whether a signal is transmitted is an all-or- nothing event (the electrical potential in the cell body of the neuron is thresholded) §Whether a signal is sent, depends on the strength of the bond (synapse) between two neurons

Neuron

Comparison

Are we studying the wrong stuff? §Humans can’t be doing the sequential analysis we are studying l Neurons are a million times slower than gates l Humans don’t need to be rebooted or debugged when one bit dies.

100-step program constraint §Neurons operate on the order of seconds §Humans can process information in a fraction of a second (face recognition) §Hence, at most a couple of hundred serial operations are possible §That is, even in parallel, no “chain of reasoning” can involve more than steps

Standard structure of an artificial neural network §Input units l represents the input as a fixed-length vector of numbers (user defined) §Hidden units l calculate thresholded weighted sums of the inputs l represent intermediate calculations that the network learns §Output units l represent the output as a fixed length vector of numbers

Representations §Logic rules l If color = red ^ shape = square then + §Decision trees l tree §Nearest neighbor l training examples §Probabilities l table of probabilities §Neural networks l inputs in [0, 1]

Feed-forward vs. Interactive Nets §Feed-forward l activation propagates in one direction l we’ll focus on this §Interactive l activation propagates forward & backwards l propagation continues until equilibrium is reached in the network

Ways of learning with an ANN §Add nodes & connections §Subtract nodes & connections §Modify connection weights l current focus l can simulate first two §I/O pairs: given the inputs, what should the output be? [“typical” learning problem]

History §1943: McCulloch & Pitts show that neurons can be combined to construct a Turing machine (using ANDs, Ors, & NOTs) §1958: Rosenblatt shows that perceptrons will converge if what they are trying to learn can be represented §1969: Minsky & Papert showed the limitations of perceptrons, killing research for a decade §1985: backprop algorithm revitalizes field

Notation

Notation (cont.)

Operation of individual units §Output i = f(W i,j * Input j + W i,k * Input k + W i,l * Input l ) l where f(x) is a threshold (activation) function l f(x) = 1 / (1 + e -Output ) “sigmoid” l f(x) = step function

Perceptron Diagram

Step Function Perceptrons

Sigmoid Perceptron

Perceptron learning rule §Teacher specifies the desired output for a given input §Network calculates what it thinks the output should be §Network changes its weights in proportion to the error between the desired & expected results §  w i,j =  * [teacher i - output i ] * input j l where:  is the learning rate; teacher i - output i is the error term; & input j is the input activation l w i,j = w i,j +  w i,j

2-layer Feed Forward example

Adjusting perceptron weights §  w i,j =  * [teacher i - output i ] * input j l miss i is (teacher i - output i ) §Adjust each w i,j based on input j and miss i

Node biases §A node’s output is a weighted function of its input §How can we learn the bias value? §Answer: treat them like just another weight

Training biases (  ) §A node’s output: l 1 if w 1 x 1 + w 2 x 2 + … + w n x n >=  l 0 otherwise §Rewrite l w 1 x 1 + w 2 x 2 + … + w n x n -  >= 0 l w 1 x 1 + w 2 x 2 + … + w n x n +  (-1) >= 0 §Hence, the bias is just another weight whose activation is always -1 §Just add one more input unit to the network topology

Perceptron convergence theorem §If a set of pairs are learnable (representable), the delta rule will find the necessary weights l in a finite number of steps l independent of initial weights §However, a single layer perceptron can only learn linearly separable concepts l it works iff gradient descent works

Linear separability §Consider a LTU perceptron §Its output is l 1, if W 1 X 1 + W 2 X 2 >  l 0, otherwise §In terms of feature space l hence, it can only classify examples if a line (hyperplane more generally) can separate the positive examples from the negative examples

AND and OR linear Separators

Separation in n-1 dimensions

How do we compute XOR?

Perceptrons & XOR §XOR function l no way to draw a line to separate the positive from negative examples

Multi-Layer Neural Nets Sections

Need for hidden units §If there is one layer of enough hidden units, the input can be recoded (perhaps just memorized; example) §This recoding allows any mapping to be represented §Problem: how can the weights of the hidden units be trained?

XOR Solution

Majority of 11 Inputs (any 6 or more)

Other Examples §Need more than a 1-layer network for: l Parity l Error Correction l Connected Paths §Neural nets do well with l continuous inputs and outputs §But poorly with l logical combinations of boolean inputs

WillWait Restaurant example

N-layer FeedForward Network §Layer 0 is input nodes §Layers 1 to N-1 are hidden nodes §Layer N is output nodes §All nodes at any layer, k are connected to all nodes at layer k+1 §There are no cycles

2 Layer FF net with LTUs §1 output layer + 1 hidden layer l Therefore, 2 stages to “assign reward” §Can compute functions with convex regions §Each hidden node acts like a perceptron, learning a separating line §Output units can compute interections of half-planes given by hidden units

Backpropagation Learning §Method for learning weights in FF nets §Can’t use Perceptron Learning Rule l no teacher values for hidden units §Use gradient descent to minimize the error l propagate deltas to adjust for errors backward from outputs to hidden to inputs

Backprop Algorithm §Initialize weights (typically random!) §Keep doing epochs l foreach example e in training set do forward pass to compute –O = neural-net-output(network,e) –miss = (T-O) at each output unit backward pass to calculate deltas to weights update all weights l end §until tuning set error stops improving

Backward Pass §Compute deltas to weights from hidden layer to output layer §Without changing any weights (yet), compute the actual contributions within the hidden layer(s) and compute deltas

Gradient Descent §Think of the N weights as a point in an N- dimensional space §Add a dimension for the observed error §Try to minimize your position on the “error surface”

Error Surface

Gradient §Trying to make error decrease the fastest §Compute: l Grad_E = [dE/dw1, dE/dw2,..., dE/dwn] §Change ith weight by l delta_wi = -alpha * dE/dwi §We need a derivative! Activation function must be continuous, differentiable, non- decreasing, and easy to compute

Can’t use LTU §To effectively assign credit / blame to units in hidden layers, we want to look at the first derivative of the activation function §Sigmoid function is easy to differentiate and easy to compute forward

Updating hidden-to-output §We have teacher supplied desired values §delta_wji =  * aj * (Ti - Oi) * g’(in_i) =  * aj * (Ti - Oi) * Oi * (1 - Oi) l for sigmoid, g’(x) = g(x) * (1 - g(x))

Updating interior weights §Layer k units provide values to all layer k+1 units l “miss” is sum of misses from all units on k+1 miss_j =  [ ai(1-ai)(Ti-ai)wji] l weights coming into this unit are adjusted based on their contribution delta_kj =  * Ik * aj * (1 - aj) * miss_j

How do we pick  ? §Tuning set, or §Cross validation, or §Small for slow, conservative learning

How many hidden layers? §Usually just one (i.e., a 2-layer net) §How many hidden units in the layer? l Too few => can’t learn l Too many => poor generalization

How big a training set? §Determine your target error rate, e §Success rate is 1-e §Typical training set approx. n/e, where n is the number of weights in the net §Example: l e = 0.1, n = 80 weights l training set size 800 trained until 95% correct training set classification should produce 90% correct classification on testing set (typical)

NETalk (1987) §Mapping character strings into phonemes so they can be pronounced by a computer §Neural network trained how to pronounce each letter in a word in a sentence, given the three letters before & after it [window] §Output was the correct phoneme §Results l 95% accuracy on the training data l 78% accuracy on the test set

Other Examples §Neurogammon (Tesauro & Sejnowski, 1989) l Backgammon learning program §Speech Recognition (Waibel, 1989) §Character Recognition (LeCun et al., 1989) §Face Recognition (Mitchell)

ALVINN §Steer a van down the road l 2-layer feedforward using backprop for learning l Raw input is 480 x 512 pixel image 15x per sec l Color image preprocessed into 960 input units l 4 hidden units l 30 output units, each is a steering direction l Teacher values were gaussian with variance 10

Learning on-the-fly §ALVINN learned as the vehicle traveled l initially by observing a human driving l learns from its own driving by watching for future corrections l never saw bad driving didn’t know what was dangerous, NOT correct computes alternate views of the road (rotations, shifts, and fill-ins) to use as “bad” examples l keeps a buffer pool of 200 pretty old examples to avoid overfitting to only the most recent images