ADVANCED PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.

Slides:



Advertisements
Similar presentations
Learning in Neural and Belief Networks - Feed Forward Neural Network 2001 년 3 월 28 일 안순길.
Advertisements

1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Artificial Neural Networks (1)
Perceptron Learning Rule
NEURAL NETWORKS Perceptron
also known as the “Perceptron”
1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.
PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.
Lecture 13 – Perceptrons Machine Learning March 16, 2010.
Artificial Neural Networks
Biological and Artificial Neurons Michael J. Watts
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Perceptron.
Simple Neural Nets For Pattern Classification
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Neural Networks Reading: Chapter 3 in Nilsson text Lab: Work on assignment 2 (due 19th in class)
CS 4700: Foundations of Artificial Intelligence
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Neural Networks Lecture 8: Two simple learning algorithms
Artificial neural networks:
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
Artificial Intelligence Lecture No. 28 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
Artificial Neural Networks
Neural Networks Ellen Walker Hiram College. Connectionist Architectures Characterized by (Rich & Knight) –Large number of very simple neuron-like processing.
Machine Learning Dr. Shazzad Hosain Department of EECS North South Universtiy
1 Machine Learning The Perceptron. 2 Heuristic Search Knowledge Based Systems (KBS) Genetic Algorithms (GAs)
 Diagram of a Neuron  The Simple Perceptron  Multilayer Neural Network  What is Hidden Layer?  Why do we Need a Hidden Layer?  How do Multilayer.
Artificial Intelligence Lecture No. 29 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
1 Chapter 20 Section Slide Set 2 Perceptron examples Additional sources used in preparing the slides: Nils J. Nilsson’s book: Artificial Intelligence:
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
BOOSTING David Kauchak CS451 – Fall Admin Final project.
Introduction to Artificial Intelligence (G51IAI) Dr Rong Qu Neural Networks.
CS 478 – Tools for Machine Learning and Data Mining Perceptron.
Neural Network Basics Anns are analytical systems that address problems whose solutions have not been explicitly formulated Structure in which multiple.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
CS621 : Artificial Intelligence
For Friday No reading Take home exam due Exam 2. For Monday Read chapter 22, sections 1-3 FOIL exercise due.
Artificial Neural Networks Chapter 4 Perceptron Gradient Descent Multilayer Networks Backpropagation Algorithm 1.
IE 585 History of Neural Networks & Introduction to Simple Learning Rules.
COMP53311 Other Classification Models: Neural Network Prepared by Raymond Wong Some of the notes about Neural Network are borrowed from LW Chan’s notes.
Artificial Neural Networks (ANN). Artificial Neural Networks First proposed in 1940s as an attempt to simulate the human brain’s cognitive learning processes.
1 Perceptron as one Type of Linear Discriminants IntroductionIntroduction Design of Primitive UnitsDesign of Primitive Units PerceptronsPerceptrons.
Perceptrons Michael J. Watts
Artificial Intelligence CIS 342 The College of Saint Rose David Goldschmidt, Ph.D.
Bab 5 Classification: Alternative Techniques Part 4 Artificial Neural Networks Based Classifer.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
1 Neural networks 2. 2 Introduction: Neural networks The nervous system contains 10^12 interconnected neurons.
Computational Intelligence Semester 2 Neural Networks Lecture 2 out of 4.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
Neural networks.
Fall 2004 Backpropagation CS478 - Machine Learning.
Gradient descent David Kauchak CS 158 – Fall 2016.
Artificial Neural Networks
Learning with Perceptrons and Neural Networks
Artificial neural networks:
Real Neurons Cell structures Cell body Dendrites Axon
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
CS 4/527: Artificial Intelligence
Perceptron as one Type of Linear Discriminants
G5AIAI Introduction to AI
Backpropagation David Kauchak CS159 – Fall 2019.
David Kauchak CS51A Spring 2019
David Kauchak CS158 – Spring 2019

Presentation transcript:

ADVANCED PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013

Admin Assignment 2 contest  due Sunday night!

Linear models A linear model in n-dimensional space (i.e. n features) is define by n+1 weights: In two dimensions, a line: In three dimensions, a plane: In n-dimensions, a hyperplane (where b = -a)

Learning a linear classifier f1f1 f2f2 w=(1,0) What does this model currently say?

Learning a linear classifier f1f1 f2f2 w=(1,0) POSITIVE NEGATIVE

Learning a linear classifier f1f1 f2f2 w=(1,0) (-1,1) Is our current guess: right or wrong?

Learning a linear classifier f1f1 f2f2 w=(1,0) (-1,1) predicts negative, wrong How should we update the model?

A closer look at why we got it wrong w1w1 w2w2 We’d like this value to be positive since it’s a positive value (-1, 1, positive) contributed in the wrong direction could have contributed (positive feature), but didn’t decreaseincrease 1 -> 00 -> 1

Learning a linear classifier f1f1 f2f2 w=(0,1) (-1,1) Graphically, this also makes sense!

Learning a linear classifier f1f1 f2f2 w=(0,1) (1,-1) Is our current guess: right or wrong?

Learning a linear classifier f1f1 f2f2 w=(0,1) (1,-1) predicts negative, correct How should we update the model?

Learning a linear classifier f1f1 f2f2 w=(0,1) (1,-1) Already correct… don’t change it!

Learning a linear classifier f1f1 f2f2 w=(0,1) (-1,-1) Is our current guess: right or wrong?

Learning a linear classifier f1f1 f2f2 w=(0,1) (-1,-1) predicts negative, wrong How should we update the model?

A closer look at why we got it wrong w1w1 w2w2 We’d like this value to be positive since it’s a positive value (-1, -1, positive) didn’t contribute, but could have contributed in the wrong direction decrease 0 -> -11 -> 0

Learning a linear classifier f1f1 f2f2 f 1, f 2, label -1,-1, positive -1, 1, positive 1, 1, negative 1,-1, negative w=(-1,0)

Perceptron learning algorithm repeat until convergence (or for some # of iterations): for each training example (f 1, f 2, …, f n, label): check if it’s correct based on the current model if not correct, update all the weights: if label positive and feature positive: increase weight (increase weight = predict more positive) if label positive and feature negative: decrease weight (decrease weight = predict more positive) if label negative and feature positive: decrease weight (decrease weight = predict more negative) if label negative and negative weight: increase weight (increase weight = predict more negative)

A trick… if label positive and feature positive: increase weight (increase weight = predict more positive) if label positive and feature negative: decrease weight (decrease weight = predict more positive) if label negative and feature positive: decrease weight (decrease weight = predict more negative) if label negative and negative weight: increase weight (increase weight = predict more negative) Let positive label = 1 and negative label = -1 label * f i 1*1=1 1*-1=-1 -1*1=-1 -1*-1=1

A trick… if label positive and feature positive: increase weight (increase weight = predict more positive) if label positive and feature negative: decrease weight (decrease weight = predict more positive) if label negative and feature positive: decrease weight (decrease weight = predict more negative) if label negative and negative weight: increase weight (increase weight = predict more negative) Let positive label = 1 and negative label = -1 label * f i 1*1=1 1*-1=-1 -1*1=-1 -1*-1=1

Perceptron learning algorithm repeat until convergence (or for some # of iterations): for each training example (f 1, f 2, …, f n, label): check if it’s correct based on the current model if not correct, update all the weights: for each w i : w i = w i + f i *label b = b + label How do we check if it’s correct?

Perceptron learning algorithm repeat until convergence (or for some # of iterations): for each training example (f 1, f 2, …, f n, label): if prediction * label ≤ 0: // they don’t agree for each w i : w i = w i + f i *label b = b + label

Perceptron learning algorithm repeat until convergence (or for some # of iterations): for each training example (f 1, f 2, …, f n, label): if prediction * label ≤ 0: // they don’t agree for each w i : w i = w i + f i *label b = b + label Would this work for non-binary features, i.e. real-valued?

Your turn repeat until convergence (or for some # of iterations): for each training example (f 1, f 2, …, f n, label): if prediction * label ≤ 0: // they don’t agree for each w i : w i = w i + f i *label f1f1 f2f2 (-1,1) (1,1) (.5,-1) (-1,-1) Repeat until convergence -Keep track of w 1, w 2 as they change -Redraw the line after each step w = (1, 0)

Your turn repeat until convergence (or for some # of iterations): for each training example (f 1, f 2, …, f n, label): if prediction * label ≤ 0: // they don’t agree for each w i : w i = w i + f i *label f1f1 f2f2 (-1,1) (1,1) (-1,-1) w = (0, -1) (.5,-1)

Your turn repeat until convergence (or for some # of iterations): for each training example (f 1, f 2, …, f n, label): if prediction * label ≤ 0: // they don’t agree for each w i : w i = w i + f i *label f1f1 f2f2 (-1,1) (1,1) (-1,-1) w = (-1, 0) (.5,-1)

Your turn repeat until convergence (or for some # of iterations): for each training example (f 1, f 2, …, f n, label): if prediction * label ≤ 0: // they don’t agree for each w i : w i = w i + f i *label f1f1 f2f2 (-1,1) (1,1) (-1,-1) w = (-.5, -1) (.5,-1)

Your turn repeat until convergence (or for some # of iterations): for each training example (f 1, f 2, …, f n, label): if prediction * label ≤ 0: // they don’t agree for each w i : w i = w i + f i *label f1f1 f2f2 (-1,1) (1,1) (-1,-1) w = (-1.5, 0) (.5,-1)

Your turn repeat until convergence (or for some # of iterations): for each training example (f 1, f 2, …, f n, label): if prediction * label ≤ 0: // they don’t agree for each w i : w i = w i + f i *label f1f1 f2f2 (-1,1) (1,1) (-1,-1) w = (-1, -1) (.5,-1)

Your turn repeat until convergence (or for some # of iterations): for each training example (f 1, f 2, …, f n, label): if prediction * label ≤ 0: // they don’t agree for each w i : w i = w i + f i *label f1f1 f2f2 (-1,1) (1,1) (-1,-1) w = (-2, 0) (.5,-1)

Your turn repeat until convergence (or for some # of iterations): for each training example (f 1, f 2, …, f n, label): if prediction * label ≤ 0: // they don’t agree for each w i : w i = w i + f i *label f1f1 f2f2 (-1,1) (1,1) (-1,-1) w = (-1.5, -1) (.5,-1)

Which line will it find?

Only guaranteed to find some line that separates the data

Convergence repeat until convergence (or for some # of iterations): for each training example (f 1, f 2, …, f n, label): if prediction * label ≤ 0: // they don’t agree for each w i : w i = w i + f i *label b = b + label Why do we also have the “some # iterations” check?

Handling non-separable data If we ran the algorithm on this it would never converge!

Convergence repeat until convergence (or for some # of iterations): for each training example (f 1, f 2, …, f n, label): if prediction * label ≤ 0: // they don’t agree for each w i : w i = w i + f i *label b = b + label Also helps avoid overfitting! (This is harder to see in 2-D examples, though)

Ordering repeat until convergence (or for some # of iterations): for each training example (f 1, f 2, …, f n, label): if prediction * label ≤ 0: // they don’t agree for each w i : w i = w i + f i *label b = b + label What order should we traverse the examples? Does it matter?

Order matters What would be a good/bad order?

Order matters: a bad order

Solution?

Ordering repeat until convergence (or for some # of iterations): randomize order or training examples for each training example (f 1, f 2, …, f n, label): if prediction * label ≤ 0: // they don’t agree for each w i : w i = w i + f i *label b = b + label

Improvements What will happen when we examine this example?

Improvements Does this make sense? What if we had previously gone through ALL of the other examples correctly?

Improvements Maybe just move it slightly in the direction of correction

Voted perceptron learning Training - every time a mistake is made on an example: - store the weights (i.e. before changing for current example) - store the number of examples that set of weights got correct Classify - calculate the prediction from ALL saved weights - multiply each prediction by the number it got correct (i.e a weighted vote) and take the sum over all predictions - said another way: pick whichever prediction has the most votes

Voted perceptron learning 3 Vote Training every time a mistake is made on an example: - store the weights - store the number of examples that set of weights got correct

Voted perceptron learning 3 Vote Classify

Voted perceptron learning 3 Vote Classify Prediction POSITIVE NEGATIVE POSITIVE NEGATIVE 8: negative 2: positive

Voted perceptron learning 3 Vote Classify Prediction POSITIVE NEGATIVE POSITIVE NEGATIVE

Voted perceptron learning Works much better in practice Avoids overfitting, though it can still happen Avoids big changes in the result by examples examined at the end of training

Voted perceptron learning Training - every time a mistake is made on an example: - store the weights (i.e. before changing for current example) - store the number of examples that set of weights got correct Classify - calculate the prediction from ALL saved weights - multiply each prediction by the number it got correct (i.e a weighted vote) and take the sum over all predictions - said another way: pick whichever prediction has the most votes Any issues/concerns?

Voted perceptron learning Training - every time a mistake is made on an example: - store the weights (i.e. before changing for current example) - store the number of examples that set of weights got correct Classify - calculate the prediction from ALL saved weights - multiply each prediction by the number it got correct (i.e a weighted vote) and take the sum over all predictions - said another way: pick whichever prediction has the most votes 1. Can require a lot of storage 2. Classifying becomes very, very expensive

Average perceptron 3 Vote The final weights are the weighted average of the previous weights How does this help us?

Average perceptron 3 Vote The final weights are the weighted average of the previous weights Can just keep a running average!

Perceptron learning algorithm repeat until convergence (or for some # of iterations): for each training example (f 1, f 2, …, f n, label): if prediction * label ≤ 0: // they don’t agree for each w i : w i = w i + f i *label b = b + label Why is it called the “perceptron” learning algorithm if what it learns is a line? Why not “line learning” algorithm?

Our Nervous System Neuron

Our nervous system: the computer science view the human brain is a large collection of interconnected neurons a NEURON is a brain cell  collect, process, and disseminate electrical signals  Neurons are connected via synapses  They FIRE depending on the conditions of the neighboring neurons

w is the strength of signal sent between A and B. If A fires and w is positive, then A stimulates B. If A fires and w is negative, then A inhibits B. If a node is stimulated enough, then it also fires. How much stimulation is required is determined by its threshold. Weight w Node ANode B (neuron)

Neural Networks Node (Neuron) Edge (synapses)

Output y Input x 1 Input x 2 Input x 3 Input x 4 Weight w 1 Weight w 2 Weight w 3 Weight w 4 A Single Neuron/Perceptron threshold function

Possible threshold functions hard threshold: if in (the sum of weights) >= threshold 1, 0 otherwise Sigmoid

A Single Neuron/Perceptron ? Threshold of

Threshold of Weighted sum is 0.5, which is not equal or larger than the threshold A Single Neuron/Perceptron

? Threshold of A Single Neuron/Perceptron

Threshold of Weighted sum is 1.5, which is larger than the threshold A Single Neuron/Perceptron

Threshold of Weighted sum is 1.5, which is larger than the threshold A Single Neuron/Perceptron What are the weights and what is b?

History of Neural Networks McCulloch and Pitts (1943) – introduced model of artificial neurons and suggested they could learn Hebb (1949) – Simple updating rule for learning Rosenblatt (1962) - the perceptron model Minsky and Papert (1969) – wrote Perceptrons Bryson and Ho (1969, but largely ignored until 1980s) – invented back-propagation learning for multilayer networks