Perceptron as one Type of Linear Discriminants

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

Perceptron Lecture 4.
Slides from: Doug Gray, David Poole
Learning in Neural and Belief Networks - Feed Forward Neural Network 2001 년 3 월 28 일 안순길.
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Introduction to Neural Networks Computing
G53MLE | Machine Learning | Dr Guoping Qiu
Perceptron Learning Rule
Navneet Goyal, BITS-Pilani Perceptrons. Labeled data is called Linearly Separable Data (LSD) if there is a linear decision boundary separating the classes.
Perceptron.
Simple Neural Nets For Pattern Classification
Artificial Neural Networks
Lecture 4 Neural Networks ICS 273A UC Irvine Instructor: Max Welling Read chapter 4.
CS 4700: Foundations of Artificial Intelligence
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Artificial Neural Networks
1 Artificial Neural Networks Sanun Srisuk EECP0720 Expert Systems – Artificial Neural Networks.
CS464 Introduction to Machine Learning1 Artificial N eural N etworks Artificial neural networks (ANNs) provide a general, practical method for learning.
Machine Learning Chapter 4. Artificial Neural Networks
1 Machine Learning The Perceptron. 2 Heuristic Search Knowledge Based Systems (KBS) Genetic Algorithms (GAs)
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi.
Linear Discrimination Reading: Chapter 2 of textbook.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
Artificial Neural Network
1 Perceptron as one Type of Linear Discriminants IntroductionIntroduction Design of Primitive UnitsDesign of Primitive Units PerceptronsPerceptrons.
Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
Neural networks and support vector machines
Fall 2004 Backpropagation CS478 - Machine Learning.
Artificial Neural Networks
Learning with Perceptrons and Neural Networks
Chapter 2 Single Layer Feedforward Networks
Artificial neural networks:
Real Neurons Cell structures Cell body Dendrites Axon
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Neural Networks CS 446 Machine Learning.
Classification with Perceptrons Reading:
Artificial Neural Networks
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Artificial Neural Networks
CS621: Artificial Intelligence
Machine Learning Today: Reading: Maria Florina Balcan
CSC 578 Neural Networks and Deep Learning
Data Mining with Neural Networks (HK: Chapter 7.5)
Neural Networks Advantages Criticism
Classification Neural Networks 1
Chapter 3. Artificial Neural Networks - Introduction -
CSE (c) S. Tanimoto, 2004 Neural Networks
Artificial Neural Networks
Artificial Intelligence Lecture No. 28
CSE (c) S. Tanimoto, 2001 Neural Networks
Lecture Notes for Chapter 4 Artificial Neural Networks
Neural Networks ICS 273A UC Irvine Instructor: Max Welling
Artificial Neural Networks
Machine Learning: Lecture 4
CSE (c) S. Tanimoto, 2002 Neural Networks
Machine Learning: UNIT-2 CHAPTER-1
Artificial Intelligence 12. Two Layer ANNs
Neuro-Computing Lecture 2 Single-Layer Perceptrons
Chapter - 3 Single Layer Percetron
Seminar on Machine Learning Rada Mihalcea
CSE (c) S. Tanimoto, 2007 Neural Nets
Introduction to Neural Network
The McCullough-Pitts Neuron
David Kauchak CS158 – Spring 2019

CSC 578 Neural Networks and Deep Learning
Outline Announcement Neural networks Perceptrons - continued
Presentation transcript:

Perceptron as one Type of Linear Discriminants Introduction Design of Primitive Units Perceptrons What is machine learning?

Artificial Neural Networks are crude attempts to model the Introduction Artificial Neural Networks are crude attempts to model the highly massive parallel and distributed processing we believe takes place in the brain. What is machine learning? Consider: 1) the speed at which the brain recognizes images; 2) the many neurons populating a brain; 3) the speed at which a single neuron transmits signals.

Introduction Neural Network Representation Left Straight Right Output nodes Internal nodes What is machine learning? Input nodes

Introduction Problems for Neural Networks Examples may be described by a large number of attributes (e.g., pixels in an image). The output value can be discrete, continuous, or a vector of either discrete or continuous values. Data may contain errors. The time for training may be extremely long. Evaluating the network for a new example is relatively fast. Interpretability of the final hypothesis is not relevant (the NN is treated as a black box). What is machine learning?

Perceptron as one Type of Linear Discriminants Introduction Design of Primitive Units Perceptrons What is machine learning?

Perceptron as one Type of Linear Discriminants Dendrites Axon Terminal Soma What is machine learning? Axon Nucleus (image copied from wikipedia, entry on “neuron”)

Perceptron as one Type of Linear Discriminants What is machine learning? (image copied from wikipedia, entry on “neuron”)

Design of Primitive Units Perceptrons Definition.- It’s a step function based on a linear combination of real-valued inputs. If the combination is above a threshold it outputs a 1, otherwise it outputs a –1. x1 w1 x2 w2 {1 or –1} Σ What is machine learning? w0 wn xn X0=1

Design of Primitive Units Learning Perceptrons 1 if w0 + w1x1 + w2x2 + … + wnxn > 0 -1 otherwise O(x1,x2,…,xn) = To simplify our notation we can represent the function as follows: O(X) = sgn(WX) where sgn(y) = 1 if y > 0 -1 otherwise What is machine learning? Learning a perceptron means finding the right values for W. The hypothesis space of a perceptron is the space of all weight vectors.

Design of Primitive Units Representational Power A perceptron draws a hyperplane as the decision boundary over the (n-dimensional) input space. Decision boundary (WX = 0) + + What is machine learning? + - - -

Design of Primitive Units Representational Power A perceptron can learn only examples that are called “linearly separable”. These are examples that can be perfectly separated by a hyperplane. + + + + + What is machine learning? - - - + - - - Linearly separable Non-linearly separable

Design of Primitive Units Functions for Perceptrons Perceptrons can learn many boolean functions: AND, OR, NAND, NOR, but not XOR AND: x1 W1=0.5 Σ W2=0.5 What is machine learning? W0 = -0.8 x2 X0=1 Every boolean function can be represented with a perceptron network that has two levels of depth or more.

Design of Primitive Units Design a two-input perceptron that implements the Boolean function X1 OR ~X2 (X1 OR not X2). Assume the independent weight is always +0.5 (assume W0 = +0.5 and X0 = 1). You simply have to provide adequate values for W1 and W2. x1 W1=? What is machine learning? Σ W2=? W0 = +0.5 x2 x0=1

Design of Primitive Units X1 OR ~X2 (X1 OR not X2). x1 W1=? What is machine learning? Σ W2=? W0 = +0.5 x2 x0=1

Design of Primitive Units There are different solutions, but a general solution is that:   W2 < -0.5 W1 > |W2| - 0.5 where |x| is the absolute value of x. x1 (X1 OR not X2). W1=1 What is machine learning? Σ W2= -1 W0 = +0.5 x2 x0=1

Design of Primitive Units Perceptron Algorithms How do we learn the weights of a single perceptron? Perceptron rule Delta rule Algorithm for learning using the perceptron rule: Assign random values to the weight vector Apply the perceptron rule to every training example Are all training examples correctly classified? Yes. Quit No. Go back to Step 2. What is machine learning?

Design of Primitive Units A. Perceptron Rule The perceptron training rule: For a new training example X = (x1, x2, …, xn), update each weight according to this rule: wi = wi + Δwi Where Δwi = η (t-o) xi t: target output o: output generated by the perceptron η: constant called the learning rate (e.g., 0.1) What is machine learning?

Design of Primitive Units A. Perceptron Rule Comments about the perceptron training rule: If the example is correctly classified the term (t-o) equals zero, and no update on the weight is necessary. If the perceptron outputs –1 and the real answer is 1, the weight is increased. If the perceptron outputs a 1 and the real answer is -1, the weight is decreased. Provided the examples are linearly separable and a small value for η is used, the rule is proved to classify all training examples correctly (i.e, is consistent with the training data). What is machine learning?

Historical Background Early attempts to implement artificial neural networks: McCulloch (Neuroscientist) and Pitts (Logician) (1943) Based on simple neurons (MCP neurons) Based on logical functions What is machine learning? Walter Pitts (right) (extracted from Wikipedia)

Historical Background Donald Hebb (1949) The Organization of Behavior. “Neural pathways are strengthened every time they are used.” What is machine learning? Picture of Donald Hebb (extracted from Wikipedia)

Design of Primitive Units B. The Delta Rule What happens if the examples are not linearly separable? To address this situation we try to approximate the real concept using the delta rule. The key idea is to use a gradient descent search. We will try to minimize the following error: E = ½ Σi (ti – oi) 2 where the sum goes over all training examples. Here oi is the inner product WX and not sgn(WX) as with the perceptron algorithm. What is machine learning?

Design of Primitive Units B. The Delta Rule The idea is to find a minimum in the space of weights and the error function E: E(W) What is machine learning? w1 w2

Design of Primitive Units Derivation of the Rule The gradient of E with respect to weight vector W, denoted as E(W) : E(W) is a vector with the partial derivatives of E with respect to each weight wi. Key concept: The gradient vector points in the direction with the steepest increase in E. Δ Δ What is machine learning?

Design of Primitive Units B. The Delta Rule For a new training example X = (x1, x2, …, xn), update each weight according to this rule: wi = wi + Δwi Where η: learning rate (e.g., 0.1) What is machine learning?

Design of Primitive Units The gradient How do we compute E(W)? It is easy to see that So that gives us the following equation: ∆ wi = η Σi (ti – oi) xi Δ What is machine learning?

Design of Primitive Units The algorithm using the Delta Rule Algorithm for learning using the delta rule: Assign random values to the weight vector Continue until the stopping condition is met Initialize each ∆wi to zero For each example: Update ∆wi: ∆wi = ∆wi + n (t – o) xi Update wi: wi = wi + ∆wi Until error is small What is machine learning?

Historical Background Frank Rosenblatt (1962) He showed how to make incremental changes on the strength of the synapses. He invented the Perceptron. Minsky and Papert (1969) Criticized the idea of the perceptron. Could not solve the XOR problem. In addition, training time grows exponentially with the size of the input. What is machine learning? Picture of Marvin Minsky (2008) (extracted from Wikipedia)

Design of Primitive Units Difficulties with Gradient Descent There are two main difficulties with the gradient descent method: Convergence to a minimum may take a long time. 2. There is no guarantee we will find the global minimum. What is machine learning?

Design of Primitive Units Stochastic Approximation Instead of updating every weight until all examples have been observed, we update on every example: ∆ wi = η (t-o) xi In this case we update the weights “incrementally”. Remarks: -) When there are multiple local minima stochastic gradient descent may avoid the problem of getting stuck on a local minimum. -) Standard gradient descent needs more computation but can be used with a larger step size. What is machine learning?

Design of Primitive Units Difference between Perceptron and Delta Rule The perceptron is based on an output from a step function, whereas the delta rule uses the linear combination of inputs directly. The perceptron is guaranteed to converge to a consistent hypothesis assuming the data is linearly separable. The delta rules converges in the limit but it does not need the condition of linearly separable data. What is machine learning?