Download presentation
Presentation is loading. Please wait.
Published byἈπολλωνία Ράγκος Modified over 6 years ago
1
Perceptron as one Type of Linear Discriminants
Introduction Design of Primitive Units Perceptrons What is machine learning?
2
Artificial Neural Networks are crude attempts to model the
Introduction Artificial Neural Networks are crude attempts to model the highly massive parallel and distributed processing we believe takes place in the brain. What is machine learning? Consider: 1) the speed at which the brain recognizes images; 2) the many neurons populating a brain; 3) the speed at which a single neuron transmits signals.
3
Introduction Neural Network Representation
Left Straight Right Output nodes Internal nodes What is machine learning? Input nodes
4
Introduction Problems for Neural Networks
Examples may be described by a large number of attributes (e.g., pixels in an image). The output value can be discrete, continuous, or a vector of either discrete or continuous values. Data may contain errors. The time for training may be extremely long. Evaluating the network for a new example is relatively fast. Interpretability of the final hypothesis is not relevant (the NN is treated as a black box). What is machine learning?
5
Perceptron as one Type of Linear Discriminants
Introduction Design of Primitive Units Perceptrons What is machine learning?
6
Perceptron as one Type of Linear Discriminants
Dendrites Axon Terminal Soma What is machine learning? Axon Nucleus (image copied from wikipedia, entry on “neuron”)
7
Perceptron as one Type of Linear Discriminants
What is machine learning? (image copied from wikipedia, entry on “neuron”)
8
Design of Primitive Units Perceptrons
Definition.- It’s a step function based on a linear combination of real-valued inputs. If the combination is above a threshold it outputs a 1, otherwise it outputs a –1. x1 w1 x2 w2 {1 or –1} Σ What is machine learning? w0 wn xn X0=1
9
Design of Primitive Units Learning Perceptrons
1 if w0 + w1x1 + w2x2 + … + wnxn > 0 -1 otherwise O(x1,x2,…,xn) = To simplify our notation we can represent the function as follows: O(X) = sgn(WX) where sgn(y) = 1 if y > 0 -1 otherwise What is machine learning? Learning a perceptron means finding the right values for W. The hypothesis space of a perceptron is the space of all weight vectors.
10
Design of Primitive Units Representational Power
A perceptron draws a hyperplane as the decision boundary over the (n-dimensional) input space. Decision boundary (WX = 0) + + What is machine learning? + - - -
11
Design of Primitive Units Representational Power
A perceptron can learn only examples that are called “linearly separable”. These are examples that can be perfectly separated by a hyperplane. + + + + + What is machine learning? - - - + - - - Linearly separable Non-linearly separable
12
Design of Primitive Units Functions for Perceptrons
Perceptrons can learn many boolean functions: AND, OR, NAND, NOR, but not XOR AND: x1 W1=0.5 Σ W2=0.5 What is machine learning? W0 = -0.8 x2 X0=1 Every boolean function can be represented with a perceptron network that has two levels of depth or more.
13
Design of Primitive Units
Design a two-input perceptron that implements the Boolean function X1 OR ~X2 (X1 OR not X2). Assume the independent weight is always +0.5 (assume W0 = and X0 = 1). You simply have to provide adequate values for W1 and W2. x1 W1=? What is machine learning? Σ W2=? W0 = +0.5 x2 x0=1
14
Design of Primitive Units
X1 OR ~X2 (X1 OR not X2). x1 W1=? What is machine learning? Σ W2=? W0 = +0.5 x2 x0=1
15
Design of Primitive Units
There are different solutions, but a general solution is that: W2 < -0.5 W1 > |W2| - 0.5 where |x| is the absolute value of x. x1 (X1 OR not X2). W1=1 What is machine learning? Σ W2= -1 W0 = +0.5 x2 x0=1
16
Design of Primitive Units Perceptron Algorithms
How do we learn the weights of a single perceptron? Perceptron rule Delta rule Algorithm for learning using the perceptron rule: Assign random values to the weight vector Apply the perceptron rule to every training example Are all training examples correctly classified? Yes. Quit No. Go back to Step 2. What is machine learning?
17
Design of Primitive Units A. Perceptron Rule
The perceptron training rule: For a new training example X = (x1, x2, …, xn), update each weight according to this rule: wi = wi + Δwi Where Δwi = η (t-o) xi t: target output o: output generated by the perceptron η: constant called the learning rate (e.g., 0.1) What is machine learning?
18
Design of Primitive Units A. Perceptron Rule
Comments about the perceptron training rule: If the example is correctly classified the term (t-o) equals zero, and no update on the weight is necessary. If the perceptron outputs –1 and the real answer is 1, the weight is increased. If the perceptron outputs a 1 and the real answer is -1, the weight is decreased. Provided the examples are linearly separable and a small value for η is used, the rule is proved to classify all training examples correctly (i.e, is consistent with the training data). What is machine learning?
19
Historical Background
Early attempts to implement artificial neural networks: McCulloch (Neuroscientist) and Pitts (Logician) (1943) Based on simple neurons (MCP neurons) Based on logical functions What is machine learning? Walter Pitts (right) (extracted from Wikipedia)
20
Historical Background
Donald Hebb (1949) The Organization of Behavior. “Neural pathways are strengthened every time they are used.” What is machine learning? Picture of Donald Hebb (extracted from Wikipedia)
21
Design of Primitive Units B. The Delta Rule
What happens if the examples are not linearly separable? To address this situation we try to approximate the real concept using the delta rule. The key idea is to use a gradient descent search. We will try to minimize the following error: E = ½ Σi (ti – oi) 2 where the sum goes over all training examples. Here oi is the inner product WX and not sgn(WX) as with the perceptron algorithm. What is machine learning?
22
Design of Primitive Units B. The Delta Rule
The idea is to find a minimum in the space of weights and the error function E: E(W) What is machine learning? w1 w2
23
Design of Primitive Units Derivation of the Rule
The gradient of E with respect to weight vector W, denoted as E(W) : E(W) is a vector with the partial derivatives of E with respect to each weight wi. Key concept: The gradient vector points in the direction with the steepest increase in E. Δ Δ What is machine learning?
24
Design of Primitive Units B. The Delta Rule
For a new training example X = (x1, x2, …, xn), update each weight according to this rule: wi = wi + Δwi Where η: learning rate (e.g., 0.1) What is machine learning?
25
Design of Primitive Units The gradient
How do we compute E(W)? It is easy to see that So that gives us the following equation: ∆ wi = η Σi (ti – oi) xi Δ What is machine learning?
26
Design of Primitive Units The algorithm using the Delta Rule
Algorithm for learning using the delta rule: Assign random values to the weight vector Continue until the stopping condition is met Initialize each ∆wi to zero For each example: Update ∆wi: ∆wi = ∆wi + n (t – o) xi Update wi: wi = wi + ∆wi Until error is small What is machine learning?
27
Historical Background
Frank Rosenblatt (1962) He showed how to make incremental changes on the strength of the synapses. He invented the Perceptron. Minsky and Papert (1969) Criticized the idea of the perceptron. Could not solve the XOR problem. In addition, training time grows exponentially with the size of the input. What is machine learning? Picture of Marvin Minsky (2008) (extracted from Wikipedia)
28
Design of Primitive Units Difficulties with Gradient Descent
There are two main difficulties with the gradient descent method: Convergence to a minimum may take a long time. 2. There is no guarantee we will find the global minimum. What is machine learning?
29
Design of Primitive Units Stochastic Approximation
Instead of updating every weight until all examples have been observed, we update on every example: ∆ wi = η (t-o) xi In this case we update the weights “incrementally”. Remarks: -) When there are multiple local minima stochastic gradient descent may avoid the problem of getting stuck on a local minimum. -) Standard gradient descent needs more computation but can be used with a larger step size. What is machine learning?
30
Design of Primitive Units Difference between Perceptron and Delta Rule
The perceptron is based on an output from a step function, whereas the delta rule uses the linear combination of inputs directly. The perceptron is guaranteed to converge to a consistent hypothesis assuming the data is linearly separable. The delta rules converges in the limit but it does not need the condition of linearly separable data. What is machine learning?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.