Download presentation
Presentation is loading. Please wait.
Published byLillian Walters Modified over 8 years ago
1
Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi Arabia mbatouche@ccis.ksu.edu.sa
2
Artificial Complex Systems Artificial Neural Networks Perceptrons and Multi Layer Perceptrons (MLP)
3
Artificial Neural Networks Perceptron
4
4 The Perceptron x1x1 w0w0 y x2x2 x3x3 x4x4 x5x5 w1w1 w2w2 w3w3 w4w4 w5w5 Σ Initialisation : The first model of a biological neuron
5
5 Artificial Neuron: Perceptron It’s a step function based on a linear combination of real-valued inputs. If the combination is above a threshold it outputs a 1, otherwise it outputs a –1. It’s a step function based on a linear combination of real-valued inputs. If the combination is above a threshold it outputs a 1, otherwise it outputs a –1.x1x2 xn {1 or –1} X0=1 w0 w1 w2 wn Σ
6
6 Perceptron: activation rule O(x 1,x 2,…,x n ) = 1 if w 0 + w 1 x 1 + w 2 x 2 + … + w n x n > 0 -1 otherwise To simplify we can represent the function as follows: O(X) = sgn(W T X) where sgn(y) = 1 if y > 0 -1 otherwise Activation Rule: Linear Threshold (step Unit)
7
7 What a Perceptron does ? For a perceptron with 2 input variables namely x 1 and x 2 Equation W T X = 0 determines a line separating positive from negative examples. x2x2 w 1 x 1 + w 2 x 2 + w 0 = 0 x1x1 x1x1 y x2x2 w1w1 w2w2 Σ w0w0 y = sgn(w 1 x 1 +w 2 x 2 +w 0 )
8
8 What a Perceptron does ? For a perceptron with n input variables, it draws a hyperplane as the decision boundary over the (n-dimensional) input space. It classifies input patterns into two classes. The perceptron outputs 1 for instances lying on one side of the hyperplane and outputs –1 for instances on the other side. x3x3 x2x2 w 1 x 1 + w 2 x 2 + w 3 x 3 + w 0 = 0 x1x1
9
9 What can be represented using Perceptrons? andor Representation Theorem: perceptrons can only represent linearly separable functions. Examples: AND,OR, NOT.
10
10 Limits of the Perceptron A perceptron can learn only examples that are called “linearly separable”. These are examples that can be perfectly separated by a hyperplane. + + + - - - + + + - - - Linearly separable Non-linearly separable
11
11 Functions for Perceptron Perceptrons can learn many boolean functions: AND, OR, NAND, NOR, but not XOR AND: x1 x2 X0=1 W0 = -0.8 W1=0.5 W2=0.5 Σ
12
12 Learning Perceptrons Learning is a process by which the free parameters of a neural network are adapted through a process of stimulation by the environment in which the network is embedded. The type of learning is determined by the manner in which the parameters changes take place.Learning is a process by which the free parameters of a neural network are adapted through a process of stimulation by the environment in which the network is embedded. The type of learning is determined by the manner in which the parameters changes take place. In the case of Perceptrons, we use a supervised learning.In the case of Perceptrons, we use a supervised learning. Learning a perceptron means finding the right values for W that satisfy the input examples {(input i, target i ) * }Learning a perceptron means finding the right values for W that satisfy the input examples {(input i, target i ) * } The hypothesis space of a perceptron is the space of all weight vectors.The hypothesis space of a perceptron is the space of all weight vectors.
13
13 Learning Perceptrons Principle of learning using the perceptron rule: 1.A set of training examples is given: {(x, t)*} where x is the input and t the target output [supervised learning] 2. Examples are presented to the network. 3.For each example, the network gives an output o. 4.If there is an error, the hyperplane is moved in order to correct the output error. 5.When all training examples are correctly classified, Stop learning.
14
14 Learning Perceptrons More formally, the algorithm for learning Perceptrons is as follows: 1.Assign random values to the weight vector 2. Apply the perceptron rule to every training example 3. Are all training examples correctly classified? Yes. Quit No. Go Back to Step 2.
15
15 Perceptron Training Rule The perceptron training rule: For a new training example [X = (x 1, x 2, …, x n ), t] update each weight according to this rule: w i = w i + Δw i Where Δw i = η (t-o) x i t: target output o: output generated by the perceptron η: constant called the learning rate (e.g., 0.1)
16
16 Perceptron Training Rule Comments about the perceptron training rule: If the example is correctly classified the term (t-o) equals zero, and no update on the weight is necessary. If the example is correctly classified the term (t-o) equals zero, and no update on the weight is necessary. If the perceptron outputs –1 and the real answer is 1, the weight is increased. If the perceptron outputs –1 and the real answer is 1, the weight is increased. If the perceptron outputs a 1 and the real answer is -1, the weight is decreased. If the perceptron outputs a 1 and the real answer is -1, the weight is decreased. Provided the examples are linearly separable and a small value for η is used, the rule is proved to classify all training examples correctly. Provided the examples are linearly separable and a small value for η is used, the rule is proved to classify all training examples correctly.
17
17 Perceptron Training Rule Consider the following example: (two classes: Red and Green)
18
18 Perceptron Training Rule Random Initialization of perceptron weights …
19
19 Perceptron Training Rule Apply Iteratively Perceptron Training Rule on the different examples:
20
20 Perceptron Training Rule Apply Iteratively Perceptron Training Rule on the different examples:
21
21 Perceptron Training Rule Apply Iteratively Perceptron Training Rule on the different examples:
22
22 Perceptron Training Rule Apply Iteratively Perceptron Training Rule on the different examples:
23
23 Perceptron Training Rule All examples are correctly classified … stop Learning
24
24 Perceptron Training Rule The straight line w 1 x+ w 2 y + w 0 =0 separates the two classes W 1 x + W 2 y +W 0 = 0
25
25 Demo Matlab Perception training rule Demo Learning AND, OR functions Try to learn XOR with Perceptron
26
26 Learning AND/OR operations P = [ 0 0 1 1;...% Input patterns 0 1 0 1 ]; T = [ 0 1 1 1];% Desired Outputs net = newp([0 1;0 1],1); net.adaptParam.passes = 35; net = adapt(net,P,T); x = [1; 1]; y = sim(net,x); display(y); x1x1 y x2x2 w1w1 w2w2 Σ w0w0
27
Artificial Neural Networks MultiLayer Perceptron (MLP)
28
28 Solution for XOR : Add a hidden layer !! Input nodes Internal nodes Output nodes X1 X2 X1 XOR X2 x1x1 x2x2 x1x1 x2x2 x1x1
29
29 Solution for XOR : Add a hidden layer !! Input nodes Internal nodes Output nodes X1 X2 The problem is: How to learn Multi Layer Perceptrons?? Solution: Backpropagation Algorithm invented by Rumelhart and colleagues in 1986 X1 XOR x2
30
30 MultiLayer Perceptron In contrast to perceptrons, multilayer networks can learn not only multiple decision boundaries, but the boundaries may be nonlinear. Input nodes Internal nodes Output nodes
31
31 MultiLayer Perceptron Decision Boundaries A B A B A B A A B B A A B B A A B B HALF PLANE BOUNDED BY HYPERPLANE CONVEX OPEN OR CLOSED REGION ARBITRARY (complexity limited by number of neurons) Single-layer Two-layer Three-layer
32
32 Example x1 x2
33
33 One single unit To make nonlinear partitions on the space we need to define each unit as a nonlinear function (unlike the perceptron). One solution is to use the sigmoid unit. x1 x2 xn X0=1 w0 w1 w2 wn Σ O = σ(net) = 1 / 1 + e -net net
34
34 Sigmoid or logistic function O(x 1,x 2,…,x n ) = σ ( WX ) where: σ ( WX ) = 1 / 1 + e -WX Function σ is called the sigmoid or logistic function. Function σ is called the sigmoid or logistic function. This function is easy to differentiate and has the following property: This function is easy to differentiate and has the following property: d σ(y) / dy = σ(y) (1 – σ(y)) d σ(y) / dy = σ(y) (1 – σ(y))
35
35 Learning MultiLayer Perceptron BackPropagation Algorithm: Goal: To learn the weights for all links in an interconnected multilayer network. Goal: To learn the weights for all links in an interconnected multilayer network. We begin by defining our measure of error: We begin by defining our measure of error: E(W) = ½ Σ d Σ k (t kd – o kd ) 2 = ½ Σ examples (t-o) 2 = ½ Err 2 k varies along the output nodes and k varies along the output nodes and d over the training examples. d over the training examples. The idea is to use a gradient descent over the space of weights to find a global minimum (no guarantee). The idea is to use a gradient descent over the space of weights to find a global minimum (no guarantee).
36
36 Gradient Descent
37
37 Minimizing Error Using Steepest Descent The main idea: Find the way downhill and take a step: E x minimum downhill = - _____ d E d x = step size x x - d E d x
38
38 Reduction of Squared Error Gradient descent reduces the squared error by calculating the partial derivative of E with respect to each weight: chain rule for derivatives expand second Err to ( t – g(in)) This is called “in” because and chain rule The weight is updated by η times this gradient of error in weight space. The fact that the weight is updated in the correct direction (+/-) can be verified with examples. learning rate The learning rate, η, is typically set to a small value such as 0.1 E is a vector
39
39 BackPropagation Algorithm Create a network with n in input nodes, n hidden internal nodes, and n out output nodes. Create a network with n in input nodes, n hidden internal nodes, and n out output nodes. Initialize all weights to small random numbers in the range of -0.5 to 0.5. Initialize all weights to small random numbers in the range of -0.5 to 0.5. Until error is small do: Until error is small do: For each example X do For each example X do Propagate example X forward through the network Propagate example X forward through the network Propagate errors backward through the network Propagate errors backward through the network
40
40 Y BackPropagation Algorithm X E D y1y1 y2y2 y4y4 y3y3 e1e1 e2e2 e4e4 e3e3 x1x1 x2x2 x3x3 x4x4 x5x5 In the classification phase, only propagation step is used to classify patterns (X,D)(X,D)
41
41 The Backpropagation Algorithm for Three-Layer Networks with Sigmoid Units Initialize all weights in the network to small random numbers. Until weights converge (may take thousands of iterations) do For each training example Compute network output vector o For each output unit i do Update each network weight For each hidden unit j do Update network weight from each input k to hidden j error backpropagation error gradient
42
42 The problem of overfitting … Approximation of the function y = f(x) : 2 neurons in hidden layer 5 neurons in hidden layer 40 neurons in hidden layer x y The overfitting is not detectable in the learning phase … So use Cross-Validation...
43
43 Application of ANNs Network StimulusResponse 0 1 0 1 1 1 0 0 1 1 0 0 1 0 1 0 Input Pattern Output Pattern encoding decoding The general scheme when using ANNs is as follows:
44
44 Application: Digit Recognition
45
45 Matlab Demo Learning XOR function Function approximation Digit Recognition
46
46 Learning XOR Operation: Matlab Code P = [ 0 0 1 1;... 0 1 0 1] T = [ 0 1 1 0]; net = newff([0 1;0 1],[6 1],{'tansig' 'tansig'}); net.trainParam.epochs = 4850; net = train(net,P,T); X = [0 1]; Y = sim(net,X); display(Y);
47
47 Function Approximation: Learning Sinus Function P = 0:0.1:10; T = sin(P)*10.0; net = newff([0.0 10.0],[8 1],{'tansig' 'purelin'}); plot(P,T); pause; Y = sim(net,P); plot(P,T,P,Y,’o’); pause; net.trainParam.epochs = 4850; net = train(net,P,T); Y = sim(net,P); plot(P,T,P,Y,’o’);
48
48 Digit Recognition: P = [ 1 0 1 1 1 1 1 1 1 1 ; 1 1 1 1 0 1 1 1 1 1 ; 1 0 1 1 1 1 1 1 1 1 ; 1 0 0 0 1 1 1 0 1 1 ; 0 1 0 0 0 0 0 0 0 0 ; 1 0 1 1 1 0 0 1 1 1 ; 1 0 1 1 1 1 1 0 1 1 ; 0 1 1 1 1 1 1 0 1 1 ; 1 0 1 1 1 1 1 1 1 1 ; 1 0 1 0 0 0 1 0 1 0 ; 0 1 0 0 0 0 0 0 0 0 ; 1 0 0 1 1 1 1 1 1 1 ; 1 0 1 1 0 1 1 0 1 1 ; 1 1 1 1 0 1 1 0 1 1 ; 1 0 1 1 1 1 1 1 1 1 ]; T = [ 1 0 0 0 0 0 0 0 0 0 ; 0 1 0 0 0 0 0 0 0 0 ; 0 0 1 0 0 0 0 0 0 0 ; 0 0 0 1 0 0 0 0 0 0 ; 0 0 0 0 1 0 0 0 0 0 ; 0 0 0 0 0 1 0 0 0 0 ; 0 0 0 0 0 0 1 0 0 0 ; 0 0 0 0 0 0 0 1 0 0 ; 0 0 0 0 0 0 0 0 1 0 ; 0 0 0 0 0 0 0 0 0 1 ];
49
49 Digit Recognition: net = newff([0 1;0 1;0 1;0 1;0 1;0 1;0 1; 0 1;0 1;0 1;0 1;0 1;0 1;0 1;0 1], [20 10],{'tansig' 'tansig'}); net.trainParam.epochs = 4850; net = train(net,P,T);
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.