Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi.

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

Multi-Layer Perceptron (MLP)
Slides from: Doug Gray, David Poole
NEURAL NETWORKS Backpropagation Algorithm
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Introduction to Neural Networks Computing
G53MLE | Machine Learning | Dr Guoping Qiu
1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.
Artificial Neural Networks
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Perceptron.
Machine Learning Neural Networks
Overview over different methods – Supervised Learning
Back-Propagation Algorithm
Artificial Neural Networks
Data Mining with Neural Networks (HK: Chapter 7.5)
Artificial Neural Networks
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
CS 4700: Foundations of Artificial Intelligence
CS 484 – Artificial Intelligence
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
Artificial Neural Networks
Classification Part 3: Artificial Neural Networks
Computer Science and Engineering
Artificial Neural Networks
Neural NetworksNN 11 Neural netwoks thanks to: Basics of neural network theory and practice for supervised and unsupervised.
1 Artificial Neural Networks Sanun Srisuk EECP0720 Expert Systems – Artificial Neural Networks.
CS464 Introduction to Machine Learning1 Artificial N eural N etworks Artificial neural networks (ANNs) provide a general, practical method for learning.
Machine Learning Chapter 4. Artificial Neural Networks
Classification / Regression Neural Networks 2
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is: By gradient descent. x0x0 + -
Multi-Layer Perceptron
Non-Bayes classifiers. Linear discriminants, neural networks.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
Chapter 2 Single Layer Feedforward Networks
Artificial Neural Network
EEE502 Pattern Recognition
1 Perceptron as one Type of Linear Discriminants IntroductionIntroduction Design of Primitive UnitsDesign of Primitive Units PerceptronsPerceptrons.
Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with.
Artificial Neural Network. Introduction Robust approach to approximating real-valued, discrete-valued, and vector-valued target functions Backpropagation.
Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Today’s Lecture Neural networks Training
Neural networks.
Fall 2004 Backpropagation CS478 - Machine Learning.
Artificial Neural Networks
Learning with Perceptrons and Neural Networks
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Artificial Neural Networks
Machine Learning Today: Reading: Maria Florina Balcan
CSC 578 Neural Networks and Deep Learning
Data Mining with Neural Networks (HK: Chapter 7.5)
Chapter 3. Artificial Neural Networks - Introduction -
Artificial Intelligence Chapter 3 Neural Networks
Perceptron as one Type of Linear Discriminants
Artificial Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Chapter - 3 Single Layer Percetron
Artificial Intelligence Chapter 3 Neural Networks
Seminar on Machine Learning Rada Mihalcea
CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya.
Artificial Intelligence Chapter 3 Neural Networks
Presentation transcript:

Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi Arabia

Artificial Complex Systems Artificial Neural Networks Perceptrons and Multi Layer Perceptrons (MLP)

Artificial Neural Networks Perceptron

4 The Perceptron x1x1 w0w0 y x2x2 x3x3 x4x4 x5x5 w1w1 w2w2 w3w3 w4w4 w5w5 Σ Initialisation : The first model of a biological neuron

5 Artificial Neuron: Perceptron It’s a step function based on a linear combination of real-valued inputs. If the combination is above a threshold it outputs a 1, otherwise it outputs a –1. It’s a step function based on a linear combination of real-valued inputs. If the combination is above a threshold it outputs a 1, otherwise it outputs a –1.x1x2 xn {1 or –1} X0=1 w0 w1 w2 wn Σ

6 Perceptron: activation rule O(x 1,x 2,…,x n ) = 1 if w 0 + w 1 x 1 + w 2 x 2 + … + w n x n > 0 -1 otherwise To simplify we can represent the function as follows: O(X) = sgn(W T X) where sgn(y) = 1 if y > 0 -1 otherwise Activation Rule: Linear Threshold (step Unit)

7 What a Perceptron does ? For a perceptron with 2 input variables namely x 1 and x 2 Equation W T X = 0 determines a line separating positive from negative examples. x2x2 w 1 x 1 + w 2 x 2 + w 0 = 0 x1x1 x1x1 y x2x2 w1w1 w2w2 Σ w0w0 y = sgn(w 1 x 1 +w 2 x 2 +w 0 )

8 What a Perceptron does ? For a perceptron with n input variables, it draws a hyperplane as the decision boundary over the (n-dimensional) input space. It classifies input patterns into two classes. The perceptron outputs 1 for instances lying on one side of the hyperplane and outputs –1 for instances on the other side. x3x3 x2x2 w 1 x 1 + w 2 x 2 + w 3 x 3 + w 0 = 0 x1x1

9 What can be represented using Perceptrons? andor Representation Theorem: perceptrons can only represent linearly separable functions. Examples: AND,OR, NOT.

10 Limits of the Perceptron A perceptron can learn only examples that are called “linearly separable”. These are examples that can be perfectly separated by a hyperplane Linearly separable Non-linearly separable

11 Functions for Perceptron Perceptrons can learn many boolean functions: AND, OR, NAND, NOR, but not XOR AND: x1 x2 X0=1 W0 = -0.8 W1=0.5 W2=0.5 Σ

12 Learning Perceptrons Learning is a process by which the free parameters of a neural network are adapted through a process of stimulation by the environment in which the network is embedded. The type of learning is determined by the manner in which the parameters changes take place.Learning is a process by which the free parameters of a neural network are adapted through a process of stimulation by the environment in which the network is embedded. The type of learning is determined by the manner in which the parameters changes take place. In the case of Perceptrons, we use a supervised learning.In the case of Perceptrons, we use a supervised learning. Learning a perceptron means finding the right values for W that satisfy the input examples {(input i, target i ) * }Learning a perceptron means finding the right values for W that satisfy the input examples {(input i, target i ) * } The hypothesis space of a perceptron is the space of all weight vectors.The hypothesis space of a perceptron is the space of all weight vectors.

13 Learning Perceptrons Principle of learning using the perceptron rule: 1.A set of training examples is given: {(x, t)*} where x is the input and t the target output [supervised learning] 2. Examples are presented to the network. 3.For each example, the network gives an output o. 4.If there is an error, the hyperplane is moved in order to correct the output error. 5.When all training examples are correctly classified, Stop learning.

14 Learning Perceptrons More formally, the algorithm for learning Perceptrons is as follows: 1.Assign random values to the weight vector 2. Apply the perceptron rule to every training example 3. Are all training examples correctly classified? Yes. Quit No. Go Back to Step 2.

15 Perceptron Training Rule The perceptron training rule: For a new training example [X = (x 1, x 2, …, x n ), t] update each weight according to this rule: w i = w i + Δw i Where Δw i = η (t-o) x i t: target output o: output generated by the perceptron η: constant called the learning rate (e.g., 0.1)

16 Perceptron Training Rule Comments about the perceptron training rule: If the example is correctly classified the term (t-o) equals zero, and no update on the weight is necessary. If the example is correctly classified the term (t-o) equals zero, and no update on the weight is necessary. If the perceptron outputs –1 and the real answer is 1, the weight is increased. If the perceptron outputs –1 and the real answer is 1, the weight is increased. If the perceptron outputs a 1 and the real answer is -1, the weight is decreased. If the perceptron outputs a 1 and the real answer is -1, the weight is decreased. Provided the examples are linearly separable and a small value for η is used, the rule is proved to classify all training examples correctly. Provided the examples are linearly separable and a small value for η is used, the rule is proved to classify all training examples correctly.

17 Perceptron Training Rule Consider the following example: (two classes: Red and Green)

18 Perceptron Training Rule Random Initialization of perceptron weights …

19 Perceptron Training Rule Apply Iteratively Perceptron Training Rule on the different examples:

20 Perceptron Training Rule Apply Iteratively Perceptron Training Rule on the different examples:

21 Perceptron Training Rule Apply Iteratively Perceptron Training Rule on the different examples:

22 Perceptron Training Rule Apply Iteratively Perceptron Training Rule on the different examples:

23 Perceptron Training Rule All examples are correctly classified … stop Learning

24 Perceptron Training Rule The straight line w 1 x+ w 2 y + w 0 =0 separates the two classes W 1 x + W 2 y +W 0 = 0

25 Demo Matlab Perception training rule Demo Learning AND, OR functions Try to learn XOR with Perceptron

26 Learning AND/OR operations P = [ ;...% Input patterns ]; T = [ ];% Desired Outputs net = newp([0 1;0 1],1); net.adaptParam.passes = 35; net = adapt(net,P,T); x = [1; 1]; y = sim(net,x); display(y); x1x1 y x2x2 w1w1 w2w2 Σ w0w0

Artificial Neural Networks MultiLayer Perceptron (MLP)

28 Solution for XOR : Add a hidden layer !! Input nodes Internal nodes Output nodes X1 X2 X1 XOR X2 x1x1 x2x2 x1x1 x2x2 x1x1

29 Solution for XOR : Add a hidden layer !! Input nodes Internal nodes Output nodes X1 X2 The problem is: How to learn Multi Layer Perceptrons?? Solution: Backpropagation Algorithm invented by Rumelhart and colleagues in 1986 X1 XOR x2

30 MultiLayer Perceptron In contrast to perceptrons, multilayer networks can learn not only multiple decision boundaries, but the boundaries may be nonlinear. Input nodes Internal nodes Output nodes

31 MultiLayer Perceptron Decision Boundaries A B A B A B A A B B A A B B A A B B HALF PLANE BOUNDED BY HYPERPLANE CONVEX OPEN OR CLOSED REGION ARBITRARY (complexity limited by number of neurons) Single-layer Two-layer Three-layer

32 Example x1 x2

33 One single unit To make nonlinear partitions on the space we need to define each unit as a nonlinear function (unlike the perceptron). One solution is to use the sigmoid unit. x1 x2 xn X0=1 w0 w1 w2 wn Σ O = σ(net) = 1 / 1 + e -net net

34 Sigmoid or logistic function O(x 1,x 2,…,x n ) = σ ( WX ) where: σ ( WX ) = 1 / 1 + e -WX Function σ is called the sigmoid or logistic function. Function σ is called the sigmoid or logistic function. This function is easy to differentiate and has the following property: This function is easy to differentiate and has the following property: d σ(y) / dy = σ(y) (1 – σ(y)) d σ(y) / dy = σ(y) (1 – σ(y))

35 Learning MultiLayer Perceptron BackPropagation Algorithm: Goal: To learn the weights for all links in an interconnected multilayer network. Goal: To learn the weights for all links in an interconnected multilayer network. We begin by defining our measure of error: We begin by defining our measure of error: E(W) = ½ Σ d Σ k (t kd – o kd ) 2 = ½ Σ examples (t-o) 2 = ½ Err 2 k varies along the output nodes and k varies along the output nodes and d over the training examples. d over the training examples. The idea is to use a gradient descent over the space of weights to find a global minimum (no guarantee). The idea is to use a gradient descent over the space of weights to find a global minimum (no guarantee).

36 Gradient Descent

37 Minimizing Error Using Steepest Descent The main idea: Find the way downhill and take a step: E x minimum downhill = - _____ d E d x  = step size x x -   d E d x

38 Reduction of Squared Error Gradient descent reduces the squared error by calculating the partial derivative of E with respect to each weight: chain rule for derivatives expand second Err to ( t – g(in)) This is called “in” because and chain rule The weight is updated by η times this gradient of error in weight space. The fact that the weight is updated in the correct direction (+/-) can be verified with examples. learning rate The learning rate, η, is typically set to a small value such as 0.1 E is a vector

39 BackPropagation Algorithm Create a network with n in input nodes, n hidden internal nodes, and n out output nodes. Create a network with n in input nodes, n hidden internal nodes, and n out output nodes. Initialize all weights to small random numbers in the range of -0.5 to 0.5. Initialize all weights to small random numbers in the range of -0.5 to 0.5. Until error is small do: Until error is small do: For each example X do For each example X do Propagate example X forward through the network Propagate example X forward through the network Propagate errors backward through the network Propagate errors backward through the network

40 Y BackPropagation Algorithm X E    D y1y1 y2y2 y4y4 y3y3 e1e1 e2e2 e4e4 e3e3 x1x1 x2x2 x3x3 x4x4 x5x5  In the classification phase, only propagation step is used to classify patterns (X,D)(X,D)

41 The Backpropagation Algorithm for Three-Layer Networks with Sigmoid Units Initialize all weights in the network to small random numbers. Until weights converge (may take thousands of iterations) do For each training example Compute network output vector o For each output unit i do Update each network weight For each hidden unit j do Update network weight from each input k to hidden j error backpropagation error gradient

42 The problem of overfitting … Approximation of the function y = f(x) : 2 neurons in hidden layer 5 neurons in hidden layer 40 neurons in hidden layer x y  The overfitting is not detectable in the learning phase …  So use Cross-Validation...

43 Application of ANNs Network StimulusResponse Input Pattern Output Pattern encoding decoding The general scheme when using ANNs is as follows:

44 Application: Digit Recognition

45 Matlab Demo Learning XOR function Function approximation Digit Recognition

46 Learning XOR Operation: Matlab Code P = [ ; ] T = [ ]; net = newff([0 1;0 1],[6 1],{'tansig' 'tansig'}); net.trainParam.epochs = 4850; net = train(net,P,T); X = [0 1]; Y = sim(net,X); display(Y);

47 Function Approximation: Learning Sinus Function P = 0:0.1:10; T = sin(P)*10.0; net = newff([ ],[8 1],{'tansig' 'purelin'}); plot(P,T); pause; Y = sim(net,P); plot(P,T,P,Y,’o’); pause; net.trainParam.epochs = 4850; net = train(net,P,T); Y = sim(net,P); plot(P,T,P,Y,’o’);

48 Digit Recognition: P = [ ; ; ; ; ; ; ; ; ; ; ; ; ; ; ]; T = [ ; ; ; ; ; ; ; ; ; ];

49 Digit Recognition: net = newff([0 1;0 1;0 1;0 1;0 1;0 1;0 1; 0 1;0 1;0 1;0 1;0 1;0 1;0 1;0 1], [20 10],{'tansig' 'tansig'}); net.trainParam.epochs = 4850; net = train(net,P,T);